Probability Theory - TU Kaiserslautern · Introduction A stochastic model: ... Basic Concepts of...

Probability Theory

Klaus Ritter

TU Kaiserslautern, WS 2017/18

Literature

In particular,

H. Bauer, Probability Theory, de Gruyter, Berlin, 1996.

P. Billingsley, Probability and Measure, Wiley, New York, 1995.

P. Ganssler, W. Stute, Wahrscheinlichkeitstheorie, Springer, Berlin, 1977.

A. Klenke, Probability Theory, Springer, Berlin, 2014.

Prerequisites

Stochastic Methods and Measure and Integration Theory

Contents

I Introduction 1

II Basic Concepts of Probability Theory 3

1 Random Variables and Distributions . . . . . . . . . . . . . . . . . . . 3

2 Convergence in Probability . . . . . . . . . . . . . . . . . . . . . . . . . 8

3 Convergence in Distribution . . . . . . . . . . . . . . . . . . . . . . . . 10

4 Uniform Integrability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

5 Kernels and Product Measures . . . . . . . . . . . . . . . . . . . . . . . 21

6 Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

IIILimit Theorems 37

1 Zero-One Laws . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

2 The Strong Law of Large Numbers . . . . . . . . . . . . . . . . . . . . 40

3 The Weak Law of Large Numbers . . . . . . . . . . . . . . . . . . . . . 47

4 Characteristic Functions . . . . . . . . . . . . . . . . . . . . . . . . . . 48

5 The One-Dimensional Central Limit Theorem . . . . . . . . . . . . . . 54

6 The Law of the Iterated Logarithm . . . . . . . . . . . . . . . . . . . . 60

7 The Multi-dimensional Central Limit Theorem . . . . . . . . . . . . . . 60

IV Brownian Motion 65

1 Stochastic Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

2 Donsker’s Invariance Principle . . . . . . . . . . . . . . . . . . . . . . . 69

3 The Brownian Motion . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

V Conditional Expectations and Martingales 78

1 The Radon-Nikodym-Theorem . . . . . . . . . . . . . . . . . . . . . . . 78

2 Conditional Expectations . . . . . . . . . . . . . . . . . . . . . . . . . . 81

A Measure and Integration 87

1 Measure Spaces and Measurable Mappings . . . . . . . . . . . . . . . . 87

2 Borel Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

ii

CONTENTS iii

3 The Lebesgue Measure . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

4 Real-valued Measurable Mappings . . . . . . . . . . . . . . . . . . . . . 89

5 Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

6 Lp-Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

7 Dynkin Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

8 Product Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

9 Caratheodory’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 97

10 The Factorization Lemma . . . . . . . . . . . . . . . . . . . . . . . . . 97

Literature 98

Definitions 99

Chapter I

Introduction

A stochastic model : a probability space (Ω,A, P ) together with a collection of random

variables (measurable mappings) Ω→ R, say.

Main topics in this course:

(i) limit theorems,

(ii) conditional probabilities and expectations,

(iii) discrete-time martingales,

(iv) Brownian motion.

Example 1. Limit theorems like the law of large numbers or the central limit theorem

deal with sequences X1, X2, . . . of random variables and their partial sums

Sn =n∑i=1

Xi

(physics: position of a particle after n collisions; gambling: cumulative gain after n

trials). Under which conditions and in which sense does n−α · Sn converge as n tends

to infinity?

Example 2. Consider two random variables X1 and X2. If P (X2 = v) > 0 then

the conditional probability of X1 ∈ A given X2 = v is defined by

P (X1 ∈ A | X2 = v) =P (X1 ∈ A ∩ X2 = v)

P (X2 = v).

How can we reasonably extend this definition to the case P (X2 = v) = 0, e.g.,

for X2 being normally distributed? How does the observation X2 = v change our

stochastic model?

Example 3. Martingales may be used to model fair games, and a particular case of

a martingale S0, S1, . . . arises in Example 1, if X1, X2, . . . is an independent sequence

with zero mean. A gambling strategy is defined by a sequence H1, H2, . . . of random

1

I. Introduction 2

variables, where Hn may depend in any way on the outcomes of the first n− 1 trials.

The cumulative gain after n trials is given by the discrete integral

n∑k=1

Hk ·Xk =n∑k=1

Hk · (Sk − Sk−1).

Can we tilt the martingale in favor of the gambler by a suitable strategy?

Example 4. The fluctuation of a stock price defines a function on the time interval

[0,∞[ with values in R (for simplicity, we admit negative stock prices at this point).

What is a reasonable σ-algebra on the space Ω of all mappings [0,∞[→ R or on the

subspace of all continuous mappings? How can we define (non-discrete) probability

measures on these spaces in order to model the random dynamics of stock prices?

Analogously for random perturbations in physics, biology, etc.

More generally, the same questions arise for mappings I → S with an arbitrary

non-empty set I and S ⊂ Rd (physics: phase transition in ferromagnetic materials,

the orientation of magnetic dipoles on a set I of sites; medicine: spread of diseases,

certain biometric parameters for a set I of individuals; environmental science: the

concentration of certain pollutants in a region I).

Example 5. Suppose that X1, X2, . . . is an independent sequence with zero mean

and variance one. Rescaling the partial sums Sn in time and space according to

S(m)n/m =

n∑k=1

Xk/√m

and using piecewise linear interpolation of S(m)0 , S

(m)1/m, . . . at the knots 0, 1/m, . . . we

get a random variable S(m) taking values in the space C([0,∞[). By the central

limit theorem S(m)1 converges to a standard normally distributed random variable as

m tends to infinity. The infinite-dimensional counterpart of this result, which deals

with probability measures on the space C([0,∞[), guarantees convergence of S(m) to

a Brownian motion. On the latter we quote from Schilling, Partzsch (2014, p. vi):

‘Brownian motion is arguably the single most important stochastic process.’

Chapter II

Basic Concepts of Probability

Theory

Context for probability theoretical concepts: a probability space (Ω,A, P ).

Terminology: A ∈ A event , P (A) probability of the event A ∈ A.

1 Random Variables and Distributions

Given: a probability space (Ω,A, P ) and a measurable space (Ω′,A′).

Definition 1. X : Ω → Ω′ is a random element if X is A-A′-measurable. Particular

cases:

(i) X is a (real) random variable if (Ω′,A′) = (R,B),

(ii) X is a numerical random variable if (Ω′,A′) = (R,B),

(iii) X is a k-dimensional (real) random vector if (Ω′,A′) = (Rk,Bk),

(iv) X is a k-dimensional numerical random vector if (Ω′,A′) = (Rk,Bk).

As is customary, we use the abbreviation

X ∈ A = ω ∈ Ω : X(ω) ∈ A

for any X : Ω→ Ω and A ⊂ Ω.

Definition 2.

(i) The distribution (probability law) of a random element X : Ω→ Ω′ (with respect

to P ) is the image measure

PX = X(P ).

Notation: X ∼ Q if PX = Q.

3

II.1. Random Variables and Distributions 4

(ii) Given: probability spaces (Ω1,A1, P1), (Ω2,A2, P2) and random elements

X1 : Ω1 → Ω′, X2 : Ω2 → Ω′.

X1 and X2 are identically distributed if

(P1)X1= (P2)X2

.

Definition 3. A property Π holds P -almost surely (P -a.s., a.s., with probability one),

if

∃A ∈ A : A ⊂ ω ∈ Ω : Π holds for ω ∧ P (A) = 1.

Remark 4.

(i) For random elements X, Y : Ω→ Ω′

X = Y P -a.s. ⇒ PX = PY ,

but the converse is not true in general. For instance, let P be the uniform

distribution on Ω = 0, 1 and define X(ω) = ω and Y (ω) = 1− ω.

(ii) For every probability measure Q on (Ω′,A′) there exists a probability space

(Ω,A, P ) and a random element X : Ω→ Ω′ such that X ∼ Q. Take (Ω,A, P ) =

(Ω′,A′, Q) and X = idΩ.

Caratheodory’s Theorem is a general tool for the construction of probability

measures, see Section A.9.

(iii) A major part of probability theory deals with properties of random elements

that can be formulated in terms of their distributions only.

Example 5. A discrete distribution PX is specified by a countable set ∅ 6= D ⊂ Ω′

and a mapping p : D → R such that

∀ω′ ∈ D : p(ω′) ≥ 0 ∧∑ω′∈D

p(ω′) = 1,

namely,

PX =∑ω′∈D

p(ω′) · εω′ .

Here εω′ denotes the Dirac measure at the point ω′ ∈ Ω′, see Example A.1.3. Hence

PX(A′) = P (X ∈ A′) =∑

ω′∈A′∩D

p(ω′), A′ ∈ A′.

Assume that ω′ ∈ A′ for every ω′ ∈ D. Then

P (X = ω′) = p(ω′),

and p, extended by zero to a mapping Ω′ → R, is the density of PX w.r.t the counting

measure on A′.


If |D| <∞ then p(ω′) = 1|D| yields the uniform distribution on D.

For (Ω′,A′) = (R,B)

B(n, p) =n∑k=0

(n

k

)· pk(1− p)n−k · εk

is the binomial distribution with parameters n ∈ N and p ∈ [0, 1]. In particular, for

n = 1 we get the Bernoulli distribution

B(1, p) = (1− p) · ε0 + p · ε1.

Further examples include the geometric distribution G(p) with parameter p ∈ ]0, 1],

G(p) =∞∑k=1

p · (1− p)k−1 · εk,

and the Poisson distribution π(λ) with parameter λ > 0,

π(λ) =∞∑k=0

exp(−λ) · λk

k!· εk.

Example 6. Consider a distribution on (Rk,Bk) that is defined in terms of a proba-

bility density f : Rk → [0,∞[ w.r.t. the Lebesgue measure λk. In this case

PX(A′) = P (X ∈ A′) =

∫A′f dλk, A′ ∈ Bk.

We present some examples in the case k = 1. The normal distribution

N(µ, σ2) = f · λ1

with parameters µ ∈ R and σ2, where σ > 0, is obtained by

f(x) =1√

2πσ2· exp

(−1

2

(x− µ)2

σ2

), x ∈ R.

The exponential distribution with parameter λ > 0 is obtained by

f(x) =

0 if x < 0

λ · exp(−λx) if x ≥ 0.

The uniform distribution on D ∈ B with λ1(D) ∈ ]0,∞[ is obtained by

f =1

λ1(D)· 1D.


Distribution Functions

Definition 7. Let X = (X1, . . . , Xk) : Ω→ Rk be a random vector. Then

FX : Rk → [0, 1]

(x1, . . . , xk) 7→ PX

( k×i=1

]−∞, xi])

= P

( k⋂i=1

Xi ≤ xi)

is called the distribution function of X.

Theorem 8. Given: probability spaces (Ω1,A1, P1), (Ω2,A2, P2) and random vectors

X1 : Ω1 → Rk, X2 : Ω2 → Rk.

Then

(P1)X1 = (P2)X2 ⇔ FX1 = FX2 .

Proof. ‘⇒’ holds trivially. ‘⇐’: By Theorem A.2.4, Bk = σ(E) for

E = k×i=1

]−∞, xi] : x1, . . . , xk ∈ R,

and E is closed w.r.t. intersections. Use Theorem A.1.5.

For notational convenience, we consider the case k = 1 in the sequel. We refer to

Ubung 3.4 in Maß- und Integrationstheorie (2016) for the following two facts.

Theorem 9.

(i) FX is non-decreasing,

(ii) FX is right-continuous,

(iii) limx→−∞ FX(x) = 0 and limx→∞ FX(x) = 1,

(iv) FX is continuous at x iff P (X = x) = 0.

Theorem 10. For every function F that satisfies (i)–(iii) from Theorem 9,

∃1Q probability measure on B ∀x ∈ R : Q(]−∞, x]) = F (x).

Expectation and Variance

Remark 11. Define ∞r =∞ for r > 0. For 1 ≤ p < q <∞ and X ∈ Z(Ω,A)∫|X|p dP ≤

(∫|X|q dP

)p/q,

due to Holder’s inequality, see Theorem A.6.2.


Notation:

L = L(Ω,A, P ) =X ∈ Z(Ω,A) :

∫|X| dP <∞

is the class of P -integrable random variables, and analogously

L = L(Ω,A, P ) =X ∈ Z(Ω,A) :

∫|X| dP <∞

is the class of P -integrable numerical random variables. We consider PX as a distri-

bution on (R,B) if P (X ∈ R) = 1 for a numerical random variable X, and we

consider L as a subspace of L.

Definition 12. For X ∈ L or X ∈ Z+

E(X) =

∫X dP

is the expectation of X. For X ∈ Z(Ω,A) such that X2 ∈ L

Var(X) =

∫(X − E(X))2 dP

and√

Var(X) are the variance and the standard deviation of X, respectively.

Remark 13. The Transformation Theorem A.5.12 implies∫Ω

|X|p dP <∞ ⇔∫R|x|p PX(dx) <∞

for X ∈ Z(Ω,A), in which case, for p = 1

E(X) =

∫RxPX(dx),

and for p = 2

Var(X) =

∫R(x− E(X))2 PX(dx).

Thus E(X) and Var(X) depend only on PX .

Example 14.

X ∼ B(n, p) E(X) = n · p Var(X) = n · p · (1− p)

X ∼ G(p) E(X) =1

pVar(X) =

1− pp2

X ∼ π(λ) E(X) = λ Var(X) = λ,

see the lecture on Stochastische Methoden.

X is Cauchy distributed with parameter α > 0 if X ∼ f · λ1 where

f(x) =α

π(α2 + x2), x ∈ R.

II.2. Convergence in Probability 8

Since∫ t

0x

1+x2dx = 1

2log(1 + t2) neither E(X+) < ∞ nor E(X−) < ∞, and therefore

X 6∈ L.

If X ∼ N(µ, σ2) then

E(X) = µ ∧ Var(X) = σ2.

If X is exponentially distributed with parameter λ > 0 then

E(X) =1

λ∧ Var(X) =

1

λ2.

See the lecture on Stochastische Methoden.

2 Convergence in Probability

Motivated by the Examples A.5.11 and A.6.8 we introduce a notion of convergence

that is weaker than convergence in mean and convergence almost surely.

In the sequel, X, Xn, etc. are random variables on a common probability space

(Ω,A, P ).

Definition 1. (Xn)n converges to X in probability if

∀ ε > 0: limn→∞

P (|Xn −X| > ε) = 0.

Notation: XnP−→ X.

Theorem 2 (Chebyshev-Markov Inequality). Let (Ω,A, µ) be a measure space and

f ∈ Z(Ω,A). For every u > 0 and every 1 ≤ p <∞

µ(|f | ≥ u) ≤ 1

up·∫|f |p dµ.

Proof. We have ∫|f |≥u

up dµ ≤∫

Ω

|f |p dµ.

Corollary 3. If E(X2) <∞ and ε > 0, then

P (|X − E(X)| ≥ ε) ≤ 1

ε2· Var(X).

Theorem 4.

d(X, Y ) =

∫min(1, |X − Y |) dP

defines a semi-metric on Z(Ω,A), and

XnP−→ X ⇔ lim

n→∞d(Xn, X) = 0.

II.2. Convergence in Probability 9

Proof. ‘⇒’ For ε > 0∫min(1, |Xn −X|) dP

=

∫|Xn−X|>ε

min(1, |Xn −X|) dP +

∫|Xn−X|≤ε

min(1, |Xn −X|) dP

≤ P (|Xn −X| > ε) + min(1, ε).

‘⇐’: Let 0 < ε < 1. Use Theorem 2 to obtain

P (|Xn −X| > ε) = P (min(1, |Xn −X|) > ε)

≤ 1

ε·∫

min(1, |Xn −X|) dP =1

ε· d(Xn, X).

Remark 5. By Theorems 4 and A.6.11,

XnLp

−→ X ⇒ XnP−→ X.

Example A.5.11 shows that ‘⇐’ does not hold in general.

Remark 6. By Theorems 4 and A.5.10,

XnP -a.s.−→ X ⇒ Xn

P−→ X.

Example A.6.8 shows that ‘⇐’ does not hold in general. The Law of Large Numbers

deals with convergence almost surely or convergence in probability, see the introduc-

tory Example I.1 and Sections III.2 and III.3.

Subsequence Criteria

Corollary 7.

XnP−→ X ⇒ ∃ subsequence (Xnk

)k∈N : Xnk

P -a.s.−→ X.

Proof. Due to Theorems 4 and A.6.7 there exists a subsequence (Xnk)k∈N such that

min(1, |Xnk−X|) P -a.s.−→ 0.

Remark 8. In any semi-metric space (M,d) a sequence (an)n∈N converges to a iff

∀ subsequence (ank)k∈N ∃ subsequence (ank`

)`∈N : lim`→∞

d(ank`, a) = 0.

Corollary 9. XnP−→ X iff

∀ subsequence (Xnk)k∈N ∃ subsequence (Xnk`

)`∈N : Xnk`

P -a.s.−→ X.

Proof. ‘⇒’: Corollary 7. ‘⇐’: Remarks 6 and 8 together with Theorem 4.

II.3. Convergence in Distribution 10

Remark 10. We conclude that, in general, there is no semi-metric on Z(Ω,A) that

defines the a.s.-convergence. However, if Ω is countable, then

XnP -a.s.−→ X ⇔ Xn

P−→ X.

Proof: Ubung 2.3.

Lemma 11. Let −→ denote convergence almost everywhere or convergence in prob-

ability. If X(i)n −→ X(i) for i = 1, . . . ,m and f : Rm → R is continuous, then

f (X(1)n , . . . , X(m)

n ) −→ f (X(1), . . . , X(m)).

Proof. Trivial for convergence almost everywhere, and by Corollary 9 the conclusion

holds for convergence in probability, too.

Corollary 12. Let XnP−→ X. Then

XnP−→ Y ⇔ X = Y P -a.s.

Proof. Corollary 9 and Lemma A.6.6.

3 Convergence in Distribution

Given: a metric space (M,ρ). Let M(M) denote the set of all probability measures

on the Borel-σ-algebra B(M) in M . Moreover, let

Cb(M) = f : M → R : f bounded, continuous.

Definition 1.

(i) A sequence (Qn)n∈N in M(M) converges weakly to Q ∈M(M) if

∀ f ∈ Cb(M) : limn→∞

∫f dQn =

∫f dQ.

Notation: Qnw−→ Q.

(ii) A sequence (Xn)n∈N of random elements with values in M converges in distribu-

tion to a random element X with values in M if Qnw−→ Q for the distributions

Qn of Xn and Q of X, respectively.

Notation: Xnd−→ X.

Remark 2. For convergence in distribution the random elements need not be defined

on a common probability space.

In the sequel: Qn, Q ∈M(M) for n ∈ N.


Example 3.

(i) For xn, x ∈Mεxn

w−→ εx ⇔ limn→∞

ρ(xn, x) = 0.

For the proof of ‘⇐’, note that∫f dεxn = f(xn),

∫f dεx = f(x).

For the proof of ‘⇒’, suppose that lim supn→∞ ρ(xn, x) > 0. Take

f(y) = min(ρ(y, x), 1), y ∈M,

and observe that f ∈ Cb(M) and

lim supn→∞

∫f dεxn = lim sup

n→∞min(ρ(xn, x), 1) > 0

while∫f dεx = 0.

(ii) For the Euclidean distance ρ on M = Rk we have (M,B(M)) = (Rk,Bk). Now,

in particular, k = 1 and

Qn = N(µn, σ2n)

where σn > 0. For f ∈ Cb(R)∫f dQn = 1/

√2π ·

∫Rf(σn · x+ µn) · exp(−1/2 · x2)λ1(dx).

Put N(µ, 0) = εµ. Then

limn→∞

µn = µ ∧ limn→∞

σn = σ ⇒ Qnw−→ N(µ, σ2).

Otherwise (Qn)n∈N does not converge weakly. Ubung 4.1.

(iii) For M = C([0, T ]) let ρ(x, y) = supt∈[0,T ] |x(t) − y(t)|. Cf. the introductory

Examples I.4 and I.5.

Remark 4. Note that Qnw−→ Q does not imply

∀A ∈ B(M) : limn→∞

Qn(A) = Q(A).

For instance, assume limn→∞ ρ(xn, x) = 0 with xn 6= x for every n ∈ N. Then

εxn(x) = 0, εx(x) = 1.

Theorem 5 (Portemanteau Theorem). The following properties are equivalent:

(i) Qnw−→ Q,

(ii) ∀ f ∈ Cb(M) uniformly continuous : limn→∞∫f dQn =

∫f dQ,


(iii) ∀A ⊂M closed: lim supn→∞Qn(A) ≤ Q(A),

(iv) ∀A ⊂M open: lim infn→∞Qn(A) ≥ Q(A),

(v) ∀A ∈ B(M) : Q(∂A) = 0 ⇒ limn→∞Qn(A) = Q(A).

Proof. At first we show that ‘(ii)⇒ (iii)⇒ (i)’, which yields the equivalence of (i)–(iv).

‘(ii) ⇒ (iii)’: Let A ⊂M be closed, and put

Am = x ∈M : dist(x,A) < 1/m.

Note that Am ↓ A. Let ε > 0, and take m ∈ N such that Q(Am) < Q(A) + ε. Define

ϕ : R→ R by

ϕ(z) =

1 if z ≤ 0

1− z if 0 < z < 1

0 otherwise

and f : M → R by

f(x) = ϕ(m · dist(x,A)).

Then f ∈ Cb(M) is uniformly continuous, and 0 ≤ f ≤ 1 with f |A = 1 and f |Acm

= 0.

Hence Qn(A) ≤∫f dQn and

lim supn→∞

Qn(A) ≤∫f dQ ≤ Q(Am) ≤ Q(A) + ε.

‘(iii) ⇒ (i)’: Let f ∈ Cb(M). Assume that 0 < f < 1 without loss of generality. For

every m ∈ N and every P ∈M(M)∫f dP ≤

m∑k=1

k/m · P ((k − 1)/m ≤ f < k/m) =1

m

m∑k=1

P (f ≥ (k − 1)/m)

=m∑k=1

(k − 1)/m · P ((k − 1)/m ≤ f < k/m) + 1/m ≤∫f dP + 1/m.

Therefore

lim supn→∞

∫f dQn ≤

1

m

m∑k=1

Q(f ≥ (k − 1)/m) ≤∫f dQ+ 1/m,

which implies lim supn→∞∫f dQn ≤

∫f dQ. In fact, the latter holds true for any

bounded and upper semi-continuous mapping f : M → R. For f being continuous

consider 1− f , too, to obtain limn→∞∫f dQn =

∫f dQ.

‘(i) ⇒ (v)’: Let A ∈ B(M) with Q(∂A) = 0, and put f = 1A as well a

f∗ = supg : g lower semi-continuous, g ≤ f,f ∗ = infh : h upper semi-continuous, h ≥ f.


Then f∗ is lower semi-continuous, f ∗ is upper semi-continuous, and f∗ ≤ f ≤ f ∗ with

equality Q-a.s., since Q(∂A) = 0. From the proof of ‘(iii) ⇒ (i)’ we get∫f∗ dQ ≤ lim inf

n→∞

∫f∗ dQn ≤ lim sup

n→∞

∫f ∗ dQn ≤

∫f ∗ dQ.

Hence limn→∞∫f dQn =

∫f dQ.

‘(v) ⇒ (i)’: Ubung 3.4.

Lemma 6. Consider metric spaces M,M ′ and a continuous mapping π : M → M ′.

Then

Qnw−→ Q ⇒ πQn

w−→ πQ.

Proof. Note that f ∈ Cb(M ′) implies f π ∈ Cb(M). Employ Theorem A.5.12 and

the definition of weak convergence.

Weak Convergence of Measures on (R,B)

In the sequel, we study the particular case (M,B(M)) = (R,B), i.e., convergence in

distribution for random variables. The Central Limit Theorem deals with this notion

of convergence, see the introductory Example I.1 and Section III.5.

Notation: for any Q ∈M(R)

FQ(x) = Q(]−∞, x]), x ∈ R,

and for any function F : R→ R

Cont(F ) = x ∈ R : F continuous at x.

Theorem 7.

Qnw−→ Q ⇔ ∀x ∈ Cont(FQ) : lim

n→∞FQn(x) = FQ(x).

Moreover, if Qnw−→ Q and Cont(FQ) = R then

limn→∞

supx∈R|FQn(x)− FQ(x)| = 0.

Proof. ‘⇒’: If x ∈ Cont(FQ) and A = ]−∞, x] then Q(∂A) = Q(x) = 0, see

Theorem 1.9. Hence Theorem 5 implies

limn→∞

FQn(x) = limn→∞

Qn(A) = Q(A) = FQ(x).

‘⇐’: Consider a non-empty open set A ⊂ R. Take pairwise disjoint open intervals

A1, A2, . . . such that A =⋃∞i=1Ai. Fatou’s Lemma implies

lim infn→∞

Qn(A) = lim infn→∞

∞∑i=1

Qn(Ai) ≥∞∑i=1

lim infn→∞

Qn(Ai).


Note that R \ Cont(FQ) is countable. Fix ε > 0, and take

A′i = ]a′i, b′i] ⊂ Ai

for i ∈ N such that

a′i, b′i ∈ Cont(FQ) ∧ Q(Ai) ≤ Q(A′i) + ε · 2−i.

Then

lim infn→∞

Qn(Ai) ≥ lim infn→∞

Qn(A′i) = Q(A′i) ≥ Q(Ai)− ε · 2−i.

We conclude that

lim infn→∞

Qn(A) ≥ Q(A)− ε,

and therefore Qnw−→ Q by Theorem 5.

Uniform convergence, see Ubung 1.3.

Corollary 8.

Qnw−→ Q ∧ Qn

w−→ Q ⇒ Q = Q.

Proof. By Theorem 7 FQ(x) = FQ(x) if x ∈ D = Cont(FQ) ∩ Cont(FQ). Since D

is dense in R and FQ as well as FQ are right-continuous, we get FQ = FQ. Apply

Theorem 1.10.

Given: random variables Xn, X on (Ω,A, P ) for n ∈ N.

Theorem 9.

XnP−→ X ⇒ Xn

d−→ X

and

Xnd−→ X ∧ X constant a.s. ⇒ Xn

P−→ X.

Proof. Assume XnP−→ X. For ε > 0 and x ∈ R

P (X ≤ x− ε)− P (|X −Xn| > ε)≤ P (X ≤ x− ε ∩ |X −Xn| ≤ ε)≤ P (Xn ≤ x)= P (Xn ≤ x ∩ X ≤ x+ ε) + P (Xn ≤ x ∩ X > x+ ε)≤ P (X ≤ x+ ε) + P (|X −Xn| > ε).

Thus

FX(x− ε) ≤ lim infn→∞

FXn(x) ≤ lim supn→∞

FXn(x) ≤ FX(x+ ε).

For x ∈ Cont(FX) we get limn→∞ FXn(x) = FX(x). Apply Theorem 7.

Now, assume that Xnd−→ X and PX = εx. Let ε > 0 and take f ∈ Cb(R) such that

f ≥ 0, f(x) = 0, and f(y) = 1 if |x− y| ≥ ε. Then

P (|X −Xn| > ε) = P (|x−Xn| > ε) =

∫1R\[x−ε,x+ε] dPXn ≤

∫f dPXn

and

limn→∞

∫f dPXn =

∫f dPX = 0.


Example 10. Consider the uniform distribution P on Ω = 0, 1. Put

Xn(ω) = ω, X(ω) = 1− ω.

Then PXn = PX and therefore

Xnd−→ X.

However, |Xn −X| < 1 = ∅ and therefore

XnP−→ X does not hold.

Theorem 11 (Skorohod). There exists a probability space (Ω,A, P ) with the follow-

ing property. If

Qnw−→ Q,

then there exist Xn, X ∈ Z(Ω,A) for n ∈ N such that

∀n ∈ N : Qn = PXn ∧ Q = PX ∧ XnP -a.s.−→ X.

Proof. Take Ω = ]0, 1[, A = B(Ω), and consider the uniform distribution P on Ω.

Define

XQ(ω) = infz ∈ R : ω ≤ FQ(z), ω ∈ ]0, 1[ ,

for any Q ∈M(R). Since XQ is non-decreasing, we have XQ ∈ Z(Ω,A). Furthermore,

PXQ= Q, (1)

see Ubung 2.4.

Assuming Qnw−→ Q we define Xn = XQn and X = XQ. Since X is non-decreasing,

we conclude that Ω \ Cont(X) is countable. Thus it suffices to show

∀ω ∈ Cont(X) : limn→∞

Xn(ω) = X(ω).

Let ω ∈ Cont(X) and ε > 0. Put x = X(ω) and take xi ∈ Cont(FQ) such that

x− ε < x1 < x < x2 < x+ ε.

Hence

FQ(x1) < ω < FQ(x2).

By assumption there exists n0 ∈ N such that

FQn(x1) < ω < FQn(x2)

for n ≥ n0. Hence Xn(ω) ∈ ]x1, x2], i.e. |Xn(ω)−X(ω)| < ε.


Remark 12. By (1) we have a general method to transform uniformly distributed

‘random numbers’ from ]0, 1[ into ‘random numbers’ with distribution Q.

Remark 13.

(i) Put

C(r) = f : R→ R : f, f (1), . . . , f (r) bounded, f (r) uniformly continuous.

Then

Qnw−→ Q ⇔ ∃ r ∈ N0 ∀ f ∈ C(r) : lim

n→∞

∫f dQn =

∫f dQ,

see Ganssler, Stute (1977, p. 66).

(ii) The Levy distance

d(Q,R) = infh ∈ ]0,∞[ : ∀x ∈ R : FQ(x− h)− h ≤ FR(x) ≤ FQ(x+ h) + h

defines a metric on M(R), and

Qnw−→ Q ⇔ lim

n→∞d(Qn, Q) = 0,

see Chow, Teicher (1978, Thm. 8.1.3).

(iii) Suppose that (M,ρ) is a complete separable metric space. Then there exists a

metric d on M(M) such that (M(M), d) is complete and separable as well, and

Qnw−→ Q ⇔ lim

n→∞d(Qn, Q) = 0,

see Parthasarathy (1967, Sec. II.6) or Klenke (2013, Abschn. 13.2) and cf.

Ubung 3.3.

Compactness

Finally, we present a compactness criterion, which is very useful for construction of

probability measures on B(M).

Lemma 14. Let xn,` ∈ R for n, ` ∈ N with

∀ ` ∈ N : supn∈N|xn,`| <∞.

Then there exists an increasing sequence (ni)i∈N in N such that

∀ ` ∈ N : (xni,`)i∈N converges.

Proof. See Billingsley (1979, Thm. 25.13).


Definition 15.

(i) P ⊂M(M) tight if

∀ ε > 0 ∃K ⊂M compact ∀P ∈ P : P (K) ≥ 1− ε.

(ii) P ⊂ M(M) relatively compact if every sequence in P contains a subsequence

that converges weakly.

Theorem 16 (Prohorov). Assume that M is a complete separable metric space and

P ⊂M(M). Then

P relatively compact ⇔ P tight.

Proof. See Parthasarathy (1967, Thm. II.6.7). Here: M = R.

‘⇒’: Suppose that P is not tight. Then, for some ε > 0, there exists a sequence

(Pn)n∈N in P such that

Pn([−n, n]) < 1− ε.

For a suitable subsequence, Pnk

w−→ P ∈M(R). Take m > 0 such that

P (]−m,m[) > 1− ε.

Theorem 5 implies

P (]−m,m[) ≤ lim infk→∞

Pnk(]−m,m[) ≤ lim inf

k→∞Pnk

([−nk, nk]) ≤ 1− ε,

which is a contradiction.

‘⇐’: Consider any sequence (Pn)n∈N in P and the corresponding sequence (Fn)n∈Nof distribution functions. Use Lemma 14 to obtain a subsequence (Fni

)i∈N and a

non-decreasing function G : Q→ [0, 1] with

∀ q ∈ Q : limi→∞

Fni(q) = G(q).

Put

F (x) = infG(q) : q ∈ Q ∧ x < q, x ∈ R.

Claim (Helly’s Theorem):

(i) F is non-decreasing and right-continuous,

(ii) ∀x ∈ Cont(F ) : limi→∞ Fni(x) = F (x).

Proof: Ad (i): Obviously F is non-decreasing. For x ∈ R and ε > 0 take δ2 > 0 such

that

∀ q ∈ Q ∩ ]x, x+ δ2[ : G(q) ≤ F (x) + ε.

Thus, for z ∈ ]x, x+ δ2[,

F (x) ≤ F (z) ≤ F (x) + ε.

II.4. Uniform Integrability 18

Ad (ii): If x ∈ Cont(F ) and ε > 0 take δ1 > 0 such that

F (x)− ε ≤ F (x− δ1).

Thus, for q1, q2 ∈ Q with

x− δ1 < q1 < x < q2 < x+ δ2,

we get

F (x)− ε ≤ F (x− δ1) ≤ G(q1) ≤ lim infi→∞

Fni(x) ≤ lim sup

i→∞Fni

(x)

≤ G(q2) ≤ F (x) + ε.

Claim:

limx→−∞

F (x) = 0 ∧ limx→∞

F (x) = 1.

Proof: For ε > 0 take m ∈ Q such that

∀n ∈ N : Pn(]−m,m]) ≥ 1− ε.

Thus

G(m)−G(−m) = limi→∞

(Fni

(m)− Fni(−m)

)= lim

i→∞Pni

(]−m,m]) ≥ 1− ε.

Since F (m) ≥ G(m) and F (−m− 1) ≤ G(−m), we obtain

F (m)− F (−m− 1) ≥ 1− ε.

It remains to apply Theorems 1.10 and 7.

4 Uniform Integrability

In the sequel: Xn, X random variables on a common probability space (Ω,A, P ).

Definition 1. (Xn)n∈N uniformly integrable (u.i.) if

limα→∞

supn∈N

∫|Xn|≥α

|Xn| dP = 0.

Remark 2.

(i) (Xn)n∈N u.i. ⇒ (∀n ∈ N : Xn ∈ L1) ∧ supn∈N ‖Xn‖1 <∞.

(ii) ∃Y ∈ L1 ∀n ∈ N : |Xn| ≤ Y ⇒ (Xn)n∈N u.i.

(iii) ∃ p > 1 (∀n ∈ N : Xn ∈ Lp) ∧ supn∈N ‖Xn‖p <∞⇒ (Xn)n∈N u.i.

Proof:∫|Xn|≥α |Xn| dP = 1/αp−1 ·

∫|Xn|≥α α

p−1|Xn| dP ≤ 1/αp−1 · ‖Xn‖pp.


Example 3. For the uniform distribution P on [0, 1] and

Xn = n · 1[0,1/n]

we have Xn ∈ L1 and ‖Xn‖1 = 1, but for any α > 0 and n ≥ α∫|Xn|≥α

|Xn| dP = n · P ([0, 1/n]) = 1,

so that (Xn)n∈N is not u.i.

Lemma 4. (Xn)n∈N u.i. iff

supn∈N

E(|Xn|) <∞ (1)

and

∀ ε > 0 ∃ δ > 0 ∀A ∈ A :

(P (A) < δ ⇒ sup

n∈N

∫A

|Xn| dP < ε

). (2)

Proof. ‘⇒’: For (1), see Remark 2.(i). Moreover,∫A

|Xn| dP =

∫A∩|Xn|≥α

|Xn| dP +

∫A∩|Xn|<α

|Xn| dP

≤∫|Xn|≥α

|Xn| dP + α · P (A).

For ε > 0 take α > 0 with

supn∈N

∫|Xn|≥α

|Xn| dP < ε/2

and δ = ε/(2α) to obtain (2).

‘⇐’: Put M = supn∈N E(|Xn|). Theorem 2.2 yields P (|Xn| ≥ α) ≤ M/α. Let

ε > 0, take δ > 0 according to (2) to obtain for α > M/δ

supn∈N

∫|Xn|≥α

|Xn| dP < ε.

Theorem 5. Let 1 ≤ p <∞, and assume Xn ∈ Lp for every n ∈ N. Then

(Xn)n∈N converges in Lp

iff

(Xn)n∈N converges in probability ∧ (|Xn|p)n∈N is u.i.

Proof. ‘⇒’: Assume XnLp

−→ X. It follows that (Xn)n∈N is bounded in Lp, and from

Remark 2.5 we get XnP−→ X. Observe that

‖1A ·Xn‖p ≤ ‖1A · (Xn −X)‖p + ‖1A ·X‖p


for every A ∈ A. Let ε > 0, take k ∈ N such that

supn>k‖Xn −X‖p < ε. (3)

By Remark 2.(ii),

(|X1 −X|p, . . . , |Xk −X|p, |X|p, |X|p, . . . ) u.i.

By Lemma 4

P (A) < δ ⇒(

sup1≤n≤k

‖1A · (Xn −X)‖p < ε ∧ ‖1A ·X‖p < ε

)for a suitable δ > 0. Together with (3) this implies

P (A) < δ ⇒ supn∈N‖1A ·Xn‖p < 2 · ε.

‘⇐’: Let ε > 0, put A = Am,n = |Xm −Xn| > ε. Then

‖Xm −Xn‖p ≤ ‖1A · (Xm −Xn)‖p + ‖1Ac · (Xm −Xn)‖p≤ ‖1A ·Xm‖p + ‖1A ·Xn‖p + ε.

By assumption XnP−→ X for some X ∈ Z(Ω,A). Take δ > 0 according to (2) for

(|Xn|p)n∈N, and note that

Am,n ⊂ |Xm −X| > ε/2 ∪ |Xn −X| > ε/2.

Hence, for m, n sufficiently large,

P (Am,n) < δ,

which implies

‖Xm −Xn‖p ≤ 2 · ε1/p + ε.

Apply Theorem A.6.7.

Remark 6.

(i) Theorem 5 yields a generalization of Lebesgue’s convergence theorem:

If Xn ∈ L1 for every n ∈ N and XnP−→ X, then

(Xn)n∈N u.i. ⇔ X ∈ L1 ∧XnL1

−→ X.

(ii) Uniform integrability is a property of the distributions only.

II.5. Kernels and Product Measures 21

Convergence of Expectations

Theorem 7.

Xnd−→ X ⇒ E(|X|) ≤ lim inf

n→∞E(|Xn|).

Proof. From Skorohod’s Theorem 3.11 we get a probability space (Ω, A, P ) with ran-

dom variables Xn, X such that

XnP -a.s.−→ X ∧ PXn

= PXn ∧ PX = PX .

Thus E(|X|) = E(|X|) and E(|Xn|) = E(|Xn|). Apply Fatou’s Lemma A.5.5.

Theorem 8. If

Xnd−→ X ∧ (Xn)n∈N u.i.

then

X ∈ L1 ∧ limn→∞

E(Xn) = E(X).

Proof. Notation as previously. Now (|Xn|)n∈N is u.i., see Remark 6.(ii). Hence, by

Remark 6.(i), X ∈ L1 and XnL1

−→ X. Thus E(|X|) <∞ and

limn→∞

E(Xn) = limn→∞

E(Xn) = E(X) = E(X).

Example 9. Example 3 continued. With X = 0 we have XnP -a.s.−→ X, and therefore

Xnd−→ X. But E(Xn) = 1 > 0 = E(X).

5 Kernels and Product Measures

Recall the construction and properties of product (probability) measures from Sec-

tion V.2 in the Lecture Notes on Maß- und Integrationstheorie (2016).

Two-Stage Experiments

Given: measurable spaces (Ω1,A1) and (Ω2,A2).

Motivation: two-stage experiment. Output ω1 ∈ Ω1 of the first stage determines

probabilistic model for the second stage.

Example 1. Choose one out of n coins and throw it once. Parameters a1, . . . , an ≥ 0

such that∑n

i=1 ai = 1 and b1, . . . , bn ∈ [0, 1].

Let

Ω1 = 1, . . . , n, A1 = P(Ω1)

and define

µ =n∑i=1

ai · εi,


i.e., ai = µ(i) is the probability of choosing the i-th coin. Moreover, let

Ω2 = H,T, A2 = P(Ω2)

and define

K(i, ·) = bi · εH + (1− bi) · εT,

i.e., bi = K(i, H) is the probability for obtaining H when throwing the i-th coin.

Definition 2. K : Ω1 × A2 → R is a Markov (transition) kernel (from (Ω1,A1) to

(Ω2,A2)), if

(i) K(ω1, ·) is a probability measure on A2 for every ω1 ∈ Ω1,

(ii) K(·, A2) is A1-B-measurable for every A2 ∈ A2.

Example 3. Extremal cases, non-disjoint.

(i) Model for the second stage not influenced by output of the first stage, i.e., for a

probability measure ν on A2

∀ω1 ∈ Ω1 : K(ω1, ·) = ν.

In Example 1 this means b1 = · · · = bn.

(ii) Output of the first stage determines the output of the second stage, i.e., for a

A1-A2-measurable mapping f : Ω1 → Ω2

∀ω1 ∈ Ω1 : K(ω1, ·) = εf(ω1).

In Example 1 this means b1, . . . , bn ∈ 0, 1.

Given:

• a probability measure µ on A1 and

• a Markov kernel K from (Ω1,A1) to (Ω2,A2).

Question: stochastic model (Ω,A, P ) for the compound experiment? Reasonable, and

assumed in the sequel,

Ω = Ω1 × Ω2, A = A1 ⊗ A2.

How to define P?

Example 4. In Example 1, a reasonable requirement for P is

P (A1 × Ω2) = µ(A1), A1 ⊂ Ω1,

and

P (i × A2) = K(i, A2) · µ(i), A2 ⊂ Ω2.


Consequently, for A ⊂ Ω

P (A) =n∑i=1

P ((ω1, ω2) ∈ A : ω1 = i) =n∑i=1

P (i × ω2 ∈ Ω2 : (i, ω2) ∈ A)

=n∑i=1

K(i, ω2 ∈ Ω2 : (i, ω2) ∈ A) · µ(i)

=

∫Ω1

K(ω1, ω2 ∈ Ω2 : (ω1, ω2) ∈ A)µ(dω1).

May we generally use the right-hand side integral for the definition of P?

Lemma 5. Let f ∈ Z(Ω,A). Then, for ω1 ∈ Ω1, the ω1-section

f(ω1, ·) : Ω2 → R

of f is A2-B-measurable, and for ω2 ∈ Ω2 the ω2-section

f(·, ω2) : Ω1 → R

of f is A1-B-measurable.

Proof. In the case of an ω1-section. Fix ω1 ∈ Ω1. Then Ω2 → Ω1 ×Ω2 : ω2 7→ (ω1, ω2)

is A2-A-measurable due to Theorem A.8.5.(i). Apply Theorem A.1.7.

Remark 6. In particular, for A ∈ A and f = 1A

f(ω1, ·) = 1A(ω1, ·) = 1A(ω1)

where1

A(ω1) = ω2 ∈ Ω2 : (ω1, ω2) ∈ A

is the ω1-section of A. By Lemma 5

∀ω1 ∈ Ω1 : A(ω1) ∈ A2.

Analogously for the ω2-section

A(ω2) = ω1 ∈ Ω1 : (ω1, ω2) ∈ A

of A.

Lemma 7. Let f ∈ Z+(Ω,A). Then

g : Ω1 → [0,∞[ ∪ ∞, ω1 7→∫

Ω2

f(ω1, ω2)K(ω1, dω2)

is A1-B([0,∞])-measurable.

1poor notation


Proof. Let F denote the set of all functions f ∈ Z+(Ω,A) with the measurability

property as claimed. We show that

∀A1 ∈ A1, A2 ∈ A2 : 1A1×A2 ∈ F. (1)

Indeed, ∫Ω2

1A1×A2(ω1, ω2)K(ω1, dω2) = 1A1(ω1)K(ω1, A2).

Furthermore, we show that

∀A ∈ A : 1A ∈ F. (2)

To this end let

D = A ∈ A : 1A ∈ Fand

E = A1 × A2 : A1 ∈ A1 ∧ A2 ∈ A2.Then E ⊂ D by (1), E is closed w.r.t. intersections, and σ(E) = A. It easily follows

that D is a Dynkin class. Hence Theorem A.7.4 yields

A = σ(E) = δ(E) ⊂ D ⊂ A,

which implies (2). From Theorems A.4.6 and A.5.9.(iv) we get

f1, f2 ∈ F ∧ α ∈ [0,∞[ ⇒ αf1 + f2 ∈ F. (3)

Finally, Theorem A.5.3 and Theorem A.4.4.(iii) imply that

fn ∈ F ∧ fn ↑ f ⇒ f ∈ F. (4)

Use Theorem A.4.8 together with (2)–(4) to conclude that F = Z+.

Theorem 8.

∃1

probability measure µ×K on A ∀A1 ∈ A1 ∀A2 ∈ A2 :

µ×K(A1 × A2) =∫A1K(ω1, A2)µ(dω1). (5)

Moreover,

∀A ∈ A : µ×K(A) =

∫Ω1

K(ω1, A(ω1))µ(dω1). (6)

Proof. ‘Existence’: For A ∈ A and ω1 ∈ Ω1

K(ω1, A(ω1)) =

∫Ω2

1A(ω1)(ω2)K(ω1, dω2) =

∫Ω2

1A(ω1, ω2)K(ω1, dω2).

According to Lemma 5.7 µ × K is well-defined via (6). Using Theorem A.5.3, it is

easy to verify that µ×K is a probability measure on A.

For A1 ∈ A1 and A2 ∈ A2

K(ω1, (A1 × A2)(ω1)) =

K(ω1, A2) if ω1 ∈ A1

0 otherwise.

Hence µ×K satisfies (5).

‘Uniqueness’: Apply Theorem A.1.5 with A0 = A1 × A2 : Ai ∈ Ai.


Example 9. In Example 4 we have P = µ×K.

Theorem 10 (Fubini’s Theorem).

(i) For f ∈ Z+(Ω,A)∫Ω

f d(µ×K) =

∫Ω1

∫Ω2

f(ω1, ω2)K(ω1, dω2)µ(dω1).

(ii) For f (µ×K)-integrable and

A1 = ω1 ∈ Ω1 : f(ω1, ·) K(ω1, ·)-integrable

we have

(a) A1 ∈ A1 and µ(A1) = 1,

(b) A1 → R : ω1 7→∫

Ω2f(ω1, ·) dK(ω1, ·) is integrable w.r.t. µ|A1∩A1 ,

(c) ∫Ω

f d(µ×K) =

∫A1

∫Ω2

f(ω1, ω2)K(ω1, dω2)µ|A1∩A1(dω1).

Proof. Ad (i): algebraic induction. Ad (ii): consider f+ and f− and use (i).

Remark 11. For brevity, we write∫Ω1

∫Ω2

f(ω1, ω2)K(ω1, dω2)µ(dω1) =

∫A1

∫Ω2

f(ω1, ω2)K(ω1, dω2)µ|A1∩A1(dω1),

if f is (µ×K)-integrable. For f ∈ Z(Ω,A)

f is (µ×K)-integrable ⇔∫

Ω1

∫Ω2

|f |(ω1, ω2)K(ω1, dω2)µ(dω1) <∞.

Multi-Stage Experiments

Now we construct a stochastic model for a series of experiments, where the outputs

of the first i− 1 stages determine the model for the ith stage.

Given:

• measurable spaces (Ωi,Ai) for i ∈ I, where I = 1, . . . , n or I = N.

Put (Ω′i,A

′i

)=

( i×j=1

Ωj,i⊗

j=1

Aj

),

and note that

i×j=1

Ωj = Ω′i−1 × Ωi ∧i⊗

j=1

Aj = A′i−1 ⊗ Ai


for i ∈ I \ 1. Furthermore, let

Ω =×i∈I

Ωi, A =⊗i∈I

Ai. (7)

Given:

• a probability measure µ on A1,

• Markov kernels Ki from(Ω′i−1,A

′i−1

)to (Ωi,Ai) for i ∈ I \ 1.

Theorem 12. For I = 1, . . . , n

∃1

probability measure ν on A ∀A1 ∈ A1 . . . ∀An ∈ An :

ν(A1 × · · · × An)

=

∫A1

. . .

∫An−1

Kn((ω1, . . . , ωn−1), An)Kn−1((ω1, . . . , ωn−2), dωn−1) · · ·µ(dω1).

Moreover, for f ν-integrable (the short version)∫Ω

f dν =

∫Ω1

. . .

∫Ωn

f(ω1, . . . , ωn)Kn((ω1, . . . , ωn−1), dωn) · · ·µ(dω1). (8)

Notation: ν = µ×K2 × · · · ×Kn.

Proof. Induction, using Theorems 8 and 10.

Remark 13. Particular case of Theorem 12 with

µ = P1, ∀ i ∈ I \ 1 ∀ω′i−1 ∈ Ω′i−1 : Ki(ω′i−1, ·) = Pi (9)

for probability measures Pi on Ai:

∃1

probability measure P1 × · · · × Pn on A ∀A1 ∈ A1 . . . ∀An ∈ An :

P1 × · · · × Pn(A1 × · · · × An) = P1(A1) · · · · · Pn(An).

Moreover, for every P1 × · · · × Pn-integrable function f , Fubini’s Theorem reads∫Ω

f d(P1 × · · · × Pn) =

∫Ω1

. . .

∫Ωn

f(ω1, . . . , ωn) Pn(dωn) · · ·P1(dω1).

Analogously for any other order of integration.

Definition 14. P1×· · ·×Pn is called the product probability measure corresponding to

Pi for i = 1, . . . , n, and (Ω,A, µ) is called the product probability space corresponding

to (Ωi,Ai, Pi) for i = 1, . . . , n.

Example 15.

(i) In Example 4 with b = b1 = · · · = bn and ν = b · εH + (1 − b) · εT we have

P = µ× ν.


(ii) For countable spaces Ωi and σ-algebras Ai = P(Ωi) we get

P1 × P2(A) =∑ω1∈Ω1

P2(A(ω1)) · P1(ω1), A ⊂ Ω.

(iii) For uniform distributions Pi on finite spaces Ωi, P1 × · · · × Pn is the uniform

distribution on Ω.

Remark 16. Theorems 8, 10, and 12 and Remark 13 extend to σ-finite measures µ

instead of probability measures and σ-finite kernels K or Ki, resp., instead of Markov

kernels, with the resulting measure on A being σ-finite, too. Recall that µ is σ-finite,

if

∃A1,1, A1,2, . . . ∈ A1 pairwise disjoint : Ω1 =∞⋃i=1

A1,i ∧ ∀ i ∈ N : µ(A1,i) <∞.

By definition, a mapping K : Ω1 ×A2 → R is a kernel , if K(ω1, ·) is a measure on A2

for every ω1 ∈ Ω1 and if K(·, A2) is A1-R-measurable for every A2 ∈ A2. Moreover,

K is a σ-finite kernel if, additionally,

∃A2,1, A2,2, . . . ∈ A2 pairwise disjoint :

Ω2 =∞⋃i=1

A2,i ∧ ∀ i ∈ N : supω1∈Ω1

K(ω1, A2,i) <∞.

Example 17.

(i) For every integrable random variable X on any probability space (Ω,A, P ) with

X ≥ 0,

E(X) =

∫]0,∞[

(1− FX(u))λ1(du),

see Ubung 7.4 in Maß- und Integrationstheorie (2016).

(ii) For the Lebesgue measure

λn = λ1 × · · · × λ1.

Theorem 18 (Ionescu-Tulcea). For I = N,

∃1

probability measure P on A ∀n ∈ N ∀A1 ∈ A1 . . . ∀An ∈ An :

P(A1 × · · · × An ×

∞×i=n+1

Ωi

)= (µ×K2 × · · · ×Kn)(A1 × · · · × An). (10)

Proof. ‘Existence’: Consider the σ-algebras

n⊗i=1

Ai, An = σ(π1,...,n

)


in×ni=1 Ωi and×∞i=1 Ωi, respectively. Define a probability measure Pn on An by

Pn

(A×

∞×i=n+1

Ωi

)= (µ×K2 × · · · ×Kn)(A), A ∈

n⊗i=1

Ai.

Then (8) yields the following consistency property

Pn+1

(A× Ωn+1 ×

∞×i=n+2

Ωi

)= Pn

(A×

∞×i=n+1

Ωi

), A ∈

n⊗i=1

Ai.

Thus

P (A) = Pn(A), A ∈ An,

yields a well-defined mapping on the algebra

A =⋃n∈N

An

of cylinder sets. Obviously, P is a content with P (×∞i=1 Ωi) = 1 and (10) holds for

P = P .

Claim: P is σ-continuous at ∅.It suffices to show that for every sequence of sets

A(n) = B(n) ×∞×

i=n+1

Ωi

with B(n) ∈⊗n

i=1 Ai and A(n) ↓ ∅ we have

limn→∞

(µ×K2 × . . .×Kn)(B(n)) = 0.

Assume the contrary, i.e., infn∈N(µ×K2 × . . .×Kn)(B(n)) > 0.

Observe that

B(n) × Ωn+1 ⊃ B(n+1). (11)

Recursively, we define ω∗ ∈ Ω such that

∀n ∈ N : B(n+1)(ω∗1, . . . , ω∗n) 6= ∅, (12)

which implies (ω∗1, . . . , ω∗n) ∈ B(n) for every n ∈ N, see (11). Consequently,

ω∗ ∈∞⋂n=1

A(n),

contradicting A(n) ↓ ∅.Put

ωn = (ω1, . . . , ωn)

for n ≥ 1 and ωi ∈ Ωi. Consider the probability measure Kn+1(ωn, ·) on An+1 as well

as the Markov kernels

Kn+m((ωn, ·), ·) :n+m−1×i=n+1

Ωi × An+m → R


for m ≥ 2. By

Qn,m(ωn, ·) = Kn+1(ωn, ·)× · · · ×Kn+m((ωn, ·), ·)

we obtain a probability measures on⊗n+m

i=n+1 Ai for m ≥ 1; actually Qn,m is a Markov

kernel. Finally, we put

fn,m(ωn) = Qn,m

(ωn, B

(n+m)(ωn))

for n,m ≥ 1.

Recursively, we define ω∗ ∈ Ω such that

∀n ∈ N : infm≥1

fn,m(ω∗1, . . . , ω∗n) > 0. (13)

Take m = 1 in (13) to obtain (12).

Due to (11) we have Bn+m(ωn)× Ωn+m+1 ⊃ Bn+m+1(ωn), and therefore

fn,m(ωn) = Qn,m+1

(ωn, B

n+m(ωn)× Ωn+m+1

)≥ fn,m+1(ωn).

Furthermore, 0 ≤ fn,m ≤ 1 and

fn,m(ωn) =

∫×n+m

i=n+1 Ωi

1B(n+m)(ωn, ωn+1, . . . , ωn+m)Qn,m(ωn, d(ωn+1, . . . , ωn+m)).

In particular, fn,m is⊗n

i=1 Ai–B measurable.

For n = 1 ∫Ω1

f1,m(ω1)µ(dω1) = (µ×K2 × . . .×K1+m)(B(1+m)).

Therefore∫Ω1

infm≥1

f1,m(ω1)µ(dω1) = infm≥1

(µ×K2 × . . .×K1+m)(B(1+m)) > 0,

which yields

∃ω∗1 ∈ Ω1 : infm≥1

f1,m(ω∗1) > 0.

For n ≥ 1 ∫Ωn+1

fn+1,m(ωn, ωn+1)Kn+1(ωn, dωn+1) = fn,m+1(ωn).

Therefore ∫Ωn+1

infm≥1

fn+1,m(ωn, ωn+1)Kn+1(ωn, dωn+1) = infm≥1

fn,m+1(ωn).

Proceed inductively to establish (13).

By Theorem A.9.3, P is σ-additive and it remains to apply Theorem A.9.4.

‘Uniqueness’: By (10), P is uniquely determined on the class of measurable rectangles.

Apply Theorem A.1.5.

Outlook: the probabilistic method.

Example 19. The queuing model, see Ubung 5.3. Here Ki((ω1, . . . , ωi−1), ·) only

depends on ωi−1. See also Ubung 5.1.

Outlook: Markov processes.

II.6. Independence 30

Product Measures – The General Case

Given: a non-empty set I and probability spaces (Ωi,Ai, Pi) for i ∈ I. Recall the

definition (7). Put

P0(I) = J ⊂ I : J non-empty, finite.

Theorem 20.

∃1

probability measure P on A ∀S ∈ P0(I) ∀Ai ∈ Ai, i ∈ S :

P(×i∈S

Ai ××i∈I\S

Ωi

)=∏i∈S

Pi(Ai). (14)

Notation: P =×i∈I Pi.

Proof. See Remark 13 in the case of a finite set I.

If |I| = |N|, assume I = N without loss of generality. The particular case of Theo-

rem 18 with (9) for probability measures Pi on Ai shows

∃1

probability measure P on A ∀n ∈ N ∀A1 ∈ A1 . . . ∀An ∈ An :

P(A1 × · · · × An ×

∞×i=n+1

Ωi

)= P1(A1) · · · · · Pn(An).

If I is uncountable, we use Theorem A.8.7. For S ⊂ I non-empty and countable and

for B ∈⊗

i∈S Ai we put

P((πIS)−1

B) =×i∈S

Pi(B).

Hereby we get a well-defined mapping P : A → R, which clearly is a probability

measure and satisfies (14). Use Theorem A.1.5 to obtain the uniqueness result.

Definition 21. P = ×i∈I Pi is called the product measure corresponding to Pi for

i ∈ I, and (Ω,A, P ) is called the product measure space corresponding to (Ωi,Ai, Pi)

for i ∈ I.

6 Independence

‘. . . the concept of independence . . . plays a central role in probability theory; it is precisely

this concept that distinguishes probability theory from the general theory of measure spaces’,

see Shiryayev (1984, p. 27).

In the sequel, (Ω,A, P ) denotes a probability space and I is a non-empty set.

Independence of Events

Definition 1. Let Ai ∈ A for i ∈ I. Then (Ai)i∈I is independent if

P

(⋂i∈S

Ai

)=∏i∈S

P (Ai) (1)

for every S ∈ P0(I). Elementary case: |I| = 2.


In the sequel, Ei ⊂ A for i ∈ I.

Definition 2. (Ei)i∈I is independent if (1) holds for every S ∈ P0(I) and all Ai ∈ Eifor i ∈ S.

Remark 3.

(i) (Ei)i∈I independent ∧ ∀ i ∈ I : Ei ⊂ Ei ⇒ (Ei)i∈I independent.

(ii) (Ei)i∈I independent ⇔ ∀S ∈ P0(I) : (Ei)i∈S independent.

Lemma 4.

(Ei)i∈I independent ⇒ (δ(Ei))i∈I independent.

Proof. Without loss of generality, I = 1, . . . , n and n ≥ 2, see Remark 3.(ii). Put

D1 = A ∈ δ(E1) : (A,E2, . . . ,En) independent.

Then D1 is a Dynkin class and E1 ⊂ D1, hence δ(E1) = D1. Thus

(δ(E1),E2, . . . ,En) independent.

Repeat this step for 2, . . . , n.

Theorem 5. If

(Ei)i∈I independent ∧ ∀ i ∈ I : Ei closed w.r.t. intersections (2)

then

(σ(Ei))i∈I independent.

Proof. Use Theorem A.7.4 and Lemma 4.

Corollary 6. Assume that I =⋃j∈J Ij for pairwise disjoint sets Ij 6= ∅. If (2) holds,

then (σ

( ⋃i∈Ij

Ei

))j∈J

independent.

Proof. Let

Ej =

⋂i∈S

Ai : S ∈ P0(Ij) ∧ Ai ∈ Ei for i ∈ S.

Then Ej is closed w.r.t. intersections and(Ej)j∈J is independent. Finally,

σ

( ⋃i∈Ij

Ei

)= σ(Ej).


Independence of Random Elements

In the sequel, (Ωi,Ai) denotes a measurable space for i ∈ I, and Xi : Ω → Ωi is

A-Ai-measurable for i ∈ I.

Definition 7. (Xi)i∈I is independent if (σ(Xi))i∈I is independent.

Remark 8. See Ubung 6.2 for a characterization of independence in terms of predic-

tion problems.

Example 9. Actually, the essence of independence. Assume that

(Ω,A, P ) =

(×i∈I

Ωi,⊗i∈I

Ai,×i∈I

Pi

)for probability measures Pi on Ai. Let

Xi = πi.

Then, for S ∈ P0(I) and Ai ∈ Ai for i ∈ S

P

(⋂i∈S

Xi ∈ Ai)

= P

(×i∈S

Ai ××i∈I\S

Ωi

)=∏i∈S

Pi(Ai) =∏i∈S

P (Xi ∈ Ai).

Hence (Xi)i∈I is independent. Furthermore, PXi= Pi.

Theorem 10. Given: probability spaces (Ωi,Ai, Pi) for i ∈ I. Then there exist

(i) a probability space (Ω,A, P ) and

(ii) A-Ai-measurable mappings Xi : Ω→ Ωi for i ∈ I

such that

(Xi)i∈I independent ∧ ∀ i ∈ I : PXi= Pi.

Proof. See Example 9.

Theorem 11. Let Fi ⊂ Ai for i ∈ I. If

∀ i ∈ I : σ(Fi) = Ai ∧ Fi closed w.r.t. intersections

then

(Xi)i∈I independent ⇔ (X−1i (Fi))i∈I independent.

Proof. By Lemma II.3.6 in Maß- und Integrationstheorie (2016) we have σ(Xi) =

X−1i (Ai) = σ(X−1

i (Fi)). ‘⇒’: See Remark 3.(i). ‘⇐’: Note that X−1i (Fi) is closed

w.r.t. intersections. Use Theorem 5.

Example 12. Independence of a family of random variables Xi, i.e., (Ωi,Ai) = (R,B)

for i ∈ I. In this case (Xi)i∈I is independent iff

∀S ∈ P0(I) ∀ ci ∈ R, i ∈ S : P

(⋂i∈S

Xi ≤ ci)

=∏i∈S

P (Xi ≤ ci).


Theorem 13. Let

(i) I =⋃j∈J Ij for pairwise disjoint sets Ij 6= ∅,

(ii) (Ωj, Aj) be measurable spaces for j ∈ J ,

(iii) fj : ×i∈Ij Ωi → Ωj be(⊗

i∈Ij Ai

)-Aj measurable mappings for j ∈ J .

Put

Yj = (Xi)i∈Ij : Ω→×i∈Ij

Ωi.

Then

(Xi)i∈I independent ⇒ (fj Yj)j∈J independent.

Proof.

σ(fj Yj) = Y −1j (f−1

j (Aj)) ⊂ Y −1j

(⊗i∈Ij

Ai

)= σ(Xi : i ∈ Ij) = σ

( ⋃i∈Ij

X−1i (Ai)

).

Use Corollary 6 and Remark 3.(i).

Example 14. For an independent sequence (Xi)i∈N of random variables(max(X1, X3), 1[0,∞[(X2), lim sup

n→∞1/n

n∑i=1

Xi

)are independent.

Independence and Product Measures

Remark 15. Consider the mapping

X : Ω→×i∈I

Ωi : ω 7→ (Xi(ω))i∈I .

Clearly X is A-⊗

i∈I Ai-measurable. By definition, PX(A) = P (X ∈ A) for A ∈⊗i∈I Ai. In particular, for measurable rectangles A ∈

⊗i∈I Ai, i.e.,

A =×i∈S

Ai ××i∈I\S

Ωi (3)

with S ∈ P0(I) and Ai ∈ Ai,

PX(A) = P

(⋂i∈S

Xi ∈ Ai). (4)


Definition 16.

(i) PX is called the joint distribution of the random elements Xi, i ∈ I.

(ii) Let P denote a probability measure on (×i∈I Ωi,⊗

i∈I Ai), and let i ∈ I. Then

Pπi is called a (one-dimensional) marginal distribution of P .

Example 17. Let Ω = 1, . . . , 62 and consider the uniform distribution P on A =

P(Ω), which is a model for rolling a die twice.

Moreover, let Ωi = N and Ai = P(Ωi) such that⊗2

i=1 Ai = P(N2). Consider the

random variables

X1(ω1, ω2) = ω1, X2(ω1, ω2) = ω1 + ω2.

Then

PX(A) =|A ∩M |

36, A ⊂ N2,

where

M = (k, `) ∈ N2 : 1 ≤ k ≤ 6 ∧ k + 1 ≤ ` ≤ k + 6

Claim: (X1, X2) are not independent. Proof:

P (X1 = 1 ∩ X2 = 3) = PX((1, 3)) = P ((1, 2)) = 1/36

but

P (X1 = 1) · P (X2 = 3) = 1/6 · P ((1, 2), (2, 1)) = 1/3 · 1/36.

We add that

PX1 =6∑

k=1

1/6 · εk, PX2 =12∑`=2

(6− |`− 7|)/36 · ε`.

Theorem 18.

(Xi)i∈I independent ⇔ PX =×i∈I

PXi.

Proof. For A given by (3)(×i∈I

PXi

)(A) =

∏i∈S

PXi(Ai) =

∏i∈S

P (Xi ∈ Ai).

On the other hand, we have (4). Thus ‘⇐’ hold trivially. Use Theorem A.1.5 to

obtain ‘⇒’.

In the sequel, we consider random variables Xi, i.e., (Ωi,Ai) = (R,B) for i ∈ I.

Theorem 19. Let I = 1, . . . , n. If

(X1, . . . , Xn) independent ∧ ∀ i ∈ I : Xi ≥ 0 (Xi integrable)

then (∏n

i=1Xi is integrable and)

E

( n∏i=1

Xi

)=

n∏i=1

E(Xi).


Proof. Use Fubini’s Theorem and Theorem 18 to obtain

E

(∣∣∣∣ n∏i=1

Xi

∣∣∣∣) =

∫Rn

|x1 · · · · · xn|P(X1,...,Xn)(d(x1, . . . , xn))

=

∫Rn

|x1 · · · · · xn| (PX1 × · · · × PXn)(d(x1, . . . , xn))

=n∏i=1

∫R|xi|PXi

(dxi) =n∏i=1

E(|Xi|).

Drop | · | if the random variables are integrable.

Uncorrelated Random Variables

Definition 20. X1, X2 ∈ L2 are uncorrelated if

E(X1 ·X2) = E(X1) · E(X2).

Theorem 21 (Bienayme). Let X1, . . . , Xn ∈ L2 be pairwise uncorrelated. Then

Var

( n∑i=1

Xi

)=

n∑i=1

Var(Xi).

Proof. We have

Var

( n∑i=1

Xi

)= E

(n∑i=1

(Xi − E(Xi))

)2

=n∑i=1

E(Xi − E(Xi))2 +

n∑i,j=1i 6=j

E((Xi − E(Xi)) · (Xj − E(Xj))).

Moreover,

E((Xi − E(Xi)) · (Xj − E(Xj))) = E(Xi ·Xj)− E(Xi) · E(Xj).

(The latter quantity is called the covariance between Xi and Xj.)

Convolutions

Definition 22. The convolution product of probability measures P1, . . . , Pn on B is

defined by

P1 ∗ · · · ∗ Pn = s(P1 × · · · × Pn)

where

s(x1, . . . , xn) = x1 + · · ·+ xn.

Theorem 23. Let (X1, . . . , Xn) be independent and S =∑n

i=1 Xi. Then

PS = PX1 ∗ · · · ∗ PXn .


Proof. Put X = (X1, . . . , Xn). Since S = s (X1, . . . , Xn) we get

PS = s(PX) = s(PX1 × · · · × PXn).

Remark 24. The class of probability measure on B forms an Abelian semi-group

w.r.t. ∗, and P ∗ ε0 = P .

Theorem 25. For all probability measures P1, P2 on B and every P1 ∗ P2-integrable

function f ∫Rf d(P1 ∗ P2) =

∫R

∫Rf(x+ y)P1(dx)P2(dy).

If P1 = h1 · λ1 then P1 ∗ P2 = h · λ1 with

h(x) =

∫Rh1(x− y)P2(dy).

If P2 = h2 · λ1, additionally, then

h(x) =

∫Rh1(x− y) · h2(y)λ(dy).

Proof. Use Fubini’s Theorem and the transformation theorem. See Billingsley (1979,

p. 230).

Example 26.

(i) Put N(µ, 0) = εµ. By Theorems 21 and 25

N(µ1, σ21) ∗N(µ2, σ

22) = N(µ1 + µ2, σ

21 + σ2

2)

for µi ∈ R and σi ≥ 0.

(ii) Consider n independent Bernoulli trials, i.e., (X1, . . . , Xn) independent with

PXi= p · ε1 + (1− p) · ε0

for every i ∈ 1, . . . , n, where p ∈ [0, 1]. Inductively, we get for k ∈ 1, . . . , n

k∑i=1

Xi ∼ B(k, p),

see also Ubung 4.4. Thus, for any n,m ∈ N,

B(n, p) ∗B(m, p) = B(n+m, p).

Chapter III

Limit Theorems

Given: a sequence (Xn)n∈N of random variables on a probability space (Ω,A, P ) and

weights 0 < an ↑ ∞.

Put

Sn =n∑i=1

Xi, n ∈ N0.

For instance, Sn might be the cumulative gain after n trials or (one of the coordinates

of) the position of a particle after n collisions.

Question: Convergence of Sn/an for suitable weights an in a suitable sense? Particular

case: an = n.

Definition 1. (Xn)n∈N independent and identically distributed (i.i.d.) if (Xn)n∈N is

independent and

∀n ∈ N : PXn = PX1 .

In this case (Sn)n∈N0 is called the associated random walk .

1 Zero-One Laws

Kolmogorov’s Zero-One Law

Definition 1. For σ-algebras An ⊂ A, n ∈ N, the corresponding tail σ-algebra is

A∞ =⋂n∈N

σ

( ⋃m≥n

Am

),

and A ∈ A∞ is called a tail (terminal) event .

Example 2. Let An = σ(Xn). Put C =⊗∞

i=1 B. Then

A∞ =⋂n∈N

σ(Xm : m ≥ n)

and

A ∈ A∞ ⇔ ∀n ∈ N ∃C ∈ C : A = (Xn, Xn+1, . . . ) ∈ C.

37

III.1. Zero-One Laws 38

For instance,

(Sn)n∈N converges, (Sn/an)n∈N converges ∈ A∞,

and the function lim infn→∞ Sn/an is A∞-B-measurable. However, Sn as well as

lim infn→∞ Sn are not A∞-B-measurable, in general. Analogously for the lim sup’s.

Theorem 3 (Kolmogorov’s Zero-One Law). Let (An)n∈N be an independent sequence

of σ-algebras An ⊂ A. Then

∀A ∈ A∞ : P (A) ∈ 0, 1 .

Proof. We show that A∞ and A∞ are independent (terminology), which implies

P (A) = P (A) · P (A) for every A ∈ A∞. Put

An = σ(A1 ∪ · · · ∪ An).

Note that A∞ ⊂ σ(An+1 ∪ . . . ). By Corollary II.6.6 and Remark II.6.3.(i)

An,A∞ independent,

and therefore⋃n∈N An and A∞ are independent, too. Thus, by Theorem II.6.5,

σ

( ⋃n∈N

An

),A∞ independent.

Finally,

A∞ ⊂ σ

( ⋃n∈N

An

)= σ

( ⋃n∈N

An

).

Corollary 4. Let X ∈ Z(Ω,A∞). Under the assumptions of Theorem 3, X is constant

P -a.s.

Remark 5. Assume that (Xn)n∈N is independent. Then

P ((Sn)n∈N converges), P ((Sn/an)n∈N converges) ∈ 0, 1.

In case of convergence P -a.s., limn→∞ Sn/an is constant P -a.s.

The Borel-Cantelli Lemma

Definition 6. Let An ∈ A for n ∈ N. Then

lim infn→∞

An =⋃n∈N

⋂m≥n

Am, lim supn→∞

An =⋂n∈N

⋃m≥n

Am.

Remark 7.

(i)(

lim infn→∞

An

)c= lim sup

n→∞Acn.

III.1. Zero-One Laws 39

(ii) P(

lim infn→∞

An

)≤ lim inf

n→∞P (An) ≤ lim sup

n→∞P (An) ≤ P

(lim supn→∞

An

).

(iii) If (An)n∈N is independent, then P(

lim supn→∞

An

)∈ 0, 1 (Borel’s Zero-One Law).

Proof: Ubung 6.3.

Theorem 8 (Borel-Cantelli Lemma). Let A = lim supn→∞An with An ∈ A.

(i) If∑∞

n=1 P (An) <∞ then P (A) = 0.

(ii) If∑∞

n=1 P (An) =∞ and (An)n∈N is independent, then P (A) = 1.

Proof. Ad (i):

P (A) ≤ P

( ⋃m≥n

Am

)≤

∞∑m=n

P (Am).

By assumption, the right-hand side tends to zero as n tends to ∞.

Ad (ii): We have

P (Ac) = P (lim infn→∞

Acn) ≤∞∑n=1

P

( ⋂m≥n

Acm

).

Use 1− x ≤ exp(−x) for x ≥ 0 to obtain

P

( ⋂m=n

Acm

)=∏m=n

(1− P (Am)) ≤∏m=n

exp(−P (Am)) = exp(−∑m=n

P (Am)).

By assumption, the right-hand side tends to zero as ` tends to∞. Thus P (Ac) = 0.

Example 9. A fair coin is tossed an infinite number of times. Determine the proba-

bility that 0 occurs twice in a row infinitely often. Model: (Xn)n∈N i.i.d. with

P (X1 = 0) = P (X1 = 1) = 1/2.

Put

An = Xn = Xn+1 = 0.

Then (A2n)n∈N is independent and P (A2n) = 1/4. Thus P (lim supn→∞An) = 1.

Remark 10. A stronger version of Theorem 8.(ii) requires only pairwise indepen-

dence, see Bauer (1996, p. 70).

Example 11. Let (Xn)n∈N be i.i.d. with

P (X1 = 1) = p = 1− P (X1 = −1)

for some constant p ∈ [0, 1]. Put An = σ(Xn) and

A = lim supn→∞

Sn = 0 ,

III.2. The Strong Law of Large Numbers 40

and note that

A /∈ A∞ =⋂n∈N

σ(Xm : m ≥ n)

at least in the standard setting from Example II.6.9. Clearly

1/2 · (Sn + n) ∼ B(n, p).

Use Stirling’s Formula

n! ≈(n

e

)n√2πn

to obtain

P (S2n = 0) =

(2n

n

)· pn · (1− p)n ≈ rn√

πn,

where r = 4p · (1− p) ∈ [0, 1].

Suppose that

p 6= 1/2.

Then r < 1, and therefore

∞∑n=0

P (Sn = 0) =∞∑n=0

P (S2n = 0) <∞.

The Borel-Cantelli Lemma implies

P (A) = 0.

Suppose that

p = 1/2.

Then∞∑n=0

P (Sn = 0) =∞∑n=0

P (S2n = 0) =∞,

but (Sn = 0)n∈N is not independent. Using the Central Limit Theorem one can

show that P (A) = 1, see Example 5.12.

2 The Strong Law of Large Numbers

Throughout this section: (Xn)n∈N independent.

The L2 Case

Put

C =

(Sn)n∈N converges in R.

By Remark 1.5, P (C) ∈ 0, 1.First we provide sufficient conditions for P (C) = 1 to hold.


Theorem 1 (Kolmogorov inequality). If

∀ i ∈ 1, . . . , n : Xi ∈ L2 ∧ E(Xi) = 0,

then for every u > 0

P(

sup1≤k≤n

|Sk| ≥ u)≤ 1

u2· Var(Sn).

Proof. Let 1 ≤ k ≤ n. We show that

∀B ∈ σ(X1, . . . , Xk) :

∫B

S2k dP ≤

∫B

S2n dP. (1)

Note that

S2n = (Sn − Sk)2 + 2Sk(Sn − Sk) + S2

k .

Moreover, for B ∈ σ(X1, . . . , Xk),

1B · Sk is σ(X1, . . . , Xk)-B-measurable,

Sn − Sk is σ(Xk+1, . . . , Xn)-B-measurable.

Use Theorem II.6.13 to obtain

1B · Sk, Sn − Sk independent.

Hence Theorem II.6.19 yields

E(1B · Sk · (Sn − Sk)) = E(1B · Sk) · E(Sn − Sk) = 0,

and thereby

E(1B · S2n) ≥ 2 · E(1B · Sk · (Sn − Sk)) + E(1B · S2

k) = E(1B · S2k).

This completes the proof of (1).

Put

Ak =k−1⋂`=1

|S`| < u ∩ |Sk| ≥ u.

Then Ak ∈ σ(X1, . . . , Xk), and by (1)

u2 · P(

sup1≤k≤n

|Sk| ≥ u)

= u2 ·n∑k=1

P (Ak) ≤n∑k=1

∫Ak

S2k dP

≤n∑k=1

∫Ak

S2n dP ≤

∫Ω

S2n dP = Var(Sn).

Theorem 2. If

∀n ∈ N : Xn ∈ L2 ∧ E(Xn) = 0

and∞∑i=1

Var(Xi) <∞,

then

P (C) = 1.


Proof. Clearly

ω ∈ C ⇔ ∀ ε > 0 ∃n ∈ N ∀ k ∈ N |Sn+k(ω)− Sn(ω)| < ε.

Put

M = infn∈N

supk∈N|Sn+k − Sn|.

Then

C = M = 0.

Let ε > 0. For every n ∈ N

M > ε ⊂

supk∈N|Sn+k − Sn| > ε

,

and sup

1≤k≤r|Sn+k − Sn| > ε

↑

supk∈N|Sn+k − Sn| > ε

as r tends to ∞. Hence

P (M > ε) ≤ limr→∞

P(

sup1≤k≤r

|Sn+k − Sn| > ε),

and Kolmogorov’s inequality yields

P(

sup1≤k≤r

|Sn+k − Sn| > ε)≤ 1

ε2·n+r∑i=n+1

Var(Xi) ≤1

ε2·∞∑

i=n+1

Var(Xi).

Thus P (M > ε) = 0 for every ε > 0, which implies P (M > 0) = 0.

Example 3. Let (Yn)n∈N be i.i.d. with PY1 = 1/2 · (ε1 + ε−1). Then E(Yn) = 0 and

Var(Yn) = 1, so that∑∞

i=1 Yi ·1i

converges P -a.s.

Remark 4. Consider the truncated random variables

X(c)i = Xi · 1|Xi|<c, c > 0.

According to the three-series theorem, P (C) = 1 iff there exists c > 0 such that (iff

for all c > 0)

(i)∑∞

i=1 P (|Xi| > c) <∞ and

(ii)∑∞

i=1 E(X(c)i ) <∞ and

(iii)∑∞

i=1 Var(X(c)i ) <∞.

See Ganssler, Stute (1977, Satz 2.2.9).

Recall that 0 < an ↑ ∞.

We now study convergence almost surely of (Sn/an)n∈N.


Lemma 5 (Kronecker’s Lemma). For every sequence (xn)n∈N in R∞∑i=1

xiai

converges⇒ limn→∞

1

an·

n∑i=1

xi = 0.

Proof. Put c =∑∞

i=1 xi/ai and cn =∑n

i=1 xi/ai. It is straightforward to verify that

1

an·

n∑i=1

xi = cn −1

an·

n∑i=2

(ai − ai−1) · ci−1.

Moreover, since ai−1 ≤ ai and limi→∞ ai =∞,

c = limn→∞

1

an·

n∑i=2

(ai − ai−1) · ci−1.

Theorem 6 (Strong Law of Large Numbers, L2 Case). If

∀n ∈ N : Xn ∈ L2 ∧∞∑i=1

1

a2i

· Var(Xi) <∞ (2)

then1

an·

n∑i=1

(Xi − E(Xi))P -a.s.−→ 0.

Proof. Put Yn = 1/an · (Xn − E(Xn)). Then E(Yn) = 0 and (Yn)n∈N is independent.

Moreover,∞∑i=1

Var(Yi) =∞∑i=1

1

a2i

· Var(Xi) <∞.

Thus∑∞

i=1 Yi converges P -a.s. due to Theorem 2. Apply Lemma 5.

Remark 7. In particular, if (Xn)n∈N is i.i.d. and X1 ∈ L2, then Theorem 6 with

an = n implies1

n·

n∑i=1

XiP -a.s.−→ E(X1).

In fact, this conclusion already holds if X1 ∈ L1, see Theorem 11 below.

Remark 8. Assume

supn∈N

Var(Xn) <∞.

Then another possible choice of an in Theorem 6 is

an =√n · (log n)1/2+ε

for any ε > 0, and we have

limn→∞

Sn − E(Sn)

an= 0 P -a.s.

Note that limn→∞ an/n = 0. Precise description of the fluctuation of Sn(ω) for P -a.e.

ω ∈ Ω: law of the iterated logarithm, see Section 6 and also Example 5.12.


The I.I.D. Case

Lemma 9. Let Ui, Vi, W ∈ Z(Ω,A) such that

∞∑i=1

P (Ui 6= Vi) <∞.

Then1

n·

n∑i=1

UiP -a.s.−→ W ⇔ 1

n·

n∑i=1

ViP -a.s.−→ W.

Proof. The Borel-Cantelli Lemma implies P (lim supi→∞Ui 6= Vi) = 0.

Lemma 10. For X ∈ Z+(Ω,A)

E(X) ≤∞∑k=0

P (X > k) ≤ E(X) + 1.

Proof. We have

E(X) =∞∑k=1

∫k−1<X≤k

X dP,

and therefore

E(X) ≤∞∑k=1

k · P (k − 1 < X ≤ k) =∞∑k=0

P (X > k)

as well as

E(X) ≥∞∑k=1

(k − 1) · P (k − 1 < X ≤ k) ≥∞∑k=0

P (X > k)− 1.

Theorem 11 (Strong Law of Large Numbers, i.i.d. Case). Let (Xn)n∈N be i.i.d. Then

∃Z ∈ Z(Ω,A) :1

n· Sn

P -a.s.−→ Z ⇔ X1 ∈ L1,

in which case Z = E(X1) P -a.s.

Proof. ‘⇒’: Clearly

P (|X1| > n) = P (An)

where

An = |Xn| > n.

Note that1

n·Xn =

1

n· Sn −

n− 1

n· 1

n− 1· Sn−1

P -a.s.−→ 0.

Hence

P (lim supn→∞

An) = 0.


Since (An)n∈N is independent, the Borel-Cantelli Lemma implies

∞∑n=1

P (An) <∞.

Use Lemma 10 to obtain E(|X1|) <∞.

‘⇐’: Consider the truncated random variables

Yn =

Xn if |Xn| < n

0 otherwise.

We have∞∑i=1

1

i2· Var(Yi) <∞. (3)

Proof: Observe that

Var(Yi) ≤ E(Y 2i ) =

i∑k=1

E(Y 2i · 1[k−1,k[ |Yi|)

=i∑

k=1

E(X2i · 1[k−1,k[ |Xi|) ≤

i∑k=1

k2 · P (k − 1 ≤ |X1| < k).

Thus

∞∑i=1

1

i2· Var(Yi) ≤

∞∑k=1

k2 · P (k − 1 ≤ |X1| < k) ·∞∑i=k

1

i2

≤ 2 ·∞∑k=1

k · P (k − 1 ≤ |X1| < k) ≤ 2 · (E(|X1|) + 1) <∞,

cf. the proof of Lemma 10.

Theorem 6 and (3) yield

1

n·

n∑i=1

(Yi − E(Yi))P -a.s.−→ 0. (4)

Furthermore,

limn→∞

E(Yn) = limn→∞

∫|X1|<n

X1 dP = E(X1),

according to the dominated convergence theorem, and therefore

limn→∞

1

n·

n∑i=1

E(Yi) = E(X1). (5)

From (4) and (5) we get

1

n·

n∑i=1

YiP -a.s.−→ E(X1).


Finally,∞∑i=1

P (Xi 6= Yi) <∞,

since, by Lemma 10,∞∑i=1

P (Xi 6= Yi) =∞∑i=1

P (|Xi| ≥ i) ≤∞∑i=0

P (|X1| > i) ≤ E(|X1|) + 1 <∞.

It remains to observe Lemma 9.

Theorem 12. Let (Xn)n∈N be i.i.d.

(i) If E(X−1 ) <∞∧ E(X+1 ) =∞ then

1

n· Sn

P -a.s.−→ ∞.

(ii) If E(|X1|) =∞ then

lim supn→∞

∣∣∣ 1n· Sn∣∣∣ =∞ P -a.s.

Proof. (i) follows from Theorem 11, and (ii) is an application of the Borel-Cantelli

Lemma, see Ganssler, Stute (1977, p. 131).

Remark 13. We already have Sn/nP -a.s.−→ E(X1) if the random variables Xn are

identically distributed, P -integrable, and pairwise independent. See Bauer (1996,

§12).

Two Applications

Remark 14. The basic idea of Monte-Carlo algorithms : to compute a quantity a ∈ R

(i) find a probability measure µ on (R,B) such that∫R xµ(dx) = a,

(ii) take an i.i.d. sequence (Xn)n∈N with PX1 = µ and approximate a by 1/n ·Sn(ω).

Clearly Sn/n is an unbiased estimator for a, i.e.,

E( 1

n· Sn)

= a.

Due to the Strong Law of Large Numbers Sn/n converges almost surely to a. If

X1 ∈ L2, then

E( 1

n· Sn − a

)2

= Var( 1

n· Sn − a

)= Var

( 1

n·

n∑i=1

(Xi − a))

=1

n· Var(X1),

i.e., the variance of X1 is the key quantity for the error of the Monte Carlo algorithm

in the mean square sense. Moreover,

1

n− 1·

n∑i=1

(Xi − Sn/n)2 P -a.s.−→ Var(X1)

provides a simple estimator for this variance, see Ubung 6.4.

Applications: see, e.g., Ubung 7.1 and 8.2.

III.3. The Weak Law of Large Numbers 47

Remark 15. Let (Xn)n∈N be i.i.d. with µ = PX1 and corresponding distribution

function F = FX1 . Suppose that µ is unknown, but observations X1(ω), . . . , Xn(ω)

are available for ‘estimation of µ’.

Fix C ∈ B. Due to Theorem 11

1

n·

n∑i=1

1C XiP -a.s.−→ µ(C).

The particular case C = ]−∞, x] leads to the definitions

Fn(x, ω) =1

n· |i ∈ 1, . . . , n : Xi(ω) ≤ x|, x ∈ R,

and

µn(·, ω) =1

n·

n∑i=1

εXi(ω)

of the empirical distribution function Fn(·, ω) and the empirical distribution µn(·, ω),

resp. We obtain

∀x ∈ R ∃A ∈ A : P (A) = 1 ∧(∀ω ∈ A : lim

n→∞Fn(x, ω) = F (x)

).

Therefore

∃A ∈ A : P (A) = 1 ∧(∀ q ∈ Q ∀ω ∈ A : lim

n→∞Fn(q, ω) = F (q)

),

which implies

∃A ∈ A : P (A) = 1 ∧(∀ω ∈ A : µn(·, ω)

w−→ µ),

see Helly’s Theorem (ii), p. 17, and Theorem II.3.7.

A refined analysis yields the Glivenko-Cantelli Theorem

∃A ∈ A : P (A) = 1 ∧(∀ω ∈ A : lim

n→∞supx∈R|Fn(x, ω)− F (x)| = 0

),

see Klenke (2013, Satz 5.23) and Ubung 7.2.

3 The Weak Law of Large Numbers

As previously, 0 < an ↑ ∞.

In the sequel: (Xn)n∈N pairwise uncorrelated.

Particular case, (Xn)n∈N pairwise independent and Xn ∈ L2 for every n ∈ N, see

Theorem II.6.19.

Theorem 1 (Khintchine). If

limn→∞

1

a2n

·n∑i=1

Var(Xi) = 0,

then1

an·

n∑i=1

(Xi − E(Xi))P−→ 0.

III.4. Characteristic Functions 48

Proof. Without loss of generality E(Xn) = 0 for every n ∈ N. For ε > 0 the

Chebyshev-Markov inequality and Bienayme’s Theorem yield

P(∣∣∣ 1

an· Sn∣∣∣ ≥ ε

)≤ 1

ε2· Var

( 1

an· Sn)

=1

ε2· 1

a2n

·n∑i=1

Var(Xi). (1)

Remark 2. Assume that supn∈N Var(Xn) < ∞. Then Theorem 1 is applicable for

any sequence (an)n∈N with limn→∞ an/√n =∞.

Example 3. Consider an independent sequence (Xn)n∈N with

P (Xn = 0) = 1− 1

n log(n+ 1), P (Xn = ±n) =

1

2n log(n+ 1).

Hence

E(Xn) = 0, Var(Xn) =n

log(n+ 1),

and1

n2·

n∑i=1

Var(Xi) ≤1

log(n+ 1).

Thus 1/n · SnP−→ 0 due to Theorem 1, but 1/n · Sn

P -a.s.−→ 0 does not hold, see Ubung

7.3.

4 Characteristic Functions

We use the notation 〈·, ·〉 and ‖ · ‖ for the Euclidean inner product and norm. Recall

that M(Rk) denotes the class of all probability measures on (Rk,Bk).

Given: a probability measure µ ∈M(Rk).

Definition 1. f : Rk → C is µ-integrable if <f and =f are µ-integrable, in which

case ∫f dµ =

∫<f dµ+ ı ·

∫=f dµ.

Definition 2. The mapping µ : Rk → C with

µ(y) =

∫exp(ı〈x, y〉)µ(dx), y ∈ Rk,

is called the Fourier transform of µ.

Example 3. For a discrete probability measure

µ =∞∑j=1

αj · εxj

we have

µ(y) =∞∑j=1

αj · exp(ı〈xj, y〉).


For instance, if µ = π(λ) is the Poisson distribution with parameter λ > 0, then

µ(y) = exp(−λ) ·∞∑j=0

λj

j!· exp(ıjy) = exp(−λ) · exp(λ · exp(ıy))

= exp(λ · (exp(ıy)− 1)).

Definition 4. Let

fk(x) = (2π)−k/2 · exp(−‖x‖2/2), x ∈ Rk.

The probability measure fk · λk is called the k-dimensional standard normal distribu-

tion.

Example 5. Let µ = f ·λk with any probability density f : Rk → [0,∞[ with respect

to the k-dimensional Lebesgue measure λk. Then

µ(y) =

∫exp(ı〈x, y〉) · f(x)λk(dx).

For any λk-integrable function f , the right-hand side defines its Fourier transform,

see also Functional Analysis. For instance, if µ is the k-dimensional standard normal

distribution, then

µ(y) = exp(−‖y‖2/2).

Proof: For k = 1 the statement follows from

µ′(y) = (2π)−1/2

∫ıx · exp(ıxy) · exp(−x2/2) dx = − y · µ(y)

and µ(0) = 1. Use Fubini’s Theorem for k > 1.

We study properties of the Fourier transformation : M(Rd)→ C(Rd).

The Range Problem

Theorem 6.

(i) µ is uniformly continuous on Rk,

(ii) |µ(y)| ≤ 1 = µ(0) for y ∈ Rk,

(iii) for n ∈ N, a1, . . . , an ∈ C, and y1, . . . , yn ∈ Rk,

n∑j,`=1

aj · a` · µ(yj − y`) ≥ 0

(positive semi-definite).


Proof. Ad (i): Observe that∣∣exp(ı〈x, y1〉)− exp(ı〈x, y2〉)∣∣ ≤ ‖x‖ · ‖y1 − y2‖. (1)

For ε > 0 take r > 0 such that µ(B) ≥ 1− ε, where B = x ∈ Rk : ‖x‖ ≤ r. Then∣∣µ(y1)− µ(y2)∣∣ ≤ ∫

B

∣∣exp(ı〈x, y1〉)− exp(ı〈x, y2〉)∣∣µ(dx) + 2 · ε

≤ r · ‖y1 − y2‖+ 2 · ε.

Properties (ii) and (iii) are easily verified.

Remark 7. Bochner’s Theorem states that every continuous, positive semi-definite

function ϕ : Rk → C with ϕ(0) = 1 is the Fourier transform of a probability measure

on (Rk,Bk). See Bauer (1996, p. 184) for references.

Convolutions

In the sequel: X, Y, . . . are k-dimensional random vectors on a probability space

(Ω,A, P ).

Definition 8. The characteristic function of X is given by

ϕX = PX .

Remark 9. Due to Theorem A.5.12

ϕX(y) =

∫Rk

exp(ı〈x, y〉)PX(dx) =

∫Ω

exp(ı〈X(ω), y〉)P (dω).

Theorem 10.

(i) For every linear mapping T : Rk → R`

ϕTX = ϕX T>.

(ii) For independent random vectors X and Y

ϕX+Y = ϕX · ϕY .

In particular, for a ∈ Rk,

ϕX+a = exp(ı〈a, ·〉) · ϕX .

Proof. Ad (i): Let z ∈ R`. Use PTX = T (PX) to obtain

ϕTX(z) =

∫Rk

exp(ı〈T (x), z〉)PX(dx) = ϕX(T>(z)).

Ad (ii): Let z ∈ Rk. Fubini’s Theorem and Theorem II.6.18 imply

ϕX+Y (z) =

∫R2k

exp(ı〈x+ y, z〉)P(X,Y )(d(x, y)) = ϕX(z) · ϕY (z).


Corollary 11 (Convolution Theorem). For probability measures µj ∈M(R),

µ1 ∗ µ2 = µ1 · µ2.

Proof. Use Theorem 10.(ii) and Theorem II.6.23.

Example 12. For µ = N(m,σ2) with σ ≥ 0 and m ∈ R

µ(y) = exp(ımy) · exp(−σ2y2/2).

See Examples 3 and 5 and Theorem 10.

The Uniqueness Theorem

Theorem 13 (Uniqueness Theorem). For probability measures µj ∈M(Rk),

µ1 = µ2 ⇒ µ1 = µ2.

Proof. See Bauer (1996, Thm. 23.4) or Billingsley (1979, Sec. 29) for the case k > 1.

Here: the case k = 1. Let T > 0 and a < b. Consider the bounded function

fa,b(y) =1

2πıy· (exp(−iay)− exp(−iby)) , y ∈ R \ 0 ,

and put

Ia,b,T =

∫[−T,T ]

fa,b(y) · µ(y)λ1(dy).

We show that

limT→∞

Ia,b,T = µ(]a, b[) +µ(a) + µ(b)

2. (2)

Fubini’s Theorem yields

Ia,b,T =

∫R

∫[−T,T ]

fa,b(y) · exp(ıxy)λ1(dy)µ(dx).

Let (sinus integralis)

Si(x) =

∫ x

0

sin(y)

ydy

for x ≥ 0. It is well known that limx→∞ Si(x) = π/2. Note that∫[−T,T ]

fa,b(y) · exp(ıxy)λ1(dy) =

∫ T

−T

1

2πıy(exp(ıy(x− a))− exp(ıy(x− b))) dy

=

∫ T

0

1

πy(sin(y(x− a))− sin(y(x− b))) dy

=ga,b,T (x)

π,

where

ga,b,T (x) = sgn(x− a) · Si(T |x− a|)− sgn(x− b) · Si(T |x− b|).


We conclude that

Ia,b,T =

∫R

ga,b,T (x)

πµ(dx).

Use

limT→∞

ga,b,T (x) =

0, if x < a or x > b,

π, if a < x < b,

π/2, if x = a or x = b

to obtain (2).

According to (2), µ determines the distribution function of µ. Apply Theorem II.1.10.

Example 14. For independent random variables X1 and X2 with Xj ∼ π(λj) we

have X1 +X2 ∼ π(λ1 + λ2).

Proof: Theorem 10 and Example 3 yield

ϕX1+X2(y) = exp(λ1 · (exp(ıy)− 1)) · exp(λ2 · (exp(ıy)− 1))

= exp((λ1 + λ2) · (exp(ıy)− 1)).

Use Theorem 13.

Remark 15. Suppose that µ is λ1-integrable. Then µ = f · λ1 with a continuous

density f , which is given by

f(x) = (2π)−1

∫exp(−ıxy) · µ(y)λ1(dy), x ∈ R.

Proof: Employ (2), see Billingsley (1979, p. 347).

The Continuity Theorem

Lemma 16. For every ε > 0 and every probability measure µ ∈M(R),

µ(x ∈ R : |x| ≥ 1/ε) ≤ 7/ε ·∫ ε

0

(1−<µ(y)) dy.

Proof. Clearly

<µ(y) =

∫R

cos(xy)µ(dx).

Hence, with the convention sin(0)/0 = 1,

1/ε ·∫ ε

0

(1−<µ(y)) dy = 1/ε ·∫

[0,ε]

∫R(1− cos(xy))µ(dx)λ1(dy)

=

∫R

(1/ε ·

∫ ε

0

(1− cos(xy)) dy

)µ(dx)

=

∫R(1− sin(εx)/(εx))µ(dx)

≥ inf|z|≥1

(1− sin(z)/z) · µ(x ∈ R : |εx| ≥ 1).


Finally,

inf|z|≥1

(1− sin(z)/z) ≥ 1/7.

Theorem 17 (Continuity Theorem, Levy).

(i) Let µ, µn ∈M(Rk) for n ∈ N. Then

µnw−→ µ ⇒ ∀ y ∈ Rk : lim

n→∞µn(y) = µ(y).

(ii) Let µn ∈M(Rk) for n ∈ N, and let ϕ : Rk → C be continuous at 0. Then

∀ y ∈ Rk : limn→∞

µn(y) = ϕ(y) ⇒ ∃µ ∈M(Rk) : µ = ϕ ∧ µnw−→ µ.

Proof. Ad (i): Note that x 7→ exp(ı〈x, y〉) is bounded and continuous on Rk.

Ad (ii): See Bauer (1996, Thm. 23.8) or Billingsley (1979, Sec. 29) for the case k > 1.

Here: the case k = 1.

We first show that

µn : n ∈ N is tight. (3)

By Lemma 16

µn(x ∈ R : |x| ≥ 1/ε) ≤ cn(ε)

with

cn(ε) = 7/ε ·∫ ε

0

(1−<µn(y)) dy.

Note that

c(ε) = 7/ε ·∫ ε

0

(1−<ϕ(y)) dy

is well-defined, at least for sufficiently small ε > 0, and the dominated convergence

theorem yields

limn→∞

cn(ε) = c(ε).

Let δ > 0, and note that ϕ(0) = 1. By continuity of ϕ at 0 there exists ε > 0 such

that

c(ε) ≤ δ/2.

Take n0 ∈ N such that, for every n ≥ n0,

|cn(ε)− c(ε)| ≤ δ/2.

Hence, for n ≥ n0,

µn(x ∈ R : |x| ≥ 1/ε) ≤ δ,

and hereby we get (3).

Thus, by Prohorov’s Theorem,

µn : n ∈ N is relatively compact. (4)

III.5. The One-Dimensional Central Limit Theorem 54

We fix a probability measure µ ∈M(R) such that µnk

w−→ µ for a suitable subsequence

of (µn)n∈N. By assumption and (i), we get µ = ϕ as well as the following fact:

if µnk

w−→ ν for any subsequence (µnk)k∈N, then ν = µ, (5)

see Theorem 13.

We claim that µnw−→ µ. Due to Remarks II.2.8 and II.3.13.(ii) it suffices to show

that every subsequence of (µn)n∈N contains a subsequence that converges weakly to µ.

The latter property follows from (4) and (5).

Example 18. For the uniform distribution µn on [−n, n] we have

µn(y) =

1 if y = 0sin(n y)n y

otherwise.

Hence

limn→∞

µn(y) =

1 if y = 0

0 otherwise,

but (µn)n∈N does not converge weakly.

Corollary 19. Weak convergence in M(Rk) is equivalent to pointwise convergence

of Fourier transforms.

Example 20. Let µn = B(n, pn) and assume that

limn→∞

n · pn = λ > 0.

Then

µnw−→ π(λ).

Proof: Ubung 3.1 or 8.1.

5 The One-Dimensional Central Limit Theorem

Given: a triangular array of random variables Xnk, where n ∈ N and k ∈ 1, . . . , rnwith rn ∈ N.

Assumptions:

(i) Xnk ∈ L2 for every n ∈ N and k ∈ 1, . . . , rn,

(ii) (Xn1, . . . , Xnrn) independent for every n ∈ N.

Put

Sn =rn∑k=1

Xnk, s2n = Var(Sn).


Additional assumption:

(iii) s2n > 0 for every n ∈ N.

Normalization:

X∗nk =Xnk − E(Xnk)

sn, S∗n =

rn∑k=1

X∗nk =Sn − E(Sn)

sn.

Clearly

E(X∗nk) = E(S∗n) = 0 ∧ Var(X∗nk) =Var(Xnk)

s2n

∧ Var(S∗n) = 1.

Question: convergence in distribution of the sequence of standardized sums S∗n?

For notational convenience: all random variables Xnk are defined on a common prob-

ability space (Ω,A, P ).

Example 1. (Xk)k∈N i.i.d. with X1 ∈ L2 and Var(X1) = σ2 > 0. Put m = E(X1),

take

rn = n, Xnk = Xk.

Then

S∗n =

∑nk=1Xk − n ·m√

n · σ.

Definition 2.

(i) Lyapunov condition

∃ δ > 0: limn→∞

rn∑k=1

E(|X∗nk|2+δ) = 0.

(ii) Lindeberg condition

∀ ε > 0: limn→∞

rn∑k=1

∫|X∗nk|≥ε

|X∗nk|2 dP = 0.

(iii) Feller condition

limn→∞

max1≤k≤rn

Var(X∗nk) = 0.

(iv) The random variables Xnk are asymptotically negligible if

∀ ε > 0: limn→∞

max1≤k≤rn

P (|X∗nk| > ε) = 0.

Lemma 3. The conditions from Definition 2 satisfy

(i) ⇒ (ii) ⇒ (iii) ⇒ (iv).

Moreover, (iii) implies limn→∞ rn =∞.


Proof. From∫|X∗nk|≥ε

|X∗nk|2 dP ≤1

εδ·∫|X∗nk|≥ε

|X∗nk|2+δ dP ≤ 1

εδ· E(|X∗nk|2+δ)

we get ‘(i) ⇒ (ii)’. From

Var(X∗nk) ≤ ε2 +

∫|X∗nk|≥ε

|X∗nk|2 dP

we get ‘(ii) ⇒ (iii)’. The Chebyshev-Markov inequality yields ‘(iii) ⇒ (iv)’. Finally,

1 = Var(S∗n) ≤ rn · max1≤k≤rn

Var(X∗nk),

so that (iii) implies limn→∞ rn =∞.

Example 4. Example 1 continued. We take rn = n and

X∗nk =Xk −m√n · σ

to obtain

n∑k=1

∫|X∗nk|≥ε

|X∗nk|2 dP =1

σ2·∫|X1−m|≥ε·

√n·σ

(X1 −m)2 dP.

Hence the Lindeberg condition is satisfied.

In the sequel

ϕnk = ϕX∗nk

denotes the characteristic function of X∗nk. Moreover,

τ 2nk = Var(X∗nk).

Use (4.1) and Theorem A.5.10 to conclude that

ϕ′nk(0) = ı · E(X∗nk) = 0

and

ϕ′′nk(0) = −E(|X∗nk|2) = −τ 2nk.

Lemma 5. For y ∈ R and ε > 0∣∣ϕnk(y)− (1− τ 2nk/2 · y2)

∣∣ ≤ y2 ·(ε · |y| · τ 2

nk +

∫|X∗nk|≥ε

|X∗nk|2 dP).

Proof. For u ∈ R ∣∣exp(ıu)− (1 + ıu− u2/2)∣∣ ≤ min(u2, |u|3/6),


see Billingsley (1979, Eqn. (26.4)). Hence∣∣ϕnk(y)− (1− τ 2nk/2 · y2)

∣∣=∣∣E(exp(ı ·X∗nk · y))− E

(1 + ı ·X∗nk · y − |X∗nk|2 · y2/2

)∣∣≤ E

(min(y2 · |X∗nk|2, |y|3 · |X∗nk|3)

)≤ |y|3 ·

∫|X∗nk|<ε

ε · |X∗nk|2 dP + y2 ·∫|X∗nk|≥ε

|X∗nk|2 dP

≤ ε · |y|3 · τ 2nk + y2 ·

∫|X∗nk|≥ε

|X∗nk|2 dP.

Lemma 6. Put

∆n(y) =rn∏k=1

ϕnk(y)− exp(−y2/2), y ∈ R.

If the Lindeberg condition is satisfied, then

∀ y ∈ R : limn→∞

∆n(y) = 0.

Proof. Since |ϕnk(y)| ≤ 1 and | exp(−τ 2nk/2 · y2)| ≤ 1, we get

|∆n(y)| =∣∣ rn∏k=1

ϕnk(y)−rn∏k=1

exp(−τ 2nk/2 · y2)

∣∣≤

rn∑k=1

∣∣ϕnk(y)− exp(−τ 2nk/2 · y2)

∣∣by induction, see Billingsley (1979, Lemma 27.1).

We assume

max1≤k≤rn

τ 2nk · y2 ≤ 1,

which holds for fixed y ∈ R if n is sufficiently large, see Lemma 3. Using

0 ≤ u ≤ 1/2 ⇒ | exp(−u)− (1− u)| ≤ u2

and Lemma 5 we obtain

|∆n(y)| ≤rn∑k=1

|ϕnk(y)− (1− τ 2nk/2 · y2)|+

rn∑k=1

τ 4nk/4 · y4

≤ y2 ·(ε · |y|+

rn∑k=1

∫|X∗nk|≥ε

|X∗nk|2 dP)

+ y4/4 · max1≤k≤rn

τ 2nk

for every ε > 0. Thus Lemma 3 yields

lim supn→∞

|∆n(y)| ≤ |y|3 · ε.


Theorem 7 (Central Limit Theorem). The following properties are equivalent:

(i) (X∗nk)n,k satisfies the Lindeberg condition.

(ii) PS∗nw−→ N(0, 1) and (X∗nk)n,k satisfies the Feller condition.

(iii) PS∗nw−→ N(0, 1) and the random variables X∗nk are asymptotically negligible.

Proof. ‘(i) ⇒ (ii)’: Due to Lemma 3 we only have to prove the weak convergence.

Recall that µ(y) = exp(−y2/2) for the standard normal distribution µ. Consider the

characteristic function ϕn = ϕS∗n of S∗n. By Theorem 4.10.(ii)

ϕn =rn∏k=1

ϕnk,

and therefore Lemma 6 implies

∀ y ∈ R : limn→∞

ϕn(y) = µ(y).

It remains to apply Corollary 4.19.

Lemma 3 yields ‘(ii) ⇒ (iii)’. See Billingsley (1979, p. 314–315) for the proof of ‘(iii)

⇒ (i)’.

In the sequel we consider an i.i.d. sequence (Xn)n∈N with X1 ∈ L2 and

σ2 = Var(X1) > 0,

and we put

m = E(X1),

cf. Example 4.

Corollary 8. We have ∑nk=1Xk − n ·m√

n · σd−→ Z

where Z ∼ N(0, 1).

Proof. Theorem 7 and Example 4.

Example 9. Example 4 continued, and Corollary 8 reformulated. Let

Φ(x) =1√2π·∫ x

−∞exp(−u2/2) du, x ∈ R,

denote the distribution function of the standard normal distribution, and let

δn = supx∈R

∣∣P (Sn ≤ m · n+ x · σ ·√n)− Φ(x)

∣∣. (1)

Due to the Central Limit Theorem and Theorem II.3.7 we have limn→∞ δn = 0.

Consequently, for

an = m · n+ a · σ ·√n, bn = m · n+ b · σ ·

√n,

with a < b,

limn→∞

P (an ≤ Sn ≤ bn) = Φ(b)− Φ(a).


Theorem 10 (Berry-Esseen). Suppose that X1 ∈ L3 and m = 0, additionally. For

δn given by (1)

∀n ∈ N : δn ≤6 · E(|X1|3)

σ3· 1√

n.

Proof. See Ganssler, Stute (1977, Section 4.2).

Example 11. Example 9 continued with

PX1 =1

2· (ε1 + ε−1). (2)

Since (−Xn)n∈N is i.i.d. as well, and since P−X1 = PX1 , we have

P (S2n ≤ 0) = P (S2n ≥ 0),

which yields

P (S2n ≤ 0) =1

2· (1 + P (S2n = 0)).

From Example 1.11 we know that

P (S2n = 0) ≈ 1√πn

,

and therefore

δ2n ≥ P (S2n ≤ 0)− 1

2=

1

2· P (S2n = 0) ≈ 1

2√πn

.

Hence the upper bound from Theorem 10 cannot be improved in terms of powers of n.

Example 12. Example 9 continued in the case m = 0.

Let

Bc = lim supn→∞

Sn/√n ≥ c, c > 0.

Using Remark 1.7.(ii) we get

P (Bc) ≥ P (lim supn→∞

Sn/√n > c) ≥ lim sup

n→∞P (Sn/

√n > c) = 1− Φ(c/σ) > 0.

Kolmogorov’s Zero-One Law yields

P (Bc) = 1,

and therefore

P (lim supn→∞

Sn/√n =∞) = P

(⋂c∈N

Bc

)= 1.

By symmetry

P (lim infn→∞

Sn/√n = −∞) = 1.

In particular, for PX1 given by (2),

P (lim supn→∞

Sn = 0) = 1,

see also Example 1.11.

III.6. The Law of the Iterated Logarithm 60

6 The Law of the Iterated Logarithm

Given: an i.i.d. sequence (Xn)n∈N of random variables on (Ω,A, P ) such that

X1 ∈ L2 ∧ E(X1) = 0 ∧ Var(X1) = σ2 > 0.

Remark 1. For every ε > 0, with probability one,

limn→∞

Sn√n · (log n)1/2+ε

= 0,

see Remark 2.8. On the other hand, with probability one,

lim supn→∞

Sn√n

=∞ ∧ lim infn→∞

Sn√n

= −∞,

see Example 5.12.

Question: precise description of the fluctuation of (Sn(ω))n∈N for P -almost every ω?

In particular: existence of a deterministic sequence (γ(n))n∈N of positive reals such

that, with probability one,

lim supn→∞

Snγ(n)

= 1 ∧ lim infn→∞

Snγ(n)

= −1?

Notation: L((un)n∈N) is the set of all limit points in R of a sequence (un)n∈N in R.

Let

γ(n) =√

2n · ln(lnn) · σ2, n ≥ 3.

Theorem 2 (Strassen’s Law of the Iterated Logarithm).

With probability one,

L

((Snγ(n)

)n∈N

)= [−1, 1].

Proof. See Bauer (1996, §33).

Corollary 3 (Hartman and Wintner’s Law of the Iterated Logarithm).

With probability one,

lim supn→∞

Snγ(n)

= 1 ∧ lim infn→∞

Snγ(n)

= −1.

7 The Multi-dimensional Central Limit Theorem

Multi-dimensional Normal Distributions

In the sequel, νk = fk · λk denotes the k-dimensional standard normal distribution,

see Definition 4.4. We Consider a k-dimensional random vector

X = (X1, . . . , Xk)>.

III.7. The Multi-dimensional Central Limit Theorem 61

Theorem 1. X ∼ νk iff (X1, . . . , Xk) is i.i.d. and X1 ∼ N(0, 1).

Proof. ‘⇒’: Take Aj ∈ B and apply Fubini’s Theorem to obtain

P (X ∈ A1 × · · · × Ak) =k∏j=1

∫Aj

f1(xj)λ1(dxj).

Choose Ai = R for i 6= j to conclude that PXj= f1 · λ1 = N(0, 1), and therefore

P

(k⋂j=1

Xj ∈ Aj

)=

k∏j=1

P (Xj ∈ Aj).

‘⇐’: By Theorem II.6.18 we have PX =×kj=1 N(0, 1), and Fubini’s Theorem yields

P (X ∈ A1 × · · · × Ak) =k∏j=1

P (Xj ∈ Aj) = (fk · λk)(A1 × · · · × Ak)

for every choice of Aj ∈ B. It remains to apply Theorem A.1.5.

See Ubung 9.1 for a generalization of Theorem 1.

Theorem 2. For probability measures µ1, µ2 on Bk we have µ1 = µ2 iff

Tµ1 = Tµ2

for every linear mapping T : Rk → R.

Proof. Let y ∈ Rk, and put Tx = 〈x, y〉. Then T>z = z · y and for µ = µ1, µ2

µ(y) = µ(T>1) = T µ(1),

see Theorem 4.10.(i). It remains to apply Theorem 4.13.

Recall that N(b, 0) = εb for any b ∈ R.

Definition 3. A probability measure µ on Bk is called a k-dimensional normal dis-

tribution, if Tµ is a one-dimensional normal distribution for every linear mapping

T : Rk → R. Furthermore, X is normally distributed, if PX is a k-dimensional normal

distribution.

Example 4. Let X ∼ νk, and take a = (a1, . . . , ak) ∈ R1×k. Use Theorem 1 to

conclude that (a1X1, . . . , akXk) is independent and ajXj ∼ N(0, a2j). By Example

II.6.26.(i)

aX ∼ N(0, aa>),

so that νk is a k-dimensional normal distribution.

More generally, for X being normally distributed, L ∈ R`×k, and b ∈ R`, the random

vector LX+b is normally distributed, too. We shall see that every normal distribution

arises in this way with X ∼ νk.


Multi-dimensional Expectations and Covariance Matrices

In the sequel, we assume that X1, . . . , Xk ∈ L2.

The notions of expectation and variance are extended to random vectors as follows.

Definition 5. The expectation and the covariance matrix of X are defined by

E(X) =

E(X1)...

E(Xk)

∈ Rk

and

Cov(X) =

Cov(X1, X1) . . . Cov(X1, Xk)...

...

Cov(Xk, X1) . . . Cov(Xk, Xk)

∈ Rk×k

respectively, where

Cov(Xi, Xj) = E((Xi − E(Xi)) · (Xj − E(Xj))).

The mean and the covariance matrix of a probability measure µ on Bk are defined by

E(X) and Cov(X), respectively, for the random vector X = idRk on the probability

space (Rk,Bk, µ).

Let Ek denote the identity matrix in Rk×k.

Example 6. If X ∼ νk, then E(X) = 0 and Cov(X) = Ek by Theorems II.6.19 and 1.

Lemma 7. For L ∈ R`×k and b ∈ R`

E(LX + b) = LE(X) + b

and

Cov(LX + b) = LCov(X)L>.

Proof. See Irle (2001, Abschn. 9.10).

Remark 8. Covariance matrices are symmetric and non-negative definite. To verify

the latter, let a ∈ R1×k to obtain

0 ≤ σ2(aX) = aCov(X) a>.

Theorem 9.

(i) For every b ∈ Rk and every symmetric, non-negative definite matrix K ∈ Rk×k

there exists a k-dimensional normal distribution with mean b and covariance

matrix K.

(ii) A normal distribution is uniquely determined by its mean and its covariance

matrix.


Proof. Ad (i): See Example 6 in the case K = Ek and b = 0. In the general case, we

get the factorization

K = LL> (1)

with L ∈ Rk×k by diagonalization of K. Then Y = LX + b is normally distributed

with mean b and covariance matrix K, if X ∼ νk.

Ad (ii): Let µ be a k-dimensional normal distribution with mean b and covariance

matrix K, and let Tx = ax with a ∈ R1×k. Then Tµ = N(ab, aKa>) by Lemma 7.

Apply Theorem 2.

Notation: the k-dimensional normal distribution with mean b ∈ Rk and covariance

matrix K ∈ Rk×k is denoted by N(b,K).

Remark 10. If K is symmetric and positive definite, then the Cholesky decomposi-

tion yields (1) with a lower triangular matrix L.

Example 11. Let K ∈ Rk×k be symmetric and non-negative definite. Take L ∈ Rk×k

with (1), and assume that X ∼ νk.

Assume that detK > 0. For A ∈ Bk

N(0, K)(A) = P (LX ∈ A) = νk(L−1A)

= (2π)−k/2 ·∫L−1A

exp(−‖x‖2/2)λk(dx) =

∫A

gk(x;K)λk(dx)

with

gk(x;K) = ((2π)k detK)−1/2 · exp(−x>K−1x/2).

It follows that N(0, K) = gk(· ;K) · λk.There exists an orthonormal basis e1, . . . , ek of Rk and c1, . . . , ck > 0 with Kei = ci ·ei.Hence

x>K−1x =k∑i=1

〈x, ei〉2/ci

for x ∈ Rk, which yields

x ∈ Rk : x>K−1x = r2 = k∑i=1

ai ·√ciei :

k∑i=1

a2i = r2

for every r ≥ 0.

Assume that detK = 0. Then Ke = 0 for some e ∈ Rk \ 0, which implies e>LX ∼ε0. Put A = x ∈ Rk : e>x = 0. It follows that N(0, K)(A) = 1, while λk(A) = 0.

Hence there exists no Lebesgue density for N(0, K).

A Multi-dimensional Central Limit Theorem

Lemma 12. For Y ∼ N(b,K)

ϕY (y) = exp(ı〈y, b〉) · exp(−y>Ky/2).


Proof. With the notation from the previous proof

ϕLX(y) = ϕX(L>y) = exp(−‖L>y‖2/2) = exp(−y>Ky/2),

see Example 4.5, and furthermore ϕY (y) = exp(ı〈y, b〉) · ϕLX(y).

A variant of the multi-dimensional Central Limit Theorem reads as follows.

Theorem 13. Let (Xn)n∈N be an independent sequence of identically distributed

k-dimensional random vectors with components in L2. Then

1√n·

(n∑k=1

Xk − n · E(X1)

)d−→ Z,

where Z ∼ N(0,Cov(X1)).

Proof. Assume E(X1) = 0 without loss of generality, and put S ′n = 1√n·∑n

k=1 Xk as

well as K = Cov(X1). Let y ∈ Rk, and put T (x) = 〈x, y〉. Then

ϕT (S′n)(1) = ϕS′n(y).

The real-valued random variables T (Xk) form an i.i.d. sequence with zero mean and

variance y>Ky. By Corollary 5.8 and Lemma 12,

limn→∞

ϕT (S′n)(1) = exp(−y>Ky/2) = ϕZ(y).

Apply Theorem 4.17.

Example 14. Let the assumptions from Theorem 13 be satisfied. Put Sn =∑n

k=1 Xk

and K = Cov(X1). If N(0, K)(δA) = 0, then

limn→∞

P (Sn ∈√n · A+ n · E(X1)) = N(0, K)(A).

We add that

N(0, K)(∂A) = 0 ⇔ λk(∂A) = 0,

if detK > 0.

Chapter IV

Brownian Motion

We construct a Brownian motion by means of an infinite-dimensional counterpart to

the Central Limit Theorem. We refer to Klenke (2013, p. 467) for the following quote

on Brownian motion: ‘Ohne Ubertreibung kann man sagen, daß dies das zentrale

Objekt der Stochastik ist’.

1 Stochastic Processes

Given: a probability space (Ω,A, P ), a set I 6= ∅, and a measurable space (Ω′,A′).

Definition 1. A mapping

X : I × Ω→ Ω′

such that

∀ t ∈ I : X(t, ·) A-A′-measurable

is called a stochastic process with state space (Ω′,A′) and parameter set I. The

mappings

X(·, ω) : I → Ω′

with ω ∈ Ω are called the realizations (paths, trajectories) of X.

Remark 2. Equivalently, a stochastic process is a family of random elements with

a common state space on a common probability space. Frequently used notation:

X = (Xt)t∈I with

Xt = X(t, ·) : Ω→ Ω′.

In the sequel, X = (Xt)t∈I denotes a stochastic process on (Ω,A, P ) with state

space (Ω′,A′). Note that X may be identified with the mapping

Ψ: Ω→(Ω′)I,

given by

Ψ(ω) = X(·, ω),

which is A-⊗

t∈I A′-measurable, since πs(Ψ(ω)) = Xs(ω).

Later on we will denote Ψ by X as well, which is consistent with the notation from

Remark II.6.15. See also Definition II.6.16.

65

IV.1. Stochastic Processes 66

Definition 3.

(i) The measure PΨ on⊗

t∈I A′ is called the distribution of X. For m ∈ N and

t1, . . . , tm ∈ I the joint distribution of Xt1 , . . . , Xtm is called a finite-dimensional

marginal distribution of X.

(ii) The σ-algebra generated by X is defined by

FX = σ(Xs : s ∈ I).

Lemma 4. We have

FX = σ(Ψ).

Proof. For every σ-algebra A1 in Ω

Ψ A1-⊗t∈I

A′-measurable ⇔ ∀ s ∈ I : Xs A1-A′-measurable ⇔ FX ⊂ A1.

Definition 5. A family (X(j))j∈J of stochastic processes on a common probability

space is independent if (FX(j)

)j∈J is independent.

In this lecture we study the case

I ⊂ [0,∞[ ,

and typically the elements t ∈ I are considered as time instances. In the limit theorems

from the previous chapter we have I = N and (Ω′,A′) = (Rk,Bk) with k = 1 most

often.

Definition 6. The σ-algebra generated by X up to time t ∈ I is defined by

FXt = σ(Xs : 0 ≤ s ≤ t).

Lemma 7. We have

FXt = σ(Ψt),

where Ψt(ω) = Ψ(ω)|[0,t].

Proof. Apply Lemma 4 to the restriction of X onto [0, t]× Ω.

Processes with Continuous Paths

In the sequel,

I = [0,∞[

and

(Ω′,A′) = (Rk,Bk).

Definition 8. X is a stochastic process with continuous paths , if X(·, ω) ∈ C(I,Rk)

for every ω ∈ Ω.


For simplicity we focus on the case k = 1; the general case k ∈ N may be treated

analogously. See also Remark 18 below.

Example 9. Let X have continuous paths. Then

sups∈[0,t]

|Xs| ≤ c =⋂

s∈[0,t]∩Q

|Xs| ≤ c ∈ FXt .

We discuss σ-algebras and probability measures on the path space

M = C(I,R).

Put

ρ(f, g) =∞∑T=1

1

2Tsupt∈[0,T ]

min(|f(t)− g(t)|, 1), f, g ∈M.

Remark 10. For f, fn ∈ M we have limn→∞ ρ(f, fn) = 0 iff (fn)n∈N converges uni-

formly to f on every compact subset of I.

Theorem 11. (M,ρ) is a complete separable metric space.

Proof. Employ the respective properties of the spaces C([0, T ],R), equipped with the

supremum norm.

We consider the Borel σ-algebra B(M) on M as well as the Borel σ-algebra Bm on

Rm. For 0 ≤ t1 < · · · < tm we define κt1,...,tm : M → Rm by

κt1,...,tm(f) = (f(t1), . . . , f(tm)).

Theorem 12.

B(M) = σ(κt : t ∈ I).

Proof. Let G = σ(κt : t ∈ I). Since κt is continuous for every t ∈ I, it follows that

G ⊂ B(M).

Let B(f, r) = g ∈M : ρ(f, g) < r for f ∈M and r > 0. Note that

supt∈[0,T ]

min(|f(t)− g(t)|, 1) = supt∈[0,T ]∩Q

min(|f(t)− g(t)|, 1)

for f, g ∈M , which implies the G-B-measurability of ρ(f, ·). It follows that B(f, r) ∈G, i.e., G contains all open balls. Let M0 be a countable dense subset of M . Consider

an open set ∅ 6= A ⊂M . There exist mappings

c : A→M0, r : A→ ]0,∞[ ∩Q

such that

f ∈ B(c(f), r(f)) ⊂ A

for all f ∈ A. It follows that

A =⋃f∈A

B(c(f), r(f)),

where the right-hand side actually is a countable union of open balls. This implies

A ∈ G, and therefore B(M) ⊂ G.


Example 13. We have f ∈ B(M) for every f ∈M . Moreover,

f ∈M : supt∈[0,∞[

|f(t)|/(1 + t) <∞ ∈ B(M).

Corollary 14. Let X be a process with continuous paths. Then Φ: Ω→M , defined

by

Φ(ω) = X(·, ω),

is A-B(M)-measurable. Moreover,

FX = σ(Φ),

and the analogous statement holds for FXt and Φt(ω) = Φ(ω)|[0,t].

Proof. Note that κt(Φ(ω)) = Xt(ω). Apply Theorem 12, and consider (M,B(M))

instead of ((Ω′)I ,⊗

t∈I A′) as well as Φ instead of Ψ in the proofs of Lemma 4 and

Lemma 7.

Later on we will also denote Φ by X.

Remark 15. Consider the measurable space (Ω2,A2) = (RI ,⊗

t∈I B), and the em-

bedding ı : C(I,R)→ RI . We claim that B(M) = σ(ı).

Proof: Observe that κt = πt ı. Use Theorem 11 to conclude that ı is B(M)-A2-

measurable. Conversely, Theorem A.10.1 implies that κt is σ(ı)-B-measurable.

It follows that B(M) is the trace of A2 in M , i.e.,

B(M) = A ∩M : A ∈ A2.

Therefore any set of the form κ−1t1,...,tm(B1×· · ·×Bm) with Bi ∈ B is called a measurable

rectangle in M , and any set of the form κ−1t1,...,tm(B) with B ∈ Bm is called a cylinder

set in M , cf. Section A.8.

Definition 16. Let X be a process with continuous paths. Then PΦ is called the

distribution of X (on B(M)).

Theorem 17. For µ1, µ2 ∈M(M) we have µ1 = µ2 iff

κt1,...,tmµ1 = κt1,...,tmµ2

for all m ∈ N and 0 ≤ t1 < · · · < tm.

Proof. By Theorem 12 the class of measurable rectangles is a generator of B(M),

which is closed with respect to intersections. Apply Theorem A.1.5.

Remark 18. Let k ∈ N. Consider a product metric on Mk, and observe that C(I,Rk)

may be identified with Mk in a canonical way. We have B(C(I,Rk)) =⊗k

j=1 B(M),

see Theorem 11 and Theorem A.8.8.

IV.2. Donsker’s Invariance Principle 69

Canonical Processes

Let Ω = C(I,Rk) and A = B(Ω) according to Remark 18.

Definition 19. The canonical process X = (Xt)t∈I on (Ω,B(Ω), µ) with any µ ∈M(Ω) is defined by

Xt(f) = f(t), t ∈ I, f ∈ Ω.

Remark 20. Let X denote the canonical process on (Ω,B(Ω), µ). By definition,

X(·, f) = f , such that X has continuous paths and µ is the distribution of X.

2 Donsker’s Invariance Principle

We present an infinite-dimensional central limit theorem, which deals with scaling

limits of random walks.

Given: an i.i.d. sequence (ξj)j∈N of square integrable random variables on a proba-

bility space (Ω,A, P ) such that

E(ξ1) = 0, E(ξ21) = σ2 > 0.

Put In = `/n : ` ∈ N0 as well as

X[n]`/n =

1

σ√n·∑j=1

ξj, ` ∈ N0.

Terminology: a random variable Y on (Ω,A, P ) and a σ-algebra G ⊂ A are called

independent if (σ(Y ),G) is independent.

Remark 1. Let s, t ∈ In with s < t. Use

X[n]t −X [n]

s =1

σ√n·

tn∑j=sn+1

ξj

to conclude that

(i) E(X[n]t −X

[n]s ) = 0,

(ii) Var(X[n]t −X

[n]s ) = t− s,

(iii) X[n]t −X

[n]s is independent of FX

[n]

s .

According to Example III.5.12, lim supn→∞X[n]1 =∞ and lim infn→∞X

[n]1 = −∞ with

probability one. However, X[n]1

d−→ Z ∼ N(0, 1) due to the Central Limit Theorem.


Theorem 2. Fix m,n0 ∈ N, and let µ[n]m,n0 denote the distribution of

X [n]m,n0

=(X

[n·n0]1/n0

, . . . , X[n·n0]m/n0

)>.

Then

µ[n]m,n0

w−→ N(0, K)

as n tends to infinity, where

K =(min(`/n0, k/n0)

)1≤`,k≤m.

Proof. Put

∆[n]` =

√n0 ·

(X

[n·n0]`/n0

−X [n·n0](`−1)/n0

)for ` = 1, . . . ,m. Since

∆[n]` =

1

σ√n·

n∑j=1

ξ(`−1)·n+j,

we get P∆

[n]`

w−→ N(0, 1), see Corollary III.5.8. Furthermore, we have independence

of

∆[n] =(

∆[n]1 , . . . ,∆[n]

m

),

and therefore

P∆[n]w−→ N(0, Em)

follows from Ubung 8.1.b) and Theorem III.7.1.

Note that

X [n]m,n0

= 1/√n0 · L∆[n]

with

L =

1 0...

. . .

1 . . . 1

∈ Rm×m. (1)

Moreover,

1/√n0 · LZ ∼ N(0, K)

if Z ∼ N(0, Em), see Example III.7.4 and Lemma III.7.7. Use Lemma II.3.6 to

conclude that µ[n]m,n0

w−→ N(0, K).

Let I = [0,∞[. We consider M = C(I,R), equipped with the metric of locally uniform

convergence, see p. 67, and the corresponding Borel σ-algebra B(M).

We extend (X[n]t )t∈In to a process (X

[n]t )t∈I with continuous paths by piecewise linear

interpolation, i.e.,

X[n]t (ω) = (t− `/n) · n ·X [n]

(`+1)/n(ω) + ((`+ 1)/n− t) · n ·X [n]`/n(ω)

for t ∈ [`/n, (`+ 1)/n] and every ω ∈ Ω. We study the distribution µ[n] of X [n], which

depends on n and Pξ1 .


Theorem 3 (Donsker). There exists a probability measure µ∗ on B(M) such that

µ[n] w−→ µ∗

for every choice of Pξ1 . Moreover,

κµ∗ = N(0, Kκ)

for every κ = κt1,...,tm with m ∈ N and 0 ≤ t1 < · · · < tm, where

Kκ = (min(ti, tj))1≤i,j≤m. (2)

Sketch of the proof. For details, see Klenke (2013, Abschn. 21.8). The proof consists

of two major steps:

1. Show that µ[n] : n ∈ N is tight.

2. Show that κµ[n] w−→ N(0, Kκ) for every κ = κt1,...,tm .

In the proof of 1.) one considers the modulus of continuity of f ∈M on [0, T ], which

is defined by

mT (f ; δ) = sup|f(t)− f(s)| : s, t ∈ [0, T ], |t− s| ≤ δ.

According to the Arzela-Ascoli Theorem, K ⊂M is relatively compact iff

supf∈K|f(0)| <∞

and

limδ→0

supf∈K

mT (f ; δ) = 0

for every T > 0. Remark 1.(ii) provides a first indication of tightness. See Theorem 2

for a particular case of a subsequence appearing in 2.)

By 1.) and Prohorov’s Theorem, µ[n] : n ∈ N is relatively compact. We fix a

convergent subsequence and let µ∗ denote its limit. We claim that µ[n] w−→ µ∗. Due

to Remarks II.2.8 and II.3.13.(iii) it suffices to show that every subsequence (µ[nk])k∈Nof (µ[n])n∈N contains a subsequence (µ[nk`

])`∈N that converges weakly to µ∗. Relative

compactness yields the existence of a convergent subsubsequence, and µ[nk`] w−→ ν

implies κν = κµ∗ for every κ according to 2.), see Lemma II.3.6. Use Theorem 1.17

to conclude that ν = µ∗.

Definition 4. The measure µ∗ according to Theorem 3 is called the Wiener measure

on B(M).

IV.3. The Brownian Motion 72

3 The Brownian Motion

In the sequel, I = [0,∞[ and X = (Xt)t∈I denotes a stochastic process on (Ω,A, P )

with state space (Rk,Bk).

Definition 1. X is called a k-dimensional Brownian motion (Wiener process), if

(i) X has continuous paths,

(ii) X0 = 0 a.s.,

(iii) Xt −Xs with 0 ≤ s < t is

(a) independent of FXs ,

(b) N(0, (t− s) · Ek)-distributed.

We address the questions of existence and uniqueness of a Brownian motion.

Definition 2. X has

(i) independent increments , if

Xt1 −Xt0 , . . . , Xtm −Xtm−1

is independent for all m ≥ 2 and 0 = t0 < · · · < tm.

(ii) stationary increments , if the distributions of Xt − Xs and Xt−s − X0 coincide

for all 0 ≤ s < t.

Lemma 3. Assume that X0 is constant P -a.s. Then

X has independent increments ⇔ ∀ 0 ≤ s < t : Xt −Xs independent of FXs .

Proof. ‘⇐’: Follows inductively. ‘⇒’: Fix s < t and put

C =⋃

m∈N, 0=s0<···<sm=s

σ(Xs0 , . . . , Xsm).

Since C is closed w.r.t ∩ and σ(C) = FXs , it suffices to show that C and σ(Xt −Xs)are independent, see Theorem II.6.5. By assumption, for 0 = s0 < · · · < sm = s,

Xs0 , Xs1 −Xs0 , . . . , Xsm −Xsm−1 , Xt −Xs independent.

Moreover,

σ(Xs0 , Xs1 −Xs0 , . . . , Xsm −Xsm−1) = σ(Xs0 , Xs1 , . . . , Xsm).

Use Corollary II.6.6 to conclude that σ(Xs0 , Xs1 , . . . , Xsm) and σ(Xt − Xs) are

independent, which yields the independence of C and σ(Xt −Xs), as claimed.


One-dimensional Brownian Motion

Theorem 4. Let k = 1 and let X have continuous paths. Then the following prop-

erties are equivalent:

(i) X is a one-dimensional Brownian motion,

(ii) the distribution of X is the Wiener measure,

(iii) every finite dimensional marginal distribution of X is a normal distribution and

E(Xt) = 0, Cov(Xs, Xt) = min(s, t)

for s, t ∈ I.

Proof. Use Theorems 1.17 and 2.3 to establish the equivalence of (ii) and (iii).

Let

Y = (Xt1 , . . . , Xtm)>

and

Z = (Xt1 −Xt0 , . . . , Xtm −Xtm−1)>

for 0 = t0 ≤ t1 < · · · < tm. Moreover, let

D = diag(t1, t2 − t1, . . . , tm − tm−1) ∈ Rm×m,

L be given by (2.1), and define K = Kκ according to (2.2). A direct computation

yields

LDL> = K.

Suppose that Xt0 = 0 P -a.s., which follows in particular from (i) or (ii). Then we

have Y = LZ P -a.s., and therefore Y ∼ N(0, K) iff Z ∼ N(0, D).

‘(i)⇒ (iii)’: Use Theorem III.7.1 and Lemma 3 to conclude that Z ∼ N(0, D). Hence

Y ∼ N(0, K).

‘(ii) ⇒ (i)’: By Theorem 2.3, Y ∼ N(0, K) so that Z ∼ N(0, D). Apply Theorem

III.7.1 and Lemma 3.

Corollary 5. Let M = C(I,R), and let µ∗ denote the Wiener measure. Then the

canonical process on (M,B(M), µ∗) is a one-dimensional Brownian motion.

Proof. Apply Remark 1.20 and Theorem 4.

Remark 6. According to Theorem 4 the distribution of a one-dimensional Brownian

motion is uniquely determined.


Multi-dimensional Brownian Motion

The components of X are real-valued processes, which are denoted by X(j) with

j ∈ 1, . . . , k, i.e.,

Xt = (X(1)t , . . . , X

(k)t ), t ∈ I.

Consider the product measurable space

(M,B(M)) =(C(I,Rk),

k⊗j=1

B(C(I,R))),

see Remark 1.18. Note that Corollary 1.14 and Definition 1.16 extend to Rk-valued

processes with continuous paths.

Remark 7. Let X have continuous paths, let µ denote the distribution of X, and let

µ(j) denote the distribution of X(j). Then, by Corollary 1.14,

(X(1), . . . , X(k)) independent ⇔ µ =k×j=1

µ(j).

See also Ubung 9.4 and cf. Theorem II.6.18.

Theorem 8. The following properties are equivalent:

(i) X is a k-dimensional Brownian motion,

(ii) (X(1), . . . , X(k)) is independent and X(j) is a one-dimensional Brownian motion

for every j ∈ 1, . . . , k.

Proof. ‘(i) ⇒ (ii)’: It is easily verified that X(1), . . . , X(k) are one-dimensional Brow-

nian motions. Let m ∈ N and 0 = t0 < · · · < tm, and put

∆(j)i = X

(j)ti −X

(j)ti−1

for i ∈ 1, . . . ,m and j ∈ 1, . . . , k as well as

∆(j) = (∆(j)1 , . . . ,∆(j)

m ), ∆i = (∆(1)i , . . . ,∆

(k)i ).

Due to Theorem II.6.13 and Ubung 9.4. it suffices to show

(∆(1), . . . ,∆(k)) is independent. (1)

By Lemma 3

(∆1, . . . ,∆m) is independent,

and for every i ∈ 1, . . . ,m

(∆(1)i , . . . ,∆

(k)i ) is independent

by Theorem III.7.1. It follows that

(∆(1)1 , . . . ,∆(k)

m ) is independent.

Apply Theorem II.6.13 to get (1).

‘(ii) ⇒ (i)’ is established in a similar way.


Corollary 9. Let µ∗ denote the Wiener measure. Then the canonical process on

(M,B(M), µ)

with µ =×kj=1 µ∗ is a Brownian motion.

Proof. Remark 7 yields the independence of (X(1), . . . , X(k)), and by Theorem 4 the

components X(j) are one-dimensional Brownian motions. Apply Theorem 8.

Remark 10. According to Theorems 4 and 8 the distribution of any k-dimensional

Brownian motion is given by×kj=1 µ∗ with µ∗ denoting the Wiener measure.

Remark 11. Brownian motion

(i) is a fundamental object in the theory of martingales, Markov processes, and

stochastic differential equations.

(ii) is the building block to model continuous-time random dynamics, e.g., in math-

ematical finance, physics, or engineering applications.

(iii) provides a link between probability theory and analysis, called stochastic anal-

ysis.

Example 12. Let X be a one-dimensional Brownian motion, and let µ ∈ R as well

as s0, σ > 0. Then

St = s0 · exp((µ− σ2/2) · t+ σ ·Xt), t ∈ I,

is called a geometric Brownian motion with initial value s0, drift µ, and volatility σ.

The process S is used to define the dynamics of the stock price in the Black-Scholes

model .

Example 13. We consider a boundary value problem for an elliptic partial differential

equation. Let ∅ 6= D ⊂ Rk be a bounded open set, and let f : ∂D → R be a continuous

function. Find a function

u : clD → R

such that

(i) u is continuous on clD and u(x) = f(x) for x ∈ ∂D,

(ii) u is harmonic on D, i.e., u is twice continuously differentiable on D and

k∑j=1

∂2u

∂x2j

(x) = 0, x ∈ D.

Consider a k-dimensional Brownian motion X, let x ∈ clD, and put

τx = inft ∈ I : x+Xt ∈ ∂D.

Then τx is a numerical random variable, and τx < ∞ P -a.s. follows from Example

III.5.12. Put

Y x = x+Xτx


on τx <∞. Then Y x is a random variable, which takes values in ∂D. Put

u∗(x) = E(f(Y x)), x ∈ clD.

We have a stochastic representation and uniquenes result: u = u∗ for every solution

u of the boundary value problem. Moreover, we have an existence result: under mild

geometrical assumptions on ∂D, which ensure continuity of u∗, e.g., if D is convex,

u∗ is the unique solution of the boundary value problem.

See Muller-Gronbach, Novak, Ritter (2012, Abschn. 4.5.5) for further details, refer-

ences, and a simulation technique that takes into account the geometry of D.

Invariance and Path Properties

Given: a one-dimensional Brownian motion X = (Xt)t∈I .

Theorem 14.

(i) Yt = −Xt defines a one-dimensional Brownian motion (symmetry).

(ii) For every c > 0

Yt = c−1/2 ·Xc·t

defines a one-dimensional Brownian motion (scaling invariance).

(iii) For every T > 0 and t ∈ [0, T ]

Yt = XT −XT−t

defines a one-dimensional Brownian motion with parameter set [0, T ] (time re-

versal).

(iv) For every T > 0

Yt = XT+t −XT

defines a one-dimensional Brownian motion, which is independent of (Xt)t∈[0,T ]

(Markov property).

Proof. Use Theorem 4 to establish (i)–(iii). For (iv) see Ubung 10.3.

A variant of the strong law of large numbers for Brownian motion reads as follows.

Theorem 15. For every α > 1/2

limt→∞

t−α ·Xt = 0 P -a.s.

Proof. Reduction to the discrete-time case, see Ubung 10.4.b).

Theorem 16.

Yt =

t ·X1/t if t > 0

0 if t = 0

defines a one-dimensional Brownian motion (projective reflection).


Proof. Use Theorem 4 and note that continuity at t = 0 follows from Theorem 15,

see Ubung 10.4.c).

The law of the iterated logarithm hold for Brownian motion, too.

Theorem 17.

lim supt→∞

Xt√2t · log log t

= 1 P -a.s.

Proof. See Schilling, Partzsch (2014, Sec. 12.1).

Remark 18. Let c, ε > 0, and put

Ac,ε = f ∈ C(I,R) : supt∈]0,ε[

|f(t)|/t1/2 ≤ c,

Bc,ε = f ∈ C(I,R) : supt∈]1/ε,∞[

|f(t)|/t1/2 ≤ c.

Use Theorems 4 and 16 to conclude that

P (X· ∈ Ac,ε) = P (Y· ∈ Ac,ε) = P (X· ∈ Bc,ε).

Moreover, P (X· ∈ Bc,ε) = 0 by Theorem 17 (or Example III.5.12). Thus X is not

locally Holder continuous of order 1/2 at t = 0 with probability one. In particular,

X is not differentiable at t = 0 with probability one. By Theorem 14.(iv) the same

conclusion holds true at any fixed t ∈ I.

Theorem 19. With probability one X is nowhere locally Holder-continuous of any

order γ > 1/2. In particular, X is nowhere differentiable with probability one.

Proof. See Klenke (2013, p. 477) as well as Schilling, Partzsch (2014, Chap. 10).

The following result is known as Levy’s modulus of continuity.

Theorem 20 (Levy’s modulus of continuity).

lim supδ→0

m1(X; δ)√2δ · log δ−1

= 1 P -a.s.

Proof. See Schilling, Partzsch (2014, Sec. 10.3).

Consider the level sets

Zb(ω) = t ∈ I : Xt(ω) = b.

Theorem 21. For every b ∈ R with probability one, Zb is closed, unbounded, of

Lebesgue measure zero, and has no isolated points.

Proof. By continuity of X·(ω) and by Theorem 17 (or Example III.5.12), Zb(ω) is

closed and unbounded for P -a.e. ω ∈ Ω. Use Ubung 10.1.b and Fubini’s Theorem to

obtain ∫Ω

∫I

1b(Xt(ω))λ(dt)P (dω) =

∫I

P (Xt = b)λ(dt) = 0

to derive λ(Zb(ω)) = 0 for P -a.e. ω ∈ Ω. See Schilling, Partzsch (2014, Sec. 11.4) for

the remaing part of the proof.

Chapter V

Conditional Expectations and

Martingales

‘Access to the martingale concept is afforded by one of the truely basic ideas of probability

theory, that of conditional expectation.’, see Bauer (1996, p. 109).

1 The Radon-Nikodym-Theorem

Given: a measure space (Ω,A, µ). Put Z+ = Z+(Ω,A).

Uniqueness of Densities

Theorem 1. Let f, g ∈ Z+ such that f · µ = g · µ. Then

(i) f µ-integrable ⇒ f = g µ-a.e.,

(ii) µ σ-finite ⇒ f = g µ-a.e.

Proof. Ad (i): It suffices to verify that

f, g µ-integrable ∧(∀A ∈ A :

∫A

f dµ ≤∫A

g dµ)⇒ f ≤ g µ-a.e.

To this end, take A = f > g. By assumption,

−∞ <

∫A

f dµ ≤∫A

g dµ <∞

and therefore∫A

(f − g) dµ ≤ 0. However,

1A · (f − g) ≥ 0,

hence∫A

(f − g) dµ ≥ 0. Thus ∫1A · (f − g) dµ = 0.

Theorem A.5.7 implies 1A · (f − g) = 0 µ-a.e., and by definition of A we get µ(A) = 0.

Ad (ii): see Elstrodt (1996, p. 141).

78

V.1. The Radon-Nikodym-Theorem 79

Existence of Densities

Remark 2. Let (Ω,A, µ) = (Rk,Bk, λk) and x ∈ Rk. There is no density f ∈ Z+

w.r.t. λk such that εx = f · λk. This follows from εx(x) = 1 and

(f · λk)(x) =

∫x

f dλk = 0.

Definition 3. A measure ν on A is absolutely continuous w.r.t. µ if

∀A ∈ A : µ(A) = 0⇒ ν(A) = 0.

Notation: ν µ.

Remark 4.

(i) ν = f · µ⇒ ν µ.

(ii) In Remark 2 neither εx λ1 nor λ1 εx.

(iii) Let µ denote the counting measure on A. Then ν µ for every measure ν on A.

(iv) Let µ denote the counting measure on B1. Then there is no density f ∈ Z+ such

that λ1 = f · µ.

Lemma 5. Let fnLp

−→ f and A ∈ A. If p = 1 or µ(A) <∞ then∫A

fn dµ→∫A

f dµ.

Proof. See Ubung 6.3.b in Maß- und Integrationstheorie (2016). See also Lp and its

dual space, e.g., in Elstrodt (1996, §VII.3).

Theorem 6 (Radon, Nikodym). For every σ-finite measure µ and every measure ν

on A we have

ν µ ⇒ ∃ f ∈ Z+ : ν = f · µ.

Proof. See Elstrodt (1996, §VII.2).

Here we consider the particular case(∀A ∈ A : ν(A) ≤ µ(A)

)∧ µ(Ω) <∞.

A class U = A1, . . . , An is called a (finite measurable) partition of Ω if A1, . . . , An ∈A are pairwise disjoint and

⋃ni=1Ai = Ω. The set of all partitions is partially ordered

by

U @ V if ∀A ∈ U ∃B ∈ V : A ⊂ B.

The infimum of two partitions is given by

U ∧V = A ∩B : A ∈ U, B ∈ V .

V.1. The Radon-Nikodym-Theorem 80

For any partition U we define

fU =∑A∈U

αA · 1A

with

αA =

ν(A)/µ(A) if µ(A) > 0

0 otherwise.

Clearly σ(U) = U+ ∪ ∅ and fU ∈ S+(Ω, σ(U)) ⊂ S+(Ω,A) as well as

∀A ∈ σ(U) : ν(A) =

∫A

fU dµ.

(Thus we have ν|σ(U) = fU · µ|σ(U).) Let U @ V and A ∈ V. Then

ν(A) =

∫A

fV dµ =

∫A

fU dµ,

since A ∈ σ(U). Hence ∫A

f 2V dµ =

∫A

fV · fU dµ,

since fV|A is constant, and therefore

0 ≤∫

(fU − fV)2 dµ =

∫f 2U dµ−

∫f 2V dµ. (1)

Put

β = sup

∫f 2U dµ : U partition

,

and note that 0 ≤ β ≤ µ(Ω) < ∞, since fU ≤ 1. Consider a sequence of functions

fn = fUn such that

limn→∞

∫f 2n dµ = β.

Due to (1) we may assume that Un+1 @ Un. Then, by (1), (fn)n∈N is a Cauchy

sequence in L2, so that there exists f ∈ L2 with

limn→∞

‖fn − f‖2 = 0 ∧ 0 ≤ f ≤ 1 µ-a.e.,

see Theorem A.6.7. Cf. Ubung 1.4.

We claim that ν = f · µ. Let A ∈ A. Put

Un = Un ∧ A,Ac

and

fn = fUn.

Then

ν(A) =

∫A

fn dµ =

∫A

fn dµ+

∫A

(fn − fn) dµ,

and (1) yields limn→∞ ‖fn − fn‖2 = 0. It remains to apply Lemma 5.

V.2. Conditional Expectations 81

2 Conditional Expectations

Given: a probability space (Ω,A, P ).

Unless stated otherwise, we consider the Borel σ-algebra B on R; for any σ-algebra

G ⊂ A, G-B-measurable mappings are called G-measurable, for short.

Remark 1. Let A,B ∈ A with P (B) > 0. Recall that

P (A |B) =P (A ∩B)

P (B)

denotes the elementary conditional probability of A given B. The probability measure

P (· |B) has got the density 1/P (B) · 1B w.r.t. P .

For every random variable X ∈ L1 = L1(Ω,A, P ) the elementary conditional expection

of X given B is defined by

E(X |B) =1

P (B)· E(1B ·X) =

∫X dP (· |B).

Obviously, for A ∈ A,

E(1A |B) = P (A |B).

Remark 2. Let I be a countable set, and let X ∈ L1. Consider a partititon

Bi ∈ A, i ∈ I,

of Ω such that

∀ i ∈ I : P (Bi) > 0.

By

G =

⋃j∈J

Bj : J ⊂ I

we get a σ-algebra G ⊂ A. In fact, G = σ(Bi : i ∈ I). Define a random variable

E(X |G) by

E(X |G)(ω) =∑i∈I

E(X |Bi) · 1Bi(ω), ω ∈ Ω. (1)

Then E(X |G) is G-measurable and belongs to L1. Furthermore,∫Bj

E(X |G) dP = E(X |Bj) · P (Bj) =

∫Bj

X dP,

and hence for every G ∈ G ∫G

E(X |G) dP =

∫G

X dP.

The transition from X to E(X |G) means coarsening. Cf. the proof of Theorem 1.6.

We consider extremal cases. If |I| = 1, then G = ∅,Ω and

E(X |G) = E(X).

If Ω is countable and I = Ω with Bi = i, then G = P(Ω) and

E(X |G) = X.


Definition and Elementary Properties

Given: a random variable X on (Ω,A, P ) with X ∈ L1 and a σ-algebra G ⊂ A.

Definition 3. A random variable Z on (Ω,A, P ) such that Z ∈ L1 and

(i) Z is G-measurable,

(ii) ∀G ∈ G :∫GZ dP =

∫GX dP

is called (a version of) the conditional expectation of X given G. Notation: Z =

E(X |G). For X = 1A with A ∈ A, Z is called (a version of) the conditional probability

of A given G. Notation: Z = P (A |G).

Theorem 4. For all X and G the conditional expectation E(X |G) exists and is

uniquely determined P |G-a.s.

Proof. At first assume that X ≥ 0. By

Q(G) =

∫G

X dP, G ∈ G,

we get a finite measure Q on (Ω,G), see Theorem A.5.14, and we have Q P |G. The

Radon-Nikodym yields a G-measurable mapping Z : Ω→ [0,∞[ such that

∀G ∈ G : Q(G) =

∫G

Z dP.

For a general X we consider decomposition into its positive and its negative part.

Uniqueness is shown as follows. From∫GZ1 dP |G =

∫GZ2 dP |G for every G ∈ G we

get Z1 = Z2 P |G-a.s. See the proof of Theorem 1.1.

In the sequel, we write X = Y or X ≥ Y , if the respective relation holds a.s.

Likewise we identify mappings that coincide a.s.

The ‘explicit’ calculation of conditional expectations is non-trivial in general. See,

however, Ubung 12.1.

Remark 5. Obviously E(X |G) = X, if X is G-measurable. Moreover,

E(X) =

∫Ω

E(X |G) dP = E(E(X |G)).

In particular for (1) with X = 1A and A ∈ A this is the total probability formula, i.e.,

P (A) =∑i∈I

P (A |Bi) · P (Bi).

Lemma 6. E(· |G) is positive and linear.

Proof. This follows from the respective properties of integrals and from closure prop-

erties of classes of measurable mappings.


Lemma 7. For Y G-measurable with X · Y ∈ L1 we have

E(X · Y |G) = Y · E(X |G).

Proof. Obviously Y · E(X |G) is G-measurable.

In the particular case Y = 1C with C ∈ G we have for any G ∈ G∫G

Y · E(X |G) dP =

∫G∩C

E(X |G) dP =

∫G∩C

X dP =

∫G

X · Y dP.

Use algebraic induction.

Lemma 8. For σ-algebras G1 ⊂ G2 ⊂ A we have

E(E(X |G1) |G2) = E(X |G1) = E(E(X |G2) |G1).

Proof. Note that E(X |G1) is G2-measurable. Hence the first identity follows from

Remark 5 (or Lemma 7). For the second identity let G ∈ G1 and observe that∫G

E(X |G1) dP =

∫G

X dP =

∫G

E(X |G2) dP.

Lemma 9. If X and G are independent, then

E(X |G) = E(X).

Proof. Obviously E(X) is G-measurable. Let G ∈ G. By assumption X and 1G are

independent. Hence∫G

E(X) dP = E(X) · E(1G) = E(X · 1G) =

∫G

X dP.

Theorem 10 (Jensen Inequality). Let J ⊂ R be an interval such that X(ω) ∈ J for

every ω ∈ Ω. Let ϕ : J → R be convex with ϕ X ∈ L1. Then

ϕ E(X |G) ≤ E(ϕ X |G).

Proof. Ganssler, Stute (1977, Kap. V.4).

Remark 11. Consider the particular case J = R and ϕ(u) = |u|p/q, where 1 ≤ q ≤p <∞. Then

(E(|X|q |G))1/q ≤ (E(|X|p |G))1/p

for any X ∈ Lp as well as(∫Ω

|E(X |G)|p dP)1/p

≤(∫

Ω

|X|p dP)1/p

. (2)

Consequently, E(· |G) is an idempotent bounded linear operator on Lp(Ω,A, P ) with

norm 1. In particular for p = 2 it is the orthogonal projection onto the subspace

L2(Ω,G, P ).


Factorization of Conditional Expectations

Frequently one encounters the following situation:

G = σ(Y )

for some measurable space (Ω′,A′) and a random element Y : Ω→ Ω′.

Definition 12. The conditional expectation of X given Y is defined by

E(X |Y ) = E(X |σ(Y )).

Remark 13. Application of Theorem A.10.1 yields the factorization of the conditional

expectation: There exists an A′-measurable mapping g : Ω′ → R such that

E(X |Y ) = g Y.

Any two such mappings coincide PY -a.s.

Definition 14. The conditional expectation of X given Y = y is defined by

E(X |Y = y) = g(y)

with g as previously.

Analogous definition for conditional probabilities , where X = 1A with A ∈ A.

Example 15. Let (Ω,A, P ) = ([0, 1],B([0, 1]), λ) and (Ω′,A′) = (R,B). Moreover,

X(ω) = ω2, Y (ω) =

1, if ω ∈ [0, 1/2],

ω − 1/2, if ω ∈ ]1/2, 1].

Then

σ(Y ) = A ∪B : A ∈ ∅, [0, 1/2], B ⊂ ]1/2, 1] , B ∈ Aas well as

E(X |Y )(ω) =

1/12, if ω ∈ [0, 1/2],

ω2, if ω ∈ ]1/2, 1]

and

E(X |Y = y) =

1/12, if y = 1,

(y + 1/2)2, if y ∈ ]0, 1/2].

Note that P (Y = y) = 0 for all y ∈ ]0, 1/2].

Remark 16. Obviously, for A′ ∈ A′∫Y ∈A′

X dP =

∫A′

E(X |Y = y)PY (dy), (3)

and in particular

P (A ∩ Y ∈ A′) =

∫A′P (A |Y = y)PY (dy)

for A ∈ A.

The mapping E(X |Y = ·) is uniquely determined PY -a.s. by (3) for all A′ ∈ A′ and

by its A′-measurability.


Conditional Expectations and Optimal Prediction

According to the next result the conditional expection is the optimal prediction in

mean-square sense. Cf. linear regression as well as Ubung 6.2 and Lemma 9.

Theorem 17. Let X ∈ L2. Then, for every A′-measurable mapping ϕ : Ω′ → R,∫Ω

(X − E(X |Y ))2 dP ≤∫

Ω

(X − ϕ Y )2 dP

with equality iff ϕ = E(X |Y = ·) PY -a.s.

Proof. Put Z∗ = E(X |Y ) and Z = ϕ Y . Jensen’s inequality ensures Z∗ ∈ L2, see

(2) with p = 2. Consider the non-trivial case Z ∈ L2. We have

E(X − Z)2 = E(X − Z∗)2 + E(Z∗ − Z)2︸︷︷︸≥0

+2 · E((X − Z∗)(Z∗ − Z)).

Lemma 6 and Lemma 7 imply

E((X − Z∗)(Z∗ − Z)) =

∫Ω

E((X − Z∗)(Z∗ − Z) |Y ) dP

=

∫Ω

(Z∗ − Z) · E((X − Z∗) |Y ) dP

=

∫Ω

(Z∗ − Z) · (E(X |Y )− Z∗)︸︷︷︸=0

dP.

Regular Conditional Distributions

We study the relation between Markov kernels and conditional probabilities.

Given: a probability space (Ω,A, P ), measurable spaces (Ω′,A′) and (Ω′′,A′′), as

well as an A-A′ measurable mapping Y : Ω → Ω′ and an A-A′′ measurable mapping

X : Ω→ Ω′′.

Lemma 18. For every mapping PX |Y : Ω′ × A′′ → R the following properties are

equivalent

(i) PX |Y is a Markov kernel from (Ω′,A′) to (Ω′′,A′′) and

P(Y,X) = PY × PX |Y , (4)

(ii) PX |Y (y, ·) is a probability measure on (Ω′′,A′′) for every y ∈ Ω′, and for all

A′′ ∈ A′′ we have

PX |Y (·, A′′) = P (X ∈ A′′ |Y = ·).

Given (i), X and Y are independent iff

PX |Y (y, ·) = PX

for PY -a.e. y ∈ Ω.


Proof. Let A′ ∈ A′ and A′′ ∈ A′′. In any case

P(Y,X)(A′ × A′′) =

∫A′P (X ∈ A′′ |Y = y)PY (dy).

Moreover, if PX |Y is a Markov kernel,

PY × PX |Y (A′ × A′′) =

∫A′PX |Y (y, A′′)PY (dy).

Hence we have equivalence of (i) and (ii). See Ubung 12.4 for the characterization of

independence.

Example 19. Let (Ω′,A′) = (Ω′′,A′′) = (R,B). Suppose that

P(Y,X) = f · λ2

for some probability density f on (R2,B2). Let A′, A′′ ∈ B. Fubini’s Theorem, which

in particular includes a measurability property, yields

P(Y,X)(A′ × A′′) =

∫A′×A′′

f d(λ1 × λ1) =

∫A′

∫A′′f(y, x)λ1(dx)λ1(dy).

Choose A′′ = R to obtain

PY = h · λ1

with the probability density

h(y) =

∫Rf(y, ·) dλ1, y ∈ R.

For y, x ∈ R we define

f(x | y) =

f(y, x)/h(y) if h(y) > 0

1[0,1](x) otherwise

as well as the Markov kernel PX |Y : Ω′ × A′′ → [0, 1] via

PX |Y (y, A′′) =

∫A′′f(x | y)λ1(dx),

i.e., PX |Y (y, ·) = f(· | y) · λ1. Hereby

P(Y,X)(A′ × A′′) =

∫A′

∫A′′f(x | y)λ1(dx) · h(y)λ1(dy)

=

∫A′PX |Y (y, A′′)PY (dy).

According to Lemma 18

P (X ∈ A′′ |Y = y) =

∫A′′f(x | y)λ1(dy)

for PY -f.e. y ∈ Ω′. The mapping f(· | ·) is called conditional density of X given Y .

Appendix A

Measure and Integration

We collect some basic definitions and facts from measure and integration theory, as

needed in the course of this lecture.

For more details and for a consistent presentation see the Lecture Notes on Maß- und

Integrationstheorie (2016). A basic reference is Elstrodt (2011), see also Ganssler,

Stute (1977, Kap. 1) and Klenke (2013).

1 Measure Spaces and Measurable Mappings

Given: a non-empty set Ω.

Definition 1. A ⊂ P(Ω) is a σ-algebra (in Ω) if

(i) Ω ∈ A,

(ii) A1, A2, . . . ∈ A⇒⋃∞n=1An ∈ A,

(iii) A ∈ A⇒ Ac ∈ A.

In this case (Ω,A) is called measurable space, and elements A ∈ A are called measur-

able sets .

Definition 2. µ : A→ [0,∞[ ∪ ∞ is a measure (on A) if

(i) A is a σ-algebra,

(ii) µ(∅) = 0,

(iii) µ is σ-additive, i.e.,

A1, A2, . . . ∈ A pairwise disjoint ⇒ µ( ∞⋃i=1

Ai

)=∞∑i=1

µ(Ai).

In this case (Ω,A, µ) is called measure space. If µ(Ω) = 1, additionally, then µ is

called probability measure (on A) and (Ω,A, µ) is called probability space.

87

A.1. Measure Spaces and Measurable Mappings 88

See Theorem 9.4 for further properties of probability measures.

Example 3. Let A denote a σ-algebra in Ω. The Dirac measure in ω ∈ Ω is given by

εω(A) = 1A(ω), A ∈ A.

The counting measure is given by

µ(A) = |A|, A ∈ A.

The uniform distribution, in the case |Ω| <∞ and A = P(Ω),

µ(A) =|A||Ω|

, A ⊂ Ω.

Given: a class E ⊂ P(Ω).

Definition 4. The σ-algebra generated by E is given by

σ(E) =⋂A : A σ-algebra in Ω ∧ E ⊂ A.

Theorem 5. For probability measures µ1, µ2 on a σ-algebra A and E ⊂ A with

(i) σ(E) = A and E is closed w.r.t. intersections,

(ii) µ1|E = µ2|E

we have µ1 = µ2.

In the sequel, (Ωi,Ai) are measurable spaces.

Definition 6. f : Ω1 → Ω2 is A1-A2-measurable if f−1(A) ∈ A1 for every A ∈ A2.

Theorem 7. If f : Ω1 → Ω2 is A1-A2-measurable and g : Ω2 → Ω3 is A2-A3-

measurable, then g f : Ω1 → Ω3 is A1-A3-measurable.

Theorem 8. If A2 = σ(E) with E ⊂ P(Ω2) and f : Ω1 → Ω2, then

f is A1-A2-measurable ⇔ ∀E ∈ E : f−1(E) ∈ A1.

Given: a measure µ on (Ω1,A1) and an A1-A2-measurable mapping f : Ω1 → Ω2.

Remark 9.

f(µ) : A2 → [0,∞[ ∪ ∞A 7→ µ(f−1(A))

defines a measure on A2.

Definition 10. f(µ) is called the image measure of µ under f .

A.2. Borel Sets 89

2 Borel Sets

Definition 1. For every topological space (Ω,G)

B(Ω) = σ(G)

is the Borel σ-algebra (in Ω w.r.t. G). Elements A ∈ B(Ω) are called Borel sets .

Theorem 2. For every pair of topological spaces (Ω1,G1) and (Ω2,G2) and every

mapping f : Ω1 → Ω2,

f continuous ⇒ f is B(Ω1)-B(Ω2)-measurable.

Remark 3. Put R = R ∪ ±∞, and consider the natural topologies on R and R.

Then O ⊂ R is an open set iff O ∩ R is an open set in R and ]a,∞] ⊂ O for some

a <∞ if ∞ ∈ O as well as [−∞, a[ ⊂ O for some a > −∞ if −∞ ∈ O.

By definition, 0 · ±∞ = 0.

Notation

Bk = B(Rk), B = B1, Bk = B(Rk), B = B1.

Theorem 4.

Bk = σ(F ⊂ Rk : F closed) = σ(K ⊂ Rk : K compact)= σ(]−∞, a] : a ∈ Rk) = σ(]−∞, a] : a ∈ Qk).

3 The Lebesgue Measure

Theorem 1. There exists a uniquely determined measure λk on Bk with

(i) λk([0, 1]k) = 1,

(ii) ∀ b ∈ Rk ∀A ∈ Bk : λk(A+ b) = λk(A).

Definition 2. λk is called the k-dimensional Lebesgue measure.

Remark 3. For a, b ∈ Rk with ai ≤ bi we have λk([a, b]) =∏k

i=1 λ1([ai, bi]).

For further properties of (probability) measures we refer to Section III.3 in the Lecture

Notes on Maß- und Integrationstheorie (2016).

4 Real-valued Measurable Mappings

Given: a measurable space (Ω,A). Put

Z(Ω,A) = f : Ω→ R : f is A-B-measurable ,Z+(Ω,A) = f ∈ Z(Ω,A) : f ≥ 0 ,Z(Ω,A) =

f : Ω→ R : f is A-B-measurable

,

Z+(Ω,A) =f ∈ Z(Ω,A) : f ≥ 0

.

A.5. Integration 90

Remark 1. Every function f : Ω → R may also be considered as a function with

values in R, and in this case f ∈ Z(Ω,A) iff f ∈ Z(Ω,A).

Theorem 2. For ≺∈ ≤, <,≥, > and f : Ω→ R,

f ∈ Z(Ω,A) ⇔ ∀ a ∈ R : ω ∈ Ω : f(ω) ≺ a ∈ A.

Theorem 3. For f, g ∈ Z(Ω,A) and ≺∈ ≤, <,≥, >,=, 6=,

ω ∈ Ω : f(ω) ≺ g(ω) ∈ A.

Theorem 4. For every sequence f1, f2, . . . ∈ Z(Ω,A),

(i) infn∈N fn, supn∈N fn ∈ Z(Ω,A),

(ii) lim infn→∞ fn, lim supn→∞ fn ∈ Z(Ω,A),

(iii) if (fn)n∈N converges at every point ω ∈ Ω, then limn→∞ fn ∈ Z(Ω,A).

Remark 5. For f ∈ Z(Ω,A) we have f+, f−, |f | ∈ Z+(Ω,A).

Theorem 6. For f, g ∈ Z(Ω,A),

f ± g, f · g, f/g ∈ Z(Ω,A),

provided that these functions are well defined.

Definition 7. f ∈ Z(Ω,A) is called simple function if |f(Ω)| <∞.

Put

S(Ω,A) = f ∈ Z(Ω,A) : f simple ,S+(Ω,A) = f ∈ S(Ω,A) : f ≥ 0 .

Theorem 8. For every (bounded) function f ∈ Z+(Ω,A) there exists a sequence

f1, f2, . . . ∈ S+(Ω,A) such that fn ↑ f (with uniform convergence).

5 Integration

Given: a measure space (Ω,A, µ). Notation: S+ = S+(Ω,A).

Definition 1. The integral of f ∈ S+ w.r.t. µ is given by∫f dµ =

n∑i=1

αi · µ(Ai)

if

f =n∑i=1

αi · 1Ai

with αi ≥ 0 and Ai ∈ A.

A.5. Integration 91

Notation: Z+ = Z+(Ω,A).

Definition 2. The integral of f ∈ Z+ w.r.t. µ is given by∫f dµ = sup

∫g dµ : g ∈ S+ ∧ g ≤ f

.

Theorem 3 (Monotone convergence, Beppo Levi). Let fn ∈ Z+ such that

∀n ∈ N : fn ≤ fn+1.

Then ∫supnfn dµ = sup

n

∫fn dµ.

Remark 4. For every f ∈ Z+ there exists a sequence of functions fn ∈ S+ such that

fn ↑ f , see Theorem 4.8.

Theorem 5 (Fatou’s Lemma). For every sequence (fn)n in Z+∫lim infn→∞

fn dµ ≤ lim infn→∞

∫fn dµ.

Definition 6. A property Π holds µ-almost everywhere (µ-a.e., a.e.), if

∃A ∈ A : ω ∈ Ω : Π does not hold for ω ⊂ A ∧ µ(A) = 0.

Theorem 7. Let f ∈ Z+. Then∫f dµ = 0 ⇔ f = 0 µ-a.e.

Notation: Z = Z(Ω,A).

Definition 8. f ∈ Z is quasi-µ-integrable if∫f+ dµ <∞ ∨

∫f− dµ <∞.

In this case: integral of f (w.r.t. µ)∫f dµ =

∫f+ dµ−

∫f− dµ.

f ∈ Z µ-integrable if ∫f+ dµ <∞ ∧

∫f− dµ <∞.

Theorem 9.

(i) f µ-integrable ⇒ |f | <∞ µ a.e.,

(ii) f µ-integrable ∧ g ∈ Z ∧ f = g µ-a.e. ⇒ g µ-integrable ∧∫f dµ =

∫g dµ.

A.5. Integration 92

(iii) equivalent properties for f ∈ Z are:

(a) f µ-integrable,

(b) |f | µ-integrable,

(c) ∃ g : g µ-integrable ∧ |f | ≤ g µ-a.e.,

(iv) for f and g µ-integrable and c ∈ R

(a) f+g well-defined µ-a.e. and µ-integrable with∫

(f+g) dµ =∫f dµ+

∫g dµ,

(b) c · f µ-integrable with∫

(cf) dµ = c ·∫f dµ,

(c) f ≤ g µ-a.e. ⇒∫f dµ ≤

∫g dµ.

Theorem 10 (Dominated convergence, Lebesgue). Assume that

(i) fn ∈ Z for n ∈ N,

(ii) ∃ g µ-integrable ∀n ∈ N : |fn| ≤ g µ-a.e.,

(iii) f ∈ Z such that limn→∞ fn = f µ-a.e.

Then f is µ-integrable and ∫f dµ = lim

n→∞

∫fn dµ.

Example 11. Consider fn = n · 1]0,1/n[ on (R,B, λ1). Then limn→∞ fn = 0 but∫fn dλ1 = 1.

Given: a measurable space (Ω′,A′) and an A-A′-measurable mapping f : Ω→ Ω′.

Theorem 12 (Transformation ‘Theorem’).

(i) For g ∈ Z+(Ω′,A′) ∫Ω′g df(µ) =

∫Ω

g f dµ. (1)

(ii) For g ∈ Z(Ω′,A′)

g is f(µ)-integrable ⇔ g f is µ-integrable,

in which case (1) holds.

Definition 13. For f ∈ Z and A ∈ A∫A

f dµ =

∫1A · f dµ.

Theorem 14. For f ∈ Z+

ν(A) =

∫A

f dµ, A ∈ A,

defines a measure on A. If∫f dµ = 1 then ν is a probability measure.

A.6. Lp-Spaces 93

Definition 15. ν = f · µ (probability) measure with (probability) density f w.r.t. µ.

Theorem 16. Let ν = f · µ with f ∈ Z+ and let g ∈ Z(Ω,A). Then

g (quasi)-ν-integrable ⇔ g · f (quasi)-µ-integrable,

in which case ∫g dν =

∫g · f dµ.

6 Lp-Spaces

Given: a measure space (Ω,A, µ) and 1 ≤ p <∞. Put Z = Z(Ω,A) and Z = Z(Ω,A).

Definition 1.

Lp = Lp(Ω,A, µ) =f ∈ Z :

∫|f |p dµ <∞

.

In particular, for p = 1: integrable functions and L = L1, and for p = 2: square-

integrable functions . Put

‖f‖p =

(∫|f |p dµ

)1/p

, f ∈ Lp.

Theorem 2 (Holder inequality). Let 1 < p, q < ∞ such that 1/p + 1/q = 1 and let

f ∈ Lp, g ∈ Lq. Then ∫|f · g| dµ ≤ ‖f‖p · ‖g‖q.

In particular, for p = q = 2: Cauchy-Schwarz inequality,

Theorem 3. Lp is a vector space and ‖ · ‖p is a semi-norm on Lp. Furthermore,

‖f‖p = 0 ⇔ f = 0 µ-a.e.

Definition 4. Let f, fn ∈ Lp for n ∈ N. (fn)n converges to f in Lp (in mean of order

p) if

limn→∞

‖f − fn‖p = 0.

In particular, for p = 1: convergence in mean, and for p = 2: mean-square conver-

gence. Notation:

fnLp

−→ f.

Remark 5. Let f, fn ∈ Z for n ∈ N. Recall (define) that (fn)n converges to f µ-a.e.

if µ(Ac) = 0 for A =

limn→∞ fn = f∈ A. Notation:

fnµ-a.e.−→ f.

Lemma 6. Let f, g, fn ∈ Lp for n ∈ N such that fnLp

−→ f . Then

fnLp

−→ g ⇔ f = g µ-a.e.

Analogously for convergence almost everywhere.

A.7. Dynkin Classes 94

Theorem 7 (Fischer-Riesz). Consider a sequence (fn)n in Lp. Then

(i) (fn)n Cauchy sequence ⇒ ∃ f ∈ Lp : fnLp

−→ f (completeness),

(ii) fnLp

−→ f ⇒ ∃ subsequence (fnk)k : fnk

µ-a.e.−→ f .

Example 8. Let (Ω,A, µ) =([0, 1],B([0, 1]), λ1|B([0,1])

). Define

A1 = [0, 1]

A2 = [0, 1/2], A3 = [1/2, 1]

A4 = [0, 1/3], A5 = [1/3, 2/3], A6 = [2/3, 1]

etc.

Put fn = 1An . Then fnLp

−→ 0 but (fn)n converges = ∅.

Remark 9. Define

L∞ = L∞(Ω,A, µ) = f ∈ Z : ∃ c ∈ [0,∞[ : |f | ≤ c µ-a.e.

and

‖f‖∞ = infc ∈ [0,∞[ : |f | ≤ c µ-a.e., f ∈ L∞.

f ∈ L∞ is called essentially bounded and ‖f‖∞ is called the essential supremum of

|f |. Note that |f | ≤ ‖f‖∞ µ-a.e.

The definitions and results of this section, except fnL∞−→ 0 in Example 8, extend to

the case p =∞, where q = 1 in Theorem 2. In Theorem 7.(ii) we even have fnL∞−→ f

⇒ fnµ-a.e.−→ f .

Remark 10. Put

Np = f ∈ Lp : f = 0 µ-a.e..

Then the quotient space Lp = Lp/Np is a Banach space. In particular, for p = 2, L2

is a Hilbert space, with semi-inner product on L2 given by

〈f, g〉 =

∫f · g dµ, f, g ∈ L2.

Theorem 11. If µ(Ω) = 1 and 1 ≤ p < q ≤ ∞ then Lq ⊂ Lp and

‖f‖p ≤ ‖f‖q, f ∈ Lq.

7 Dynkin Classes


Definition 1. A ⊂ P(Ω) is a Dynkin class (in Ω) if

(i) Ω ∈ A,

A.8. Product Spaces 95

(ii) A1, A2, . . . ∈ A pairwise disjoint ⇒⋃∞n=1An ∈ A,

(iii) A1, A2 ∈ A ∧ A1 ⊂ A2 ⇒ A2 \ A1 ∈ A.

Theorem 2. For every Dynkin class A

A σ-algebra ⇔ A closed w.r.t. intersections.

Given: a class E ⊂ P(Ω).

Definition 3. The Dynkin class generated by E is given by

δ(E) =⋂A : A Dynkin class in Ω ∧ E ⊂ A.

Theorem 4. E closed w.r.t. intersections ⇒ σ(E) = δ(E).

8 Product Spaces

Given: a non-empty set I and measurable spaces (Ωi,Ai) for i ∈ I as well as mappings

fi : Ω→ Ωi for i ∈ I and some non-empty set Ω.

Definition 1. The σ-algebra generated by (fi)i∈I (and (Ai)i∈I) is given by

σ(fi : i ∈ I) = σ(⋃i∈I

f−1i (Ai)

).

Put σ(f) = σ(f) in the case |I| = 1 and f = fi.

Theorem 2. For every measurable space (Ω, A) and every mapping g : Ω→ Ω,

g is A-σ(fi : i ∈ I)-measurable ⇔ ∀ i ∈ I : fi g is A-Ai-measurable.

Put Y =⋃i∈I Ωi and define

×i∈I

Ωi = ω ∈ Y I : ω(i) ∈ Ωi for i ∈ I.

Notation: ω = (ωi)i∈I for ω ∈×i∈I Ωi. Moreover,

P0(I) = J ⊂ I : J non-empty, finite.

Definition 3. (i) A measurable rectangle is any set of the form

A =×j∈J

Aj ××i∈I\J

Ωi

with J ∈ P0(I) and Aj ∈ Aj for j ∈ J . Notation: R class of measurable

rectangles.

A.8. Product Spaces 96

(ii) The product (measurable) space (Ω,A) with components (Ωi,Ai), i ∈ I, is given

by

Ω =×i∈I

Ωi, A = σ(R).

Notation: A =⊗

i∈I Ai, product σ-algebra.

Put Ω =×i∈I Ωi. For any ∅ 6= S ⊂ I let

πIS : Ω→×i∈S

Ωi, (ωi)i∈I 7→ (ωi)i∈S

denote the projection of Ω onto ×i∈S Ωi. For short, πS instead of πIS and πi instead

of πi.

Theorem 4.

(i)⊗

i∈I Ai = σ(πi : i ∈ I).

(ii) ∀ i ∈ I : Ai = σ(Ei) ⇒⊗

i∈I Ai = σ(⋃

i∈I π−1i (Ei)

).

Theorem 5.

(i) For every measurable space (Ω, A) and every mapping g : Ω→ Ω

g is A-⊗i∈I

Ai-measurable ⇔ ∀ i ∈ I : πi g is A-Ai-measurable.

(ii) For every ∅ 6= S ⊂ I the projection πIS is⊗

i∈I Ai-⊗

i∈S Ai-measurable.

Remark 6. From Theorems 4.(i) and 5 we get⊗i∈I

Ai = σ(πIS : S ∈ P0(I)).

The sets (πIS)−1

(B) = B ×(×i∈I\S

Ωi

)with S ∈ P0(I) and B ∈

⊗i∈S Ai are called cylinder sets . Notation: C class of

cylinder sets.

Theorem 7. For every A ∈⊗

i∈I Ai there exists a non-empty countable set S ⊂ I

and a set B ∈⊗

i∈S Ai such that A =(πIS)−1

(B).

Given: a non-empty countable set I and a family of topological spaces (Ωi,Gi) where

i ∈ I. Consider the product topology G on Ω =×i∈I Ωi.

Theorem 8. If every space (Ωi,Gi) has a countable basis, then

B(Ω) =⊗i∈I

B(Ωi).

In particular, Bk =⊗k

i=1 B and Bk =⊗k

i=1 B.

A.9. Caratheodory’s Theorem 97

9 Caratheodory’s Theorem


Definition 1. A ⊂ P(Ω) is an algebra (in Ω) if

(i) Ω ∈ A,

(ii) A,B ∈ A⇒ A ∪B ∈ A,

(iii) A ∈ A⇒ Ac ∈ A.

Definition 2. µ : A→ [0,∞[ ∪ ∞ is a content (on A) if

(i) A is an algebra,

(ii) µ(∅) = 0,

(iii) µ is additive, i.e.,

A,B ∈ A disjoint ⇒ µ(A ∪B) = µ(A) + µ(B).

Theorem 3. The following are equivalent for a content µ on A with µ(Ω) <∞:

(i) A1, A2, . . . ∈ A pairwise disjoint ∧⋃∞i=1Ai ∈ A ⇒ µ

(⋃∞i=1Ai

)=∑∞

i=1 µ(Ai)

(σ-additivity)

(ii) A1, A2, . . . ∈ A ∧⋃∞i=1Ai ∈ A⇒ µ

(⋃∞i=1 Ai

)≤∑∞

i=1 µ(Ai) (σ-subadditivity),

(iii) A1, A2, . . . ∈ A ∧ An ↑ A ∈ A ⇒ limn→∞ µ(An) = µ(A) (σ-continuity from

below),

(iv) A1, A2, . . . ∈ A ∧ An ↓ A ∈ A ⇒ limn→∞ µ(An) = µ(A) (σ-continuity from

above),

(v) A1, A2, . . . ∈ A ∧ An ↓ ∅ ⇒ limn→∞ µ(An) = 0 (σ-continuity at ∅).

Theorem 4. For every σ-additive content µ on A with µ(Ω) <∞,

∃µ∗ measure on σ(A): µ∗|A = µ.

10 The Factorization Lemma

Given: a set Ω1 6= ∅, a measurable space (Ω2,A2), and a mapping X : Ω1 → Ω2.

Theorem 1. For every Y : Ω1 → R

Y σ(X)-B-measurable ⇔ ∃ g : Ω2 → R A2-B-measurable : Y = g X.

Literature

H. Bauer, Probability Theory, de Gruyter, Berlin, 1996.

P. Billingsley, Probability and Measure, Wiley, New York, 1st ed. 1979, 3rd ed. 1995.

Y. S. Chow, H. Teicher, Probability Theory, Springer, New York, 1st ed. 1978, 3rd ed.

1997.

J. Elstrodt, Maß- und Integrationstheorie, Springer, Berlin, 1st ed. 1996, 7th ed. 2011.

P. Ganssler, W. Stute, Wahrscheinlichkeitstheorie, Springer, Berlin, 1977.

A. Irle, Wahrscheinlichkeitstheorie und Statistik, Teubner, Stuttgart, 1st ed. 2001, 2nd

ed. 2005.

O. Kallenberg, Foundations of Modern Probability, Springer, New York, 2002.

A. Klenke, Wahrscheinlichkeitstheorie, Springer, Berlin, 1st ed. 2006, 3rd ed. 2013. [in

English: Probability Theory, Springer, Berlin, 2014.]

T. Muller-Gronbach, E. Novak, K. Ritter, Monte Carlo-Algorithmen, Springer, Berlin,

2012.

K. R. Parthasarathy, Probability Measures on Metric Spaces, Academic Press, New

York, 1967.

R. L. Schilling, Wahrscheinlichkeit, de Gruyter, Berlin, 2017.

R. L. Schilling, L. Partzsch, Brownian Motion, de Gruyter, Berlin, 1st ed. 2012, 2nd

ed. 2014.

A. N. Sirjaev, Wahrscheinlichkeit, Deutscher Verlag der Wissenschaften, Berlin, 1988.

[in English: Probability, Springer, New York, 1984.]

D. Williams, Probability with Martingales, Cambrigde University Press, Cambrigde,

1991.

J. Yeh, Martingales and Stochastic Analysis, World Scientific, Singapore, 1995.

Bauer (1996) is available via http://www.degruyter.com/viewbooktoc/product/3786

Elstrodt (2011), Irle (2005), Klenke (2013), and Muller-Gronbach et al. (2012) are available

via http://link.springer.com

98

http://www.degruyter.com/viewbooktoc/product/3786

http://link.springer.com

Definitions

absolutely continuous measure, 79

additivity, 97

algebra, 97

almost everywhere, 91

almost surely, 4

asymptotically negligible, 55

Bernoulli distribution, 5

binomial distribution, 5

Black-Scholes model, 75

Borel set, 89

Borel σ-algebra, 89

boundary value problem

elliptic PDE, 75

Brownian motion, 72

Markov property, 76

projective reflection, 76

scaling invariance, 76

symmetry, 76

time reversal, 76

canonical process, 69

Cauchy distribution, 7

characteristic function, 50

conditional density, 86

conditional expectation, 82, 84

elementary, 81

conditional probability, 82, 84

elementary, 81

content, 97

convergence

almost everywhere, 93

in Lp, 93

in distribution, 10

in mean, 93

in mean-square, 93

in probability, 8

weak, 10

convolution, 35

counting measure, 88

covariance, 35

covariance matrix, 62

cylinder set, 68, 96

density, 93

Dirac measure, 88

discrete distribution, 4

distribution, 3

distribution function, 6

Dynkin class, 94

generated by a class of sets, 95

empirical distribution, 47

empirical distribution function, 47

essential supremum, 94

essentially bounded function, 94

event, 3

expectation, 7, 62

exponential distribution, 5

Feller condition, 55

Fourier transform

of a probability measure, 48

of an integrable function, 49

Fourier transformation, 49

geometric Brownian motion, 75

geometric distribution, 5

i.i.d, 37

identically distributed, 4

image measure, 88

independence

of a family of classes, 31

of a family of events, 30

of a family of processes, 66

of a family of random elements, 32

integrable function, 91

complex-valued, 48

99

DEFINITIONS 100

integral, 91

of a complex-valued function, 48

of a non-negative function, 91

of a simple function, 90

joint distribution, 34

kernel, 27

σ-finite, 27

Lebesgue measure, 89

Levy distance, 16

limes inferior, 38

limes superior, 38

Lindeberg condition, 55

Lyapunov condition, 55

marginal distribution, 34

finite-dimensional, 66

Markov kernel, 22

mean, 62

measurable mapping, 88

measurable rectangle, 68, 95

measurable set, 87

measurable space, 87

measure, 87

σ-finite, 27

with density, 93

measure space, 87

Monte Carlo algorithm, 46

normal distribution

multi-dimensional, 49, 61

one-dimensional, 5

Poisson distribution, 5

positive semi-definite function, 49

probability density, 93

probability measure, 87

with density, 93

probability space, 87

product measurable space, 96

product measure, 30

product measure space, 30

product probability measure, 26

product probability space, 26

product σ-algebra, 96

quasi-integrable function, 91

random element, 3

random variable, 3

random vector, 3

random walk, 37

relatively compact set of measures, 17

section

of a mapping, 23

of a set, 23

σ-additivity, 87, 97

σ-algebra, 87

generated by a class of sets, 88

generated by a family of mappings, 66,

95

σ-continuity at ∅, 97

σ-continuity from above, 97

σ-continuity from below, 97

σ-subadditivity, 97

simple function, 90

square-integrable function, 93

standard deviation, 7

stochastic process, 65

distribution, 66, 68

parameter set, 65

realization, 65

state space, 65

with continuous paths, 66

with independent increments, 72

with stationary increments, 72

tail σ-algebra, 37

tail (terminal) event, 37

tightness, 17

unbiased estimator, 46

uncorrelated random variables, 35

uniform distribution

on a finite set, 5, 88

on a subset of Rk, 5

uniform integrability, 18

variance, 7

Wiener measure, 71

with probability one, 4

Probability Theory - TU Kaiserslautern · Introduction A stochastic model: ... Basic Concepts of...

Documents

Transcript of Probability Theory - TU Kaiserslautern · Introduction A stochastic model: ... Basic Concepts of...