Notes from Limit Theorems 2 Mihai Nica

Notes from Limit Theorems 2

Mihai Nica

Notes. These are my notes from the class Limit Theorems 2 taught by Proffe-sor McKean in Spring 2012. I have tried to carefully go over the bigger theoremsfrom the course and fill in all the details explicitly. There is also a lot of informationthat is folded in from other sources.

• The section on Martingales is supplemented with some notes from ”A FirstLook at Rigorous Probability Theory” by Jeffrey S. Rosenthal, which hasa really nice introduction to Martingales.

• The section of the law of the iterated logarithm is supplemented withsome inequalities which I looked up on the internet...mostly wikipediaand PlanetMath.

• In the section on Ergodic theorem, I use a notation I found on wikipediathat I like for continued fractions. In my pen-and-paper notes, there isalso a little section about Ergodic theory for geodesics on surfaces, whichis really cute. However, I couldn’t figure out a good way to draw thepictures so it hasn’t been typed up yet.

• The section on Brownian Motion is supplemented by the book BrownianMotion and Martingale’s in Analysis by Richard Durret which is reallywonderful. Some of the slick results are taken straight from there.

• I also include an appendix with results that I found myself reviewing as Iwent through this stuff.

Contents

Chapter 1. Martingales 51. Definitions and Examples 52. Stopping times 63. Martingale Convergence Theorem 74. Applications 9

Chapter 2. The Law of the Iterated Logarithm 131. First Half of the Law of the Iterated Logarithm 132. Second Half of the Law of the Iterated Logarithm 15

Chapter 3. Ergodic Theorem 191. Motivation 192. Birkhoff’s Theorem 203. Continued Fractions 24

Chapter 4. Brownian Motion 291. Motivation 292. Levy’s Construction 303. Construction from Durret’s Book 334. Some Properties 36

Chapter 5. Appendix 391. Conditional Random Variables 392. Extension Theorems 40

3

CHAPTER 1

Martingales

1. Definitions and Examples

This section on Martingales contains heavy use of conditional random variables.I do a quick review of this topic from Limit Theorems 1 in the appendix.

Definition 1.1. A sequence of random variablesX0, X1, ... is called amartingaleif E(|Xn|) <∞for all n and with probability 1:

E (Xn+1|X0, X1, ..., Xn) = Xn

Intuitively, this is says that the average value of Xn+1is the same as that of Xn,even if we are given the values of X0to Xn. Note that conditioning on X0, ..., Xnisjust different notation for conditioning on σ(X0, ..., Xn), which is the sigma algebragenerated by preimages of Borel sets throughX0, ..., Xn.One can make more generalmartingales by replacing σ(X0, ..., Xn) with an arbitrary increasing chain of sigmaalgebras Fn; the results here carry over to that setting too.

Example 1.2. Sometimes martingales are called “fair games”. The analogy isthat the random variable Xn represents the bankroll of the gambler at time n. Thegame is fair, because at any point in time the equity of the gambler is constant.

Definition 1.3. A submartingale is when E (Xn+1|X0, X1, ..., Xn) ≥ Xn (i.e.the capital is increasing) and a supermartingale is when E (Xn+1|X0, X1, ..., Xn) ≤Xn (i.e. the capital is decreasing) Most of the theorems for martingales work forsubmartingales, just change the inequality in the right place. To avoid confusionbetween sub-, super-, and ordinary martingales, we will sometimes call a martingalea “fair martingale”.

Example 1.4. The symmetric random walk, Xn = Z0 + Z1 + ... + Zn witheach Zn = ±1 with probability 1

2 is a martingale. In terms of the fair game, this isgambling on the outcome of a fair coin.

Remark. Using the properties of conditional probabilities to see that:

E (Xn+2|X0, X1, ..., Xn) = E (E (Xn+2|X0, X1, ..., Xn+1) |X0, ...Xn)= E (Xn+1|X0, ...Xn)= Xn

With a simple argument by induction, we get that in general:

E (Xm|X0, X1, ..., Xn) = Xn

In particular then E(Xn) = E(X0) for every n. If τ is a random “time”, (anon-negative integer) that is independent of the Xn’s, then E(Xτ ) is a weightedaverage of E(Xn)’s, so have E(Xτ ) = E(X0) still. What if υis dependent on the

5

6 1. MARTINGALES

X ′ns? In general we cannot have equality for the example of the simple symmetricrandom walk (coin-flip betting), with τ =first time that Xn = −1 has E(Xn) =−1 6= 0 = E(X0). The next section gives some conditions where this holds.

2. Stopping times

Definition 2.1. For a martingale Xn, A non-negative integer valued randomvariable τ is a stopping time if it has the property that:

τ = n ∈ σ(X1, X2, . . . , Xn)

Intuitively, this is saying that one can determine if τ = n just by looking at thefirst n steps in the martingale.

Example 2.2. In the example of the random coin flipping, if we let τ be thefirst time so that Xn =10, then τ is a stopping time.

Example 2.3. We often are interested in Xτ , the value of the martingale atthe random time τ. This is precisely defined as Xτ (ω) = Xτ(ω)(ω). Another handyrewriting is: Xτ =

∑Xk1τ=k .

Lemma 2.4. If Xnis a submartingale and τ1, τ2are bounded stopping timesso that ∃M s.t. 0 ≤ τ1 ≤ τ2 ≤ M with probability 1, then E(Xτ1) ≤ E(Xτ2), withequality for fair martingales.

Proof. For fixed k, the event τ1 < k ≤ τ2can be written as τ1 < k ≤τ2 = τ1 ≤ k − 1 ∩ τ2 ≤ k − 1C from which we see that the event τ1 <k ≤ τ2 ∈ σ(X0, X1, . . . , Xk−1) because τ1and τ2are both stopping times. Wehave then the following manipulation using a telescoping series, linearity of theexpectation, the fact that E(Y 1A)= E(E(Y |X0, X1, . . . , Xk−1)1A) for events A ∈σ(X0, X1, . . . , Xk−1), and finally the fact that E(Xk|X0, X1, . . . Xk−1)−Xk−1 ≥ 0since Xn is a (sub)martingale. (with equality for fair martingales):

E(Xτ2)−E(Xτ1) = E(Xτ2 −Xτ1)

= E(τ2∑

k=τ1+1

Xk −Xk−1)

= E

(M∑k=1

(Xk −Xk−1)1τ1<k≤τ2

)

= E

(M∑k=1

(E(Xk|X0, X1, . . . Xk−1)−Xk−1)1τ1<k≤τ2

)

=M∑k=1

E((E(Xk|X0, X1, . . . Xk−1)−Xk−1)1τ1<k≤τ2

)≥

M∑k=1

E(01τ1<k≤τ2

)= 0

Where the inequality is equality in the case of a fair martingale.

3. MARTINGALE CONVERGENCE THEOREM 7

Theorem 2.5. Say Xn is a martingale and τ a bounded stopping time, (thatis ∃M s.t. 0 ≤ τ ≤M with probability 1). Then:

E(Xτ ) = E(X0)

Proof. Let υbe the random variable which is constantly 0. This is a stoppingtime! So by the above lemma, since 0 ≤ υ ≤ τ ≤ M , we have that E(Xτ ) =E(Xυ) = E(X0)

Theorem 2.6. For Xna martingale and τ a stopping time which is almostsurely finite (that is P(τ <∞) = 1) we have:

E(Xτ ) = E(X0) ⇐⇒ E(

limn→∞

Xmin(τ,n)

)= limn→∞

E(Xmin(τ,n)

)Proof. It suffices to show that E(Xτ ) = E

(limn→∞Xmin(τ,n)

)andE(X0) =

limn→∞E(Xmin(τ,n)

). The first equality holds since P(τ <∞) = 1 gives P(limn→∞Xmin(τ,n) =

Xτ ) = 1, so they agree almost surely. The second holds by the above theorem con-cerning bounded stopping times since for any n, min(τ, n) is a bounded stoppingtime, so we have E

(Xmin(τ,n)

)= E(X0), so equality holds in the limit too.

Remark. The above theorem can be combined with things like monotoneconvergence theorem or Lebesgue dominated convergence theorem to switch thelimits and conclude that E(Xτ ) = E(X0). Here are some examples:

Example 2.7. If Xnis a martingale and τ a stopping time so that P(τ <∞) = 1 and E(|Xτ |) <∞, and limn→∞E(Xn1τ>n) = 0, then E(Xτ ) = E(X0).

Proof. For any n we have: Xmin(τ,n) = Xn1τ>n+Xτ1τ≤nTaking expectationand then the limit as n→∞, gives:

limn→∞

E(Xmin(τ,n)) = limn→∞

E(Xn1τ>n) + limn→∞

E(Xτ1τ>n)

= 0 + E(Xτ )

Where the first term is 0 by hypothesis, and the second limit is justified sinceXτ1τ>n → Xτpointwise almost surely since P(τ < ∞) = 1, and the dominantmajorant E(|Xτ |) <∞lets us use the Lebesgue dominated convergence theorem toconclude the convergence of the expectation.

Example 2.8. Suppose Xnis a martingale and τ a stopping time so thatE(τ) < ∞ and |Xn+1 − Xn| ≤ M < ∞for some fixed M and for every n. ThenE(Xτ ) = E(X0).

Proof. Let Y = |X0|+Mτ . Then Y can be used as a dominant majorant ina L.D.C.T. very similar to the above example to get the conclusion.

3. Martingale Convergence Theorem

The proof relies on the famous upcrossing lemma:

Lemma 3.1. [The Upcrossing Lemma]. Let Xnbe a submartingale. For fixedα, β ∈ R, β > α,and M ∈ N let Uα,βM be the number of “upcrossings” that themartingale Xnmakes of the interval α, β in the time period 1 ≤ n ≤ M . (Anupcrossing is when Xn goes from being less than α initially to being more than β

8 1. MARTINGALES

later. Precisely this is: Uα,βM = maxkk : ∃t1 < u1 < . . . < tk < uk ≤ M s.t. Xti ≤α and Xui ≥ β ∀i ). Then:

E(Uα,βM ) ≤ E (|XM −X0|)β − α

Proof. Firstly, we remark that it suffices to prove the result when the sub-martingale Xnis replaced by max(Xn, α), since this is still a submartingale,it has the same number of upcrossings as Xn, and |max(XM , α) −max(X0, α)| ≤|XM−X0|, so the equality is only strengthened. In other words, we assume withoutloss of generality that Xn ≥ α for all n. This simplification is used in exactly onespot later on to get the inequality we need.

Let us now carefully nail down where the upcrossings happen. Define u0 =v0 = 0 and iteratively define:

uj = min(M, infk>vj−1

k : Xk ≤ α

vj = min(M, infk>ujk : Xk ≥ β

These record the times where the martingale crosses the interval [α, β]; the uj ’srecord when it first crosses moving to the left of the interval, and the vj ’s recordcrosses going to the right of the interval. They are also truncated at time M so thatthey are bounded stopping times. Moreover, since these times are strictly increasinguntil they hit M , it must be the case that vM = M . We have then, using somecrafty telescoping sums:

E(XM ) = E (XvM )= E

(XvM −XuM +XuM −XvM−1 +XvM−1 − . . .−Xu1 +Xu1 −X0 +X0

)= E (X0) + E

(M∑k=1

Xvk −Xuk

)+

M∑k=1

E(Xuk −Xvk−1

)The third term is non-negative! This is because uk and vk−1are both bounded

stopping times with 0 ≤ vk−1 ≤ uk ≤ M , so our theorem about stopping timesgives that this expectation is non-negative. (This is subtle! Most of the time(when we haven’t hit time M yet) we expect Xuk < αwhile Xvk−1 > β, so theirdifference is negative. However, because of the small probability event where vk−1 <M and uk = M , we get a big positive number with small probability which balancesthe whole expectation. Compare to the example of a simple symmetric random walkwith a truncated stopping time for τ =first time that Xn = −1.)

Now the second term, has E(∑M

k=1Xvk −Xuk

)≥ E

((β − α)Uα,βM

). This is

because each upcrossing counted in Uα,βM contributes at least (β−α) to the sum, nullcycles (where uk = vk = M) contribute nothing, and the possibly one incompletecycle (where uk < M but vk = M) must give a non-negative contribution to thesum by the simplification that Xn > α.

Hence we have:

E(XM ) ≥ E (X0) + (β − α)E(Uα,βM

)+ 0

Which gives the desired result.

4. APPLICATIONS 9

Theorem 3.2. [Martingale Convergence Theorem] Let Xnbe a submartingalewith supn E(|Xn|) < ∞. Then there exists a random variable X so that Xn → Xalmost surely. (That is Xn(ω) = X(ω) for almost all ω ∈ Ω).

Proof. Firstly, since supn E(|Xn|) <∞, by Fatou’s lemma we have: E(lim infn |Xn|) ≤lim infn E(|Xn|) ≤ supn E(|Xn|) < ∞, from which it follows that P(|Xn| → ∞) =0. This ensures that the Xncannot “leak away” probability to ±∞, which wouldprevent the limiting random variable from being properly normalized.

Now suppose by contradiction that P(lim inf Xn < lim supXn) > 0, i.e. there isa non-zero probability of Xnnot converging. Then, using the density of the rationalsand countable subadditivity to find an α and β so that P(lim inf Xn < α < β <lim sup Xn) > 0. Counting the number of upcrossing Xn makes of [α, β],we see thatwe must have: P

(limM→∞

Uα,βM =∞)> P(lim inf Xn < α < β < lim sup Xn) > 0.

Hence E(

limM→∞

Uα,βM

)= ∞. By the monotone convergence theorem however, we

have that limM→∞

E(Uα,βM ) =E(

limM→∞

Uα,βM

)=∞.

But now we have reached a contradiction! For by the upcrossing lemma:

limM→∞

E(Uα,βM ) ≤ limM→∞E (|XM −X0|)β − α

≤ 2 supM E(|Xn|)β − α

<∞

4. Applications

Theorem 4.1. [Levy]Suppose Z a random variable with E(|Z|) <∞, and thatFn is a decreasing chain of σ−algebras, F1 ⊃ F2 ⊃ . . . (This is saying that theyare getting coarser and coarser). Let F∞ = ∩Fn. Then we have almost surely:

limn→∞

E(Z|Fn) = E(Z|F∞)

Proof. We first prove that there is an almost sure limit using the martingaleconvergence theorem, and then we check the defining properties of E(Z|F∞) toverify that this is indeed the limit.

Firstly, letXn = E(Z|Fn). Then for any fixedM ∈ N we have that the sequenceXM , XM−1, . . . X2, X1is a martingale (Here we are referring to a slightly more gen-eral martingale than in our original definition, the sigma algebra σ(X1, X2, . . .) inthe definition is replaced by arbitrary increasing sigma algebras Fn. The expec-tation property of the martingale follows by the fact that E(E(Z|F)|G) = E(Z|G)when G ⊂ F) Notice that we had to reverse the order of the sequence to get thesigma algebras to increase (i.e. get finer and finer), so that we really have a martin-gale. For this reason, the martingale convergence theorem does not apply directlybut the idea of the proof will still work. Suppose by contradiction, as in the proof ofthe martingale convergence theorem, that P(lim inf Xn < lim sup Xn) > 0. Then,as before, find αand β so that P(lim inf Xn < α < β < lim supXn) > 0. Since thereare infinitely many crossings then of the interval [α, β], we can know that the numberof downcrossings Dα,β

M has P(

limM→∞

Dα,βM =∞

)> 0 and so E

(limM→∞

Dα,βM

)= ∞.

Hence, since Dα,βM is increasing in M (the number of downcrossings can only in-

crease if we wait longer), we may find an M0 ∈ Nso that E(Dα,βM0

)> 2E(|Z|)

β−α .

10 1. MARTINGALES

Taking now the martingale sequence XM0 , XM0−1, . . . X2, X1, we have a violationof the upcrossing lemma just as we did in the martingale convergence theorem.

Next, to verify that the limit is indeed E(Z|F∞) we just need to check thetwo defining properties, namely that it is F∞measurable and that it has the cor-rect expectation value for events in F∞. limn→∞E(Z|Fn) is F∞measurable, sinceF∞ ⊂ Fn for every n, meaning that E(Z|Fn) is F∞measurable for every n, and sothe limit is too.

To see that limn→∞E(Z|Fn) takes the correct expectations for events in F ,notice that for any A ∈ F∞ ⊂ Fn we have for every n that E (E(Z|Fn)1A) =E(Z1A) since A ∈ Fn, so in the limit limn→∞E (E(Z|Fn)1A) = E(Z1A). Hencethe problem of proving that E (limn→∞E(Z|Fn)1A) = E(Z1A) is reduced to aninterchange of a limit with an expectation. If Z is bounded, this is justified by thebounded convergence theorem. For Z not bounded, truncating Z by Z1Z≤Nwitha bit more work will give the same interchange of limits.

Theorem 4.2. [Levy] Suppose Z a random variable with E(|Z|) <∞, and thatFn is an increasing chain of σalgebras, F1 ⊂ F2 ⊂ . . . (This is saying that theyare getting finer and finer). Let F∞ = ∪Fn. Then we have almost surely:

limn→∞

E(Z|Fn) = E(Z|F∞)

Proof. This proof is like the last one. In this case E(Z|Fn) really is a mar-tingale (no backwards), so an almost sure limit exists by the martingale conver-gence theorem. Some more work here is needed....I think you get the desired prop-erty by approximation with “tame events” A ∈ F∞, for every ε > 0 there existsAn ∈ Fnsuch that P(A∆An) < ε.

Remark. This result is often known as the “Levy Zero-One Law” since acommon application is to consider an event A ∈ F∞, for which the theorem tellsus that:

limn→∞

P(A|Fn) = limn→∞

E(1A|Fn)

= E(1A|F∞)= 1A

Where the last equality holds since A is F∞measurable. This says in particularthat this probability is either 0 or 1, since these are the only two values taken onby 1A. In this setting, the theorem gives a short proof of the Kolmogorov zero-onelaw.

Theorem 4.3. [Kolmogorov Zero-One law] Let X1, X2, . . .be an infinite se-quence of i.i.d. random variables. Define:

Fn = σ

(n⋃k=1

σ(Xk)

)

F∞ =n⋃k=1

Fn

Ftail =∞⋂n=1

σ

( ∞⋃k=n

σ(Xk)

)

4. APPLICATIONS 11

Then any event A ∈ Ftail has either P(A) = 0 or P(A) = 1. These are thoseevents which do not depend on finitely many of the X ′ns.

Proof. Let A ∈ Ftail. For any n ∈ N we have that P(A) = P(A|Fn) =E(1A|Fn) since A ∈ Ftail does not depend on the first n variables, so its conditionalexpectation is a constant. Have then, (as in the above “Levy 0-1” remark):

P(A) = limn→∞

P(A|Fn)

= 1A

Since A ∈ F∞So indeed, the only the values of P(A) that are possible are 1and 0.

Theorem 4.4. [Strong Law of Large Numbers] Suppose X1, X2, . . . are i.i.d.Then we have almost surely that:

limn→∞

X1 +X2 + . . .+Xn

n= E(X1)

Proof. Define Sn = X1 + X2 + . . . + Xn, and let Fn = σ(∞⋃k=n

σ(Sk)) be the

sigma algebra of the tail Sn, Sn+1, . . . . We now claim that:

E(X1|Fn) =Snn

This can be seen in the following slick way. First notice that by symmetry,we must have E(X1|Fn) = E(X2|Fn) = . . . = E(Xn|Fn). By linearity now:∑nk=1 E(Xk|Fn)=E(Σnk=1Xk|Fn) = E(Sn|Fn) = Sn, since Sn ∈ Fn. Hence since

they are all equal, and sum to Sn, we get E(X1|Fn) = Snn as desired. By Levy’s

theorem now:

limn→∞

Snn

= limn→∞

E(X1|Fn)

= E

(X1|

⋂k

Fk

)From here, one can use the Hewitt-Savage zero-one law (which says that per-

mutation invariant events have a zero one law), to see that the whole sigma algebra⋂k Fk must be the trivial one, so then E (X1|

⋂k Fk) = E(X1). Alternatively, once

we have conclude that such an almost sure limit exists, one could then remarkby the Kolmogorov zero that the limit must be a constant (for limn→∞

Snn does

not depend on finitely many of the X ′ns so any type of event lim Snn < αmust

have probability 0 or 1. By taking a sup, we can find that it must be a constant.)Combining this with the above, using the fact that conditional random variablespreserve the expectation, shows the constant is indeed E(X1).

Theorem 4.5. [Hewitt Savage Zero-One Law] Let X1, X2, . . .be an infinite se-quence of i.i.d. random variables. Let A be an event which is unchanged underfinite permutations of the induces of the X ′is. (e.g. for every finite permutationΠ,ω = (x1, x2, . . .) ∈ A iff Π(ω) = (xΠ(1), xΠ(2), . . .) ∈ A i.e. Π(A) = A). ThenP(A) = 0 or 1.

12 1. MARTINGALES

Proof. We call an event “tame” if it only depends on finitely many of theX ′is.The proof is a consequence of the fact that for any ε, any event A can beapproximated by a “tame event” B so that P(B4A) < ε. (This is completely anal-ogous to the fact that for the usual Lebesgue measure on R, one can approximateany measurable set S by a finite union of open intervals In so that λ(∪ni=1Ii4U) < ε.This comes from the definition of the Lebesgue measure as the inf of the outer mea-sure with open sets, and the fact that every open set is a union of countably manyintervals, of which only finitely many are needed to be within ε/2. In the same vein,the probability measure on the infinite sequence of events is generated by the outermeasure from tame events. This is usually all packaged up in the Caratheodory ex-tension theorem.). Once we have this tame event B, depending only on X1, . . . Xn

we letΠ be the permutation that permutes 1, . . . n with n+ 1, . . . , 2n so that B andΠ(B) are independent events. Have then:

P(A) ≈ P(A ∩B)= P(Π(A) ∩Π(B))= P(A ∩Π(B))≈ P(B ∩Π(B))= P(B)P(Π(B))= P(B)2

≈ P(A)2

Where each of the approximations hold within ε by the choice of B. Since wecan do this for every ε > 0, we get P(A) = P(A)2 and the result follows.

CHAPTER 2

The Law of the Iterated Logarithm

We will prove that for a sequence of i.i.d events X1, X2, . . .with mean 0 andvariance 1 that for Sn =

∑ni=1Xn:

P

(lim supn→∞

Sn√n log(log n))

=√

2

)This result is giving us finer information about these sums than the law of large

numbers or the central limit theorem. We need the theory of martingales to getDoob’s inequality, and then a bunch of other sneaky tricks, like the Borel Cantellilemmas, to get the result. We will also need a few analytic type estimates along theway. (Actually, our proof here will only prove the case where the X ′ns are ±1 withprobability 1/2 each. The result can be generalized by using even finer estimates)

1. First Half of the Law of the Iterated Logarithm

To start, we will first prove some helpful lemmas.

Lemma 1.1. [Doob’s Inequality] For a submartingale Zn, we have for any α > 0that:

P((

max0≤i≤n

Zi

)≥ α

)≤ E(|Zn|)

α

Proof. (Taken from Rosenthal) Let Akbe the event that Xk ≥ α, but Xi <α for i < k,i.e. that the process reaches αfor the first time at time k. These aredisjoint events with A = ∪Ak = (max0≤i≤n Zi) ≥ α which is the event we want.Now consider:

αP(A) =n∑k=0

αP(Ak)

=∑

E(α1Ak)

≤∑

E(Xk1Ak) since Xk ≥ αon Ak

≤∑

E (E (Xn|X1, X2, . . . Xk) 1Ak) since it’s a submartingale

=∑

E(Xn1Ak)

= E(Xn1A)≤ E(|Xn|)

And the result follows.

13

14 2. THE LAW OF THE ITERATED LOGARITHM

Remark. This is a “rich man’s version of Chebyushev-type inequalities”, whichare proved using the same trick as in lines 3 and 4 of the inequality train above.The fact that the behavior of the whole martingale can be controlled by the endpoint of the martingale gives us the little extra oomph we need.

Lemma 1.2. [Hoeffding’s Inequality] Let Y be a random variable so that E(Y ) =0 and a, b ∈ R so that a ≤ Y ≤ b almost surely. Then E(etY ) ≤ et2(b−a)2/8.

Proof. Write X as a convex combination of a and b: Y = αb+(1−α)a whereα = (Y − a)/(b− a). By convexity of e( ), have then:

etY ≤ Y − ab− a

etb +b− Yb− a

eta

Taking expectations (and using E(Y ) = 0), have:

E(etY)≤ −ab− a

etb +b

b− aeta = eg(t(b−a))

For g(u) = −γu+ log(1− γ + γeu)and γ = − ab−a . Notice g(0) = g′(0) = 0 and

g′′(u) < 14 for all u. Hence by Taylor’s theorem:

g(u) = g(0) + ug′(0) +u2

2g′′(ξ)

≤ 0 + 0 +u

2

2 14

=u2

8

So then E(etY)≤ eg(t(b−a)) ≤ et2(b−a)2/8

Lemma 1.3. Let X1, X2, . . . be i.i.d with P(X1 = ±1) = 12 and Sn =

∑nk=1Xk.

Then P(maxk≤n Sk > λ) ≤ e−λ2/2n.

Proof. Have, by using Doob’s inequality and Hoeffding’s Inequality, for anyt ∈ R, we have:

P(maxk≤n

Sk > λ) = P(maxk≤n

etSk > etλ)

≤ e−tλE(etSn)

= e−tλE(etX1)n

≤ e−tλent2(b−a)2/8

Set t = 4λ/n(b− a)2 to get:

P(maxk≤n

Sk > λ) ≤ e−(4λ/n(b−a)2)λen(4λ/n(b−a)2)2(b−a)2/8

= e−2λ2/n(b−a)2

For simple symmetric steps, we have a = −1 and b = 1, so this gives theresult.

Theorem 1.4. Let X1, X2, . . . be i.i.d with P(X1 = ±1) = 12 and Sn =∑n

k=1Xk. Then for any ε > 0,

P

(lim supn→∞

Sn√n log(log n))

>√

2 + ε

)= 0

2. SECOND HALF OF THE LAW OF THE ITERATED LOGARITHM 15

Or in other words, since this holds for any value of ε > 0:

P

(lim supn→∞

Sn√n log(log n))

≤√

2

)= 1

Proof. Fix some θ > 1 (the choice will be made more precise later). We willshow that with the correct choice of θ, the eventsAn = Sk >

√(2 + ε)k log(log k))for

some k, θn−1 ≤ k < θn happens only finitely many times, which will show thatthe limsup can’t be more than

√2 + ε. To do this it suffices to show that P(An)

is summable, because then the Borel-Cantelli lemmas will show that An happensfinitely often with probability 1. We have (using our previous lemma):

P(An) = P(Sk >

√(2 + ε)k log(log k)), θn−1 ≤ k < θn

)≤ P

(Sk >

√(2 + ε)θn−1 log(log θn−1)), θn−1 ≤ k < θn

)≤ P

(maxk≤θn

Sk >√

(2 + ε)θn−1 log(log θn−1)))

≤ exp

(−√

(2 + ε)θn−1 log(log θn−1))2

2θn

)

= exp

(−2 + ε

2θn−1(log(n− 1) + log(log(θ)))

θn

)≈ exp

(−(

1 +ε

2

)θ−1 log(n− 1)

)for large n

So choosing θ < 1 + ε2 , gives us that

(1 + ε

2

)θ−1 > 1, so this is:

P(An) ≤ (n− 1)−(1+ ε2 )θ−1

From which we see that P(An) is summable (it’s a p-series!). By using theBorel Cantelli lemma, this means that An happens only finitely many times withprobability 1, which is the desired result.

2. Second Half of the Law of the Iterated Logarithm

To prove the other half, we need some more estimates.

Lemma 2.1. [Mill’s Inequality] This is an estimate concerning the probabilitydensity function of a Gaussian:

λ

λ2 + 1e−λ

2/2 ≤∞∫λ

e−y2/2dy ≤ 1

λe−λ

2/2

Proof. To prove the lower bound, we find a remarkable anti-derivative:∞∫λ

e−y2/2dy ≥

∞∫λ

e−y2/2

(y4 + 2y2 − 1y2 + 2y2 + 1

)dy

=[− y

y2 + 1e−y

2/2

]∞λ

=λ

λ2 + 1e−λ

2/2


The upper bound is found by using the estimate y/λ > 1 in the range ofintegration:

∞∫λ

e−y2/2dy ≤

∞∫λ

y

λe−y

2/2dy

=1λ

[−e−y

2/2]∞λ

=1λe−λ

2/2

Theorem 2.2. Let X1, X2, . . . be i.i.d with P(X1 = ±1) = 12 and Sn =∑n

k=1Xk. Then for any ε > 0,

P

(lim supn→∞

Sn√n log(log n))

≥√

2− 2ε

)= 1

Or in other words, since this holds for any value of ε > 0:

P

(lim supn→∞

Sn√n log(log n))

≥√

2

)= 1

Proof. As in the proof of the other half of the law, the idea is to prove that theappropriate events happen infinitely often using the Borel-Cantelli lemmas. Fix θ >1 (the choice will be made precise later). LetBn =

Sθn − Sθn−1 ≥

√(2− ε)θn log(log(θn))

.

We will show that these occur infinitely often and then show why this gives theresult. Notice that the B′ns are independent, as each Bn depends only on the valueof Xk for θn−1 ≤ k ≤ θn, so to prove that Bnhappens i.o. it suffices to show, viathe Borel Cantelli lemma, that P(Bn) is not summable. Consider:

P(Bn) = P(Sθn − Sθn−1 ≥


)= P

(Sθn−θn−1 ≥


)≈ 1√

2π

∞∫√

(2−ε)θn log(log(θn))√θn−θn−1

e−y2/2dy

Where the first equality holds using the Markov property of the sums (equiv-alently, look at the definition as sums of X ′is and the fact the X ′is are i.i.d.), andthe second equality is coming asymptotically as θn − θn−1 → ∞ from the central

limit theorem. Now, let λ =√

(2−ε)θn log(log(θn))√θn−θn−1 be the lower bound of the integral

and use Mill’s inequality to get:

P(Bn) ≥ 1√2π

λ

λ2 + 1e−λ

2/2

=1√2π

1λ+ λ−1

e−λ2/2

2. SECOND HALF OF THE LAW OF THE ITERATED LOGARITHM 17

But now notice that λ =√

(2−ε)θn log(log(θn))√θn−θn−1 ≈

√2− ε

√logn√

1−θ−1 , so λ2 ≈ (2 −ε) logn

1−θ−1 . So our estimate is:

P(Bn) ≥ C1

√log n+

√log n−1 exp

(2− ε

2(1− θ−1)log n

)≥ Cn

−“

1−ε/21−θ−1

”log n−1/2

Where C’s are some constants. By choosing θlarge enough, 1−ε/21−θ−1 < 1 and this

will not be summable! Have then Bn occurs infinitely often.Now, we will show that these events Bnoccurring infinitely often will be enough

to see that Sθn ≥√

(2− 2ε)θn log(log(θn)) infinitely often too. To do this wewill use the first half of the law of the iterated logarithm we already proved,namely that for any η > 0, the events Sk >

√(2 + η)k log(log k))happen only

finitely often with probability 1. By symmetry, we’ll have the events Sk <

−√

(2 + η)k log(log k))happen only finitely often too. Hence, the events An =Sθn−1 < −

√(2 + η)θn−1 log(log θn−1))

happens only finitely often with proba-

bility 1. Now, since the B′ns occur infinitely often with probability 1, and the A′nsoccur only finitely often with probability 1, the events Bn ∩Acn will occur infinitelyoften with probability 1 too. This will give us the infinite sequence we need, for onthe event Bn ∩Acn we have the inequalities:

Sθn − Sθn−1 ≥√

(2− ε)θn log(log(θn))

Sθn−1 ≥ −√

(2 + η)θn−1 log(log θn−1))

Hence, with probability 1, we have that for infinitely many values of n:

Sθn ≥√

(2− ε)θn log(log(θn)) + Sθn−1

≥√

(2− ε)θn log(log(θn))−√

(2 + η)θn−1 log(log θn−1))

≥√

(2− ε)θn log(log(θn))−√

(2 + η)θ

θn log(log θn))

=

(√

2− ε−√

2 + η

θ

)√θn log(log(θn))

So by fixing η, (any choice will do) and then choosing θ large enough we can

make the coefficient(√

2− ε−√

2+ηθ

)≥√

2− 2ε. (Note that this doesn’t disrupt

our choice of θ previously because that too was a choice to make θ large, so we canalways find θso big to suit both our needs.) We have then that for infinitely manyn:

Sθn√θn log(log(θn))

≥√

2− 2ε

So then:

P

(lim supn→∞

Sn√n log(log n))

≥√

2− 2ε

)= 1

The two halves of the law of the iterated logarithm give the full result:


P

(lim supn→∞

Sn√n log(log n))

=√

2

)= 1

CHAPTER 3

Ergodic Theorem

1. Motivation

The study of Ergodic Theory was first motivated by statistical mechanics. Here,one is interested in the long term average of systems. For example, say we have someparticles with position Q(t) at time t, and momentum P (t) at time t. Let f be afunction on this state space, for example f might be the pressure/temperature/someother macroscopic variable. Can we find a distribution G so that:

limT→∞

1T

T∫0

f(Q(s), P (s))ds =∫fdG

Gibbs et al. worked on this problem and it turns out that G = 1Z e−H/kT with

Z the partition function, H the Hamiltonian, T temperature, and k Boltzmann’sconstant has this! These types of long term averaging things can be useful. We willstart with a simple example.

Example 1.1. Let Ω = [0, 1) = θ : 0 ≤ θ < 1 where we think of Ω as acircle with perimeter 1 (and θthe position on the circle). For some fixed angle ω, letT : Ω→ Ω be rotation by ω, that is T (θ) = θ + ω mod 1. This is clearly measurepreserving in the sense that for any set B we have that m(B) = m(T−1(B)) wherem is the usual Lebesgue measure. Could it be that:

limN→∞

1N

N−1∑n=0

f (Tnx) =

1∫0

f(s)ds

If ωis rational, this doesn’t have a chance, because Tneventually cycles backto the identity, so Tnx will only sample finitely many points. However, if ωisirrational, this is true! We can prove it in this case using Fourier analysis. Whenf(x) = e2πimx, for m ∈ N, we have the geometric series:

1N

N−1∑n=0

f (Tnx) =1N

N−1∑n=0

e2πim(x+nω)

=1Ne2πmx e

2πimNω − 1e2πimω − 1

→ 0

=

1∫0

f(s)ds

Where the fact that ωirrational ensures that e2πimω−1 6= 0. In the case m = 0,f is constant, so of course the result holds. Now for any f ∈ C2(Ω), we can expand

19

20 3. ERGODIC THEOREM

f as a Fourier series to see the result holds. This lets us calculate for example:

limN→∞

#k ≤ N : x+ kω ∈ (a, b)N

= b− a

For if f = 1(a,b) notice that #k≤N : x+kω∈(a,b)N = 1

N

∑N−1n=0 f (Tnx). By ap-

proximating f by C2functions (in the L1sense) from above and below, and applyingthe limit calculated above, we get the result.

Is there away we can do this kind of thing using probability methods (ratherthan Fourier)? The next result is a nice theorem in this direction.

2. Birkhoff’s Theorem

Theorem 2.1. [Birkoff-Khinchin Ergodic Theorem] Say (Ω,F ,P) is a proba-bility space. Suppose T : Ω → Ω is a measure preserving map, in the sense thatP(T−1(B)) = P(B) for all B ∈ F . Let F0 = A ∈ F : T−1A = A a.e. be the fieldof T invariant events. For f : Ω→ R a random variable with E(|f |) <∞, we havealmost surely:

limN→∞

1N

N−1∑n=0

f (Tnx) = E (f |F0)

Corollary 2.2. In the case that F0 is the trivial field, E (f |F0) = E(f) is aconstant, so this is exactly the thing we had above. This happens precisely whenT−1A = A⇒ P(A) = 0 or 1. In this case we say that the map T is “ergodic”.

The proof of this theorem relies on the following lemma.

Lemma 2.3. [Maximal Ergodic Lemma] Say (Ω,F ,P) is a probability space.Suppose T : Ω → Ω is a measure preserving map, in the sense that P(T−1(B)) =P(B) for all B ∈ F . Say f : Ω → R a random variable with E(|f |) < ∞. LetSn =

∑n−1k=1 f(T kx) and let A = supn≥1 Sn > 0 be the event that this is positive

at some point. Then:

E (f1A) =∫A

fdP > 0

Proof. Define f+(x) = f(Tx) and let mn = max0, S1, S2, . . . Sn, and m+n in

the same way, replacing f by f+ in the definition of Sk. Notice that by thisdefinition the mn’s are non-decreasing. Notice that the event A = supn≥1 Sn > 0is the same as saying mn > 0 for n large enough. For this reason, it will be enoughto restrict our attention to the events mn > 0. Notice that if we are in the eventmn > 0 then we have:

S1 +m+n = S1 + max0, S+

1 , S+2 , . . . S

+n

= S1 + max0, S2 − S1, S3 − S1, . . . Sn+1 − S1= maxS1, S2, . . . Sn+1= mn+1

Where we used that we’re on the eventmn > 0 in the last step to see the lastequality, and we used S+

n =∑n−1

0 f(T kTx) =∑n

1 f(Tx) = Sn+1−S1in the second

2. BIRKHOFF’S THEOREM 21

equality. We have then:

E(f1mn>0

)= E

(S11mn>0

)= E

((mn+1 −m+

n )1mn>0)

= E(mn+11mn>0

)−E

(m+n1mn>0

)≥ E

(mn+11mn>0

)−E

(m+n

)The last inequality holds since on the event mn = 0,we have S1 ≤ 0, so

m+n = mn+1 − S1 ≥ mn+1 ≥ 0, so E

(m+n1mn=0

)≥ 0. Hence E (m+

n ) =E(m+n1mn>0

)+ E

(m+n1mn=0

)≥ E

(m+n1mn>0

). From here, we note that

E(m+n ) = E(mn) since the map T is measure preserving, and the only difference

between m+n and mn is the map x→ Tx. Have then:

E(f1mn>0

)≥ E

(mn+11mn>0

)−E (mn)

= E(mn+11mn>0

)−E

(mn1mn>0

)= E

((mn+1 −mn)1mn>0

)≥ 0

The second equality holds since mn ≥ 0 always holds, and the last inequalityholds since the m′ns are non-increasing. Finally, to get the result, notice thatmn > 0 is increasing to supSn > 0, so by a monotone convergence theoremresult, we have:

E(f1supSn>0

)= limn→∞

E(f1mn>0

)≥ 0

With this in hand, we can prove Birkhoff’s theorem:

Theorem 2.4. [Birkoff-Khinchin Ergodic Theorem] Say (Ω,F ,P) is a proba-bility space. Suppose T : Ω → Ω is a measure preserving map, in the sense thatP(T−1(B)) = P(B) for all B ∈ F . Let F0 = A ∈ F : T−1A = A a.e. be the fieldof T -invariant events. For f : Ω→ R a random variable with E(|f |) <∞, we havealmost surely:

limN→∞

1N

N−1∑n=0

f (Tnx) = E (f |F0)

Proof. Firstly, we will argue that limN→∞1N

∑N−1n=0 f (Tnx) converges a.s. to

some random variables, and then we (as usual) check that it has the two definingproperties of conditional expectation.

Define SN =∑N−1n=0 f(Tnx) as before, so that we are interested in the sum

Sn/n. Suppose by contradiction that limN→∞1N

∑N−1n=0 f (Tnx) does not converge

a.s.. By the usual trick with rational numbers then, we can find a, b ∈ R so thatthe even A =

lim inf Sn

n ≤ a < b ≤ lim sup Snn

hasP (A) > 0. Notice moreover,

that A is a T -invariant event, i.e. x ∈ A ⇒ Tx ∈ A, since applying T shifts theterms in Sn by one, which does not affect the limsup or liminf of Sn/n. (Indeed,these don’t depend on finitely many of the terms!). For this reason, we may definea new probability measure on the set A, namely we think of (A, F , P) as a newprobability space, with F=A ∩ B : B ∈ Fand P(E) = P(E)/P(A). The factthat A is T -invariant means that Tnx ∈ A whenever x ∈ A so we can still talk


about Snand so on on this space. The fact that P(A) > 0 means that there is noproblem re-normalizing like this. So we have now P(A) = 1 is the whole space.With this new space as our framework, we let f ′(ω) = f(ω) − b, then we get newsums S′n with S′n

n = Snn − b and then A =

lim inf S′n

n ≤ a− b < 0 ≤ lim sup S′nn

.

Notice then that P(lim sup Snn ≥ 0) ≥ P(A) = 1 so then P(supS′n > 0) = 1 is

the whole space A. Have then by the maximal ergodic lemma that:

0 < E(f ′1supS′n>0) = E(f ′) = E(f)− b

The same argument on f ′′′(ω) = a− f(ω) gives:

0 < E(f ′′1supS′′n>0) = a− E(f)

But this is a contradiction now, for we have:

a > E(f) > b

Which is impossible since a < b. This contradiction means that its impossibleto separate the liminf and the limsup like this, in other words we have almost sureconvergence.

Next it remains only to see that the random variable that this converges to isE(f |F0). Let us denote Firstly, notice that limN→∞

1N

∑N−1n=0 f (Tnx) by f . We

must show f is F0 measurable and that E(f1A) = E(f1A) for all A ∈ F0. Noticethat applying x→ Tx does not change limN→∞

1N

∑N−1n=0 f (Tnx) as it only effects

finitely many terms. This shows that f(x) = f(Tx) This is the reason why f is F0

measurable. More formally, to see that f−1(B) is T -invariant for any Borel set B,just write out the definitions:

T (f−1(B)) =Tx ∈ Ω : f(x) ∈ B

=

Tx ∈ Ω : f(Tx) ∈ B

=

y ∈ Ω : f(y) ∈ B

= f−1(B)

So indeed, f−1(B) ⊂ F0 means f is F0 measurable. To see that f has the rightexpectation values, we first see prove the result for indicator functions and then usethe “ladder” of integration to get the result we need. Consider that for sets A ∈ F0

and B ∈ F we have: ∫A

1B(x)dP =∫

1A(x)1B(x)dP

=∫

1A(Tx)1B(Tx)dP

=∫

1A(x)1B(Tx)dP

=∫A

1B(Tx)dP

Where the second equality is using the fact that P is T -invariant and the thirdequality uses the fact thatA ∈ F0 ⇒ 1A(x) = 1A(Tx). Since

∫A

1B(x)dP=∫A

1B(Tx)dP,

2. BIRKHOFF’S THEOREM 23

by following along with the construction of the Lebesgue integral starting from in-dicator functions, we conclude that

∫Af(x)dP =

∫Af(Tx)dP for any integrable f .

Applying this inductively, we see that for any N ∈ N that:

∫A

1N

N−1∑k=0

f(T kx)dP =∫A

f(x)dP

When f is bounded, we can take the limit as N → ∞ and use the boundedconvergence theorem to conclude:

∫A

fdP = limN→∞

∫A

1N

N−1∑k=0

f(T kx)dP

=∫A

f(x)dP

For general f , we can use a truncation argument and the monotone convergencetheorem to get finish the result.

Example 2.5. If we look at our first example of rotation by an angle ω, weconcluded (using Fourier analysis) that when ωis irrational and f has a Fourierseries that:

limN→∞

1N

N−1∑n=0

f (Tnx) =

1∫0

f(s)ds

By Birkhoff’s theorem, we know that:

limN→∞

1N

N−1∑n=0

f (Tnx) = E(f |F0)

So we conclude that:∫ 1

0f(s)ds = E(f |F0). Since this holds for every f , it

must be that F0 is the trivial field. Notice that this improves our result a little bit,since we may now apply it to any f integrable, not just f which are C2.

Example 2.6. In the first example, we were essentially looking at 1N

∑N−1n=0 e

2πim(x+nω).Now lets ask about the series: 1

N

∑N−1n=0 e

2πim(2nx). This is harder to handle withFourier techniques, but we can still use Birkhoff’s theorem. Again take Ω = [0, 1)to be our space, but instead of thinking of this as a circle, think of this as bi-nary sequence (which are the binary expansions of each number between 0 and 1),Ω = 0.e1e2 . . . : ei = ±1. Let T : Ω → Ω by T (0.e1e2e3 . . .) = 0.e2e3 . . . . Thistranslates to T (x) = 2x mod 1 (this is the reason that applying it N times gives2Nx). It’s not hard to verify that this is measure preserving. By the KolmogorovZero-One law, the field F0of T -invariant events must be the trivial field, for byapplying T N times, we see that an event A ∈ F0cannot depend on the first Ndigits e1, e2, . . . eN . Since this works for any N , this is a subset of the tail field,


which by K-0-1 is trivial. Hence, by Birkhoff’s Theorem, we have:

limN→∞

1N

N−1∑n=0

f (Tnx) = E(f |F0)

= E(f)

=

1∫0

fdP

For the Fourier basis function f(x) = e2πimx, this is saying that:

limN→∞

1N

N−1∑n=0

e2πim(2nx) = 0

Example 2.7. We can use Birkhoff’s theorem to give yet another proof ofthe strong law of large numbers. Let (X1, X2, . . .) be a sequence of i.i.d. randomvariables with finite mean and let Ω be the probability space for these sequences.Define T : Ω → Ω by T (x1, x2, x3, . . .) = (x2, x3, . . .). Notice that since the X ′sare i.i.d. that this is measure preserving. As in example 2, the Kolmogorov zeroone law tells us the field F0 of T -invariant is trivial. Let f(x1, x2, . . .) = x1. ByBirkhoff’s theorem:

limN→∞

1N

N−1∑n=0

xn = limN→∞

1N

N−1∑n=0

f (Tnx)

= E(f |F0)= E(f)= E(X1)

Which is exactly the strong law of large numbers.

3. Continued Fractions

One way to specify a number in x ∈ [0, 1) is the binary expansion. Each binarydigit tells you “which half” of the number line x is in. e.g. first digits says if itsin[0, 1

2

]or[

12 , 1], and then we treat that interval like [0, 1) and start over again for

the next digit. Another way to do this game would be to draw the harmonic series1n on the number line, and then specify which interval [ 1

n+1 ,1n ) the number is in.

Call this first number n1, and we’ll have then that 1n1+1

≤ x < 1n1

. From this wemay conclude that:

x =1

n1 + ε1For some ε1 ∈ [0, 1). Play the same game again for ε1, and we get:

x =1

n1 + 1n2+ε2

Continuing this indefinitely gives us the “continued fraction expansion” for x.Since this is hard to write, we will adopt the convention that x = [n1;n2;n3; . . .] tomean the continued fraction expansion n1 and then n2 and so on.

Proposition 3.1. If the sequence [n1;n2;n3; . . .] is cyclic (that is it repeatsafter some finite number of steps), then x = [n1;n2;n3; . . .] is algebraic.

3. CONTINUED FRACTIONS 25

Proof. The easiest way to see this is an example. Suppose we look at x =[1; 1; 1; . . .]. Then:

x =1

1 + 1

1+. . .

So then:

1x

= 1 + x

But then x2 − x+ 1 = 0, so x is the root of a quadratic equation. In this casex =

√5−12 is the golden section. In general, if the continued fraction expansion is

periodic after N steps, then x will be the root of an N + 1 order polynomial.

Definition 3.2. We write x = [n1;n2;n3; . . .] to mean:

x =1

n1 + 1n2+ 1

n3+...

Problem 3.3. Let T : (0, 1) → (0, 1) by T ([n1;n2; . . .]) = [n2;n3; . . .]. This isthe map T (x) = 1

x mod 1. Is there a probability density P we can put on (0, 1) sothat T will be measure preserving?

Proof. [Gauss] The probability density dP = 1log 2

11+xdx will do the trick!

Indeed, just notice that by the definition of T that:

T−1(a, b) =∞⋃n=1

(1

b+ n,

1a+ n

)

So then the requirement P(T−1(a, b)) = P(a, b) gives (using ρ as a probabilitydensity function):

b∫a

ρ(x)dx =∞∑n=1

1a+n∫1

b+n

ρ(x)dx

Taking the derivative w.r.t. b here gives:

ρ(x) =∞∑n=1

ρ

(1

x+ n

)1

(x+ n)2


This is hard to solve, but its easy to verify that ρ(x) = 11+x works, since the

LHS is 11+x while the RHS is:

∞∑n=1

ρ

(1

x+ n

)1

(x+ n)2 =∞∑n=1

11 + 1

x+n

1(x+ n)2

=∞∑n=1

x+ n

1 + (x+ n)1

(x+ n)2

=∞∑n=1

1(x+ n+ 1)(x+ n)

=∞∑n=1

1x+ n

− 1x+ n+ 1

=1

x+ 1Which is a telescoping sum so we can evaluate it exactly. The factor of 1

log 2

normalizes ρ so that∫ 1

0ρ(x)dx = 1. Indeed:

1∫0

1log 2

1x+ 1

dx =1

log 2[log(1 + x)]10 =

log 2− log 1log 2

= 1

Theorem 3.4. The shift function T : [0, 1]→ [0, 1] given by T ([n1;n2; , . . .]) =[n2;n3; . . .]is ergodic.

Proof. Fix N ∈ N and a list of integers n1, n2, . . . , nN . Now define:

n(x) :=1

n1 + 1n2+...+ 1

nN+x

For each choice of n1, n2, . . . , nN , the image of [0, 1] through n(x) is an intervalwhose endpoints are n(0) and n(1). As N increases, the interval [n(0), n(1)] getssmaller and smaller. An easy proof by induction shows that n(x) can be writtenas:

n(x) =Ax+B

Cx+DFor A,B,C,D ∈ Rwith 0 ≤ A ≤ B and 1 ≤ C ≤ D and with AD − BC = ±1

where the sign depends on the parity of N . Now, let I = [n(0), n(1)] and letJ = (a, b) be an arbitarty interval.

Claim. |I ∩ T−N (J)| ≥ 12 |I||J | holds for all N ∈ N.

Proof. Take x ∈ I ∩ T−N (J). Notice that x ∈ I if and only if x = n(y)for some y ∈ [0, 1] by definition of I. So we can write x as a continued fractionx = [n1;n2; . . . ;nN−1;nN + y]. On the other hand, x ∈ T−N (J) if and only ifTNx ∈ J . But TNx = TN ([n1;n2; . . . ;nN−1;nN ; y]) = y by definition of T . Thisshows that x ∈ T−N (J) if and only if y ∈ J .

Have then, using the the observation that n is a fractional linear transformation,that:

I ∩ T−N (J) = n(y) : y ∈ J = [n(a), n(b)]

3. CONTINUED FRACTIONS 27

This shows:

|I ∩ T−N (J)| = |n(b)− n(a)|

=∣∣∣∣Ab+B

Cb+D− Aa+B

Ca+D

∣∣∣∣=

∣∣∣∣ b− a(Ca+D)(Cb+D)

∣∣∣∣≥ |b− a|

(C +D)2since a, b < 1

≥ 12|b− a||I|

=12|J ||I|

The last inequality holds by writing out |I|and using AD −BC = ±1 and thefact that 1 ≤ C ≤ D so that C +D ≤ 2D:

|I| = |n(0)− n(1)|

=∣∣∣∣A+B

C +D− B

D

∣∣∣∣= |AD −BC| 1

D(C +D)

=1

D(C +D)

≤ 2(C +D)2

Finally, to see that T is ergodic, take any Borel set B ∈ F . By approximatingB by intervals, the inequality from the claim still holds:∣∣I ∩ T−NB∣∣ ≥ 1

2|I||B|

Take any set A now. Again, by approximting A by intervals I, we can use theabove inequality to get: ∣∣A ∩ T−NB∣∣ ≥ 1

2|A||B|

This gives what we want, for if B is T−invariant, we have T−NB = B for everyN . The choice A = Bc in the above gives:

12|B|Bc| ≤ |Bc ∩ T−NB|

= |Bc ∩B|= 0

So |B||Bc| = 0, which is only possible if |B| = 1 or |B| = 0. This is sayingall T invarant sets are either measure zero or full measure. In other words, T isergodic.

CHAPTER 4

Brownian Motion

1. Motivation

Our aim is to discuss a stochastic process on [0, 1] (that is a probability space(Ω,F ,P) and a collection of random variables Bt(ω), for t ∈ [0, 1]) which has thefollowing properties:

• B0(ω) = 0 for every ω ∈ Ω• Fix a T ∈ [0, 1], and define for t > T,B+

t = BT+t − Bt. We want B+t to

look statistically identical to Bt. (This says the process has some sort of“time homogenous” property.)

• We want B+t as defined above to be independent of Bt. (This says that

the process has some sort of Markov property)• E(B2

t ) <∞• E(Bt) = 0• Bt(ω) is continuous for every (or almost every) ω ∈ Ω.

This process is supposed to describe something like a piece of dust that you cansee sometimes wiggling about in a sunbeam. Notice that the time homogenous andMarkov property together means we can write:

BT =N∑k=1

B kTN−B (k−1)T

N

Which is a sum of many independent increments. By the central limit theorem,this is suggesting Bt ∼ N(0, σ2) is normally distributed (to get this more rigorouslywould take a bit more work, since the above set up is not exactly the set up for thecentral limit theorem). This is often taken as an “axiom” :

• Bt ∼ N(0, σ2)A quick calculation shows that σ2 ∝ t. Let f(t) = σ2 be the variance for Bt. Then:

f(t+ s) = E((Bt+s)2

)= E

((Bt+s −Bs +Bs)2

)= E

((Bt+s −Bs)2

)+ E

(B2s

)+ 2E ((Bt+s −Bs)Bs)

= f(t) + f(s) + 2 · 0

Where we used the time homogenous property and the Markov property. Thisfunctional relation means that f(t) must be linear! f(0) = 0 holds since B0 isknown exactly. Hence f(t) = c · t. It doesn’t hurt to take c = 1, since anything weget can be rescaled for other values of c if need be. Sometimes this is taken as the“axiom”:

(1) Bt ∼ N(0, t)

29

30 4. BROWNIAN MOTION

The following resulting property also turns out to be very useful:

Proposition 1.1. E(BaBb) = min(a, b)

Proof. Suppose W.O.L.O.G. a < b. Then: E(BaBb) = E(Ba(Bb − Ba +Ba)) = E(Ba(Bb −Ba)) + E(B2

a) = 0 + a = min(a, b)

It remains to see that such a process really exists. The main difficulty is provingthat the process is continuous. There is more than one way to skin the cat for this;each method is useful because it gives a different insight into what is going on.

2. Levy’s Construction

We will construct Brownian motion on t ∈ [0, 1] as a uniform limit of continuousfunctions BNt , as N → ∞. Each BNt will be an approximation of the Brownianmotion that is piecewise linear between the dyadic rationals of the form a

2N. The

real trick in the construction is the remarkable observation that the correctionsfrom BNt to BN+1

t are independent of the construction so far up to level N , whichis the crucial fact that makes the construction so nice and allows it to converge.The crucial fact about Brownian motion that makes this possible is captured in thebelow proposition:

Proposition 2.1. Let Bt be a Brownian path and 0 < a < b < 1. Consider theline segment joining Ba and Bb: l(t) = Ba+(t−a)Bb−Bab−a . Consider the value of theBrownian path at the midpoint time B a+b

2. The difference from this point to the line

l(t) is independent of Bb and Ba. That is to say: X = B a+b2− l(a+b

2 ) = B a+b2−

12Ba −

12Bb, is independent of Ba and Bb. Moreover, X is normally distributed

X ∼ N(0, 14 (b− a)).

Proof. Firstly, we notice that the random variables X,Ba,and Bb are have ajoint normal distribution. This can be seen without much difficulty by expandingthe definition of X to write any linear combination of X,Baand Bb as a linearcombination of B a+b

2, Ba,and Bb. From here, rewrite as a linear combination of

Ba, B a+b2− Ba, and Bb − B a+b

2. By the hypothesis on our Brownian motion, each

of these are independent Gaussian variables, so any linear combination of them isagain Gaussian. Hence any linear combination of X,Ba and Bb is Gaussian. Thisproperty is a characterization of the joint Gaussian distribution. The observationthat X,Ba and Bb are jointly normal substantially simplifies the verification oftheir independence, as for jointly normal distributions they are independent if andonly if they are uncorrelated. From here we calculate (with the help of the usefulcovariance relation):

E(BaX) = E(Ba(B a+b

2− 1

2Ba −

12Bb))

= E(BaB a+b

2

)− 1

2E(B2a

)− 1

2E(BaBb)

= a− 12a− 1

2a

= 0

A similar calculation holds for E(BbX). Since these are uncorrelated and jointlynormal, they are independent. A quick calculation using the covariance relationagain gives X ∼ N(0, 1

4 (b− a))

2. LEVY’S CONSTRUCTION 31

This remarkable fact gives us a nice idea to construct Brownian motion startingwith an infinite sequence of standard E(Z) = 0,E(Z2) = 1 i.i.d Gaussian variables(Z0, Z1, Z2, . . .). The idea is to first construct B0 = 0, B1 = Z0. Then, once B0, andB1 are constructed by the above proposition, we know that B1/2− 1

2B0− 12B1 can be

modeled by 14Z1, so set B1/2 = 1

2B1 +√

14Z1. Once B0, B1/2, B1 are constructed,

the above proposition gives us a way to get B 14and B 3

4using two more normal

variables√

18Z2 and

√18Z3 and so on.

The above proposition and paragraph is the basic idea. It becomes a bit of amouthful to write it all down. A confused reader should focus on understandingthe construction above before digesting the below details.

To formalize the process, we let BNt be the construction at the N − th level ofconstruction, which will have the correct values at points of the form a

2N, 0 ≤ a ≤

2N . We make fill in in between these points with a piecewise continuous function.After some bookkeeping, the easiest way to write this down is as follows. Firstdefine some “tent” functions which make little peaks in the interval

[2k2n ,

2(k+1)2n

]of

unit height:

Tn,k =

2n (t− (2k)) t ∈

[2k2n ,

2k+12n

]2n ((2k + 2)− t) t ∈

[2k+1

2n , 2k+22n

]0 t /∈

[2k2n ,

2(k+1)2n

]Notice that for every level n, 0 ≤ k ≤ 2n−1− 1 means there are 2n−1tents, and

notice that these tents are disjoint and of unit height.Now, at every level of the construction we make sure that BNt has the right

value at points of the form a2N

by adding in the right tents with heights distributedby scaled normal functions:

BNt = Z0t+N∑n=1

2n−1−1∑k=0

√1

2n+1Zn,kTn,k(t)

Explanation of this formula: The “Z0t” is the initial level 0 construction. Thesum 0 ≤ n ≤ N sums over the N levels of construction, and the sum 0 ≤ k ≤2n−1 − 1 is over the 2n−1 tents that get added on at the n − th level. Each tent

has a height distributed like√

12n+1Z ∼ N(0, 1

2n+1 ) , where Z ∼ N(0, 1)(This is thecontent of the proposition above!) For convenience, we label the infinite sequenceof normal variables so that Zn,k is controlling the height of the k − th tent on then− th level.

Finally we get the Brownian motion as Bt = limN→∞BNt , which puts theBrownian motion on the same probability space as the infinite sequence of normalvariables. To see that this is continuous, we show that the convergence is uniformalmost surely. Since each BNt is continuous, and a uniform limit of continuousfunctions is continuous, this gives that Bt is continuous.

Proposition 2.2. The family of functions BNt is converging uniformly almostsurely.

Proof. As you might suspect, the trick is to use the right summable sequence

with a clever application of the Borel Cantelli lemma. LetHn = maxt∈[0,1]

∣∣∣∑2n−1−1k=0

√1

2n+1Zn,kTn,k(t)∣∣∣


be the maximum height contribution to Bt at level n. Since the tent functions

Tn,k(t) are disjoint, this is Hn =√

12n+1 max

0≤k≤2n−1−1(|Zn,k|). We now make the

following estimate:

P(Hn > 2−n2√

2n) = P(

max0≤k≤2n−1−1

(|Zn,k|) > 2−n2 2

n+12 2

12√n

)≤ 2n−1P

(|Z| > 2

√n)

= 2nP(Z > 2

√n)

=2n√2π

∞∫2√n

exp(−x

2

2

)dx

≤ 2n√2π

12√n

exp(− (2√n)2

2

)(this is Mill’s ratio)

= C · 1√n·(

2e2

)nWhich is a summable sequence! Hence, we know by the Borel Cantelli lemma

that this happens only finely often almost surely. That is to say, for almost everyω ∈ Ω, we can find N ∈ N so that Hn(ω) ≤ 2−

n2√

2n for all n > N . But then wehave that for all p, q > Nand any t ∈ [0, 1]:

|Bpt −Bqt | =

∣∣∣∣∣∣q∑

n=p+1

2n−1−1∑k=0

√1

2n+1Zn,kTn,k(t)

∣∣∣∣∣∣≤

q∑n=p+1

|Hn|

≤q∑

n=p+1

2−n2√

2n

≤∞∑n=N

2−n2√

2n

But since 2−n2√

2n is summable, this can be made arbitrarily small, and wesee then that BNt is Cauchy in the uniform norm. Since this holds for almost everyω ∈ Ω, we indeed have uniform convergence almost surely.

Finally, to see that the limiting process is really what we want, we just verifythat E

((Bt −Bs)2

)= |t− s|, from which it’s easy to check the properties we want.

To see this, we just use the density of the dyadic rationals in [0, 1]. The aboveconstruction fixes points of the form a

2n at step n, that is to say Bt( a2n ) = Bnt ( a

2n ).Hence for t, s dyadic rationals, we have E

((Bt −Bs)2

)= E

((Bnt −Bns )2

)= |t− s|

which is easily checked by the construction above/the earlier proposition.

3. CONSTRUCTION FROM DURRET’S BOOK 33

For arbitrary t now, but s still taken to be a dyadic rational, we take a sequenceof dyadic rationals tn → t. We have then using Fatou’s lemma:

E((Bt −Bs)2

)= E

(limn→∞

(Btn −Bs)2)

≤ limn→∞

E((Btn −Bs)2

)= lim

n→∞|tn − s|

= |t− s|

Now consider, for any n ∈ N:

E((Bt −Bs)2

)= E

((Bt −Btn −Bs +Btn)2

)= E

((Bt −Btn)2

)+ E

((Bs −Btn)2

)+ 2E ((Bt −Btn)(Bs −Btn))

Since this holds for any n ∈ N, we get:

E((Bt −Bs)2

)= lim

n→∞

(E((Bt −Btn)2

)+ E

((Bs −Btn)2

)+ 2E ((Bt −Btn)(Bs −Btn))

)= 0 + lim

n→∞|tn − s|+ 0

= |t− s|

Where we have observed that the two limits on either side are 0 by usingE((Bt −Bs)2

)≤ |t−s| in a clever way. First:limn→∞E

((Bt −Btn)2

)≤ limn→∞ |t−

tn| = 0 and secondly with the help of Holder:

limn→∞

|E ((Bt −Btn)(Bs −Btn)) | ≤ limn→∞

√E((Bt −Btn)2)

√E((Bs −Btn)2)

≤ limn→∞

√|t− tn|

√|s− tn|

= 0

Once we have E((Bt −Bs)2

)= |t − s| for arbitrary t and dyadic s, the same

argument repeated again will show that E((Bt −Bs)2

)= |t− s| works when both

t and s are arbitrary.

3. Construction from Durret’s Book

(I call this “Durret’s construction” since I read it out of Durret’s book: “Brow-nian Motion and Martingale’s in Analysis”)

The above construction is pretty elementary and gives all the desired proper-ties. The following construction is a bit more technical, in particular it uses a fewextension results like Caratheodory and Kolmogorov. However, it gives immedi-ately that not only is the Brownian motion continuous, but it is Holder continuousfor exponents γ < 1

2 . This construction uses a few ”extension theorems”, which aregone over briefly in the appendix.

Definition 3.1. (Constructing Brownian Motion with the Kolmogorov Exten-sion Theorem)

The Kolmogorov Extension Theorem gives us a quick way to define a measureon the space of functions. However, since the space of functions f : T → R is solarge, this theorem often gives us a very unwieldy space to work with, one in whichwe can’t get our hands on the properties we want. The construction of Brownianmotion below is a great example, constructing with the Kolmogorov theorem is


bad, while if we take more care and construct it on only countably many points,we get what we want.

Let Pt1,t2,...tn(A1×A2×. . .×An) =∫A1

dx1

∫A2

dx2 . . .

∫An

dxnΠnk=1pti−ti−1(xi−1, xi),

where pt(x, y) =√

2πt−1

exp(− |y−x|2

2t ). This is naively what you get as the distri-bution of Bt1 , Bt2,..., Btn if you use the Markov property and normal distributionof the Brownian motion. By Kolmogorov, we get a measure Pon the entire spaceof function f : [0, 1]→ R. This defines the Brownian motion!

Proposition 3.2. With the above description of P, it will be impossible tosee that the Brownian motion is almost surely continuous because the continuousfunctions C ⊂ f : [0, 1]→ R are not even measurable.

Proof. Suppose by contradiction C is measurable. Then we can find a se-quence t1, t2, . . . of times and Borel sets B1, B2, . . . so that C = f : (f(ti) ∈ Bi(The proof of this fact comes by showing that sets of the form f : (f(ti) ∈ Bi are asigma-algebra which contain the cylinder sets used to define Ω = σ(A) ). Take anycontinuous function f now, and alter its value at a single point t /∈ t1, t2, . . . toget a function f which agrees with f at t1, t2, . . . but is not continuous. But thenf ∈ C = f : (f(ti) ∈ Bi since it agrees with f at t1, t2, . . . is a contradiction.

This result means that our construction is not good. It is better to constructthe Bt as follows:

Definition 3.3. (Constructing Brownian Motion with Uniform Continuity)Step 1. (Define on dyadic rationals). Let Pt1,...tn as above. Use the countable

Kolmogorov Extension Theorem to get a measure P on the set of functions Ω =f : [0, 1] ∩D2 → R from the dyadic rationals to R.

Step 2. Check that functions in Ω are almost surely Holder continuous. i.e. foralmost all f ∈ Ω, |f(t)− f(s)| ≤ C|t− s|γ

Step 3. Conclude that for almost every f ∈ Ω,there is a unique way to extendf to a function f : [0, 1]→ R since the dyadic rationals are dense in R.

Step 1 is pretty simple, but step 2 requires some verification and is the realheart of the problem:

Proposition 3.4. Fix γ < 12 . For almost every f ∈ Ω, there is a constant C

so that |f(t)− f(s)| ≤ C|t− s|γ

We first prove a lemma.

Lemma 3.5. Fix γ < 12 . Then there exists δ > 0,so that for almost every f ∈ Ω,

there is an N ∈ N (which depends on f) so that for n ≥ N we have:

|f(x)− f(y)| ≤ |x− y|γ

Whenever x = i2−n, y = j2−nand |x− y| ≤(

12

)n(1−δ)

Proof. Take m ∈ N so large so that m > 11−2γ . We use the inequality

E |f(t)− f(s)|2m ≤ Cm|t−s|m with Cm = E|f(1)|2m (This follows by the propertythat f(t) − f(s) ∼ f(s) + N(0, t − s) ). For any n ∈ N now, consider now the

3. CONSTRUCTION FROM DURRET’S BOOK 35

following estimates:

P

(|f(x)− f(y)| > |x− y|γ for some x = i2−n, y = j2−n and |x− y| ≤

(12

)n(1−δ))

≤∑|x− y|−2mγE

(|f(x)− f(y)|2m

)Where the sum on the right hand side is taken over all the possible x, y that satisfythe inequality |x− y| ≤

(12

)n(1−δ) (There are finitely many, since we are restrictingourselves to dyadic rationals x = i2−n, y = j2−n). We have used the Chebyshevinequality P(|X| > a) ≤ a−mE(|X|m) here. Now, by the above inequality, we have:

LHS ≤ Cm∑|x− y|−2mγ |x− y|m

= Cm∑|x− y|−2mγ+m

≤ Cm2n2nδ(2−n(1−δ))−2mγ+m

= Cm2−n(−(1+δ)+(1−δ)(−2mγ+m))

The last bound comes in because |x − y| ≤ 2−n(1−δ) for x, y in our sum, andthere are at most 2n choices for x and 2nδ choices for y once x has been fixed(remember, they are all n-th level dyadic rationals). Now, the term that appearsin the exponent is:

ε = −(1 + δ) + (1− δ)(−2mγ +m)

Since m is so large so that −2mγ + m > 1, we can choose δ so small so thatε > 0. We will have then that

LHS ≤ 2−nε

Which is a summable sequence! By the Borel Cantelli lemma, it must be thecase that for almost every f ∈ Ω the event here happens only finitely many times.This is exactly the statement of the lemma which we wanted to prove.

Proposition 3.6. Fix γ < 12 . For almost every f ∈ Ω, there is a constant C

so that |f(t)− f(s)| ≤ C|t− s|γ

Proof. For almost every f ∈ Ω, find δ > 0, N ∈ N as in the lemma. Take anyt, s ∈ D2 ∩ [0, 1] with t− s < 2−N(1−δ).Choose m > N now so that 2−(m+1)(1−δ) ≤t − s ≤ 2−m(1−δ).Write now t = i2−m − 2−q1 − 2−q2 − . . . 2−qk < (i − 1)2−m, ands = j2−m + 2−r1 + . . . + 2−rl < (j + 1)2m for some choice of q′s and r′s so thatm < q1 < . . . < qk and m < r1 < . . . < rl. Since t − r < 2−m(1−δ), we havei2−m − j2−m < t − s < 2−m(1−δ) so we can apply the result from the lemma toconclude:

|f(i2−m)− f(j2−m)| ≤ ((2mδ)2−m)γ

= 2−m(1−δ)γ


Now, we use the result of the lemma again many times to see that (using ourclever rewriting of t):

|f(t)− f(i2−m)| ≤ |f(i2−m − 2−q1)− f(i2−m)|+ |f(i2−m − 2−q1 − 2−q2)− f(i2−m − 2−q1)|+ . . .+ |f(i2−m − 2−q1 − 2−q2 − . . . 2−qk−1)− f(t)|≤ |2−q1 |γ + . . .+ |2−qk |γ

≤∞∑

j=m+1

(2−j)γ

≤ C2−γm

Since m < qp for each p, and where we used Jensen’s inequality to bound thesum. We similarly get a bound on |f(s)− f(j2−m)|.Finally then:

|f(t)− f(s)| ≤ C2−γm(1−δ) + C2−γm + C2−γm

≤ C2−γm(1−δ)

= C2γ(1−δ)(

2−(m+1)(1−δ))γ

≤ C2γ(1−δ)|t− s|γ

By the choice of m so that 2−(m+1)(1−δ) ≤ t− s.

So from here we see that the Brownian motion is almost surely Holder contin-uous for exponents γ < 1

2 . This result lets us find a unique extension of f(t) fromthe dyadic rationals to all of [0, 1] which is not only continuous, but moreover itsHolder continuous for exponents γ < 1

2 , which is a stronger result than our firstconstruction. For ease of notation now, we will change our notation now a littlebit. We will refer to ω ∈ Ωnow instead of f and we now have a family of randomvariables Bt(ω) = ω(t). What we have just proven is that for fixed ω, the mapt→ Bt(ω) is indeed a Holder continuous path for exponents γ < 1

2 .

4. Some Properties

The following slick result shows that the Brownian motion is nowhere Holdercontinuous for γ > 1

2 , which in particular shows that it is nowhere differentiable.

Proposition 4.1. For γ > 12 , the set of functions which are Holder continuous

with exponent γ at some point is a null set. In other words, the Brownian motionis almost surely nowhere Holder continuous for exponents γ > 1

2 .

Proof. Fix a γ > 12 and C ∈ R. Choose m ∈ N so large so that γ¿m+1

2m . Definethe events, starting at n > m:

An =ω : ∃s ∈ [0, 1] such that |Bt −Bs| ≤ C|t− s|γ∀t ∈ [s− m

n, s+

m

n]

Define the random variable:

Yn,k(ω) = maxj=0,1,...2m

∣∣∣∣B(k + j

n

)−B

(k + j − 1

n

)∣∣∣∣And finally, the events:

Bn =

at least one of the Yn,k ≤ 2C(mn

)γWe now claim that An ⊂ Bn, since for ω ∈ An, we find an s so that |Bt −Bs| ≤C|t− s|γ∀t ∈ [s− m

n , s+ mn ]. In particular, |Bt −Bs| ≤ C

(mn

)γ By the pigeonhole

4. SOME PROPERTIES 37

principle, inside this interval we can find k so that kn ,k+1n , k+2

n , . . . k+2mn ⊂ [s −

mn , s+ m

n ] . But then, for this k, we have:

Yn,k(ω) = maxj=0,1,...2m

∣∣∣∣B(k + j

n

)−B

(k + j − 1

n

)∣∣∣∣≤ max

j=0...2m

∣∣∣∣B(k + j

n

)−B(s)

∣∣∣∣+∣∣∣∣B(s)−B

(k + j − 1

n

)∣∣∣∣≤ 2C

(mn

)γSo ω ∈ Bn by definition.Now consider that:

P(An) ≤ P(Bn)

≤∑

k=0..n−m

P(Yn,k ≤ 2C

(mn

)γ)≤

∑k=0..n−m

P(∣∣∣B k+j

n−B k+j−1

n

∣∣∣ ≤ 2C(mn

)γfor each j = 0, 1, ..2m

)≤ nP

(|B 1

n−B0| < 2C

(mn

)γ)2m

= nP(|B1 −B0| < 2C

(mn

)γ √n)2m

≤ n

(2√2π

2C(mn

)γ √n

)2m

= Dn( 12−γ)2m+1 = Dnm+1−2mγ → 0

Where we used the independence property of disjoint intervals of the Brownianmotion, the scaling relation P(Bt > a) = P(Bct >

√ca), and the easy inequality

P(N(0, 1) > λ) ≤ 2λwhich comes from integrating the p.d.f.. Finally, by the choiceof m so that γ > m+1

2m , we know that m + 1 − 2mγ < 0 so this probability doesindeed go to zero. But then, as the events An are increasing, this means that Anare all zero probability events, which is the result we wanted.

CHAPTER 5

Appendix

1. Conditional Random Variables

Let (Ω,F ,P) be a probability space and X,Y : Ω→ R random variables. B isthe Borel sigma algebra of R.

Definition 1.1. We define σ(X) ⊂ F to be the sigma-algebra generated bythe preimages of Borel sets through F . That is:

σ(X) = σ(X−1(B) : B ∈ B)

Remark. The sub-algebra σ(X) is in coarser than all of F . Intuitively, therandom variable X can only “detect” up to sets in σ(X).

Definition 1.2. Let Σ ⊂ F be a subalgebra of F . We say a random variableX : Ω → R is Σ−measurable if X−1(B) ∈ Σ for all B ∈ B. Equivalently, ifσ(X) ⊂ Σ.

Example 1.3. Every random variable is always F measurable, since σ(X) ⊂ F.

Definition 1.4. Given X and Y , we can define a new random variable Z =E(Y |X) to be the unique random variable with the following two properties:

1. Z is σ(X) measurable.2. For any B ∈ B we have E (Z1X∈B) = E (Y 1X∈B)

Remark. The existence of this random variable is proven by restricting theRadon-Nikodym derivative of Y with respect to the probability space to just thesigma field σ(X).

Remark. There is no problem with picking any subalgebra Σ ⊂ F insteadof σ(X). The second condition is simply that for any S ∈ Σ we have E (Z1S) =E (Y 1S), which is really the condition above with Σ = σ(X).

Remark. Z = E(Y |X) is a random variable Z : Ω→ R, but it is often thoughtof as a function Z : R → R, whose input is the random variable X. This worksbecause Z is σ(X) measurable. The following two little results clear this up a bit:

Proposition 1.5. If f : R→ R is measurable, and Z : Ω→ Ris Σ-measurable,then the random variable f Z is Σ-measurable too.

Proof. For any B ∈ B we have (f Z)−1(B) = Z−1(f−1(B)) ∈ Σsincef−1(B) ∈ B as f is measurable and Z is Σ- measurable.

Proposition 1.6. If Z is σ(X)-measurable random variable, then we may thinkof Z as a function Z : R→ R whose input is X.

39

40 5. APPENDIX

Proof. Define Z : R → R by Z(x) = Z(ω) for any representative ω ∈X−1(x). We must justify why this value is independent of the choice of ω ∈X−1(x). Indeed for ω1, ω2 ∈ X−1(x), let z = Z(ω1).Since Z is σ(X) measur-able, we have that:

Z−1(z) ∈ σ(X)⇒ Z−1(z) = X−1(B) for some B ∈ B

But then ω1 ∈ Z−1(z) = X−1(B), so that X(ω1) ∈ B. Since X(ω1) = X(ω2) =x, we have then ω2 ∈ X−1(B) = Z−1(z), which means that Z(ω1) = Z(ω2) = z,as desired. Hence Z is well defined! With this definition of Z, we see that Z = ZX.We often conflate Z with Z in practice.

2. Extension Theorems

Theorem 2.1. [Caratheodory Extension Theorem]Fix some (Ω,A,P0), where Ω is a set, A is an algebra of sets (aka a field

of sets), and P0is a finitely additive probability measure on A. If we have theadditional property that:

For sequences of sets A1, A2, . . . ∈ A which are pairwise disjoint with the prop-erty that ∪An ∈ A too, then we necessarily have P0(∪An) =

∑P0(An) .

Then there is a unique extension to a probability space (Ω, σ(A),P) so that Pand P0 agree on A.

Proof. [sketch] The idea is exactly the same as the construction of the Lebesguemeasure on [0, 1] from the premeasure generated by µ((a, b)) = b−a on the algebraof open sets. Define an outer measure: P(E) := inf

E⊂∪An

∑P0(An). From here

you check that P is indeed a probability measure. Countable subadditivity andmonotonicity are easy. To get that P(A) = P0(A) for A ∈ A requires the specialproperty we are given above. Once this is done, you can define measurable sets a-laCaratheodory: E measurable iff for allA ∈ A we have P(A) = P(A∩E)+P(A∩Ec).Then you verify that σ(A) is a subset of these measurable sets, and declare P = Ptobe the measure on σ(A).

Remark. The above condition needed in the theorem can be replaced with“Continuity from above at ∅”:

For A1, A2, . . . ∈ A which are decreasing down to ∅, then we necessarily havethat P0(An)→ 0 too.

The equivalence of these two conditions is not too difficult. The first condition ismore intuitive, while this second condition is sometimes easier to verify in practice.

Theorem 2.2. [Countable Kolmogorov Extension Theorem]Suppose for every n ≥ 1, we have a probability measure Pn on Rn. Suppose

also that these probability measure’s satisfy the following consistency condition forevery Borel set E ∈ Rn:

Pn+k(E × Rk) = Pn(E)

Then there exists a unique measure Pon the infinite product measure R∞ofsequences, so that for every Borel set E ∈ Rn P(E × R× R× . . .) = Pn(E).

2. EXTENSION THEOREMS 41

Proof. [sketch] Take Ω = R∞be real-valued sequences. Define the field ofcylinder sets to be:

A = E × R× R× . . . : E ∈ Rn is BorelWith finitely additive measure P0(E × R × R × . . .) := Pn(E). The given

condition on the P′ns shows this is well defined. To see continuity from above at∅, notice that if Ak ↓ ∅, then we must have Ak = Ek × R × R × . . . for some setsEk ∈ Rn with Ek ↓ ∅. But then of course, since Pn is a probability measure,we have P0(Ak) = Pn(Ek) → 0. By application of the Caratheodory extensiontheorem, we get the desired measure!

Theorem 2.3. [Kolmogorov Extension Theorem]Let T be any interval T ⊂ R. Suppose we have a family of probability measure’s

Pt1,t2,...tn on Rn whenever t1, t2, . . . tn is a finite number of points in T . Supposealso that these probability measure’s satisfy the following consistency condition:

Pt1,t2,...tn,t1,t2,...tm(E × Rm) = Pt1,t2,...tn(E)

Then there exists a unique measure P on the set of functions f : T → Rsothat:

P (f : (f(t1), f(t2), . . . f(tn)) ∈ E) = Pt1,t2,...tn(E)

Remark. This is very similar to the countable version, but requires some morework to make it work out. However, since the space of functions f : T → R is solarge, this theorem often gives us a very unwieldy space to work with, one in whichwe can’t get our hands on the properties we want. The construction of Brownianmotion below is a great example, constructing with the uncountable Kolmogorovtheorem is bad, while with the countable one is a good.

Notes from Limit Theorems 2 Mihai Nica

Documents

Transcript of Notes from Limit Theorems 2 Mihai Nica