ProbabilitProbability With Martingalesy With Martingales

265
Probability with Martingales David Williams Statistical Laboratory, DPMMS Cambridge University Cambridge UNIVERSITY PRESS

description

Probability With Martingales

Transcript of ProbabilitProbability With Martingalesy With Martingales

  • Probability with MartingalesDavidWilliamsStatistical Laboratory, DPMMS

    Cambridge University

    CambridgeUNIVERSITY PRESS

  • Published in tlie United States of America by Cambridge University Press, New York

    \302\251Cambridge University Press 1991

    Thispublication is in copyright. Subject to statutory exceptionand to the provisions of relevant collective licensing agreements,no reproduction of any part may take place withoutthe written permission of Cambridge University Press.

    First published 1991Twelfth printing 2010

    Printed in the United Kingdom at the University Press,Cambridge

    A catalogue record for this publication is available from the British Library

    ISBN 978-0-521-40605-5 paperback

    Cambridge University Press has no responsibility for the persistence or accunof URLs for external or third-party internet websites referred to in this public!and does not guarantee that any content on such websites is, or will remain,accurate or appropriate.

  • Contents

    Preface \342\200\224please read! xi

    A Question of Terminology xiii

    A Guide to Notation xiv

    Chapter 0: A Branching-Process Example 10.0.Introductory remarks. 0.1. Typical number of children, X. 0.2. Sizeof n^^ generation, Z\342\200\236.0.3. Use of conditional expectations. 0.4. Extinctionprobability, tt. 0.5. Pause for thought: measure. 0.6. Our first martingale.0.7. Convergence (or not) of expectations.0.8. Finding the distribution ofMoo. 0.9. Concrete example.

    PART A: FOUNDATIONS

    Chapter 1: Measure Spaces 141.0. Introductory remarks. 1.1. Definitions of algebra,

  • vi Contents

    First Borel-Cantelli Lemma (BCl). 2.8. Definitions, liminf j^n, (^n,ev).2.9. Exercise.

    Chapter 3: Random Variables 293.1.Definitions. S-measurable function, mS, (mS)+,bS. 3.2. ElementaryPropositionson measurability. 3.3. Lemma. Sums and products ofmeasurable functions are measurable. 3.4. Composition Lemma. 3.5. Lemmaon measurability of infs, liminfs of functions. 3.6. Definition. Randomvariable.3.7.Example.Coin tossing. 3.8. Definition.

  • Contents vii

    Chapter 7: An Easy Strong Law 717.1.

    'Independence means multiply'- again! 7.2. StrongLaw - first version.

    7.3. Chebyshev's inequality. 7.4. Weierstrass approximation theorem.

    Chapter 8: Product Measure 75

    8.0. Introductionand advice. 8.1. Product measurable structure, Ei X E2.8.2. Product measure, Fubini's Theorem. 8.3. Joint laws, joint pdfs. 8.4.Independence and product measure. 8.5. i?(R)'' = S(R''). 8.6. The n-foldextension. 8.7. Infinite products of probability triples. 8.8. Technical noteon the existenceof joint laws.

    PART B: MARTINGALE THEORY

    Chapter 9: ConditionalExpectation 839.1.A motivating example. 9.2. Fundamental Theorem and Definition(Kolmogorov, 1933). 9.3. The intuitive meaning. 9.4. Conditionalexpectation as least-squares-best predictor. 9.5. Proof of Theorem9.2. 9.6.Agreement with traditional expression. 9.7. Properties of conditionalexpectation: a list. 9.8. Proofs of the propertiesin Section9.7.9.9.Regularconditional probabilities and pdfs. 9.10. Conditioning under independenceassumptions. 9.11. Use of symmetry: an example.

    Chapter 10: Martingales 9310.1.Filtered spaces. 10.2. Adapted processes. 10.3. Martingale, super-martingale,submartingale. 10.4. Some examples of martingales. 10.5. Fairand unfair games. 10.6. Previsible process, gambling strategy. 10.7. Afundamental principle: you can't beat the system! 10.8. Stoppingtime.10.9.Stopped supermartingales are supermartingales. 10.10. Doob's Optional-StoppingTheorem.10.11.Awaiting the almost inevitable. 10.12. Hittingtimes for simple random walk. 10.13. Non-negative superharmonicfunctions for Markov chains.

    Chapter 11: The Convergence Theorem 10611.1.The picture that says it all. 11.2. Upcrossings. 11.3.Doob'sUpcross-ing Lemma. 11.4. Corollary. 11.5. Doob's'Forward'Convergence Theorem.11.6. Warning. 11.7. Corollary.

  • viii Contents

    Chapter 12: Martingales bounded in \302\243^ 11012.0. Introduction. 12.1. Martingales in \302\243^:orthogonality of increments.12.2. Sums of zero-meanindependentrandom variables in C^. 12.3.Random signs. 12.4. A symmetrization technique: expanding the samplespace.12.5.Kolmogorov's Three-Series Theorem. 12.6. Cesaro's Lemma. 12.7.Kronecker'sLemma.12.8.A Strong Law under variance constraints. 12.9.Kolmogorov'sTruncation Lemma. 12.10.Kolmogorov's Strong Law ofLarge Numbers (SLLN). 12.11. Doob decomposition.12.12.The angle-brackets process (M). 12.13. Relating convergenceof M to finiteness of(M)oo- 12.14. A trivial 'Strong Law' for martingales in \302\243^.12.15. Levy'sextension of the Borel-Cantelli Lemmas.12.16.Comments.

    Chapter 13:Uniform Integrability 12613.1. An 'absolute continuity' property. 13.2.Definition. UI family. 13.3.Two simple sufficient conditionsfor the UI property. 13.4. UI propertyof conditional expectations. 13.5. Convergence in probability. 13.6.Elementary proof of (BDD). 13.7. A necessary and sufficient condition for C^convergence.

    Chapter 14: UI Martingales 13314.0. Introduction. 14.1. UI martingales. 14.2. Levy's 'Upward'Theorem.14.3.Martingale proof of Kolmogorov's 0-1 law. 14.4. Levy's 'Downward'Theorem.14.5.Martingale proof of the Strong Law. 14.6. Doob's Sub-martingaleInequality. 14.7. Law of the Iterated Logarithm: specialcase.14.8.A standard estimate on the normal distribution. 14.9.Remarksonexponential bounds; large deviation theory. 14.10. A consequence of Holder'sinequality. 14.11. Doob's C^ inequality. 14.12. Kakutani's Theorem on'product' martingales. 14.13.TheRadon-Nikodym theorem. 14.14. TheRadon-Nikodym theorem and conditionalexpectation.14.15.Likelihoodratio; equivalent measures. 14.16. Likelihood ratio and conditionalexpectation. 14.17. Kakutani's Theorem revisited; consistency of LR test. 14.18.Note on Hardy spaces, etc.

    Chapter 15: Applications 15315.0.Introduction - please read! 15.1. A trivial martingale-representationresult. 15.2. Option pricing; discrete Black-Scholesformula. 15.3. TheMabinogion sheep problem. 15.4. Proof of Lemma 15.3(c). 15.5. Proofof result 15.3(d). 15.6.Recursive nature of conditional probabilities. 15.7.Bayes' formula for bivariate normal distributions. 15.8. Noisyobservationofa single random variable. 15.9. The Kalman-Bucyfilter. 15.10.Harnessesentangled.15.11.Harnesses unravelled, 1. 15.12. Harnesses unravelled, 2.

  • Contents ix

    PART C: CHARACTERISTIC FUNCTIONS

    Chapter 16: Basic Properties of CFs 17216.1.Definition. 16.2. Elementary properties. 16.3. Some uses ofcharacteristic functions. 16.4. Three key results. 16.5. Atoms. 16.6. Levy'sInversion Formula. 16.7. A table.

    Chapter 17: Weak Convergence 17917.1.The

    'elegant' definition. 17.2. A 'practical' formulation, n.3. Sko-rokhod representation.17.4. Sequential compactness for Prob(R). 17.5.Tightness.

    Chapter 18: The Central Limit Theorem 18518.1. Levy's Convergence Theorem. 18.2.o and O notation. 18.3. Someimportant estimates. 18.4. The CentralLimit Theorem. 18.5. Example.18.6. CF proof of Lemma 12.4.

    APPENDICES

    Chapter Al: Appendix to Chapter 1 192Al.l.A non-measurable subset A of 5^. A1.2.

  • X Contents

    Chapter A9: Appendix to Chapter 9 214A9.1.Infinite products: setting things up. A9.2. Proof of A9.1(e).Chapter A13: Appendix to Chapter 13 217A13.1.Modes of convergence: definitions. A13.2. Modes of convergence:relationships.

    Chapter A14:Appendixto Chapter 14 219A14.1. The

  • Preface - please read!

    The most important chapter in this book is ChapterE: Exercises.I haveleft the interesting things for you to do. You can start now on the 'EG'exercises,but see 'More about exercises' later in this Preface.

    Thebook,which is essentially the set of lecture notes for a third-yearundergraduate course at Cambridge, is as lively an introduction as I canmanage to the rigorous theory of probability. Since much of the book isdevoted to martingales, it is bound to becomevery lively: look at thoseExercises on Chapter 10!But, of course, there is that initial plod throughthe measure-theoreticfoundations. It must be said however that measuretheory, that most arid of subjects when done for its own sake, becomesamazingly more alive when used in probability, not only because it is thenapplied,but also because it is immensely enriched.

    You cannot avoid measure theory: an event in probabilityis ameasurableset, a random variable is a measurablefunction on the sample space,the expectation of a random variable is its integral with respect to theprobability measure; and so on. To be sure, one can take some central resultsfrom measure theory as axiomaticin the main text, giving careful proofs inappendices;and indeedthat is exactly what I have done.

    Measuretheory for its own sake is based on the fundamental additionrule for measures. Probability theory supplementsthat with themultiplication rule which describes independence; and things are already lookingup. But what really enriches and enlivens things is that we deal with lotsof (7-algebras, not just the one

  • xii Preface

    Taylor (1966), Laha and Rohatgi (1979), and Neveu (1965). As regardsmeasure theory, I learnt it from Dunford and Schwartz (1958) and Halmos(1959). After reading this book, you must read the still-magnificent Breiman(1968), and, for an excellent indicationof what can be done with discretemartingales. Hall and Heyde (1980).

    Of course,intuitionis muchmoreimportant than knowledge of measuretheory, and you should take every opportunity to sharpen your intuition.There is no betterwhetstone for this than Aldous (1989), though it is a verydemanding book. For appreciatingthe scopeof probability and for learninghow to think about it, Kaxlin and Taylor (1981), Grimmett and Stirzaker(1982),Hall (1988),and Grimmett's recent superb book, Grimmett (1989),on percolationare strongly recommended.

    More about exercises. In compiling Chapter E, which consists exactly ofthe homework sheet I give to the Cambridge students, I have taken intoaccountthe fact that this book, like any other mathematicsbook,implicitlycontains a vast number of other exercises,many of which are easier thanthose in Chapter E. I refer of course to the exercises you create by readingthe statementof a result, and then trying to prove it for yourself, beforeyou read the given proof. One other point about exercises: you will, forexample, surely forgive my using expectation E in Exerciseson Chapter4before E is treated with full rigour in Chapter 6.

    Acknowledgements. My first thanks must go to the students who haveendured the course on which the bookis basedand whose quality has mademe try hard to make it worthy of them; and to those, especially DavidKendall, who had developed the coursebefore it becamemy privilege toteach it. My thanks to David Tranah and otherstaff of CUP for their help inconverting the courseinto this book.Next,Imust thank Ben Gar ling, JamesNorris and ChrisRogerswithout whom the book would have contained moreerrorsand obscurities. (The many faults which surely remain in it are myresponsibility.) Helen Rutherford and I typed part of the book, but the vastmajority of it was typed by Sarah Shea-Simonds in a virtuoso performanceworthy of Horowitz. My thanks to Helen and, most especially, to Sarah.Special thanks to my wife. Sheila, too, for all her help.

    But my best thanks - and yours if you derive any benefit from the book-

    must go to three people whose names appear in capitalsin the Index: J.L.Doob, A.N. Kolmogorov and P. Levy: without them, there wouldn't havebeen much to write about, as Doob (1953)splendidly confirms.

    Statistical Laboratory, David WilliamsCambridge October1990

  • A Question of TerminologyRandomvariables:functions or equivalence classes?

    At the level of this book, the theory would be more'elegant'if we regardeda random variable as an equivalence class of measurable functions on thesamplespace,two functions belonging to the same equivalence class if andonly if they are equal almost everywhere. Then the conditional-expectationmap

    X ^ E{x\\g)would be a truly well-defined contraction map from i^^(fi, ^, P) to L^(f2, Q^P)for p > 1; and we would not have to keep mentioning versions(representativesof equivalence classes) and would be able to avoid the endless 'almostsurely' qualifications.

    I have however chosen the 'inelegant' route: firstly, I prefer to workwith functions^ and confess to preferring

    4-h 5 = 2 mod 7 to [4]7 + [5]7 = [2]7.

    But there is a substantive reason. I hope that this book will tempt you toprogress to the much more interesting,and more important, theory wherethe parameter set of our process is uncountable (e.g. it may be the time-parameter set [0,oo)). There, the equivalence-class formulation just willnot work: the 'cleverness'of introducing quotient spaces loses the subtletywhich is essentialeven for formulating the fundamental results on existenceof continuous modifications, etc., unless one performs contortionswhich arehardly elegant. Even if these contortions allow one to formulate results, onewould still have to use genuine functions to prove them; so wheredoesthereality lie?!

    xni

  • A Guide to Notation

    \342\226\272signifies something important, \342\226\272\342\226\272something very important, and \342\226\272\342\226\272\342\226\272the Martingale Convergence Theorem.

    I use ':=' to signify 'is defined to equal'. This Pascal notation is particularlyconvenient because it can also be used in the reversedsense.I use analysts' (as opposed to category theorists') conventions:

    \342\226\272 N:={1,2,3,...}C{0,1,2,...}=:Z+.

    Everyone is agreed that R\"^ := [0,oo).For a set B containedin someuniversal set 5, Ib denotes the indicatorfunction of B: that is /^ : 5 \342\200\224\342\231\246{0,1} and

    \\ 0 otherwise.

    For a, 6 E R,a Ab := min(a, 6), a V 6 := max(a, 6).

    CFxharacteristic function; DF: distribution function; pdf: probabilitydensity function.

    a-algebra,

  • A Guide to Notation XV

    B(S): the Borel a-algebra on 5, B := B(R) (1.2)C \342\200\242X: discrete stochastic integral (10.6)dX/dfi: Radon-Nikodymderivative (5.14)dQ/dP: Likehhood Ratio (14.13)E(X): expectationE{X):=^ X{uj)P(du;) of X (6.3)E(X;F): /^ Xc/P (6.3)E(X|^): conditional expectation (9.3)(En.ev): liminf jE;\342\200\236(2.8)

    (En,i.o.): limsupjEn (2.6)fX' probability density function (pdf) of X (6.12)./x,y: joint pdf (8.3)fx\\Y' conditional pdf (9.6)Fx' distribution function of X (3.9)liminf: for sets, (2.8)limsup: for sets, (2.6)X =1 linix\342\200\236: x\342\200\236| x in that Xn < Xn-\\-i (Vn) and x\342\200\236\342\200\224>x.log: natural (base e) logarithmCx, Ax: law of X (3.9)\302\243P,LP: Lebesgue spaces (6.7, 6.13)Leb: Lebesguemeasure(1.8)mE: space of E-measurable functions (3.1)

    process M stoppedat time T (10.9)(M): angle-brackets process (12.12)

    /i(/): integral of / with respect to /i (5.0, 5.2)/i(/;A): X4/c//i(5.0,5.2)

  • Chapter 0

    A Branching-Process Example(This Chapter is not essential for the remainder of the book. You can startwith Chapter 1 if you wish.)

    0.0. Introductory remarksThepurposeof this chapter is threefold: to take somethingwhich is probablywell known to you from books such as the immortalFeller(1957)or Ross(1976), so that you start on familiar ground; to make you start to thinkabout someof the problems involved in making the elementarytreatmentinto rigorousmathematics;and to indicate what new results appear if oneapplies the somewhat more advanced theory developedin this book. Westick to one example: a branching process.This is richenoughto show thatthe theory has some substance.

    0.1. Typical numberofchildren,XIn our model, the number of childrenof a typical animal (see Notes belowfor some interpretations of 'child' and 'animal') is a randomvariable X withvalues in Z\"'\". We assume that

    P(X z= 0) > 0.

    We define the generatingfunction f of X SiS the map / : [0,1]-^ [0?1]?where

    kez+

    Standard theorems on powerseriesimply that, for 0 G [0,1],

    f\\0) = E(X0^-') = J2 ke^~^P{X = k)and

    ^~ E{X) = f\\l) = ^ kP{X = k)

  • 2 Chapter 0: A Branching-Process Example (O-l)-

    Of course, /'(I) is hereinterpretedas

    ^Ti 0-1 9]i 1 - 6

    since/(I) = 1. We assume thatfl < OO.

    Notes. The first application of branching-process theory wasto the questionof survival of family names; and in that context, animal = man, and child= son.

    In another context,'animal'can be 'neutron', and 'child' of thatneutron will signify a neutron released if and when the parent neutron crashesinto a nucleus. Whether or not the associatedbranchingprocessissupercritical can be a matter of real importance.

    We can often find branching processes embedded in richer structuresand can then use the results of this chapterto start the study of moreinteresting things.

    For superbaccountsof branching processes, see Athreya and Ney (1972),Harris (1963),Kendall (1966, 1975).0.2. Size of n^^ generation, ZnTo be a bit formal: suppose that we are given a doubly infinite sequence

    (a) |X(^^ :m,rGN}of independent identically distributed random variables (IID RVs), eachwith the same distribution as X:

    P(X^-*) = k) = P{X= k).The idea is that for n G Z\"^ and r G N, the variable Xr representsthenumber of children (who will be in the (n-h 1)^^ generation) of the r^^ animal(if there is one) in the n^^ generation. The fundamental rule therefore isthat if Zm signifies the size of the n^^ generation,then

    (b) Z\342\200\236+i=x\\\"+'^ + -.- + xil+'\\We assume that Zq = 1, so that (b) gives a full recursive definition ofthe sequence (Zm : m G Z\"^) from the sequence (a). Our first task is

  • ..(0.3) Chapter 0: A Branching-Process Example 3

    to calculate the distribution function of Zn, or equivalently to find thegenerating function

    (c) U9):=E{e^'')^Y.^'P{Zn^k).

    0.3. Use of conditionalexpectationsThe first main result is that for n G Z\"^ (and 6 G [0,1])

    (a) fn+m = urn),so that for each n G Z\"^, /\342\200\236is the n-fold composition

    (b) /\342\200\236= /o/o...o/.Note that the 0-fold compositionis by convention the identity map fo(0) =0^ in agreement with - indeed, forced by - the fact that Zq = 1.

    To prove (a), we use - at the moment in intuitive fashion - thefollowing very special case of the very useful Tower Property of ConditionalExpectation:

    (c) E(c;) = EE(u\\vy,to find the expectation of a random variable Z7, first find the conditionalexpectation E(Z7|V)of U given V, and then find the expectation of thatWe prove the ultimate form of (c) at a laterstage.

    We apply (c) with U = 6^^+^and V = Zn:

    E(^^\"+0 = EE(^^\"+H^n).

    Now, for A: G Z\"^, the conditional expectation of ^^\"+igiven that Zn = ^satisfies

    (d) E(^^\"+>\\Z\342\200\236= k) = E(^^{-'\"+-+4\"+\" |z\342\200\236= k).But Zn is constructed from variables Xi with r < n, and so Zn isindependent of Xj ,... ,X|^\" . The conditionalexpectationgiven Zn = kin the right-hand term in (d) must therefore agree with the absoluteexpectation

    (e) E(e^'^\"^'\\..0^i\"''').

  • 4 Chapter 0: A Branching-Process Example (0.3)..

    But the expression at (e) is a expectation of the product of independentrandom variables and as part of the family of 'Independence means multiply^results, we know that this expectation of a product may be rewritten as theproduct of expectations. Since (for every n and r)

    we have proved that

    E(0^\" + '|^n = fc) = /W*,and this is what it means to say that

    [If V takes only integer values, then when V = k^ the conditional expectationE(L/|V) of U given V is equal to the conditionalexpectationE(Z7|F = k) ofU given that V = k. (Sounds reasonable!)] Property (c) now yields

    E^z\342\200\236+i^E/(^)Z\342\200\236^

    and, since

    E(a^\" ) = /\342\200\236(\302\253), Dresult (a) is proved.

    Independence and conditionalexpectations are two of the main topicsin this course.

    0.4. Extinctionprobability,ttLet TTn := P(Zn = 0). Then tt^ = /\342\200\236(0),so that, by (0.3,b),(a) 7r\342\200\236+i=/(7r\342\200\236).Measure theory confirms our intuition about the extinctionprobability:(b) TT := P{Zm = 0 for some m) =t lim7r\342\200\236.Because / is continuous, it follows from (a) that

    (c) TT^ f(ir).The function / is analytic on (0,1), and is non-decreasingand convex (ofnon-decreasing slope). Also, /(I) = 1 and /(O) = P(X = 0) > 0. Theslope/'(I) of / at 1 is /i = E(X). Thecelebratedpictures opposite now makethe following Theorem obvious.THEOREM

    IfE{X)> 1,then the extinction probability tt is the unique root of theequation tt = /(tt) which lies strictly between 0 and 1. If E(-X')< 1,then TT = 1.

  • ..(0.4) Chapter 0: A Branching-Process Example

    y = f{x)

    Case1: subcriiical, // = /'(!)< 1. Clearly, tt := 1.The critical case// = 1 has a similar picture.

    Case 2: supercritical^ ^ = /'(I) > 1. Now, tt < 1.

  • 6 Chapter 0: A Branching-Process Example (0.5)..

    0.5. Pause for thought: measureNow that we have finished revising what introductory courseson probabiUtytheory say about branching-process theory, let us think about why we mustfind a more preciselanguage.Tobe sure, the claim at (0.4,b) that

    (a) TT =1 limTTnis intuitively plausible, but how could one prove it? We certainlycannot prove it at present because we have no means of stating with pure-mathematical precision what it is supposed to mean. Let us discuss thisfurther.

    Back in Section 0.2,we said 'Suppose that we are given a doubly infinitesequence [Xr : m,r 6 N} of independentidentically distributed randomvariables each with the same distribution as X'. What does this mean? Arandom variable is a (certainkind of) function on a sample space Q. Wecould follow elementary theory in taking Q to be the set of all outcomes, inother words, taking Q to be the Cartesianproduct

    the typical element cj of Q beinga; = (a;^^> :r6N,5 6N),

    and then setting Xa {oj) = oJa - Now Q is an uncountable set, so thatwe are outside the 'combinatorial' context which makes sense of 7r\342\200\236in theelementary theory. Moreover, if one assumesthe Axiom of Choice, onecan prove that it is impossibleto assign to all subsets of Q a probabilitysatisfying the 'intuitively obvious' axiomsand making the X's IID RVswith the correct common distribution. So, we have to know that the setof uo corresponding to the event 'extinction occurs' is one to which one canuniquely assign a probability (which will then provide a definition of tt).Even then, we have to prove (a).Example. Consider for a moment what is in some ways a bad attempt toconstructa

    'probability theory'. Let C be the class of subsets C of N forwhich the 'density'

    p{C):= lim U^:l

  • ..(0.6) Chapter 0: A Branching-Process Example 7

    Hence the logic which will allow us correctly to deduce (a) from thefact that

    {Zn = 0} t {extinctionoccurs}fails for the (N,C,/o) set-up: (N,C,/9) is not 'a probability triple'. D

    There are problems.Measuretheory resolves them, but provides a hugebonus in the form of much deeper results such as the MartingaleConvergence Theorem which we now take a first look at - at an intuitive level, Ihasten to add.

    0.6. Our first martingaleRecall from (0.2,b) that

    where the X^^'^^^variablesare independent of the values Zi, Z2,..., Z\342\200\236.Itis clear from this that

    P(Zn+i ^ j\\Zo = io.Zi =ii,...,Zn= in) = P(2n+1 = j\\Zn = in),a result which you will probably recognize as stating that the process Z \342\200\224(Zn : n > 0) is a Markov chain. We therefore have

    E(Zn+l|Zo = io.Zi = Z'l,... ,Z\342\200\236= in) = 2_^ jP(Z\342\200\236+i = j\\Zn = in)J

    =E(Zn+l|Z\342\200\236 = in),

    or, in a condensed and better notation,

    (a) E(Z\342\200\236+,|Zo,Zi,...,Z\342\200\236) = E(Z\342\200\236+i|Z\342\200\236).

    Of course, it is intuitively obvious that

    (b) E{Zn^,\\Zn)== flZn,because each of the Zn animals in the n^^ generation has on average (j.children. We can confirm result (b) by differentiating the result

    with respect to 6 and setting 6=1.

  • 8 Chapter 0: A Branching-Process Example (0.6)..

    Now define

    (C) Mn := ^n//i\", ^ > 0.Then

    E(Mn+i|Zo,Zi,...,Z\342\200\236)-Mn,which exactly says that(d) M is a martingale relative to the Z process.Given the history of Z up to stage n, the next value Mn+i of M is on averagewhat it is now: M is 'constant on average'in this very sophisticated senseof conditional expectation given 'past' and 'present'.The true statement

    (e) E(Afn) = l, Vn

    is of course infinitely cruder.A statement S is said to be true almost surely(a.s.)or with

    probability 1 if (surprise, surprise!)

    P(5 is true) =1.Becauseour martingale M is non-negative {Mn > 0,Vn), the

    Martingale Convergence Theorem implies that it is almost surely true that

    (f) Moo:=limMn exists.

    Note that if Moo > 0 for someoutcome(which can happen with positiveprobability only when /i > 1), then the statement

    Zn//i\" ^ Moo (a.s.)is a preciseformulation of 'exponential growth'. A particularly fascinatingquestionis: suppose that /.i > 1; what is the behaviour of Z conditional onthe value o/Mo\302\251?

    0.7. Convergence (or not) of expectationsWe know that Moo := lim Mn exists with probabiUty 1, and that E(M\342\200\236)= 1,Vn. We might be tempted to believethat E(Moo) = 1. However, we alreadyknow that if /i < 1, then, almost surely,the processdiesout and Mn iseventually 0. Hence

    (a) ^/m < 1, then Moo = 0 (a.s.) and0 = E(Moo)7^1imE(Mn) = l.

  • ..(0.8) Chapter0: A Branching-Process Example 9

    This is an excellentexampleto keepin mind when we come to studyFatou's Lemma, valid for any sequence {Yn) of non-negative randomvariables:

    E(liminf Yn) < liminf E(Fn).What is

    'going wrong' at (a) is that (when /i < 1) for large n, the chancesare that Mn will be large if Mn is not 0 and, very roughly speaking, thislarge value times its smallprobability will keep E(Mn) at 1. See the concreteexamplesin Section 0.9.

    Of course, it is very important to know when

    (b) limE(-)= E(lim-),and we do spend quite a considerabletime studying this. The bestgeneral theorems are rarely good enough to get the best resultsfor concreteproblems, as is evidenced by the fact that

    (c) E(Moo) = \"^ if and only if hoth fi > 1 and E(XlogX) < cx),where X is the typical number of children. Of course 0 log 0 = 0. li /j > 1and E(XlogX) = cx), then, even though the process may not die out.Moo = 0, a.s.

    0.8. Finding the distribution of MooSince Mn -^ Moo (a.s.), it is obvious that for A > 0,

    exp(-AAfn) -^ exp{-XMoo) (a.s.)Now since each Mn > 0, the whole sequence(exp(\342\200\224AMn)) is boundedin absolute value by the constant 1, independently of the outcome of ourexperiment. The Bounded Convergence Theorem says that we can nowassert what we would wish:

    (a) Eexp(-AA/oo)= limEexp{-XMn).Since Mn = Zn/^\"\" and E(6\302\273^\") = fn{0), we have

    (b) Eexp(-AM,) - fn{exp{-X/fi^)),so that, in principle (if very rarely in practice),we can calculate the left-handside of (a). However, for a non-negative random variable F, the distributionfunction y \302\273\342\200\224\342\226\272P{Y < y) is completely determined by the map

    A \302\273\342\200\224\342\226\272Eexp(\342\200\224Ay) oji (0,cx)).

  • 10 Chapter 0: A Branching-Process Example (0.8)..

    Hence, in principle, we can find the distribution of Moo-

    We have seen that the real problem is to calculate the function

    i:(A):=Eexp(-AMoo).

    Using (b), the fact that /n+i = f ^ fn-, and the continuity of L (anotherconsequenceof the Bounded Convergence Theorem), you can immediatelyestablishthe functional equation:

    (c) I(Am) = /(X(A)).

    0.9. Concrete exampleThis concrete example is just about the only one in which one can calculateeverything explicitly, but, in the way of mathematics, it is useful in manycontexts.

    We take the 'typical number of children' X to have a geometricdistribution:

    (a) P{X = k) = pq^ (^^eZ+),where

    0

  • ..(0.9) Chapter 0: A Branching-Process Example 11

    Then you can check that if H is another such matrix, then

    G{H{9))= (GHXe),SO that composition of fractional Unear transformations correspondstomatrix multipHcation.

    Suppose that p ^ q. Then, by the S~^AS = A method, for example,we find that the n^^ power of the matrix correspondingto / is

    (AO\"=\"-'\"-C:)(^o;)(-. T).so that

    pfi\"(l - 6) + qO - p(d) MO) =gp\302\273(l-^) + 50-p-

    li (jL = q/p < 1, then linin fn{^) \342\200\2241, corresponding to the fact that theprocessdiesout.

    Suppose now that yi > 1. Then you can easilycheckthat, for A > 0,L{\\) : = Eexp(-AMoo)- lim/4exp(-A//i\)

    _

    p\\-\\r q- pqX-^q-p

    Jofrom which we deduce that

    P(Moo - 0)= TT,and

    P(x < Moo < X + dx) = (1 - 7r)2e-(^-^)^c/x (x > 0),or,better,

    P(Moo > x) = (1 - 7r)e-(^-^)^ (x > 0).Suppose that jj, < 1. In this case, it is interesting to ask: what is the

    distribution of Zn conditioned by Zn ^ 0? We find that

    ^^' '^\"^'^-i-/\342\200\236(0) =13^'

    where

    p \342\200\224qjj,^'\"

    p-qyi^'

  • 12 Chapter 0: A Branching-Process Example (0.9)..

    so 0 < a\342\200\236< 1 and an -h Pn = 1. As n \342\200\224^oc, we see that

    an -^ 1 - //, ^n -^ /^,

    so (this is justified)(e) Um P{Zn = h\\Zn ^ 0) = (1 - yi)ii^-^ [k G N).

    n\342\200\224\342\226\272oo

    Suppose that jjL \342\200\2241. You can show by induction that

    [n + 1) \342\200\224n6and that

    E(e-^^\"/\302\273|Z\342\200\236^ 0)^1/(1 +A),

    corresponding to

    (f) P{Zn/n > x\\Zn 7^ 0) -> e-^ x > 0.

    'The Fatou factor'We know that when /z < 1, we have E(Mn) = 1, Vn, but E(Moo) = 0. Canwe get some insight into this?

    First considerthe case when jjl < 1. Result (e) makes it plausible thatfor large n,

    E{Zn\\Zn ^ 0) is roughly (1 - //) E kfi'^-' - 1/(1- fi).We know that

    P{Zn ^ 0) = 1- /\342\200\236(0)is roughly (1 - fi)fi^,so we should have (roughly)

    E(M\342\200\236)= E ('^1

    Z\342\200\236^o) P{Z\342\200\236^ 0)

    which might help explain how the 'balance' E(Mn) = 1 is achievedby bigvalues times small probabilities.

  • ..(0.9) Chapter0: A Branching-Process Example 13

    Now consider the case when fi = 1. Then

    P(Z\342\200\236^0) = l/(n + l),

    and, from (f), Zn/n conditionedby Zn ^ 0 is roughly exponential withmean 1, so that Mn = Zn conditionedby Zn \"^ 0 is on average of size aboutn, the correct order of magnitude for balance.

    Warning. We have just been using for 'correct intuitive explanations'exactlythe type of argument which might have misled us into thinking thatE(Afoo) = 1 in the first place. But, of course, the result

    E(M\342\200\236)= E{Mn\\Z\342\200\236^ 0)P(Z\342\200\2367^ 0) = 1

    is a matter of obvious fact.

  • PART A: FOUNDATIONS

    Chapter 1

    Measure Spaces

    1.0.Introductory remarksTopology is about oyen sets. The characterizingproperty of a continuousfunction / is that the inverse image f~^{G) of an open set G is open.

    Measure theory is about measurable sets. The characterizingpropertyof a measurable function / is that the inverse image f\"^ (A) of anymeasurableset is measurable.

    In topology, one axiomatizesthe notion of 'open set', insisting inparticular that the union of any collection of open setsis open,and that theintersection of a finite collectionof open sets is open.

    In measure theory, one axiomatizesthe notion of 'measurable set',insisting that the union of a countablecollectionof measurable sets ismeasurable, and that the intersection of a countable collection of measurable setsis also measurable.Also, the complement of a measurable set must bemeasurable, and the whole space must be measurable.Thus the measurablesetsform a a-algebra, a structure stable (or 'closed')under countably many setoperations. Without the insistence that 'only countably many operationsare allowed', measure theory would be self-contradictory - a point lost oncertainphilosophers of probability.

    The probability that a point chosenat random on the surface of the unitsphere 5^ in R^ falls into the subset F of 5^ is just the area of F dividedby the total area 47r. What could be easier?

    However, Banach and Tarski showed (see Wagon (1985)) that if theAxiom of Choiceis assumed,asit is throughout conventional mathematics,then there exists a subset F of the unit sphere S^ in R^ such that for

    14

  • ..(1.1) Chapter 1: Measure Spaces 15

    Z < k < oo (and even for k = oo), S^ is the disjointunion of k exact copiesofF:

    5^ = U r/*>F,1= 1

    where eachr^-

    ^ is a rotation. If F has an 'area', then that area mustsimultaneously be 47r/3,47r/4,..., 0. The only conclusion is that the set Fis non-measurable(not Lebesgue measurable): it is so complicated that onecannot assignan areato it. Banach and Tarski have not broken the Law ofConservation of Area: they have simply operatedoutsideits jurisdiction.Remarks, (i) Because every rotation r has a fixed point x on S^ such thatr(x) = X, it is not possibleto find a subset A of 5^ and a rotation r suchthat A U t{A) = S^ and A f] t{A) \342\200\2240. So, we could not have taken k = 2.

    (ii) Banach and Tarski even proved that given any two boundedsubsets A and B of R^ each with non-empty interior, it is possible to decomposeA into a certain finite number n of disjointpiecesA \342\200\224IJ^Lj A,- and B intothe samenumbern of disjoint pieces B = |jr=i ^\302\253'^^ such a way that, foreach 2, Ai is Euclid-congruent to B,!!! So, we can disassembleA and rebuildit as B.

    (iii) SectionAl.l (optional!) in the appendix to this chapter givesan Axiom-of-Choiceconstructionof a non-measurable subset of 5^.

    This chapter introducesa-algebras,Tr-systems, and measures

    and emphasizes m^onotone-convergence properties of measures. We shall seein later chapters that, although not all sets are measurable, it is always thecase for probability theory that enough sets are measurable.

    1.1. Definitionsofalgebra,a-algebraLet 5 be a set.

    Algebra on SA collection Eq of subsets of S is calledan algebra on S (or algebra ofsubsets of 5) if

    (i) S e So,(ii) FeSo => F^:=5\\F\342\202\254Eo,

    (iii) F,G \342\202\254So => FUGe So.[Note that 0 = 5^ \342\202\254So and

    F, C? \342\202\254So => F n C?= (F\" U G\\"") \342\202\254So.]

  • 16 Chapter 1: Measure Spaces (^-V--

    Thus,an algebra on 5 is a family of subsets of 5 stable under finitely manyset operations.

    Exercise (optional). Let C be the classof subsets C of N for which the'density'

    lim m-^i{k :1

  • ..(1.2) Chapter 1: Measure Spaces 17

    (7(C), (7-algebra generated by a class C of subsetsLet C be a class of subsets of 5. Then cr(C), the a-algebra generated by C,is the smallest cr-algebra E on 5 such that C C E . It is the intersectionofall (7-algebras on S which have C as a subclass.(Obviously, the class of allsubsets of 5 is a cr-algebrawhich extends C.)1.2. Examples. Borel cr-algebras, B(5), B = B{R)Let 5 be a topological space.B{S)B(5), the Borel cr-algebra on 5, is the cr-algebrageneratedby the family ofopen subsets of S. With slight abuse of notation,

    B{S) :\342\200\224cr(open sets).

    B:=B(R)It is standard shorthand that B := B{R).

    The cr-algebra B is the most important of all cr-algebras. Every subsetof R which you meet in everyday use is an elementof B; and indeed it isdifficult (but possible!) to find a subset of R constructed explicitly (withoutthe Axiom of Choice)which is not in B.

    Elements of B can be quite complicated. However, the collection7r(R) := {(_oo,a:]: x G R}

    (not a standard notation) is very easy to understand, and it is often thecase that all we need to know about B is that(a) B =

  • 18 Chapter 1: Measure Spaces (1.3)..

    1.3. Definitions concerning set functionsLet 5 be a set, let Eo be an algebraon 5, and let /zq be a non-negative setfunction

    /io : So ~> [0,oc].

    AdditiveThen /zq is called additive if /io(0) = 0 and, for F, G G So,

    F n G = 0 => yio{F U G) = yLo{F)+ /io(G).

    Countably additiveThe map /zo is called countably additive (or cr-additive) if /i(0) = 0 andwhenever {Fn : n 6 N) is a sequence of disjoint sets in Eo with unionF = |JF\342\200\236in So (note that this is an assumptionsinceEo need not be a(7-algebra), then

    po(F) =^/.o(F\342\200\236).

    n

    Of course (why?), a countably additive set function is additive.

    1.4. Definition of measure spaceLet (5, E) be a measurable space, so that E is a cr-algebraon S.

    A map

    /i : E -^ [0,cx)].is calleda measureon (5, E) if /i is countably additive. The triple (5,S, /z)is then called a measure space.

    1.5. Definitions concerningmeasuresLet (5, E, /i) be a measure space. Then /z (or indeed the measure space(5, E, /i)) is calledfinite

    if /i(5) < oo,(7-finite

    if there is a sequence(5\342\200\236: n 6 N) of elements of E such that

    li{Sn) < oo (Vn \342\202\254N) and (J 5\342\200\236= 5.

    Warning. Intuition is usually OK for finite measures, and adapts well for(7-finite measures. However, measureswhich are not cr-finite can be crazy;fortunately, there are no such measures in this book.

  • ,.(1.6) Chapter1: Measure Spaces 19

    Probability measure, probability tripleOur measureyi is called a probability measure if

    and (5, E, /i) is then called a probability triple.//-null element of E, almosteverywhere (a.e.)An element F of E is calledfi-nuUii fi(F) = 0. A statement S about points5 of 5 is said to hold almost everywhere (a.e.) if

    F := {s : S{s) is false}G E and fi{F) = 0.

    1.6. LEMMA. Uniqueness ofextension,7r-systemsMoral: cr-algebras are 'difRcult', but 7r-systenis are 'easy'; so weaim to work with the latter.

    \342\226\272(a)Let S be a set. Let I be a 7r-system on S, that is, a family of subsetsof S stable under finite intersection:

    Let E := cr(J). Suppose that fii and /i2 CLf^ measures on (5, E) suchthat fii(S) = fJ'2{S) < cx) and fii = yL2 on J. Then

    fjii = fi2 on E.

    \342\226\272\342\226\272(b)Corollary. If two probability measures agree on a 7r-system,then they agree on the cr-algebra generated by that 7r-system.TheexampleB= (7(7r(R)) is of course the most important exampleof

    the E = cr(J) in the theorem.

    This result will play an important role. Indeed, it will be applied morefrequently than will the celebrated existence result in Section 1.7. Becauseof this, the proof of Lemma 1.6 given in Sections Al.2-1.4 of the appendixto this chapter should perhaps be consulted - but read the remainderofthis chapter first.

  • 20 Chapter 1: MeasureSpaces (1.7)..1.7. THEOREM.Caratheodory'sExtension Theorem

    \342\200\242\342\226\272Let S be a set, let So be an algebra on S, and let

    E:=(7(Eo).If fiQ is a countably additive map fio : T,o -^ [0,oo], then there exists ameasureji on (5, E) such that

    ^i ^ fiQ on Eo-

    If fJ>o{S) < oo, then, by Lemma 1.6, this extension is unique - analgebra is a ir-system!In a sense,this result should have more \342\226\272signs than any other, for

    without it we could not construct any interesting models. However, oncewe have our model, we make no further use of the theorem.

    The proof of this result given in Sections A 1.5-1.8 of the appendix isthere for completeness. It will do no harm to assume the result for thiscourse. Let us now see how the theorem is used.

    1.8. Lebesgue measure Leb on ((0,1],B(0,1])Let S = (0,1]. For F C 5, say that F G Eo if F may be written as a finiteunion

    (*) F = (ai,6i]U...U(a^,6r]where r \342\202\254N, 0 < ai < 6i < \" - < ar < br < 1. Then Eq is an algebra on(0,1] and

    E:=(7(Eo) = B(0,l].(We write S(0,1] instead of S((0,1]).) For F as at (*),let

    fio{F)==J2(bk-ak).k

  • ..(1.10) Chapter 1: Measure Spaces 211.9. LEMMA. Elementary inequalitiesLet (5, E,//) he a measure space. Then

    (a) fi(AuB)

  • 22 Chapter 1: MeasureSpaces (1.10)..\342\226\272(b)If Gn \342\202\254S, G\342\200\236i G and ^i{Gk) < cx) for some h, then //(G\342\200\236)i //(G).

    Proof of{h). For n 6 N, let F\342\200\236:= Ga:\\Ga;+\342\200\236, and now apply part (a). D

    Example - to indicate what can 'go wrong\\ For n G N, let

    Hn := (n,oo).ThenLeb(^n) = cx),Vn, but i?\342\200\236j 0.

    \342\226\272(c) The union of a countable number of fi-null sets is fi-null.This is a trivial corollary of results (1.9,b) and (1.10,a).1.11. Example/WarningLet (5, S,//) be ([0,1],S[0, l],Leb). Let \342\202\254{k)be a sequence of strictlypositive numbers such that \342\202\254{k)| 0. For a singlepoint x of 5, we have

    (a) {x} C (x - e{k), x -h e{k)) n S,so that for every fc, fi({x}) < 2\342\202\254{k),and so fi{{x}) = 0. That {x}is B{S)-measurablefollows because {x} is the intersection of the countablenumberof open subsets of S on the right-handside of (a).

    Let V = Q n [0,1], the set of rationals in [0,1]. Since V is acountable union of singletons: V = {vn : n G N}, it is clear that V is iB[0,1]-measurableand that Leb(V) = 0. We can include V in an open subset ofS of measure at most 4\342\202\254{k)as follows:

    VCGk= [j [(v\342\200\236- e(k)2-\", v\342\200\236+ e(fc)2-\") n 5] =: jj /\342\200\236,*.fiGN n

    Clearly, H := pj^ Gk satisfies Leb(-fir) = 0 and V C H. Now, it is aconsequenceof the Baire category theorem (see the appendixto this chapter)that H is uncountable^ so

    (b) the set H is an uncountable set of measure0; moreoverj

    k n n k

    Throughout the subject, we have to be careful about interchanging ordersof operations.

  • Chapter 2

    Events

    2.1. Model for experiment:(Q,^,P)A model for an experiment involving randomness takes the form of aprobability triple (fi,^, P) in the sense of Section 1.5.

    Sample spacef] is a set calledthe sample space.Sample pointA point u; of f] is called a samplepointEventThe (7-algebra ^ on f] is called the family of events, so that an event is anelementof ^, that is, an ^-measurable subset of Q.By definition of probability triple, P is a probability measure on (f],^).

    2.2. The intuitive meaningTyche, Goddess of Chance, chooses a point u; of f] 'at random' according tothe law P in that, for F in ^, P(F) represents the 'probability'(in the senseunderstood by our intuition) that the point uj chosen by Tyche belongs toF.

    The chosenpoint uj determines the outcome of the experiment. Thusthereis a map

    Q \342\200\224>set of outcomes,

    u; \302\273\342\200\224>outcome.

    There is no reason why this 'map' (the co-domain lies in our intuition!)should be one-one. Often it is the case that although there is some obvious'minimal' or 'canonical'model for an experiment, it is better to use somerichermodel.(For example, we can read off many properties of coin tossingby imbedding the associated randomwalk in a Brownian motion.)

  • 24 Chapter 2: Events (2,3)..

    2.3. Examples of (f],^) pairsWe leave the question of assigning probabilities until later.

    (a) Experiment: Toss coin twice. We can take

    Q. = [HH, HT, TH, TT}, T = P(fi) :=set of all subsets of Q.In this model,the intuitive event 'At least one head is obtained' is describedby the mathematical event (element of ^) {HH^HT^TH}.(b) Experiment:Toss coin infinitely often. We can take

    n = {H,T}'-',SO that a typical point cj of f] is a sequence

    uj = (u;i,u;2,...)^ ^n G {H,T}.

    We certainly wish to speak of the intuitive event 'ujn = W\\ where W G{if, T}, and it is natural to choose

    :f = (t{{ujen:ujn = w}:neN,w e {h,t}).Although T 7^ 'Pi^) (accept this!), it turns out that ^ is big enough;forexample, we shall see in Section 3.7 that the truth set

    p^(. Kk

  • ..(2.5) Chapter 2: Events 25

    2.4. Almostsurely (a.s.)\342\226\272Astatement S about outcomes is said to be true almost surely (a.s.)^ or

    with probability 1 (w.p.l)^ifF := {lv: S{uj) is true} G T and P(F) = 1.

    (a) Proposition. If Fn E J^ (n e N) and P(Fn) = l,Vn, then

    Proof. P(F^) = 0,Vn, so, by Lemma 1.10(c),P(Un-^n)= 0- But f]Fn =([jF^y. a

    (b) Somethingto think about. Some distinguished philosophers have tried todevelop probability without measure theory. One of the reasonsfor difficultyis the following.

    When the discussion(2.3,b)isextendedto define the appropriateprobability measure for fair coin tossing, the Strong Law of Large Numbers(SLLN) states that F \342\202\254^ and P(F) = 1, where F, the truth set of thestatement 'proportion of heads in n tosses \342\200\224>i', is defined formally in(2.3,b).

    Let A be the set of all maps a : N \342\200\224*N such that a(l) < a(2) G [\342\200\22400,00].^ [n>m J ^ ln>m J

  • 26 Chapter2: Events (2.5)..

    Obviously, ym '-= ^'^Pn>m ^n is monotonenon-increasingin m, so that thehmit of the sequencey^ exists in [\342\200\22400,00]. The use of tHm or |Um to signifymonotone limits will be handy, as will t/n J, t/oo to signify t/oo =i limt/n-

    (b) Analogously,

    liminf Xn := sup < inf Xn \\ =T li^ { i^^f ^n f \342\202\254[\342\200\22400,00].

    (c) We have

    Xn converges in [\342\200\22400,00] limsupxn, thenXn < z eventually (that is, for all sufficiently large n)

    (ii) if 2: < limsupx\342\200\236, thenXn > z infinitely often (that is, for infinitely many n).

    2.6. Definitions. limsupjE^n,(\302\243'n, i.o.)

    The event (in the rigorous formulation: the truth set of the statement)'number of heads/ number of tosses \342\200\224>^'

    is built out of simple events such as 'the n^^ toss results in heads' in arather complicatedway. We need a systematic method of being able tohandlecomplicatedcombinations of events. The idea of taking lim infs andlim sups of sets provides what is required.

    It might be helpful to note the tautology that, if E is an event, thenE = {uj :ujeE}.

    Suppose now that (En : n 6 N) Z5 a sequence of events.\342\226\272(a)We define

    (\302\243*\342\200\236,i.o.) : = (En infinitely often): = limsup\302\243'n -= f] [j En

    m n>m

    = {uj : for every m, 3n{uj) > m such that u; G ^n(u;)}= {uj : uj E En for infinitely many n}.

  • ..(2.8) Chapter 2: Events 27

    \342\226\272(b)(Reverse Fatou Lemma - needs FINITENESS of P)P(limsupJE;\342\200\236) > limsupP(E\342\200\236).

    Proof. Let Gm \342\226\240=Un>m ^n- Then (look at the definition in (a)) Gm i G,where G := HmsupE\342\200\236T By result (1.10,b), P(G\342\200\236)i P(G). But, clearly,

    P(G\342\200\236.)> sup P{En).

    Hence,

    P(G) >i Hm I sup P{En)\\ =: limsupPC^n). D\"* Ln>m J

    2.7. First Borel-Cantelli Lemma (BCl)\342\226\272\342\226\272 Let {En : n G N) be a sequence of events such that

    X:\342\200\236P(^n) m{u;)}= {uj : LJ \302\243En for all large n}.

    (b) Note that {En, evf = {E^, i.o.).\342\226\272\342\226\272(c)(Fatou's Lemma for sets - true for ALL measure spaces)

    P{\\hnin{En) < liminf P(\302\243'n).Exercise. Prove this in analogy with the proof of result (2.6,b), using(1.10,a)rather than (1.10,b).

  • Chapter 2: Events (2.9).

    2.9. ExerciseFor an event jB, define the indicator function I^- on Q via

    i.M:={;; :j uj ^ E.

    Let {En : n 6 N) be a sequenceof events. Prove that, for each u;,

    Iiimsup\302\243;\342\200\236(^)= limsupl\302\243;\342\200\236(u;),

    and estabHsh the corresponding result for Um infs.

  • Chapter 3

    Random Variables

    Let (5, E) be a measurable space, so that E is a cr-algebra on S.

    3.1. Definitions. E-measurablefunction,mS,(mE)\"^,bESuppose that h : S -^ R. For A C R, define

    h-\\A) :={se S:h{s)\302\243A].Then h is called H-measurable if /i\"^ : B -^ T,, that is, h-^(A) 6 E, VA E B.So, here is a picture of a E-measurable function h:

    Eiilis

    We write mE for the class of E-measurable functions on 5, and (mE)\"^ forthe class of non-negative elements in mE. We denote by bE the class ofbounded E-measurablefunctions on 5.Note. Because lim sups of sequences even of finite-valued functions may beinfinite, and for other reasons, it is convenient to extend thesedefinitionsto functions h taking values in [\342\200\224oo,oo] in the obvious way: h is calledTt-measurahle if h~^ : S[\342\200\224oo,oo] \342\200\224>E.

    Which of the various results stated for real-valued functions extend tofunctions with values in [\342\200\224oo,oo], and what these extensions are, shouldbeobvious.

    Borel function

    A function h from a topologicalspace5 to R is called Borel if h is B{S)-measurable. The most important caseis when S itself is R.

    29

  • 30 Chapter8: Random Variables (3.2)..

    3.2. Elementary Propositions on nieasurability(a) The map h~^ preserves all set operations:

    h-\\[j^A,) = [j,h-\\A,), h-\\A^) = {h-^{A)y, etc.Proof. This is just definition chasing. D

    \342\226\272(b)IfC C B and g{C) = B, then /i\"\"^ : C -^ E => he mS.Proof. Let \302\243be the class of elements B \\n B such that h~^(B) E E. Byresult (a), \302\243*is a cr-algebra, and, by hypothesis, S DC. D(c) If S is topological and ft : 5 \342\200\224>R is continuous, then h is Borel.Proof. Take C to be the classof open subsets of R, and apply result (b). D

    \342\226\272(d)For any measurable space (5, E), a function h : S -^ R is Jl-m,easurableif

    {h c}, {h > c}, etc.

    3.3. LEMMA.Sumsand productsof measurable functions aremeasurable

    \342\226\272 mS is an algebra over R, that is,if \\ E R and /i, /ii, /i2 E mE, then

    hi -{-h2 E mS, hih2 E mE, \\h E mE.Example of proof. Let c E R. Then for 5 E 5, it is clear that hi{s)-^h2{s) > cif and only if for some rational g, we have

    hi{s) > q > c \342\200\224h2{s).

    In other words,

    {hi + /i2 > c} = y ({hi > q}n{h2>c- q}),qeQ

    a countable union of elements of E. D

  • ,.(8.6) Chapter 3: Random Variables 31

    3.4. Composition Lemma.

    If h E mE and f G mB, then f o h E mE.Proof. Draw the picture:

    s -!urMr

    Note.Thereare obvious generaUzations based on the definition (importantin moreadvanced theory): if (5i,Ei) and (52, E2) are measurablespacesand h : Si -^ 82^ then h is called E1/E2-measurable if h~^ : E2 -^ Ei.Fromthis point of view, what we have called Y^-measurable should readTiIB-measurable (or perhaps E/S[\342\200\224oc, 00]-measurable).

    3.5. LEMMA on measurability of infs, lim infs of functions\342\226\272\342\226\272 Let (hn : n \302\243N) be a sequence of elements o/mE. Then

    (i) inf/in? (ii) liminf/in, (iii) lim sup/inare Ti-m.easurable (into ([\342\200\22400,00], S[\342\200\22400,00]), but we shall still writeinf hn E mE (for example)). Further,

    (iv) {s : lim/in('S) exists in R} E E.Proof (i) {inf/in> C} = flni^^n > c}.(ii) Let Ln{s):= \\ni{hr{s) : r > n}. Then Ln E mE, by part (i). But

    L(s) := lim inf/in('S) =| limXn('S) = supXn('S),and{i:

  • 32 ChapterS: Random Variables (3.7)..

    3.7. Example. Coin tossingLet n = {H,T}'^,u= (ui,U2,...),u;ne {H,T}. As in (2.3,b), we define

    f = aiW : u;n = W} : n e N,W e {H,T}).Let

    The definition of f guarantees that each Xn is a random variable. ByLemma3.3,

    Sn := Xi + X2 + \342\200\242\342\200\242\342\200\242+ Xn = number of heads in n tosses

    is a randomvariable.Next, for p 6 [0,1], we have

    . f number of heads 1 , _.. . .^, r-/ \\ iA:=

  • ..(3.10) Chapter 3: Random Variables 33

    is defined to he the smallest a-algebra y onO, such that each map Yy (7 E C)is y-measurahle. Clearly,

    a{Yy : 7 \342\202\254C) = a({u; \342\202\254^ : F-^H \342\202\254B} : 7 \342\202\254C, B \342\202\254S).If X is a random variable for some (f],^), then, of course, cr{X) C T.

    Remarks, (i) The idea introduced in this section is somethingwhich youwill pick up gradually as you work through the course. Don't worry aboutit now; think about it, yes!(ii) Normally, 7r-systems come to our aid. For example,if {Xn : n 6 N) is acollectionof functions on f], and Xn denotes a^Xk : fc < n), then the union[J A'n is a TT-system (indeed, an algebra) which generates (j{Xn : n 6 N).3.9. Definitions. Law, distributionfunctionSuppose that X is a random variable carried by some probability triple(f],jF,P). We have

    or indeed[0,1]

    [0,1]-^

    n^R^J'^B,

    a{X) ^B.

    Define the law Cx of X byCx:=PoX-\\ Cx:B ^[0,1].

    Then (Exercise!) Cx is a probability measure on (R,S). Since 7r(R) ={(\342\200\224cx),c]: c 6 R} is a 7r-systemwhich generatesS, Uniqueness Lemma 1.6shows that Cx is determined by the function Fx : R \342\200\224>[0,1] defined asfollows:

    Fx(c) := \302\243x(-oo,c] = P(X < c) = P{uj : X{uj)< c}.The function Fx is called the distribution function of X.

    3.10. Properties of distribution functionsSuppose that F is the distribution function F = Fx of some random variableX. Then(a) F:R-^[0,1], F T (that is, x < y =\302\273 F(x) < F(y)),(b) lim^^oo F{x) = 1, lim:c-.-ooF{x) = 0,(c) F is right-continuous.Proof of (c). By using Lemma (1.10,b), we see that

    P(X

  • 34 Chapter S: Random Variables (S.ll)..

    3.11. Existence of random variable with given distributionfunction

    \342\226\272IfF has the properties (a,b,c) in Section3.10,then, by analogy withSection 1.8 on the existence of Lebesgue measure,we can construct a uniqueprobability measure C on (R,5) such that

    C{~oQ,x] = F{x),\\fx.Take (17,J^,P) = (R, S, \302\243), X{u;) = co. Then it is tautological that

    Fx{x) = F{x)yx.Note. The measure Cjust describediscalledthe Lebesgue-Stieltjes measureassociated with F. Its existence is proved in the next section.

    3.12. Skorokhod representation of a random variable withprescribed distribution function

    Again let F : R -> [0,1] have properties (3.10,a,b,c). We can construct arandom variable with distribution function F carried by

    (Q,^,P) = ([0,l],S[0,l],Leb)as follows. Define (the right-hand equalities, which you can prove, are therefor clarification only)

    (al) X+(w) := \\rd{z : F{z) > a;} = supjy : F{y)< a.},(al) X-{lo) := hd{z : F{z) > w} = snp{y : F{y)< co}.The following picture shows cases to watch out for.

    M0X\302\261(a;)

    By definition of ^~,

    F{x)

    X-{Fix)) X X+{Fix))

    {CO < F{c)) iX-{co) < c).

  • ..(3.12) ChapterS: Random Variables 35

    Now,

    (^>.Y-(u;)) =^ {F{z)>ulso, by the right-continuity of F, F{X~{ijo)) > u, and

    {X-{u)< c) ^ L:< F{X-{u:)) < F(c)\\.

    Thus,(u < F(c))

  • 86 Chapters: Random Variables (3.IS).,

    3.13. Generated cr-algebras - a discussionSuppose that (Q,^, P) is a model for some experiment, and that theexperiment has been performed, so that (see Section 2.2) Tyche has made herchoice of u.

    Let (Ky : 7 \342\202\254C) be a collection of random variablesassociatedwithour experiment, and suppose that someone reports to you the followinginformation about the chosen point uj:

    (*) the values Yy{uj), that isj the observed values of the random variablesY, (7 e C).

    Thenthe intuitive significance of the cr-algebra 3^ := cr(Ky : 7 \342\202\254C) is that itconsists precisely of those events F for which, for each and every u;, you candecide whether or not F has occurred(that is, whether or not uj E F) onthe basis of the information (*); the information (*) is precisely equivalentto the following information:

    (**) the values If{uj) (F \342\202\254y).(a) Exercise. Prove that the cr-algebra(t(Y) generated by a single randomvariable Y is given by

    a{Y) = Y-\\B) := ({u; : Y{uj) e B} : B e B),and that cr(Y) is generated by the 7r-system

    7r(r) := {{u: Y{uj) < x} : x E R) = F-'(7r(R)). DThe following results might help clarify things. Good advice: stop

    readingthis section after (c)! Results (b) and (c) areproved in the appendixto this chapter.(b) If y : f] \342\200\224>R, then Z : f] \342\200\224>R is an R such that Z = f(Y).(c) If Yi, F2,. \342\200\242.,Yn are functions from f2 to R, then a function Z : Q, -^ Ris cr(Yi, F2, \342\200\242\342\200\242\342\200\242,yn)-measurable if and only if there exists a Borelfunction /on R\" such that Z = /(Yi, F2, \342\200\242\342\200\242\342\200\242,Yn). We shall see in the appendix thatthe more correct measurability condition on / is that / be 'S\"-measurable'.(d) If (Yy : 7 E C) is a collection(parametrizedby the infinite set C) offunctions from Q to R, then Z : fi \342\200\224^R is a{Yy : 7 6 C)-measurableif andonly if there exists a countablesequence(ji :i E N) of elements of C and aBorelfunction / on R^ such that

    Z = /(K,.,K,\342\200\236...).Warning - for the over-enthusiastic only. For uncountableC, S(R^) ismuch larger than the C-fold product measurespaceH^^c^i^)- ^^ is thelatter rather than the former which gives the appropriate type of / in (d).

  • ..(3.14) Chapters:Random Variables 373.14. The Monotone-Class TheoremIn the same way that Uniqueness Lemma 1.6 allows us to deduce resultsabout (7-algebras from results about 7r-systems, the following 'elementary'version of the Monotone-Class Theorem allows us to deduceresultsaboutgeneral measurable functions from results about indicatorsof elements of tt-systems. Generally, we shallnot usethe theorem in the main text, preferring'just to use barehands'. However, for product measure in Chapter 8, itbecomes indispensable.

    THEOREM.

    \342\226\272\342\226\272 Let Ti he a class of hounded functions from a set S into R satisfyingthe following conditions:

    (i) H is a vector space over R;(ii) the constant function 1 is an elementof 7i;(iii) if (fn) is a sequenceof non-negative functions in H, such thatfn^f where f is a hounded function on 5, then f E 7i.Then if 7i contains the indicator function of every set in some tt-system I, then Ti contains every hounded (j(I)-measurahle functionon S.

    For proof, see the appendixto this chapter.

  • Chapter 4

    Independence

    Let (fi,^, P) be a probability triple.4.1. Definitions of independenceNote. We focus attention on the cr-algebra formulation (and describethemore familiar forms of independence in terms of it) to acclimatize ourselvesto thinking of cr-algebras as the natural means of summarizing information.Section 4.2 shows that the fancy cr-algebra definitions agree with the onesfrom elementary courses.

    Independent a-algebras\342\226\272Sub-

  • ,.(4.2) Chapter 4' Independence 39

    4.2. The TT-system Lemma; and the more familiar definitionsWe know from elementary theory that events jE^i, \302\243\"2,... are independent ifand only if whenever n G N and z'l,... , in are distinct, then

    n

    corresponding results involving complements of the Ei^ etc., beingconsequences of this.

    We now use the UniquenessLemma1.6to obtain a significantgeneralization of this idea, allowing us to study independence via(manageable) TT-systems rather than (awkward) cr-algebras.

    Let us concentrateon the case of two cr-algebras.\342\226\272\342\226\272(a) LEMMA. Suppose that Q and H are sub-a-algebrasof J-', and that

    1 and J are TT-systems with

    a{i) = g,

  • ^0 Chapter4- I'Tidependence (4-^)\"

    Suppose now that X and Y are two random variables on (fi, ^, P) suchthat, whenever x,y 6 R,(b) P{X m}, because of independence, and the limit as r | 00 beingjustified by the monotonicity of the two sides.

    For X > 0, 1 \342\200\224X < exp(\342\200\224x), so that, since YlPn = 00,

    n>m \\ n>m J

    So, PpmsupjEn)^] =0. DExercise. Prove that if 0 < /?\342\200\236< 1 and S := X]Pn < 00,then [](! ~Pn) >0. Hint First show that if 5 < 1, then n(l - Pn) > 1 - 5.

  • ..(4-4) Chapter4' Independence 4^

    4.4. Example

    Let (Xn : n 6 N) be a sequence of independent random variables, eachexponentially distributed with rate 1:

    P(Xn >a:) = e-^ a: > 0.

    Then, for q > 0,P(Xn > alogn) = n-'',

    so that, using (BCl) and (BC2),

    (aO) P{Xn > alogn for infinitely many n) = log n + loglogn -f a log log log n, i.o. ) = S . if a < l'

    or etc. By combining in an appropriate way (think about this!) thesequence of statements (a0),(al),(a2),... with the statement that the union ofa countable number of null sets is null while the intersectionof a sequenceof probability-1 sets has probability 1, we can obviously make remarkablyprecise statements about the sizeof the big elements in the sequence (Xn).

    I have included in the appendix to this chapter the statementof atruly fantastic theorem about precise descriptionof long-term behaviour:Strassen's Law.

  • 42 Chapter 4- Independence (4-4)\"A number of exercises in Chapter E are now accessible to you.

    4.5. A fundamental question for modellingCan we construct a sequence (Xn : n E N) of independent random variables,Xn having prescribed distribution function Fn ? We have to be ableto answerYes to this question - for example, to be able to construct a rigorousmodelfor the branching-process model of Chapter 0, or indeed for Example 4.4to make sense. Equation (0.2,b) makesit clearthat a Yes answer to ourquestion is all that is needed for a rigorous branching-process model.

    The trick answer based on the existence of Lebesgue measure givenin the next section does settle the question. A more satisfying answer isprovided by the theory of product measure, a topic deferred to Chapter8.

    4.6. A coin-tossing model with applicationsLet (n, jr,P) be ([0,1],S[0,l],Leb).For u; E fi, expand uj in binary:

    UJ = O.UJ1UJ2 \342\200\242\342\200\242\342\200\242

    (The existence of two different expansions of a dyadic rational is not goingto cause any problems because the set D (say) of dyadic rationals in [0,1]has Lebesgue measure0 - it is a countable set!) An an Exercise, you canprove that the sequence (^n : n G N), where

    is a sequenceof independent variables each taking the values 0 or 1 withprobability ^ for either possibility. Clearly, (^^ : n E N) provides a modelfor coin tossing.

    Now define

    Yi(uj) := O.uJiuJ^ujQ ... ,Y2((j^) := 0.u;2Cc;5u;9... ,

    Y3{uj) := 0.u;4u;8u;i3 ... ,

    and so on. We now need a bit of common sense. Sincethe sequence

    has the same 'coin-tossing' properties as the full sequence (ujn : n G N), itis clear that

    Fi has the uniform distribution on [0,1];and similarly for the other F's.

  • ..(4-8) Chapter 4' I'f^d^P^f^d^'^^^^ 4^

    Since the sequences (1,3,6,.-O^ (2,5,9,...), ... which give rise to Yi, 1^2,\342\200\242\342\200\242\342\200\242aredisjoint, and therefore correspond to different sets of tosses of our 'coin', itis intuitively obvious that

    ^ Yi^Y2,... are independent random variables, each uniformlydistributed on [0,1].

    Now suppose that a sequence (F\342\200\236: n E N) of distribution functionsis given. By the Skorokhod representation of Section 3.12, we can findfunctions gn on [0,1] such that

    Xn := gn{yn) has distribution function Fn-But because the F-variablesare independent, the same is obviously true ofthe X-variables,

    \342\226\272 We have therefore succeeded in constructing a fam,ily (Xn : n E N) ofindependent random, variables with prescribed distribution functions.

    Exercise. Satisfy yourself that you could if forced carry through theseintuitive arguments rigorously. Obviously, this is again largely a case ofutilizing the Uniqueness Lemma 1.6 in much the same way as we did inSection 4.2.

    4.7. Notation: IID RVsMany of the most important problems in probabilityconcernsequences of

    .random variables (RVs) which are independent and identically distributed(IID).Thus, if (Xn) is a sequence of IID variables, then the Xn areindependent and all have the same distribution function F (say):

    P{Xn

  • ^^ Chapter 4' I'^f^^P^f^dence (4-^)\"

    Our concernwill be mainly with processes X = (X\342\200\236: n \342\202\254Z\"*\") indexed(or parametrized) by Z\"^. We think of Xn as the value of the process X attime n. For u; G fi, the map 7i \302\273-*Xn(u;) is called the sample path of Xcorresponding to the sample point lj.

    A very important example of a stochastic process is provided by aMarkov chain.

    \342\226\272\342\226\272Let\302\243\"be a finite or countable set. Let P = {pij : ij e E) he sl stochasticE X E matrix, so that for i,j G E, we have

    Pii > 0, Y^pik = l.k

    Let // be a probability measureon E, so that fi is specified by the values^- := //({f}),{iG E). By a time-homogeneous Markov chain Z \342\200\224{Zn : ri GZ\"*\") on E with initial distribution fi and 1-step transition m,atrix P is meanta stochasticprocessZ such that, whenever n G Z\"*\" and io, M,... , in gE.

    (a) P(Zo = iQ\\Zi = 2i;...;Z\342\200\236 = z'n) = fJ'ioPioh -\"Pin-iin-

    Exercise. Give a constructionof such a chain Z expressing Zn{^) explicitlyin terms of the values at u; of a suitable family of independent randomvariables. See the appendixto this chapter.

    4.9. Monkey typing Shakespeare

    Many interesting events must have probability 0 or 1, and we often showthat an event F has probability 0 or 1 by using some argument based onindependenceto show that P(F)^ = P(F).

    Here is a silly example, to which we apply a silly method, but onewhich both illustratesvery clearly the use of the monotonicity propertiesof measures in Lemma 1.10 and has a lot of the flavour of the Kolmogorov0-1 law. See the

    'Easy exercise' towards the end of this sectionfor aninstantaneous solution to the problem.

    Let us agreethat correctly typing WS, the Collected Works ofShakespeare, amounts to typing a particular sequence of N symbols on atypewriter. A monkey types symbols at random, one per unit time, producingan infinite sequence {Xn) of IID RVs with values in the set of all possiblesymbols. We agree that

    e := inf{P(A'i= x) : x is a symbol} > 0.Let H be the event that the monkey producesinfinitely many copies of WS.Let Hk be the event that the monkey will produce at least k copiesof WS in

  • .,(4-9) Chapter 4: Independence 4^

    all, and let Hm,k be the probability that it will produce at least k copies bytime m. Finally, let H^^^ be the event that the monkey producesinfinitelymany copies of WS over the time period [m -f- 1, cx)).

    Because the monkey's behaviour over [l,m] is independent of itsbehaviour over [m + 1, oo), we have

    But logic tells us that, for every m, H^\"^^ = HI Hence,

    P(Hm,knH) = P{Hm^k)P{H).But, as m t oo, Hm,k T Hk, and {H,ri,k H JjT) T (Hk HH) = H, it beingobvious that Hk 2 -S^- Hence, by Lemma 1.10(a),

    P{H)=P{Hk)P{H).However, sls k ] oo,Hk i H, and so, by Lemma 1.10(b),

    P(H) = PiH)PiH),whence P(-fir) = 0 or 1.

    The Kolmogorov 0-1 law produces a huge class of important events Efor which we must have P(\302\243') = 0 or P(^) = 1. Fortunately, it doesnot tellus which - and it therefore generates a lot of interesting problems!Easy exercise. Use the SecondBorel-CantelliLemmato prove that P(H) \342\200\2241. Hint, Let E\\ be the event that the monkey producesWS right away,that is, during time period [1,A^]. Then P(\302\243'i) > e^.Tricky exercise ( to which we shall return). If the monkey types onlycapital letters, and is on every occasion equally likely to type any of the 26,how long on average will it take him to producethe sequence

    'ABRACADABRA'?

    The next three sections involve quite subtle topics which take time toassimilate. They are not strictly necessary for subsequent chapters. TheKolmogorov 0-1 law is used in one of our two proofs of the Strong Law forIID RVs, but by that stage a quick martingaleproof (of the 0-1law) willhave been provided.

    Note. Perhaps the otherwise-wonderfulTgK makes its T too like J. Below,I use /C instead of Z to avoid the confusion.ScriptX, A*, is too like Greekchi, X? ^oo\\ but we have to live with that.

  • 46 Chapter 4' I'f^d^V^''^^^''^^^ (4-10)..

    4.10. Definition. Tail cr-algebras\342\226\272\342\226\272LetXi, JY\"2,... be random variables. Define

    n

    The (7-algebra T is called the tail a-algebraof the sequence (Xn : n 6 N).Now, T containsmany important events: for example,

    (bl) Fi := (lim-Yfe exists) := {uj : limXit(u;) exists},k

    (b2) F2 := (X^-^ik converges),

    (b3) -F3 .*= I hm exists 1 .

    Also, there are many important variableswhich are in mT: for example,.X ^ T X\\'{-X2^ \\-Xk(c) $i:=limsup 7 ,

    which may be \302\26100,of course.Exercise. Prove that Fi, F2 and Fz are are T-measurable, that the eventH in the monkey problem is a tail event, and that the various events ofprobability 0 and 1 in Section 4.4 are tail events.Hint - to be readonly after you have already tried hard.Lookat F3 for example. For each n, logic tellsus that F3 is equal to the set

    Fi\") := {u,:lim ^\"+^M +\342\200\242^-+

    ^n+*Hexists}.

    Now, Xn_|_i, Xn+2,... are all random variables on the triple (f],7^, P). That3F3\" E Tn now follows from Lemmas 3.3 and 3.5.

    4.11. THEOREM.Kolmogorov's0-1Law\342\226\272\342\226\272 Let (Xn : n E N) 6e a sequence0/independent random variables,

    and let T be the tail a-algebra of (Xn : n 6 N). Then T is P-trivial:that isy

    (i) FeT =^ P(F) = 0 or P(F) = 1,(ii) if ^ is a T-measurablerandom variable, then, ^ is almost deter-m,inistic in that for some constant c in [\342\200\22400,00],

    P(e = c) = l.

  • ..(4-11) Chapter4' I'^dependence 4'^

    We allow ^ = \302\261ooat (ii) for obvious reasons.

    Proof of (i). Let

    Step 1: We claim that Xn and Tn are independent.Proof of claim. The class IC of events of the form

    {u : Xi(u)

  • ^8 Chapter 4- Independence (4-11)\"

    Hence, P{C = c) = 1. \342\226\241Remarks. The examples in Section 4.10 showhow striking this result is.For example, i/J\\ri,-Y2,\342\200\242\342\200\242\342\200\242^^ cl sequence of independent random variables^then

    either P( V] Xn converges) = 0or P(y^ Xn converges) = 1.

    The Three Series Theorem(Theorem12.5)completely settles the questionof which possibility occurs.

    So,you can see that the 0-1 law posesnumerousinterestingquestions.Example. In the branching-process example of Chapter 0, the variable

    Moo :=limZn//i'*

    is measurable on the tail cr-algebraof the sequence (Zn : n E N) but neednot be almost deterministic.But then the variables (Zn : n E N) are notindependent.

    4.12. Exercise/WarningLet Yo, ill, ^2, \342\200\242\342\200\242\342\200\242be independent random variables with

    p(y'\342\200\236--fi) = p(r\342\200\236= -i) = i, vn.For n 6 N, define

    Xn := Vo^i ... ^n-Prove that the variables

    -X'i,X2,... are independent. Definey:^aiYi,Y2,...), T^ := a{Xr : r > n).

    Prove that

    c-f]

  • Chapter 5

    Integration

    5.0. Notation, etc. /i(/) :=:J f dfi^ /i(/; A)Let (5, S,/i) be a measurespace.We are interested in defining for suitableelements/ of mE the (Lebesgue) integral of / with respect to /z, for whichwe shall use the alternative notations:

    \342\226\272\342\226\272 fi{f) :=: Is f{s)fi{d3) :=: /^ fdfi.

    It is worth mentioning now that we shall also use the equivalentnotations for A 6 S:

    (with a true definition on the extreme right!) It should be clear that, forexample,

    Kf; f>x):= fi{f; A), where A = {s E S : f{s) > x}.Something else worth emphasizing now is that, of course, summation is

    a special type of integration. If (a\342\200\236: n E N) is a sequenceof real numbers,then with 5 = N, E = 'P(N),and jj, the measure on (5, E) with /i({fc}) = 1for every A: in N, then 5 \302\273\342\200\224>a^ is /z-integrable if and only if ^ |an | < 00,andthen

    y^an = / asiJ>{ds)= a dji.

    We begin by considering the integral of a function / in (mS)\"^, allowingsuch an f to take values in the extended half-line [0,00].

    49

  • 50 Chapter5: Integration (5.1)..

    5.1. Integrals of non-negative simple functions, SF'^If A is an element of E, we define

    A^o(U) := ^^{A) < cx).

    The use of /io rather than yi signifies that we currently have only a naiveintegral defined for simple functions.

    An element / of (mE)\"^ is called simple^ and we shall then write / ESF'^, if / may be written as a finite sum

    m

    (a) / = X^\302\253itUfcJk=i

    where ak E [0, oo] and Ak E T,. We then define

    (b) fioif) = Y^akfi{Ak) < oo (with O.oo := 0 =: oo.O).The first point to be checked is that /io(/) is well-defined; for / will havemany different representationsof the form (a), and we must ensure thatthey yield the same value of /io(/) in (b). Various desirable properties alsoneed to be checked,namely (c), (d) and (e) now to be stated:(c) ii f,g e 5F+ and //(/ ^ g) = 0 then /io(/) = f^oig);(d) ('Linearity') ii f,g e 5F+ and c > 0 then f + g and cf are in 5F+,and

    Mo(/ + g)=^ Mo(/) + f^o{g), fJ'o{cf)= c/io(/);(e) (Monotonicity) if f,g e SF'^ and f < g, then /io(/) < l^o{g)](f) ii f,g e 5F+ then / A ^f and / V ^ are in 5F+.

    Checking all the propertiesjust mentionedis a little messy, but itinvolves no point of substance, and in particular no analysis.We skip this, andturn our attention to what matters: the Monotone-Convergence Theorem.

    5.2. Definition of/i(/), / E (mE)+\342\226\272For/ E (mE)\"^ we define

    (a) fi{f):= sup{fio{h) : h \342\202\254SF+, ft < /} < oo.Clearly, for / E 5F+, we have fi{f) = fio{f).

    The following result is important.

  • ..(5.3) Chapter 5: Integration 51

    LEMMA\342\226\272(b) // / G (mE)+ and fi{f) = 0, then

    K{/>o}) = o.

    Proof. Obviously, {/ > 0}=T limj/ > n~^}. Hence, using (1.10,a), we seethat if /i({/ > 0}) > 0, then, for some n, /i({/ > n~^}) > 0, and then

    fi{f)>fio{n-'l{f>i/n})>0. \342\226\241

    5.3. Monotone-Convergence Theorem (MON)\342\226\272\342\226\272\342\226\272(a)If (/n) is a sequence of elements of (mE)\"^ such that /\342\200\236f /,

    thenM(/n) T M(/) < OO,

    or, in other notation,

    / fnisUds) T / f{s)fJi{ds).Js JsThis theoremis really all there is to integration theory. We shall see thatother key results sucha^ the Fatou Lemma and the Dominated-ConvergenceTheoremfollow trivially from it.

    The (MON) theorem is proved in the Appendix. Obviously, thetheorem relates very closely to Lemma 1.10(a), the monotonicity result formeasures. The proof of (MON) is not at all difficult, and may be read onceyou have lookedat the following definition of o:^''^.

    It is convenient to have an explicit way given / E (mE)\"^ of obtaininga sequence f^^^ of simple functions such that f^^^ | /. For r E N, define ther^^ staircase function a^^^ : [0,cx)] -^ [0,cx)] as follows:

    (0 if X = 0,(b) a(''>(x) := I {i- 1)2-'' if {i - 1)2-'' T / so that, by (MON),/i(/)=Tlim//(/''>) =Tlim/io(/^''^).

    We have made a^''^ left-continuous so that if /\342\200\236T / then \302\273('')(/\342\200\236)T Oi^'^Hf)-

  • 52 Chapter 5: Integration (5.3)..

    Often, we need to apply convergence theorems such as (MON) wherethe hypothesis (/\342\200\236T / ii^ ^^e case of (MON)) holds almost everywhererather than everywhere. Let us see how such adjustments may be made.(c) If f,9 e (mE)+ and f = g (a.e.), then fi{f) = fi{g).Proof. Let /(''> = \302\253(''>o /, ^('') = a^''^ o g. Then /(''> = g^\"^^ (a.e.) and so,by (5.1,c), /i(/^''^) = Kg^\"^^)' Now let r t oo, and use (MON). D

    \342\226\272(d) If f E (mE)+ and (/\342\200\236)is a sequence in (mE)\"^ such that, except ona jjL-null set iV\", /\342\200\236T /\342\200\242Then

    Kfn) T M/).

    Proof We have /i(/) = fi{fls\\N)and /i(/n) = //(/nl5\\iv)- But fnls\\N Tfls\\N everywhere. The result now follows from (MON). DFrom now on, (MON) is understood to include this extension.We do notbother to spell out such extensionsfor the other convergence theorems,often stating results with 'almost everywhere' but proving them under theassumptionthat the exceptional null set is empty.

    Note on the Riemann integralIf, for example, / is a non-negative Riemann integrable function on ([0,1],S[0,1], Leb) with Riemann integral I, then there exists an increasingsequence (Ln) of elements of SF\"^ and a decreasing sequence (Un) of elementsof SF\"^ such that

    Ln'{Lfand fjL^Ln) T I? y^{Un) i L If we define

    2[L if X = [/,\\ 0 otherwise,

    then it is clear that / is Borel measurable, while (since/i(X) = /i(^) = 1){/ 7^ /} is a subset of the Borelset {L ^ U) which Lemma 5.2(b) showsto be of measure 0. So / is Lebesgue measurable (see SectionA 1.11)and the Riemann integral of / equals the integralof / associated with([0,1], Le6[0,1],Leb), Le6[0,1]denoting the

  • ..(5.6) Chapter5: Integration 53

    Proof. We have

    (*) liminf/n =T lim^^, wheregk := infn>*: fn-n

    For n > A;, we have /\342\200\236> gk^ so that /i(/n) > l^iQk)-, whence

    li{gk) < inf //(/\342\200\236);n>Ai;

    and on combining this with an appHcation of (MON) to (*), we obtain

    //(Uminf/n)=t hm/i(

  • 54 Chapter5: Integration (5.7)..

    5.7. Integrable function, \302\243^(5, E,/i)

    \342\226\272For/ \342\202\254mE, we say that / is fi-integrable^ and write

    if

    M(i/i) = M/\"')+M(r)f{s) for every s in S and that

    the sequence (/\342\200\236)is dominated by an element g o/>C^(5, E,/z)\"^:|/n(^)|

  • ..(5.11) Cha'pier 5: Integration 55

    Proof. We have |/\342\200\236- /| < 2g, where fi{2g) < oo,so by the reverse FatouLemma 5.4(b),

    \\imsupfi{\\fn-- /I) < /i(Hmsup|/n - /I) = /i(0) = 0.

    SinceIM/n) - /^(/)l = IM/n - /)l < M(I/\302\253- /I),

    the theorem is proved. CD

    5.10. Scheffe's Lemma (SCHEFFE)\342\226\272(i) Suppose that fn,f \342\202\254\302\243^(5, E,//)\"^; m particular, fn and f are non-

    negative. Suppose that fn-^f (a.e.). Thenf^ilfn

    - /I) -^ 0 if and only if fi(fn) -^ Kf)-

    Proof The 'only if part is trivial.Suppose now that

    (a) Kfn) ^ Kf).Since(/\342\200\236- /)- < /, (DOM) shows that(b) p((/n-/)-)-0.Next

    M((/n-/)+) = M(/n-/;/n>/)=

    Kfn)- Kf) - Kfn -f;fn< /)\342\200\242

    But

    K/n -/;/n < /)| < K(/n -/)-)!- 0SO that (a) and (b) together imply that(C) M((/n-/)+)-0.Of course, (b) and (c) now yield the desiredresult. D

    Hereis the second part of SchefFe's Lemma,\342\226\272(ii) Suppose that fn^fE \302\243^(5, E,/i) and that fn-^f (a.e.). Then

    K\\fn- /I) -^ 0 if and only if ^(|M) -^ fi{\\f\\).

    Exercise. Prove the 'if part of (ii) by using Fatou's Lemma to show thatf^ift) \"^ Kf^)^ ^^^ ^^^^ applying (i). Of course, the 'only if part istrivial.

    5.11. Remark on uniform integrabilityThe theory of uniform integrability, which we shall establishlaterforprobability triples, gives better insight into the matter of convergence of integrals.

  • 56 Chapter 5: Integration (5.12)..5.12.ThestandardmachineWhat I call the standard machine is a much cruder alternative to theMonotone-Class Theorem.

    The idea is that to prove that a 'linear' result is true for all functions/i in a space suchas \302\243^(5, E, /z),

    \342\200\242first, we show the result is true for the case when h is an indicatorfunction - which it normally is by definition;

    \342\200\242then, we use linearity to obtain the result for h in SF ;\342\200\242next, we use (MON) to obtain the result for h G (mE)\"^, integrability

    conditions on h usually being superfluous at this stage;\342\200\242finally, we show, by writing h = h\"^ \342\200\224h'~ and using linearity, that

    the claimed result is true.It seemsto me that, when it works, it is ea

  • ..(5.14) Chapter5: Integration 57

    5.14. The measure f/.i^ f \342\202\254(mS)\"*\"Let / G (mE)+. For A G S, define

    (a) (ff,){A):=fi{f;A):=fi(flAy

    A trivial Exercise on the results of Section5.5and (MON) shows that

    (b) (ff^) ^^ ^ measure on (5, S).

    For h 6 (niE)\"^, and A \342\202\254S, we can conjecture that

    (c) (h{fl^))(A) := (/M)(ftU)= KfhU).If h is the indicator of a set in E, then (c) is immediateby definition. Ourstandard machine produces (c), so that we have

    (d) hifl,) = (hf),!.Result (d) is often used in the following form:

    \342\226\272(^)^f f ^ (\"^^)^ ^^^ ^ ^ (n^S)> then h 6 \302\243^(5, E,///) if and only iffh e C^{S,S, /i) and then {ffi){h) = fi{fh).Proof. We need only prove this for ft > 0 in which case it merely says thatthe measures at (d) agreeon 5. DTerminology, and the Radon-Nikodym theoremIf A denotes the measure ffi on (5, E), we say that A has density f relativeto //, and expressthis in symbols via

    d\\/dfi = f.

    We note that in this case, we have for i^ E E:

    (f) ^{F) = 0 impliesthat X{F) = 0;so that only certain measures have density relative to fi. The Radon-Nikodyin theorem (proved in Chapter 14) tells us that(g) if fi and A are a-finite measures on (5, E) such that (f) holds, then\\ = fji for some f \342\202\254(mE)\"^.

  • Chapter 6

    Expectation

    6.0. Introductory remarksWe work with a probability triple (fi,^, P), and write C^ for C^{Q.^T^ P).Recall that a random variable (RV) is an element of m^, that is an J^-measurable function from fi to R.

    Expectationis just the integralrelative to P.Jensen's inequality^ which makes critical use of the fact that P(r2) = 1, isvery useful and powerful: it implies the Schwarz,Holder,... inequalities forgeneral (5, E,//). (See Section 6.13.)We study the geometry of the space C^{Q.^J-P^) in somedetail,with a viewto several later applications.

    6-1- Definition ofexpectationFor a random variable X E >C^ = \302\243^(fi, J^, P), we define the expectationE(X) of Xhy

    E(X) := / XdP = / X{u)P{duj),We also define E(X) (< oo) for X \342\202\254(m^)+. In short, E(X) = P(X).

    That our presentdefinitions agree with those in terms of probabilitydensity function (if it exists) etc. will be confirmed in Section 6.12.

    6-2- Convergence theoremsSuppose that {Xn) is a sequence of RVs, that X is a RVj and that Xn \342\200\224>Xalmost surely:

    P(Xn ^ X) = 1.We rephrase the convergence theorems of Chapter 5 in our new notation:

    58

  • ..(6.4) Chapter 6: Expectation 59\342\226\272\342\226\272(MON) if 0 < Xn T X, then E(X\342\200\236)T E(X) < oo;

    \342\226\272\342\226\272(FATOU) ifX\342\200\236> 0, then E(X) < liminf E(X\342\200\236);\342\226\272(DOM) if \\X\342\200\236{u)\\< Y(uj) V(n,w), where E{Y) < oo, then

    E(|X\342\200\236-.Y|)^0,

    30 that

    E{Xn) - E(X);\342\226\272(SCHEFFE) ifE(\\Xn\\) -^ E(|X|), then

    E{\\Xn-X\\)-^0;

    \342\226\272\342\226\272(BDD) if for some finite constant K, \\Xn((^)\\ < Ky{n^u), then

    E(|Xn-X|)->0.

    The newly-added BoundedConvergence Theorem (BDD) is animmediate consequence of (DOM), obtained by taking Y{ijj) = K^ Vu;; becauseof the fact that P(fi) = 1, we have E(F) < oo.It has a direct elementaryproof which we shall examinein Section 13.7; but you might well be ableto provide it now.

    As has been mentioned previously, uniform integrability is the keyconcept which gives a proper understanding of convergencetheorems.We shallstudy this, via the elementary (BDD)result,in Chapter 13.6.3. The notation E(X; F)For X eC^ (or (mJF)+) and F 6 JF, we define

    \342\226\272 E(X; F) := /^ X(u)P(cL;) := E(XI^),where,as ever.

    Of course, this tallies with the /i(/; A) notation of Chapter 5.6.4. Markov's inequalitySuppose that Z E mj-\" and that ^ : R \342\200\224>[0, oo] is B-m.easurable and non-decreasing.(We know that g{Z) = g o Z E (m^)\"^.^ Then

    \342\226\272 \302\243g{Z) > E(^(Z); Z > c) > g(c)P(Z > c).

  • 60 Chapter 6: Expectation (6-4)'-

    Examples: for Z \342\202\254(m^)+, cP(Z > c) < E(Z), (c > 0),for X e C\\ cP(\\X\\ >c)< E{\\X\\) (c > 0).

    >->-Considerablestrength can often be obtained by choosing the optimum 0 forc in

    \342\226\272 P(F > c) < e-^^E(e^^), (^ > 0, c \342\202\254R).

    6.5. Sums of non-negative RVsWe collect together some useful results.

    (a) If X e imJ=')-^ and E{X) < oo, then P{X < oo)= 1. This is obvious.\342\226\272(b)If (Zk) is a sequence in (m^)\"^, then

    This is an obvious consequence of linearity and (MON).\342\226\272(c)If (Zk) is a sequence in (m^)\"^ such that X^E(Z)t) < oo, then

    ^Zk < oo (a.s.) and so Zfc \342\200\224>0 (a.s.)This is an immediate consequenceof (a) and (b).(d) The First Borel-CantelliLemma is a consequence of (c). For supposethat (Fk) is a sequence of events such that ^ P{Fk) < oo. Take Zk = Ipk-Then E(Zk) = P(Fk) and, by (c),

    Y^ If^ = number of events Fk which occur

    is a.s. finite.

    6.6. Jensen's inequality for convexfunctions\342\226\272\342\226\272Afunction c : G \342\200\224>R, where G is an open subinterval of R, is called

    convex on G if its graph lies below any of its chords: for x,y E G and0

  • ..(6.7) Chapter 6: Expectation 61

    THEOREM. Jensen's inequality\342\226\272\342\226\272 Suppose that c : G -^ H is a convex function on an open subinterval

    G of R and that X is a random variable such that

    E(|X|) < oo, P(X \342\202\254G) = 1, E|c(X)| < oo.

    ThenEc{X)> c(E(X)).

    Proof. The fact that c is convex may be rewritten as follows: ior u^v^w 6 Gwith u < V < w, we have

    A ^ A T_ A ^(^) ~ ^(^)Au,v < At,,u\342\200\236where Au,u := -^^-^^ ^-^.

    It is now clear (why?!) that c is continuous on G, and that for each v in Gthe monotonelimits

    (D-c)(v) :=t lii^ Au,i\342\200\236 (^+c)(^^) :=i Hm A^;,^^exist and satisfy (D-c){v) < (D^c)(v). The functions D-c and Z^^-carenon-decreasing,and for every v in G, for any m in [(Z)_c)(v), (\302\243)4.c)(v)] wehave

    c(x) > m(x \342\200\224v) + c(i;), x E G.In particular, we have, almost surely, for jjl := E(X),

    c{X) > m(X -fi) + c(m), m 6 [(D-c)(;.),(D+c)(/x)]and Jensen's inequality follows on taking expectations. D

    Remark. Forlateruse, we shall need the obvious fact that

    (a) c{x) = sup[(D_c)(^)(a: - q) + c{q)]= sup(ana: + bn) {x 6 G)qeG nfor some sequences (an) and (bn) in R. (Recallthat c is continuous.)6.7. Monotonicity of C^ norms

    \342\226\272\342\226\272For1 < p < cx),we say that X E C^ = a{Q.,7, P) if

    E(|X|^) < oo,

  • 62 Chapter 6: Expectation (6.7)..and then we define

    \342\226\272\342\226\272 II^IIp := {EdXl\}^.The monotonicity property referredto in the sectiontitleis the following:

    \342\226\272(a) ifl

  • ..(6.9) Cha'pier 6: Expectation 63

    Write Xrt~ X Nn,Yn''-Y ^n, so that Xn and Yn are bounded. For any

    0 < E[{aXn -f hYnf]= a^E{Xl) -f 2abE(XnYn) + b^E{Y^),

    and since the quadraticin a/b (orb/a,or...)doesnot have two distinct realroots,

    {2E{XnYn)y < AE(Xl)E(Y^) < AE{X^)E{Y^).Now let n t oo using (MON). \342\226\241

    The following is an immediate consequence of (a):(b) if X and Y are in C^, then so is X -^ Y, and we have the triangle law:

    \\\\X + YhC\"^, then by the monotonicity of norms, X,Y 6 >C^, so that wemay define

    Mx:=E(X), fiY-E{Y).Sincethe constant functions with values /ix,/^y are in \302\243^,we see that

    (a) X:=X-^fix, Y:=Y-fiYare in C^. By the Schwarz inequality, XY \342\202\254\302\243^,and so we may define

    (b) Cov(X,Y) := EiXY) = E[{X - ^cx){Y- /zy)].The Schwarz inequality further justifies expanding out the product in thefinal [ ] bracket to yield the alternative formula:

    (c) Coy(X,Y) = E{XY)-fixtiY.As you know, the variance of X is defined by

    (d) Var(X) := E[(X - fix)'] = E(X')- ^\\ = Cov(X, X).

  • 64 Chapter 6: Expectation (6.9)..Inner product,angleFor Z7, V G >C^,we define the inner (or scalar)product(e) {U,V):=E{UV\\

    and if ||J7||2 and ||F||2 ^ 0, we define the cosine of the angle 9 betweenUand V by

    (f) cos.=

  • ..(6.10) Chapter 6: Expectation 65

    Quotienting (or lack of it!): L^Our space \302\243\"^does not quite satisfy the requirements for an inner productspace because the best we can say is that (see (5.2,b))

    ||J7||2= 0 if and only if U = 0 almost surely.In functional analysis, we find an elegant solution by defining an

    equivalence relation

    U ^ V ii and only if f7 = V almost surely

    and define L'^ as '\302\243q^uotiented out by this equivalence relation'. Of course,oneneedsto checkthat if for i = 1,2, we have c,- 6 R and Ui,Vi 6 C^ withUi