theory_of_probability_Zitcovic.pdf

download theory_of_probability_Zitcovic.pdf

of 162

Transcript of theory_of_probability_Zitcovic.pdf

  • 7/21/2019 theory_of_probability_Zitcovic.pdf

    1/162

    Theory of ProbabilityMeasure theory, classical probability and stochastic analysis

    Lecture Notes

    by Gordan itkovic

    Department of Mathematics, The University of Texas at Austin

  • 7/21/2019 theory_of_probability_Zitcovic.pdf

    2/162

    Contents

    Contents 1

    I Theory of Probability I 4

    1 Measurable spaces 5

    1.1 Families of Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.2 Measurable mappings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81.3 Products of measurable spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101.4 Real-valued measurable functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131.5 Additional Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

    2 Measures 19

    2.1 Measure spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192.2 Extensions of measures and the coin-toss space . . . . . . . . . . . . . . . . . . . . . . 242.3 The Lebesgue measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282.4 Signed measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302.5 Additional Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

    3 Lebesgue Integration 36

    3.1 The construction of the integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363.2 First properties of the integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393.3 Null sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443.4 Additional Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

    4 Lebesgue Spaces and Inequalities 50

    4.1 Lebesgue spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 504.2 Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 534.3 Additional problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

    5 Theorems of Fubini-Tonelli and Radon-Nikodym 60

    5.1 Products of measure spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 605.2 The Radon-Nikodym Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 665.3 Additional Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

    1

  • 7/21/2019 theory_of_probability_Zitcovic.pdf

    3/162

    6 Basic Notions of Probability 72

    6.1 Probability spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 726.2 Distributions of random variables, vectors and elements . . . . . . . . . . . . . . . . 746.3 Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 776.4 Sums of independent random variables and convolution . . . . . . . . . . . . . . . . 82

    6.5 Do independent random variables exist? . . . . . . . . . . . . . . . . . . . . . . . . . . 846.6 Additional Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

    7 Weak Convergence and Characteristic Functions 89

    7.1 Weak convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 897.2 Characteristic functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 967.3 Tail behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1007.4 The continuity theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1027.5 Additional Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

    8 Classical Limit Theorems 106

    8.1 The weak law of large numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

    8.2 An iid-central limit theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1088.3 The Lindeberg-Feller Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1108.4 Additional Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

    9 Conditional Expectation 115

    9.1 The definition and existence of conditional expectation . . . . . . . . . . . . . . . . . 1159.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1189.3 Regular conditional distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1219.4 Additional Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

    10 Discrete Martingales 130

    10.1 Discrete-time filtrations and stochastic processes . . . . . . . . . . . . . . . . . . . . . 130

    10.2 Martingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13110.3 Predictability and martingale transforms . . . . . . . . . . . . . . . . . . . . . . . . . 13310.4 Stopping times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13510.5 Convergence of martingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13710.6 Additional problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

    11 Uniform Integrability 142

    11.1 Uniform integrability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14211.2 First properties of uniformly-integrable martingales . . . . . . . . . . . . . . . . . . . 14511.3 Backward martingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14711.4 Applications of backward martingales . . . . . . . . . . . . . . . . . . . . . . . . . . . 149

    11.5 Exchangeability and de Finettis theorem (*) . . . . . . . . . . . . . . . . . . . . . . . . 15011.6 Additional Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156

    Index 158

    2

  • 7/21/2019 theory_of_probability_Zitcovic.pdf

    4/162

    Preface

    These notes were written (and are still being heavily edited) to help students with the graduatecourses Theory of Probability I and II offered by the Department of Mathematics, University ofTexas at Austin.

    Statements, proofs, or entire sections marked by an asterisk () are not a part of the syllabusand can be skipped when preparing for midterm, final and prelim exams. .

    GORDAN ITKOVICAustin, TX

    December 2010.

    3

  • 7/21/2019 theory_of_probability_Zitcovic.pdf

    5/162

    Part I

    Theory of Probability I

    4

  • 7/21/2019 theory_of_probability_Zitcovic.pdf

    6/162

    Chapter1

    Measurable spaces

    Before we delve into measure theory, let us fix some notation and terminology.

    denotes a subset (not necessarily proper). A set Ais said to be countableif there exists an injection (one-to-one mapping) from A

    into N. Note that finite sets are also countable. Sets which are not countable are calleduncountable.

    For two functionsf :BC,g : AB , thecompositionf g :ACoffandgis givenby(f g)(x) =f(g(x)), for allx A.

    {An}nNdenotes a sequence. More generally, (A)denotes a collection indexed by theset.

    1.1 Families of Sets

    Definition 1.1 (Order properties) A (countable) family {An}nNof subsets of a non-empty setSissaid to be

    1. increasingifAn An+1for alln N,2. decreasingifAn An+1for alln N,3. pairwise disjointifAn Am= form =n,4. apartitionofSif

    {An

    }n

    Nis pairwise disjoint and

    nAn = S.

    We use the notation An A to denote that the sequence{An}nN is increasing and A =nAn.Similarly,An Ameans that {An}nNis decreasing andA = nAn.

    5

  • 7/21/2019 theory_of_probability_Zitcovic.pdf

    7/162

    CHAPTER 1. MEASURABLE SPACES

    Here is a list of some properties that a family Sof subsets of a nonempty setScan have:(A1) S,(A2) S S,(A3) A

    S Ac

    S,

    (A4) A, B S A B S,(A5) A, B S,A B B \ A S,(A6) A, B S A B S,(A7) An Sfor alln N nAn S,(A8) An S, for alln N andAn AimpliesA S,(A9) An S, for alln N and {An}nNis pairwise disjoint implies nAn S,

    Definition 1.2 (Families of sets) A family Sof subsets of a non-empty set Sis called an1. algebraif it satisfies (A1),(A3) and (A4),

    2. -algebraif it satisfies (A1), (A3) and (A7)

    3. -systemif it satisfies (A6),

    4. -systemif it satisfies (A2), (A5) and (A8).

    Problem 1.3 Show that:

    1. Every-algebra is an algebra.

    2. Each algebra is a-system and each-algebra is an algebra and a -system.

    3. A family Sis a-algebra if and only if it satisfies (A1), (A3), (A6) and (A9).4. A-system which is a-system is also a-algebra.

    5. There are-systems which are not algebras.

    6. There are algebras which are not -algebras (Hint: Pick all finite subsets of an infinite set.That is not an algebra yet, but sets can be added to it so as to become an algebra which is nota-algebra.)

    7. There are-systems which are not-systems.

    Definition 1.4 (Generated -algebras) For a familyAof subsets of a non-empty set S, the inter-section of all -algebras on Sthat contain A is denoted by (A)and is called the -algebra generatedby A.

    6

  • 7/21/2019 theory_of_probability_Zitcovic.pdf

    8/162

    CHAPTER 1. MEASURABLE SPACES

    Remark 1.5 Since the family 2S ofall subsets ofSis a -algebra, the concept of a generated-algebra is well defined: there is always at least one -algebra containingA- namely 2S. (A)is itself a -algebra (why?) and it is the smallest (in the sense of set inclusion)-algebra thatcontains A. In the same vein, one can define the algebra, the -system and the -system generatedbyA. The only important property is that intersections of-algebras, -systems and-systems

    are themselves-algebras,-systems and-systems.Problem 1.6 Show, by means of an example, that theunionof a family of algebras (on the sameS)does not need to be an algebra. Repeat for-algebras,-systems and-systems.

    Definition 1.7 (Topology) A topology on a set S is a family of subsets ofSwhich containsandSand is closed under finite intersections and arbitrary (countable or uncountable!) unions. Theelements ofare often called theopen sets. A setSon which a topology is chosen (i.e., a pair (S, )ofa set and a topology on it) is called a topological space.

    Remark 1.8 Almost all topologies in these notes will be generated by a metric, i.e., a setA Swill be open if and only if for each xA there exists > 0 such that {y S : d(x, y)< } A.The prime example is R where a set is declared open if it can be represented as a union of openintervals.

    Definition 1.9 (Borel -algebras) If(S, )is a topological space, then the -algebra (), generatedby all open sets, is called theBorel-algebraon(S, ).

    Remark 1.10 We often abuse terminology and call Sitself a topological space, if the topologyon it is clear from the context. In the same vein, we often speak of the Borel -algebra on a setS.

    Example 1.11 Some important-algebras. LetSbe a non-empty set:

    1. The set S= 2S (also denoted by P(S)) consisting of all subsets ofSis a-algebra.2. At the other extreme, the familyS ={, S}is the smallest -algebra onS. It is called the

    trivial-algebraon S.3. The set Sof all subsets ofSwhich are either countable or whose complements are countable

    is a -algebra. It is called thecountable-cocountable -algebraand is the smallest -algebra

    onSwhich contains all singletons, i.e., for which {x} Sfor allx S.4. The Borel-algebra onR(generated by all open sets as defined by the Euclidean metric onR), is denoted by B(R).

    Problem 1.12 Show that the B(R) =(A), for any of the following choices of the family A:1.A = {all open subsets ofR },2.A = {all closed subsets ofR },

    7

  • 7/21/2019 theory_of_probability_Zitcovic.pdf

    9/162

    CHAPTER 1. MEASURABLE SPACES

    3.A = {all open intervals in R},4.A = {all closed intervals in R},5.A = {all left-closed right-open intervals in R},

    6.A = {all left-open right-closed intervals in R}, and7.A = {all open intervals in R with rational end-points}8.A = {all intervals of the form(, r], whereris rational}.

    (Hint:An arbitrary open intervalI= (a, b)in R can be written asI= nN[a + n1, b n1]. )

    1.2 Measurable mappings

    Definition 1.13 (Measurable spaces) A pair(S, S)consisting of a non-empty set Sand a -algebraSof its subsets is called a measurable space.

    If(S, S)is a measurable space, and A S, we often say thatAis measurable in S.

    Definition 1.14 (Pull-backs and push-forwards) For a function f : S Tand subsetsA S,B T, we define the

    1. push-forwardf(A)ofA Sas

    f(A) = {f(x) : x A} T,2. pull-backf1(B)ofB Tas

    f1(B) = {x S : f(x) B} S.

    It is often the case that the notation is abused and the pull-back ofBunder fis denoted simplyby {f B}. This notation presupposes, however, that the domain offis clear from the context.Problem 1.15 Show that the pull-back operation preserves the elementary set operations, i.e., for

    f :S T, andB, {Bn}nN T,1. f1(T) =S,f1() = ,2. f1(nBn) = nf1(Bn),3. f1(nBn) = nf1(Bn), and4. f1(Bc) = [f1(B)]c.

    8

  • 7/21/2019 theory_of_probability_Zitcovic.pdf

    10/162

    CHAPTER 1. MEASURABLE SPACES

    Give examples showing that the push-forward analogues of the statements (1), (3) and (4) aboveare not true.

    (Note: The assumption that the families in (2) and (3) above are countable is not necessary.Uncountable unions or intersections commute with the pull-back, too.)

    Definition 1.16 (Measurability) A mappingf : S T, where(S, S)and (T, T)are measurablespaces, is said to be(S, T)-measurableiff1(B) Sfor eachB T.

    Remark 1.17 When T = R, we tacitly assume that the Borel -algebra is defined on T, and wesimply callfmeasurable. In particular, a functionf : RR, which is measurable with respect tothe pair of the Borel-algebras is often called a Borel function.

    Proposition 1.18 (A measurability criterion) Let(S,

    S)and(T,

    T)be two measurable spaces, and

    letC be a subset ofT such thatT = (C). Iff : S T is a mapping with the property thatf1(C) S, for anyC C, thenfis(S, T)-measurable.

    PROOF Let Dbe the family of subsets ofTdefined byD= {B T : f1(B) S}.

    By the assumptions of the proposition, we have C D. On the other hand, by Problem 1.15, thefamilyDhas the structure of the -algebra, i.e.,Dis a-algebra that containsC. RememberingthatT = (C)is the smallest-algebra that containsC, we conclude thatT D. Consequently,f1(B) Sfor allB T.Problem 1.19 Let(S, S)and(T, T)be measurable spaces.

    1. Suppose thatSand Tare topological spaces, and that Sand Tare the corresponding Borel-algebras. Show that each continuous function f : S T is (S, T)-measurable. (Hint:Remember that the functionfis continuous if the pull-backs of open sets are open.)

    2. Letf :S Rbe a function. Show thatfis measurable if and only if{x S : f(x) q} S, for all rationalq.

    3. Find an example of(S, S), (T, T)and a measurable function f : S Tsuch that f(A) ={f(x) : x A} Tfor all nonemptyA S.

    Proposition 1.20 (Compositions of measurable maps) Let(S, S),(T, T)and(U,U)be measur-able spaces, and let f : S T and g : T Ube measurable functions. Then the compositionh= g f :S U, given byh(x) =g(f(x))is(S,U)-measurable.

    9

  • 7/21/2019 theory_of_probability_Zitcovic.pdf

    11/162

    CHAPTER 1. MEASURABLE SPACES

    PROOF It is enough to observe thath1(B) =f1(g1(B)), for anyB U.

    Corollary 1.21 (Compositions with a continuous maps) Let(S, S)be a measurable space,Tbe atopological space and T the Borel-algebra onT. Letg :T Rbe a continuous function. Then themapg f :S R is measurable for each measurable function f :S T.

    Definition 1.22 (Generation by a function) Letf : S Tbe a map from the set Sinto a mea-surable space (T, T). The -algebra generated by f, denoted by (f), is the intersection of all-algebras SonSwhich makef(S, T)-measurable.

    The letter will typically be used to denote an abstract index set - we only assume that it is

    nonempty, but make no other assumptions about its cardinality.

    Definition 1.23 (Generation by several functions) Let(f) be a family of maps from a set Sinto a measurable space(T, T). The -algebra generated by(f), denoted by

    (f)

    , is the

    intersection of all-algebras onSwhich make eachf, , measurable.

    Problem 1.24 In the setting of Definitions 1.22 and 1.23, show that

    1. forf :S

    T, we have

    (f) = {f1(B) : B T }.(1.1)

    2. for a familtyf :S T, , we have

    (f)

    =

    f1 (T)

    ,(1.2)

    wheref1 (T) = {f1 (B) : B T }.(Note:Note how the right-hand sides differ in (1.1) and (1.2).)

    1.3 Products of measurable spaces

    Definition 1.25 (Products, choice functions) Let (S)be a family of sets, parametrized by some(possibly uncountable) index set . Theproduct

    S is the set of all functions s : S

    (calledchoice functions) with the property that s() S.

    10

  • 7/21/2019 theory_of_probability_Zitcovic.pdf

    12/162

    CHAPTER 1. MEASURABLE SPACES

    Remark 1.26

    1. When is finite, each function s : Scan be identified with an ordered tuple(s(1), . . . , s(n)), where nis the cardinality (number of elements) of, and 1, . . . , n issome ordering of its elements. With this identification, it is clear that our definition of aproduct coincides with the well-known definition in the finite case.

    2. The celebratedAxiom of Choice in set theory postulates that no matter what the family (S)is, there exists at least one choice function. In other words, axiom of choice simply assertsthat products of sets are non-empty.

    Definition 1.27 (Natural projections) For0 , the function0 : S S0 defined by

    0(s) =s(0), for s

    S,

    is called the(natural) projection onto the coordinate0

    .

    Definition 1.28 (Products of measurable spaces) Let{(S, S)} be a family of measurablespaces. Theproduct(S, S)is a measurable space (

    S, S), whereSis the

    smallest -algebra that makes all natural projections()measurable.

    Example 1.29 Whenis finite, the above definition can be made more intuitive. Suppose, just forsimplicity, that =

    {1, 2

    }, so that (S1,

    S1)

    (S2,

    S2)is a measurable space of the form (S1

    S2,

    S1

    S2), where S1 S2is the smallest-algebra onS1 S2which makes both1and 2measurable.The pull-backs under1of sets in S1are given by

    1(B1) = {(x, y) S1 S2 : x B1} =B1 S2, forB1 S1.Similarly

    1(B2) =S1 B2, forB2 S2.Therefore, by Problem 1.24,

    S1 S2= {B1 S2, S1 B2 : B1 S1, B2 S2}

    .

    Equivalently (why?)

    S1 S2= {B1 B2 : B1 S1, B2 S2}.

    In a completely analogous fashion, we can show that, for finitely many measurable spaces(S1, S1), . . . ,(Sn, Sn), we have

    ni=1

    Si= {B1 B2 Bn : B1 S1, B2 S2, . . . , Bn Sn}

    The same goes for countable products. Uncountable products, however, behave very differently.

    11

  • 7/21/2019 theory_of_probability_Zitcovic.pdf

    13/162

    CHAPTER 1. MEASURABLE SPACES

    Problem 1.30 We know that the Borel -algebra (based on the usual Euclidean topology) can beconstructed on each Rn. A-algebra on Rn (forn > 1), can also be constructed as a product-algebrani=1B(R). A third possibility is to consider the mixed case where1 < m < nis pickedand the-algebra B(Rm) B(Rnm)is constructed on Rn (which is now interpreted as a productofRm and Rnm). Show that we get the same -algebra in all three cases.

    Problem 1.31 Let(P, P), {(S, S)}be measurable spaces and set S= S, S=S.Prove that a mapf : P Sis (P, S)-measurable if and only if the composition f : P Sis (P, S)measurable for each . (Note: Loosely speaking, this result states that a vector-valued mapping is measurable if and only if all of its components are measurable.)

    Definition 1.32 (Cylinder sets) Let{(S, S)} be a family of measurable spaces, and let( S, S) be its product. A subset C

    Sis called a cylinder set if there exist

    a finite subset {1, . . . , n} of, as well as a measurable setB S1 S2 S n such that

    C= {s S : (s(1), . . . , s(n)) B}.A cylinder set for which the set B can be chosen of the form B = B1 Bn, for some B1S1, . . . , Bn Snis called aproduct cylinder set. In that case

    C= {s

    S : (s(1) B1, s(2) B2, . . . , s(n) Bn}.

    Problem 1.33

    1. Show that the family of product cylinder sets generates the product-algebra.2. Show that (not-necessarily-product) cylinders are measurable in the product-algebra.

    3. Which of the 4 families of sets from Definition 1.2 does the collection of all product cylindersbelong to in general? How about (not-necessarily-product) cylinders?

    Example 1.34 The following example will play a major role in probability theory. Hence the namecoin-toss space. Here =N and fori N,(Si, Si)is the discrete two-element spaceSi= {1, 1},Si = 2S1 . The product

    iN Si ={1, 1}N can be identified with the set of all sequences s =

    (s1, s2, . . . ), wheresi {1, 1}, i N. For each cylinder setC, there exists (why?) n Nand asubsetBof{1, 1}n such that

    C= {s= (s1, . . . , sn, sn+1, . . . ) {1, 1}N : (s1, . . . , sn) B}.

    The product cylinders are even simpler - they are always of the form C ={1, 1}N or C =Cn1,...,nk;b1,...,bk , where

    Cn1,...,nk;b1,...,bk =

    s= (s1, s2, . . . ) {1, 1}N : sn1 =b1, . . . , snk =bk

    ,(1.3)

    for somek N,1 n1< n2< < nk N andb1, b2, . . . , bk {1, 1}.

    12

  • 7/21/2019 theory_of_probability_Zitcovic.pdf

    14/162

    CHAPTER 1. MEASURABLE SPACES

    We know that the -algebra S= iNSiis generated by all projections i: {1, 1}N {1, 1},i N, where i(s) = si. Equivalently, by Problem 1.33,Sis generated by the collection of allcylinder sets.

    Problem 1.35 One can obtain the product -algebraS

    on{

    1, 1}N as the Borel -algebra corre-

    sponding to a particular topology which makes {1, 1}N compact. Here is how. Start by defininga mappingd : {1, 1}N {1, 1}N [0, )by

    d(s1, s2) = 2i(s1,s2), wherei(s1, s2) = inf{i N : s1i=s2i },(1.4)

    for sj = (sj1, sj2, . . . ), j= 1, 2.

    1. Show thatdis a metric on {1, 1}N.2. Show that {1, 1}N is compact underd. (Hint:Use the diagonal argument.)3. Show that each cylinder of

    {1, 1

    }N is both open and closed under d.

    4. Show that each open ball is a cylinder.

    5. Show that {1, 1}N is separable, i.e., it admits a countable dense subset.6. Conclude that Scoincides with the Borel-algebra on {1, 1}N under the metricd.

    1.4 Real-valued measurable functions

    Let L0(S, S;R)(or, simply, L0(S;R)or L0(R)or L0 when the domain(S, S)or the co-domainR areclear from the context) be the set of all S-measurable functionsf :S R. The set of non-negativemeasurable functions is denoted by L0+or L0([0, )).

    Proposition 1.36 (Measurable functions form a vector space) L0 is a vector space, i.e.

    f+ g L0, whenever, R, f ,g L0.

    PROOF Let us define a mapping F : S R2 byF(x) = (f(x), g(x)). By Problem 1.30, the Borel-algebra on R2 is the same as the product -algebra when we interpret R2 as a product of twocopies ofR. Therefore, since its compositions with the coordinate projections are precisely thefunctionsfandg, Problem 1.31 implies that F is(S, B(R2))-measurable.

    Consider the function : R2 Rgiven by (x, y) = x+ y. It is linear, and, therefore,continuous. By Corollary 1.21, the composition F : S R is(S, B(R))-measurable, and it onlyremains to note that

    ( F)(x) =(F(x)) =f(x) + g(x), i.e., F =f+ g.

    In a similar manner (the functions (x, y)max(x, y)and(x, y)xy are continuous from R2to R - why?) one can prove the following proposition.

    13

  • 7/21/2019 theory_of_probability_Zitcovic.pdf

    15/162

    CHAPTER 1. MEASURABLE SPACES

    Proposition 1.37 (Products and maxima preserve measurability) Letf, gbe in L0. Then1. f g L0,

    2. max(f, g)andmin(f, g) L0

    ,

    Even though the mapx 1/xis not defined on the whole R, the following problem is not toohard:

    Problem 1.38 Suppose thatf L0 has the property thatf(x) = 0for allx S. Then the function1/fis also in L0.ForA S, theindicator function1Ais defined by

    1A(x) = 1, x A0, x

    A.

    Despite their simplicity, indicators will be extremely useful throughout these notes.

    Problem 1.39 Show that forA S, we haveA Sif and only if1A L0.Remark 1.40 Since it contains the products of pairs of its elements, the setL0 has the structureof analgebra(not to be confused with the algebra of sets defined above). It is true, however, thatany algebra A of subsets of a non-empty setS, together with the operations of union, intersectionand complement forms a Boolean algebra. Alternatively, it can be given the (algebraic) structureof a commutative ring with a unit. Indeed, under the operation of symmetric difference,Ais anAbelian group (prove that!). If, in addition, the operation of intersection is introduced in lieu ofmultiplication, the resulting structure is, indeed, the one of a commutative ring.

    Additionally, a natural partial order given by f g iff(x) g(x), for all x S, can beintroduced onL0. This order is compatible with the operations of addition and multiplicationand has the additional property that each pair{f, g} L0 admits a least upper bound, i.e., theelementh L0 such thatf h,ghandhk , for any otherkwith the property thatf, gk .Indeed, we simply take h(x) = max(f(x), g(x)). A similar statement can be made for a greatestlower bound. A vector space with a partial order which satisfies the above properties is called avector lattice.

    Since a limit of a sequence of real numbers does not necessarily belong to R, it is often nec-essary to consider functions which are allowed to take the values and. The set R =R {, }is called theextended set of real numbers. Most (but not all) of the algebraic andtopological structure from R can be lifted to R. In some cases there is no unique way to do that, sowe choose one of them as a matter of convention.

    1. Arithmetic operations. Forx, y R, all the arithmetic operations are defined in the usualway whenx, y R. When one or both are in{, }, we use the following convention,where {+, , , /}:We define xy = z if all pairs of sequences{xn}nN,{yn}nN in R such that x = limn xn,y= limn ynand xn ynis well-defined for alln N, we have

    z= limn

    (xn yn).

    14

  • 7/21/2019 theory_of_probability_Zitcovic.pdf

    16/162

    CHAPTER 1. MEASURABLE SPACES

    Otherwise, x yis not defined. This basically means that all intuitively obvious conventions(such as + = and a0 = fora >0hold). In measure theory, however, we do makeone importantexceptionto the above rule. We set

    0 = 0 = 0 () = () 0 = 0.

    2. Order. < x

  • 7/21/2019 theory_of_probability_Zitcovic.pdf

    17/162

    CHAPTER 1. MEASURABLE SPACES

    The set of all measurable functions f :S Ris denoted by L0(S, S;R), and, as always we leaveoutSand Swhen no confusion can arise. The set of extended non-negative measurable functionsoften plays a role, so we denote it byL0([0, ])orL0+(R). UnlikeL0(R),L0(R)is not a vectorspace, but it retains all the order structure. Moreover, it is particularly useful because, unlikeL0(R), it is closed with respect to the limiting operations. More precisely, for a sequence {fn}nNin L

    0

    (R), we define the functionslim supn fn : S [, ]andlim infn fn: S [, ]by

    (lim supn

    fn)(x) = lim supn

    fn(x) = infn

    supkn

    fk(x)

    ,

    and

    (liminfn

    fn)(x) = lim infn

    fn(x) = supn

    infkn

    fk(x)

    .

    Then, we have the following result, where the supremum and infimum of a sequence of functionsare defined pointwise (just like the limits superior and inferior).

    Proposition 1.43 (Limiting operations preserve measurability) Let{fn}nN be a sequence inL0(R). Then

    1. supn fn, infn fn L0(R),2. limsupn fn, lim infn fn L0(R),3. iff(x) = limn fn(x)exists in Rfor eachx S, thenf L0(R), and4. the setA= {limn fnexists in R } is in S.

    PROOF

    1. We show only the statement for the supremum. It is clear that it is enough to show that theset{supn fn a}is inSfor alla (, ](why?). This follows, however, directly fromthe simple identity

    {supn

    fn a} = n{fn a},

    and the fact that-algebras are closed with respect to countable intersections.

    2. Define gn = supkn fkand use part 1. above to conclude that gn L0(R)for each n N.Another appeal to part 1. yields that lim supn fn = infn gnis in L0(R). The statement aboutthe limit inferior follows in the same manner.

    3. If the limit f(x) = limn fn(x)exists for all x S, thenf = liminfn fnwhich is measurableby part 2. above.

    4. The statement follows from the fact that A = f1({0}), where

    f(x) = arctan

    lim supn

    fn(s)

    arctan

    liminfn

    fn(x)

    .

    (Note:The unexpected use of the functionarctanis really noting to be puzzled by. The onlyproperty needed is its measurability (it is continuous) and monotonicity+bijectivity from

    16

  • 7/21/2019 theory_of_probability_Zitcovic.pdf

    18/162

    CHAPTER 1. MEASURABLE SPACES

    [, ]to[/2, /2]. We compose the limits superior and inferior with it so that we dontrun into problems while trying to subtract+ from itself.)

    1.5 Additional Problems

    Problem 1.44 Which of the following are-algebras on R?

    1.S= {A R : 0 A}.2.S= {A R : Ais finite}.3.S= {A R : Ais finite, orAc is finite}.4.S= {A R : Ais countable orAc is countable}.5.S= {A R : Ais open}.6.

    S=

    {A

    R : Ais open orAis closed

    }.

    Problem 1.45 ApartitionSis a family Pof non-empty subsets ofSwith the property that each Sbelongs to exactly oneA P.

    1. Show that the number of different algebras on a finite set Sis equal to the number of differentpartitions ofS. (Note:This number for Sn ={1, 2, . . . , n} is called thenth Bell numberBn,and no nice closed-form expression for it is known. See below, though.)

    2. How many algebras are there on the set S= {1, 2, 3}?3. Does there exist an algebra with 754elements?

    4. ForN

    N

    , letan be the number of different algebras on the set{

    1, 2, . . . , n}. Show thata1= 1,a2= 2,a3 = 5, and that the following recursion holds (wherea0 = 1by definition),

    an+1=nk=0

    n

    k

    ak.

    5. Show that the exponential generating function for the sequence {an}nNis f(x) =eex1, i.e.,that

    n=0

    anxn

    n! =ee

    x1 or, equivalently,an = dn

    dxn eex1

    x=0.

    Problem 1.46 Let(S, S)be a measurable space. For f, g L0

    show that the sets {f = g} ={xS : f(x) =g(x)}, {f < g} = {x S : f(x)< g(x)} are in S.

    Problem 1.47 Show that all

    1. monotone,

    2. convex

    functionsf :R R are measurable.

    17

  • 7/21/2019 theory_of_probability_Zitcovic.pdf

    19/162

    CHAPTER 1. MEASURABLE SPACES

    Problem 1.48 Let(S, S)be a measurable space and let f :S Rbe a Borel-measurable function.Show that the graph

    Gf= {(x, y) S R : f(x) =y},offis a measurable subset in the product space(S R, S B(R)).

    18

  • 7/21/2019 theory_of_probability_Zitcovic.pdf

    20/162

    Chapter2

    Measures

    2.1 Measure spaces

    Definition 2.1 (Measure) Let(S, S)be a measurable space. A mapping :S [0, ]is called a(positive) measureif

    1. () = 0, and2. (nAn) =

    nN (An), for allpairwise disjointsequences {An}nNin S.

    A triple(S, S, )consisting of a non-empty set, a -algebraSon it and a measure onSis called ameasure space.

    Remark 2.2

    1. A mapping whose domain is some nonempty setAof subsets of some set Sis sometimescalled aset function.

    2. If the requirement 2. in the definition of the measure is weakened so that it is only requiredthat(A1 An) = (A1) + +(An), for n N, and pairwise disjoint A1, . . . , An,we say that the mappingis afinitely-additive measure. If we want to stress that a map-pingsatisfies the original requirement 2. for sequencesof sets, we say thatis -additive(countably additive).

    Definition 2.3 (Terminology) A measureon the measurable space(S, S)is called1. aprobability measure, if(S) = 1,

    2. afinite measure, if(S)< ,3. a-finite measure, if there exists a sequence {An}nNin Ssuch that nAn = Sand(An)1. Then {Bn}nNis a pairwise disjoint sequence in

    Swith n

    k=1Bk =Anfor eachn N

    (why?). By-additivity we have

    (nAn) =(nBn) =nN

    (Bn) = limn

    nk=1

    (Bk) = limn

    (nk=1Bk) = limn (An).

    4. Consider the increasing sequence {Bn}nNin Sgiven by Bn = A1 \ An. By De Morgan laws,finiteness of(A1)and (3) above, we have

    (A1) (nAn) =(A1 \ (nAn)) =(nBn) = limn

    (Bn) = limn

    (A1 \ An)=(A1) lim

    n(An).

    Subtracting both sides from(A1)< produces the statement.5. We start from the observation that forA1, A1 Sthe setA1 A2can be written as a disjoint

    unionA1 A2= (A1 \ A2) (A2 \ A1) (A1 A2),

    so that(A1 A2) =(A1 \ A2) + (A2 \ A1) + (A1 A2).

    On the other hand,

    (A1) + (A2) = ((A1 \ A2) + (A1 A2)) +

    (A2 \ A1) + (A1 A2)=(A1 \ A2) + (A2 \ A1) + 2(A1 A2),

    and so(A1) + (A2) (A1 A2) =(A1 A2) 0.

    Induction can be used to show that

    (A1 An) nk=1

    (Ak).

    Since all(An)are nonnegative, we now have

    (A1 An) , for eachn N

    , where = nN(An).The sequence{Bn}nNgiven by Bn =nk=1Ak is increasing, so the continuity of measurewith respect to increasing sequences implies that

    (nAn) =(nBn) = limn

    (Bn) = limn

    (A1 An) .

    22

  • 7/21/2019 theory_of_probability_Zitcovic.pdf

    24/162

    CHAPTER 2. MEASURES

    Remark 2.7 The condition (A1)

  • 7/21/2019 theory_of_probability_Zitcovic.pdf

    25/162

    CHAPTER 2. MEASURES

    2.2 Extensions of measures and the coin-toss space

    Example 1.34 of Chapter 1 has introduced a measurable space({1, 1}N, S), where Sis the prod-uct-algebra on {1, 1}N. The purpose of the present section is to turn ({1, 1}N, S)into a mea-sure space, i.e., to define a suitable measure on it. It is easy to construct just any measure on

    {1, 1}N

    , but the one we are after is the one which will justify the name coin-toss space.The intuition we have about tossing a fair coin infinitely many times should help us start withthe definition of the coin-toss measure - denoted by C - on cylinders. Since the coordinatespaces {1, 1} are particularly simple, each product cylinder is of the form C={1, 1}N orC=Cn1,...,nk;b1,...,bk , as given by (1.3), for a choice 1 n1 < n2 < . . . , nk Nof coordinates andthe corresponding values b1, . . . , bk {1, 1}. In the language of elementary probability, eachcylinder corresponds to the event when the outcome of the ni-th coin is bi {1, 1}, for k =1, . . . , n. The measure (probability) of this event can only be given by

    C(Cn1,...,nk;b1,...,bk) = 12 12 12

    ktimes

    = 2k.(2.2)

    The hard part is to extend this definition toallelements ofS, and not only cylinders. For example,in order to state the law of large numbers later on, we will need to be able to compute the measureof the set

    s {1, 1}N : limn

    1n

    nk=1

    sk = 12

    ,

    which is clearly not a cylinder.Problem 1.33 states, however, that cylinders form an algebra and generate the -algebraS.

    Luckily, this puts us close to the conditions of the following important theorem of Caratheodory.The proof does not use unfamiliar methodology, but we omit it because it is quite long and tricky.

    Theorem 2.10 (Caratheodorys Extension Theorem) Let Sbe a non-empty set, let A be an algebraof its subsets and let: A [0, ]be a set-function with the following properties:

    1. () = 0, and2. (A) =

    n=1 (An), if{An}nNis a pairwise-disjoint family in A andA= nAn A.

    Then, there exists a measure on(A)with the property that(A) = (A)forA A.

    Remark 2.11 In words, a -additive measure on an algebraAcan be extended to a -additivemeasure on the-algebra generated by A. It is clear that the-additivity requirement of Theorem2.10 is necessary, but it is quite surprising that it is actually sufficient.In order to apply Theorem 2.10 in our situation, we need to check that is indeed a countably-additive measure on the algebra A of all cylinders. The following problem will help pinpoint thehard part of the argument:

    Problem 2.12 LetA be an algebra on the non-empty set S, and let :A [0, ]be a finite((S)< ) and finitely-additive set function on Swith the following, additional, property:

    limn

    (An) = 0, wheneverAn .(2.3)

    24

  • 7/21/2019 theory_of_probability_Zitcovic.pdf

    26/162

    CHAPTER 2. MEASURES

    Thenis-additive on A, i.e., it satisfies the conditions of Theorem 2.10.The part about finite additivity is easy (but messy) and we leave it to the reader:

    Problem 2.13 Show that the set-function C, defined by (2.2) on the algebraA of cylinders, isfinitely additive.

    Lemma 2.14 (Conditions of Caratheodorys theorem) The the set-function C, defined by(2.2)on the algebra A of cylinders, has the property(2.3).

    PROOF By Problem 1.35, cylinders are closed sets, and so{An}nNis a sequence of closed setswhose intersection is empty. The same problem states that {1, 1}N is compact, so, by the finite-intersection property1, we haveAn1 . . . Ank =, for some finite collection n1, . . . , nkof indices.Since {An}nNis decreasing, we must have An = ,forall n nk, and, consequently, limn (An) =0.

    Proposition 2.15 (Existence of the coin-toss measure) There exists a measure C on({1, 1}N, S)with the property that (2.2) holds for all cylinders.

    PROOF Thanks to Lemma 2.14, Theorem 2.10 can now be used.

    In order to prove uniqueness, we will need the celebrated-Theorem of Eugene Dynkin:

    Theorem 2.16 (Dynkins - Theorem) Let Pbe a-system on a non-empty setS, and letbea-system which contains P. Thenalso contains the-algebra (P)generated by P.

    PROOF Using the result of part 4. of Problem 1.3, we only need to prove that(P)(where(P)denotes the-system generated byP) is a -system. ForA S, letGAdenote the family of allsubsets ofSwhose intersections withAare in(P):

    GA = {C S : C A (P)}.

    Claim 1:GAis a-system forA (P). SinceA (P), clearlyS GA. For an increasing family {Cn}nNin GAwe have(nCn) A=n(Cn A). EachCn Ais

    in, and the family {Cn A}nNis increasing, so(nCn) A .1Thefinite-intersection propertyrefers to the following fact, familiar from real analysis: If a family of closed sets of a

    compact topological space has empty intersection, then it admits a finitesubfamily with an empty intersection.

    25

  • 7/21/2019 theory_of_probability_Zitcovic.pdf

    27/162

    CHAPTER 2. MEASURES

    Finally, forC1, C2 GwithC1 C2, we have

    (C2 \ C1) A= (C2 A) \ (C1 A) ,

    becauseC1 A C2 A.

    Since Pis a-system, for anyA P, we have P GA. Therefore,(P) GA, because GAis a-system. In other words, forA Pand B (P), we haveA B (P).

    That means, however, that P GB , for anyB(P). Using the fact that GBis a-system wemust also have(P) GB , for anyB (P), i.e.,A B (P), for allA, B (P), which showsthat(P)is-system.

    Proposition 2.17 (Measures which agree on a -system) Let(S, S) be a measurable space, andletP be a -system which generatesS. Suppose that 1 and 2 are two measures onSwith theproperty that1(S) =2(S)< and

    1(A) =2(A), for allA P.Then1= 2, i.e.,1(A) =2(A), for allA S.

    PROOF LetLbe the family of all subsets A ofSfor which 1(A) = 2(A). ClearlyP L, butL is, potentially, bigger. In fact, it follows easily from the elementary properties of measures (seeProposition 2.6) and the fact that1(S) = 2(S)

  • 7/21/2019 theory_of_probability_Zitcovic.pdf

    28/162

    CHAPTER 2. MEASURES

    Our next task is to probe the structure of the -algebra Son {1, 1}N a little bit more and showthat S = 2{1,1}N . It is interesting that such a result (which deals exclusively with the structure ofS) requires a use of a measure in its proof.

    Example 2.21 ((*) A non-measurable subset of {1, 1}N) Since -algebras are closed under count-

    able set operations, and since the product -algebra Sfor the coin-toss space {1, 1}N

    is generatedby sets obtained by restricting finite collections of coordinates, one is tempted to think that Scon-tainsallsubsets of{1, 1}N. That is not the case. We will use the axiom of choice, together withthe fact that a measure Ccan be defined on the whole of{1, 1}N, to show to construct anexample of a non-measurable set.

    Let us start by constructing a relationon{1, 1}N in the following way: we set s1 s2 ifand only if there exists n Nsuch that s1k = s2k, for k n(here, as always, si = (si1, si2, . . . ),i= 1, 2). In words,s1 ands2 are related if they only differ in a finite number of coordinates. It iseasy to check that is an equivalence relation and that it splits {1, 1}N into disjoint equivalenceclasses. One of the many equivalent forms of the axiom of choice states that there exists a subsetNof{1, 1}N which contains exactly one element from each of the equivalence classes.

    Let us suppose thatNis an element inS

    and see if we can reach a contradiction. Let Fdenotethe set of all finite subsets ofN. For each nonempty n ={n1, . . . , nk} F, let us define themappingTn: {1, 1}N {1, 1}N in the following manner:

    (Tn(s))l =

    sl, l n,sl, l n.

    In words,Tnflips the signs of the elements of its argument on the positions corresponding to n.We defineT= Id, i.e.,T(s) =s.

    Since nis finite, Tfpreserves the-equivalence class of each element. Consequently (andusing the fact that Ncontains exactly one element from each equivalence class) the sets NandTn(N) ={Tn(s) : sN} are disjoint. Similarly and more generally, the sets Tn(N)andTn(N)are also disjoint whenever n= n. On the other hand, each s {1, 1}N is equivalent to somes N, i.e., it can be obtained from sby flipping a finite number of coordinates. Therefore, thefamily

    N = {Tn(N) : n F}forms a partition of{1, 1}N.

    The mappingTnhas several other nice properties. First of all, it is immediate that it is involu-tory, i.e.,Tn Tn= Id. To show that it is(S, S)-measurable, we need to prove that its compositionwith each projection mapk : S {1, 1} is measurable. This follows immediately from the factthat fork N

    (Tn k)1({1}) =

    Ck;1, k n,Ck;

    1, k

    n,

    where. fori {1, 1},Ck;i ={s {1, 1}N : sk =i} - a cylinder. If we combine the involutivityand measurability ofTn, we immediately conclude thatTn(A) Sfor eachA S. In particular,N S.

    In addition to preserving measurability, the map Tnalso preserves the measure2 the in C, i.e.,C(Tn(A)) =C(A), for allA S. To prove that, let us pick nFand consider the set-function

    2Actually, we say that a map ffrom a measure space(S,S, S)to the measure space(T,T, T)is measure pre-servingif it is measurable and S(f1(A)) = T(A), for all A T. The involutivity of the map Tnimplies that thisgeneral definition agrees with our usage in this example.

    27

  • 7/21/2019 theory_of_probability_Zitcovic.pdf

    29/162

    CHAPTER 2. MEASURES

    n: S [0, 1]given byn(A) =C(Tn(A)).

    It is a simple matter to show that nis, in fact, a measure on (S, S)withn(S) = 1. Moreover,thanks to the simple form (2.2) of the action of the measure Con cylinders, it is clear that n= Con the algebra of all cylinders. It suffices to invoke Proposition 2.17 to conclude that n =Con

    the entire S, i.e., thatTnpreservesC.The above properties of the mapsTn, n Fcan imply the following: Nis a partition ofSinto

    countably many measurable subsets of equal measure. Such a partition {N1, N2, . . . } cannot exist,however. Indeed if it did, one of the following two cases would occur:

    1. (N1) = 0. In that case(S) =(kNk) =n (Nk) =

    n0 = 0 = 1 =(S).

    2. (N1) = >0. In that case(S) =(kNk) =n (Nk) =

    n = = 1 =(S).

    Therefore, the setNcannot be measurable in S.(Note:Somewhat heavier set-theoretic machinery can be used to prove that most of the subsets

    ofSare not in S, in the sense that the cardinality of the set Sis strictly smaller than the cardinalityof the set2

    S

    of all subsets ofS)

    2.3 The Lebesgue measure

    As we shall see, the coin-toss space can be used as a sort of a universal measure space in proba-bility theory. We use it here to construct the Lebesgue measure on[0, 1]. We start with the notionsomewhat dual to the already introduced notion of the pull-back in Definition 1.14. We leave itas an exercise for the reader to show that the set function f()from Definition 2.22 is indeed ameasure.

    Definition 2.22 (Push-forwards) Let(S, S, ) be a measure space and let (T, T) be a measurablespace. The measuref()on(T, T), defined by

    f(B) =(f1(B)), forB T,

    is called thepush-forwardof the measurebyf.

    Letf : S [0, 1]be the mapping given by

    f(s) =

    k=1 1+sk2 2k, s {1, 1}

    N.

    The idea is to use fto establish a correspondence between all real numbers in [0, 1]and theirexpansions in the binary system, with the coding1 0and 1 1. It is interesting to notethat f is not one-to-one3 , as it, for example, maps s1 = (1, 1, 1, . . . )and s2 = (1, 1, 1, . . . )into the same value - namely 12 . Let us show, first, that the map fis continuous in the metric d

    3The reason for this is, poetically speaking, that [0, 1]is not the Cantor set.

    28

  • 7/21/2019 theory_of_probability_Zitcovic.pdf

    30/162

    CHAPTER 2. MEASURES

    defined by part (1.4) of Problem 1.33. Indeed, we pick s1and s2in{1, 1}N and remember thatford(s1, s2) = 2n, the firstn 1coordinates ofs1and s2coincide. Therefore,

    |f(s1) f(s2)| k=n

    2k = 2n+1 = 2d(s1, s2).

    Hence, the mapfis Lipschitz and, therefore, continuous.The continuity off(together with the fact that Sis the Borel -algebra for the topology induced

    by the metricd) implies thatf : ({1, 1}N, S) ([0, 1], B([0, 1]))is a measurable mapping. There-fore, the push-forward = f()is well defined on ([0, 1], B([0, 1])), and we call it the Lebesguemeasureon[0, 1].

    Proposition 2.23 (Intuitive properties of the Lebesgue measure) The Lebesgue measure on([0, 1], B([0, 1]))satisfies

    ([a, b)) =b

    a, ({

    a}

    ) = 0for0

    a < b

    1.(2.4)

    PROOF

    1. Consider a, bof the form b = k2n and b = k+12n , for n Nand k < 2n. For such a, bwe

    have f1([a, b)) = C1,...,n;c1,c2,...,cn , where c1c2 . . . cn is the base-2 expansion ofk(after therecoding 1 0,1 1). By the very definition ofand the form (2.2) of the action of thecoin-toss measureCon cylinders, we have

    [a, b)

    =C

    f1

    [a, b)

    =C(C1,...,n;c1,c2,...,cn) = 2

    n = k+12n k2n .

    Therefore, (2.4) holds fora, bof the formb = k2n andb = l2n , forn N,k a, and use the continuity with respect to increasing sequences to get, fora < b (0, 1),

    [a, b)

    =

    n[a, pn)

    = limn

    [a, pn)

    = limn

    (pn a) = (b a).

    The Lebesgue measure has another important property:

    Problem 2.24 Show that the Lebesgue measure is translation invariant. More precisely, for BB([0, 1])andx [0, 1), we have

    29

  • 7/21/2019 theory_of_probability_Zitcovic.pdf

    31/162

    CHAPTER 2. MEASURES

    1. B+1x = {b + x(mod1) : b B} is in B([0, 1])and2. (B+1x) =(B),

    where, for a [0, 2), we define a (mod1) =

    a, a 1,a

    1, a >1

    . Geometrically, the set x+1 B is

    obtained fromBby translating it to the right byxand then shifting the part that is sticking outby1to the left.) (Hint:Use Proposition 2.17 for the second part.)

    Finally, the notion of the Lebesgue measure is just as useful on the entire R, as on its compactsubset[0, 1]. For a generalB B(R), we can define the Lebesgue measure ofBby measuring itsintersections with all intervals of the form[n, n + 1), and adding them together, i.e.,

    (B) =

    n=

    B [n, n + 1) n.Note how we are overloading the notation and using the letter for both the Lebesgue measure

    on[0, 1]and the Lebesgue measure on R.It is a quite tedious, but does not require any new tools, to show that many of the propertiesofon[0, 1]transfer toon R:

    Problem 2.25 Letbe the Lebesgue measure on (R, B(R)). Show that1. ([a, b)) =b a,({a}) = 0fora < b,2. is-finite but not finite,

    3. (B+ x) =(B), for allB B(R)andx R, whereB+ x= {b + x : b B}.Remark 2.26 The existence of the Lebesgue measure allows to show quickly that the converse of

    the implication in the Borel-Cantelli Lemma does not hold without additional conditions, even ifis a probability measure. Indeed, let= be the Lebesgue measure on [0, 1].SetAn = (0, 1n ], forn N so that

    lim supn

    An =n

    kn

    Ak =n

    An = ,

    which implies that(limsupnAn) = 0. On the other handnN

    (An) =nN

    1n = .

    We will see later that the converse does hold if the family of sets {An}nNsatisfy the additionalcondition of independence.

    2.4 Signed measures

    In addition to (positive) measures, it is sometimes useful to know a few things about measure-likeset functions which take values in R (and not in[0, ]).

    30

  • 7/21/2019 theory_of_probability_Zitcovic.pdf

    32/162

    CHAPTER 2. MEASURES

    Definition 2.27 (Signed measures) Let (S, S) be a measurable space. A mapping : S (, ]is called asigned measure(orreal measure) if

    1. () = 0, and2. for any pairwise disjoint sequence{An}nN inS the series n (An) is summable and

    (nAn) =n (An).

    The notion of convergence here is applied to sequences that may take the value , so we needto be precise about how it is defined. Remember that a+ = max(a, 0)anda= max(a, 0).

    Definition 2.28 (Summability for sequences) A sequence{an}nN in (, ] is said to besummableifnN a

    n

  • 7/21/2019 theory_of_probability_Zitcovic.pdf

    33/162

    CHAPTER 2. MEASURES

    where the supremum is taken over all finite measurable partitions D1, . . . , Dn, n N ofS. The number|| (S) [0, ]is called thetotal variation (norm) of.

    The central result about signed measures is the following:

    Theorem 2.32 (Hahn-Jordan decomposition) Let(S, S)be a measure space, and letbe a signedmeasure on S. Then there exist two (positive) measures+ andsuch that

    1. is finite,

    2. (A) =+(A) (A),3.|| (A) =+(A) + (A),

    Measures+ and with the above properties are unique. Moreover, there exists a setD S

    suchthat+(A) =(A Dc)and(A) = (A D)for allA S.

    PROOF (*) Call a setB S negativeif(C)0, for allC S,C B . Let Pbe the collection ofall negative sets - it is nonempty because P. Set

    = inf{(B) : B P},and let{Bn}nNbe a sequence of negative sets with (Bn) . We defineD =nBnand notethatDis a negative set with(D) =(why?). In particular, > .

    Our first order of business is to show that Dc is apositiveset, i.e. that(E)

    0for allE

    Dc.

    Suppose, to the contrary, that (B)< 0, for someB S,BDc. The setBcannot be a negativeset - otherwiseD Bwould be a negative set with (D B) = (D) +(B) = + (B) < .Therefore, there exists a measurable subsetE1ofBwith(E1)> 0, i.e., the set

    E1= {E B : E S, (E)> 0}is non-empty. Pickk1 N such that

    1k1

    sup{(E) : E E1},and an almost-maximal setE1 E1with

    1

    k1 (E1)>

    1

    k1+1

    .

    We setB1 = B \ E1and observe that, since0> (E1)> , we have(B1) =(B) (E1)< 0,

    and so(B1)< 0. ReplacingBbyB1, the above discussion can be repeated and a constant k2andthe an almost maximalE2with 1k2 (E2)> 1k2+1can be constructed. Continuing in the samemanner, we obtain the sequence{En}nNof pairwise disjoint subsets ofB with 1kn (En) >1kn+1

    .

    32

  • 7/21/2019 theory_of_probability_Zitcovic.pdf

    34/162

    CHAPTER 2. MEASURES

    Given that(B)< 0, it cannot have subsets of measure . Therefore,(nEn)< andnN

    1kn+1

    (B0)> 0,

    a contradiction. Therefore,Dc is a positive set.Having splitSinto a disjoint union of a positive and a negative set, we define

    +(A) =(A Dc)and(A) = (A D),

    so that both+

    andare (positive measures) withfinite and = +

    .Finally, we need to show that || =+ + . TakeA Sand {B1, . . . , Bn} P[0,)(A). Thennk=1

    |(B)| =nk=1

    +(Bk) (Bk) nk=1

    +(Bk) +

    (Bk)

    =+(A) + (A).

    To show that the obtained upper bound is tight, we consider the partition {A D, A Dc}ofAfor which we have

    |(A D)| + |(A Dc)| =(A D) + +(A Dc) =+(A) + (A).

    2.5 Additional Problems

    Problem 2.33 (Local separation by constants) Let(S, S, )be a measure space and let the func-tionf, g L0(S, S, )satisfy{xS : f(x)< g(x)} >0. Prove or construct a counterexamplefor the following statement:

    There exist constantsa, b R such that{x S : f(x) a < b g(x)} >0.Problem 2.34 (A pseudometric on sets) Let(S, S, )be a finite measure space. For A, B Sde-fine

    d(A, B) =(A B),

    where denotes the symmetric difference: A B = (A \ B) (B \ A). Show thatdis a pseudo-metric4 on S, and forA Sdescribe the set of allB Swithd(A, B) = 0.4LetXbe a nonempty set. A function d : X X [0,)is called apseudo metricif

    1. d(x, y) + d(y, x) d(x, z), for allx,y,z X,

    2. d(x, y) = d(y, x), for allx, y X, and

    3. d(x, x) = 0, for allx X.

    Note how the only difference between a metric and a pseudometric is that for a metricd(x, y) = 0impliesx = y , whileno such requirement is imposed on a pseudometric.

    33

  • 7/21/2019 theory_of_probability_Zitcovic.pdf

    35/162

    CHAPTER 2. MEASURES

    Problem 2.35 (Complete measure spaces) A measure space(S, S, )is calledcompleteif all sub-sets of null sets are themselves in S. For a (possibly incomplete) measure space(S, S, )we definethecompletion(S, S, )in the following way:

    S= {A N : A Sand N Nfor someN Swith(N) = 0}.

    ForB Swith representationB = A Nwe set(B) =(A).1. Show that Sis a -algebra.2. Show that the definition(B) =(A)above does not depend on the choice of the decom-

    position B = A N, i.e., that(A) = (A)ifB = A Nis another decomposition ofBinto a set Ain Sand a subset Nof a null set in S.

    3. Show thatis a measure on(S, S)and that(S, S, )is a complete measure space withthe property that(A) =(A), forA S.

    Problem 2.36 (The Cantor set) TheCantor setis defined as the collection of all real numbers xin

    [0, 1]with the representation

    x=n=1

    cn3n, wherecn {0, 2}.

    Show that it is Borel-measurable and compute its Lebesgue measure.

    Problem 2.37 (The uniform measure on a circle) LetS1 be the unit circle, and letf : [0, 1)S1be the winding map

    f(x) =

    cos(2x), sin(2x)

    , x [0, 1).

    1. Show that the mapfis (B([0, 1)), S1

    )-measurable, where S1

    denotes the Borel-algebra onS1 (with the topology inherited from R2).

    2. For (0, 2), letRdenote the (counter-clockwise) rotation ofR2 with center(0, 0)andangle. Show thatR(A) = {R(x) : x A} is in S1 if and only ifA S1.

    3. Let 1 be the push-forward of the Lebesgue measure by the map f. Show that1 isrotation-invariant, i.e., that1(A) =1

    R(A)

    .

    (Note:The measure1 is called theuniform measure(or theuniform distribution onS1.)

    Problem 2.38 (Asymptotic densities) We say that the subsetAofN admitsasymptotic densityifthe limit

    d(A) = limn

    #(A {1, 2, . . . , n})n

    ,

    exists (remember that# denotes the number of elements of a set). LetDbe the collection of allsubsets ofN which admit asymptotic density.

    1. Is D an algebra? A-algebra?2. Is the mapA d(A)finitely-additive on D? A measure?

    34

  • 7/21/2019 theory_of_probability_Zitcovic.pdf

    36/162

    CHAPTER 2. MEASURES

    Problem 2.39 (A subset of the coin-toss space) An element in{1, 1}N (i.e., a sequence s withs = (s1, s2, . . . )wheresn {1, 1} for alln N) is said to be eventually periodicif there existsN0, K N such that sn = sn+Kfor all n N0. Let P {1, 1}Nbe the collection of all eventually-period sequences. Show thatPis measurable in the product-algebra Sand computeC(P).

    Problem 2.40 (Regular measures) The measure space(S,S

    , ), where(S, d)is a metric space andSis a-algebra on Swhich contains the Borel -algebraB(d)on Sis called regularif for eachA Sand each > 0there exist a closed set Cand an open set O such that C A Oand(O \ C)< .

    1. Suppose that (S, S, )is a regular measure space, and let (S, B(d), |B(d))be the measurespace obtained from (S, S, )by restricting the measure onto the-algebra of Borel sets.Show that S B(d), where S, B(d), (|B(d)) is the completion of(S, B(d), |B(d))(in thesense of Problem 2.35).

    2. Suppose that (S, d)is a metric space and that is a finite measure onB(d). Show that(S, B(d), )is a regular measure space.

    (Hint: Consider a collection A of subsets A ofSsuch that for each > 0there exists a closedsetCand an open set O withC A Oand (O\ C) < . Argue thatAis a-algebra.Then show that each closed set can be written as an intersection of open sets; use (but prove,first) the fact that the map

    x d(x, C) = inf{d(x, y) : y C},

    is continuous onSfor any nonemptyC S. )3. Show that (S, B(d), )is regular if is not necessarily finite, but has the property that (A) 0such that

    f(x) = forx A. On the other hand, show that f d = 0forfof the formf(x) = 1A(x) =

    , x A,0, x =A,

    whenever(A) = 0. (Note:Relate this to our convention that 0 = 0 = 0.)

    Finally, we are ready to define the integral for general measurable functions. Each f L

    0 can bewritten as a difference of two functions in L0+in many ways. There exists a decomposition whichis, in a sense, minimal. We define

    f+ = max(f, 0), f = max(f, 0),

    so thatf =f+ f(and bothf+ andfare measurable). The minimality we mentioned aboveis reflected in the fact that for each x S, at most one off+ andfis non-zero.

    Definition 3.9 (Integrable functions) A functionf L0 is said to beintegrableif

    f+ d < and f d < .

    The collection of all integrable functions inL0 is denoted byL1. The family of integrablefunctions is tailor-made for the following definition:

    38

  • 7/21/2019 theory_of_probability_Zitcovic.pdf

    40/162

    CHAPTER 3. LEBESGUE INTEGRATION

    Definition 3.10 (The Lebesgue integral) Forf L1, we define theLebesgue integral f doffby

    f d=

    f+ d

    f d.

    Remark 3.11

    1. We have seen so far two cases in which an integral for a function f L0 can be defined:when f 0or when f L1. It is possible to combine the two and define the Lebesgueintegral for all functions f L0 withf L1. The set of all such functions is denoted byL01 and we set

    f d =

    f+ d

    f d (, ], forf L01.

    Note that no problems of the form

    arise here, and also note that, likeL0+,

    L01 is only

    a convex cone, and not a vector space. While the notationL0 andL1 is quite standard, theone we use for L01 is not.

    2. ForA Sand f L01 we usually write A f dfor f1A d.Problem 3.12 Show that the Lebesgue integral remains a monotone operation in L01. More pre-cisely, show that iff L01 andg L0 are such thatg(x) f(x), for allx S, theng L01 and

    g d f d.3.2 First properties of the integral

    The wider the generality to which a definition applies, the harder it is to prove theorems about it.

    Linearity of the integral is a trivial matter for functions in LSimp,0+ , but you will see how much weneed to work to get it forL0+. In fact, it seems that the easiest route towards linearity is through

    two important results: an approximation theorem and a convergence theorem. Before that, weneed to pick some low-hanging fruit:

    Problem 3.13 Show that forf1, f2 L0+

    [0, ] and [0, ]we have1. iff1(x) f2(x)for allx Sthen

    f1 d

    f2 d.

    2.

    f d=

    f d.

    Theorem 3.14 (Monotone convergence theorem) Let{

    fn}nNbe a sequence in

    L0+[0, ] withthe property that

    f1(x) f2(x) . . . for allx S.Then

    limn

    fn d=

    f d,

    wheref(x) = limn fn(x) L0+

    [0, ], forx S.

    39

  • 7/21/2019 theory_of_probability_Zitcovic.pdf

    41/162

    CHAPTER 3. LEBESGUE INTEGRATION

    PROOF The (monotonicity) property (1) of Problem 3.13 above implies immediately that the se-quence

    fn dis non-decreasing and that

    fn d

    f d. Therefore,limn

    fn d

    f d. To

    show the opposite inequality, we deal with the case

    f d 0and g LSimp,0+withg(x)f(x), for allxSand g d f d (the case f d = is similar and left tothe reader). For0 < c 0, theincreasing convergencefn fimplies that nAn = S. By non-negativity offnand monotonicity,

    fn d

    fn1And c

    g1And,

    and sosupn

    fn d c supn

    g1And.

    Letg =ki=1 i1Bibe a simple-function representation ofg. Then

    g1And =

    ki=1

    i1BiAnd =ki=1

    i(Bi An).

    SinceAn S, we have An Bi Bi, i = 1, . . . , k, and the continuity of measure implies that(An Bi) (Bi). Therefore,

    g1And

    ki=1

    i(Bi) =

    g d.

    Consequently,

    limn

    fn d= sup

    n

    fn d c

    g d,for allc (0, 1),

    and the proof is completed when we let c 1.

    Remark 3.15

    1. The monotone convergence theorem is a testament to the incredible robustness of the Lebesgueintegral. This stability with respect to limiting operations is one of the reasons why it is ade-facto industry standard.

    2. The monotonicity condition in the monotone convergence theorem cannot be dropped.Take, for exampleS= [0, 1], S= B([0, 1]), and = (the Lebesgue measure), and define

    fn = n1(0,n1], forn N.

    Thenfn(0) = 0for alln Nand fn(x) = 0for n > 1x andx > 0. In either casefn(x) 0.On the other hand

    fn d= n

    (0, 1n ]

    = 1,

    40

  • 7/21/2019 theory_of_probability_Zitcovic.pdf

    42/162

    CHAPTER 3. LEBESGUE INTEGRATION

    so thatlimn

    fn d= 1> 0 =

    limn

    fn d.

    We will see later that the while the equality of the limit of the integrals and the integral of thelimit will not hold in general, they will always be ordered in a specific way, if the functions

    {fn}nNare non-negative (that will be the content of Fatous lemma below).

    Proposition 3.16 (Approximation by simple functions) For eachf L0+

    [0, ] there exists asequence {gn}nN LSimp,0+ such that

    1. gn(x) gn+1(x), for alln N and allx S,2. gn(x) f(x)for allx S,3. f(x) = limn gn(x), for allx S, and

    4. the convergencegn fis uniform on each set of the form {f M},M >0, and, in particular,on the wholeSiffis bounded.

    PROOF Forn N, letAnk ,k = 1, . . . , n2nbe a collection of subsets ofSgiven by

    Ank = {k12n f < k2n } =f1

    [k12n , k2n )

    , k= 1, . . . , n2n.

    Note that the sets Ank , k = 1, . . . , n2n are disjoint and that the measurability offimplies that

    Ank Sfor k = 1, . . . , n2n. Define the functiongn LSimp,0+ by

    gn =n2

    nk=1

    k12n 1Ank + n1{fn}.

    The statements 1., 2., and 4. follow immediately from the following three simple observations:

    gn(x) f(x)for allx S, gn(x) =niff(x) = , and gn(x)> f(x) 2n whenf(x)< n.

    Finally, we leave it to the reader to check the simple fact that {gn}nNis non-decreasing.

    Problem 3.17 Show, by means of an example, that the sequence {gn}nNwould not necessarily bemonotone if we defined it in the following way:

    gn =n2k=1

    k1n 1{f[k1n ,

    kn )}

    + n1{fn}.

    41

  • 7/21/2019 theory_of_probability_Zitcovic.pdf

    43/162

    CHAPTER 3. LEBESGUE INTEGRATION

    Proposition 3.18 (Linearity of the integral for non-negative functions) For 1, 2 0 andf1, f2 L0+

    [0, ] we have

    (1f1+ 2f2) d= 1 f1 d + 2 f2 d.

    PROOF Thanks to Problem 3.13 it is enough to prove the statement for 1 =2 = 1. Let {g1n}nNand {g2n}nNbe sequences in LSimp,0+ which approximate f1 and f2 in the sense of Proposition 3.16.The sequence {gn}nNgiven bygn = g1n+ g2n,n N, has the following properties:

    gn LSimp,0+ forn N, gn(x)is a nondecreasing sequence for eachx S,

    gn(x)

    f1(x) + f2(x), for allx

    S.

    Therefore, we can apply the linearity of integration for the simple functions and the monotoneconvergence theorem (Theorem 3.14) to conclude that

    (f1+ f2) d= limn

    (g1n+ g

    2n) d= limn

    g1n d +

    g2n d

    =

    f1 d +

    f2 d.

    Corollary 3.19 (Countable additivity of the integral) Let {fn}nNbe a sequence in L0+

    [0, ].Then

    nN f

    n d= nN

    fn d.

    PROOF Apply the monotone convergence theorem to the partial sumsgn=f1+ + fn, and uselinearity of integration.

    Once we have established a battery of properties for non-negative functions, an extension to L1 isnot hard. We leave it to the reader to prove all the statements in the following problem:

    Problem 3.20 The family L1 of integrable functions has the following properties:

    1. f L1 iff|f|d < ,2.L1 is a vector space,3.f d |f| d, forf L1.

    4.|f+ g| d |f|d + |g| d, for allf, g L1.

    42

  • 7/21/2019 theory_of_probability_Zitcovic.pdf

    44/162

    CHAPTER 3. LEBESGUE INTEGRATION

    We conclude the present section with two results, which, together with the monotone convergencetheorem, play the central role in the Lebesgue integration theory.

    Theorem 3.21 (Fatous lemma) Let {fn}nNbe a sequence in L0+[0, ]. Then liminf

    nfn d liminf

    n

    fn d.

    PROOF Setgn(x) = infkn fk(x), so thatgn L0+

    [0, ] andgn(x)is a non-decreasing sequencefor eachx S. The monotone convergence theorem and the fact that lim inffn(x) = supn gn(x) =limn gn(x), for allx S, imply that

    gn d

    lim infn

    d.

    On the other hand,gn(x) fk(x)for allk n, and so gn d inf

    kn

    fkd.

    Therefore,

    limn

    gn d lim

    ninfkn

    fkd= lim inf

    n

    fkd.

    Remark 3.22

    1. The inequality in the Fatous lemma does not have to be equality, even if the limitlimn fn(x)

    exists for allx S. You can use the sequence {fn}nNof Remark 3.15 to see that.2. Like the monotone convergence theorem, Fatous lemma requires that all function {fn}nN

    be non-negative. This requirement is necessary - to see that, simply consider the sequence{fn}nN, where {fn}nNis the sequence of Remark 3.15 above.

    3. The strength of Fatous lemma comes from the fact that, apart from non-negativity, it re-quires no special properties for the sequence {fn}nN. Its conclusion is not as strong as thatof the monotone convergence theorem, but it proves to be very useful in various settings be-cause it gives an upper bound (namely lim infn

    fn d) on the integral of the non-negative

    functionlim inffn.

    Theorem 3.23 (Dominated convergence theorem) Let{fn}nN be a sequence inL0 with theproperty that there exists g L1 such that|fn(x)| g(x), for all x X and all n N. Iff(x) = limn fn(x)for allx S, thenf L1 and

    f d= limn

    fn d.

    43

  • 7/21/2019 theory_of_probability_Zitcovic.pdf

    45/162

    CHAPTER 3. LEBESGUE INTEGRATION

    PROOF The condition|fn(x)| g(x), for allx Xand all n Nimplies that g(x) 0, for allx S. Sincef+n g,fn gandg L1, we immediately havefn L1, for alln N. The limitingfunctionfinherits the same propertiesf+ gandf gfrom {fn}nNso f L1, too.

    Clearlyg(x) + fn(x) 0for alln N and allx S, so we can apply Fatous lemma to get

    g d + lim infn fn d= lim infn (g+ fn) d lim infn (g+ fn) d=

    (g+ f) d=

    g d +

    f d.

    In the same way (since g(x) fn(x) 0, for allx S, as well), we have g d limsup

    n

    fn d= lim inf

    n

    (g fn) d

    liminf

    n(g fn) d

    =

    (g f) d=

    g d

    f d.

    Therefore

    limsupn

    fn d f d liminfn

    fn d,and, consequently,

    f d = limn

    fn d.

    Remark 3.24 The dominated convergence theorem combines the lack of monotonicity require-ments of Fatous lemma and the strong conclusion of the monotone convergence theorem. Theprice to be paid is the uniform boundedness requirement. There is a way to relax this requirementa little bit (using the concept ofuniform integrability), but not too much. Still, it is an unexpectedlyuseful theorem.

    3.3 Null sets

    An important property - inherited directly from the underlying measure - is that it is blind to setsof measure zero. To make this statement precise, we need to introduce some language:

    Definition 3.25 (Null sets) Let(S, S, )be a measure space.1. N Sis said to be anull setif(N) = 0.2. A functionf :S Ris called anull functionif there exists a null set Nsuch thatf(x) = 0

    forx Nc.

    3. Two functionsf, gare said to beequal almost everywhere- denoted byf=g , a.e. - iff gis a null function, i.e., if there exists a null setNsuch thatf(x) =g(x)for allx Nc.

    Remark 3.26

    1. In addition to almost-everywhere equality, one can talk about the almost-everywhere ver-sion of any relation between functions which can be defined on points. For example, wewritef g, a.e. iff(x) g(x)for allx S, except, maybe, forxin some null set N.

    44

  • 7/21/2019 theory_of_probability_Zitcovic.pdf

    46/162

    CHAPTER 3. LEBESGUE INTEGRATION

    2. One can also define the a.e. equality of sets: we say that A= B , a.e., for A, B Sif1A=1B,a.e. It is not hard to show (do it!) that A = Ba.e., if and only if(A B) = 0(Rememberthat denotes the symmetric difference:A B= (A \ B) (B \ A)).

    3. When a property (equality of functions, e.g.) holds almost everywhere, the set where it failsto hold is not necessarily null. Indeed, there is no guarantee that it is measurable at all. Whatis true is that it iscontainedin a measurable (and null) set. Any such (measurable) null set isoften referred to as theexceptional set.

    Problem 3.27 Prove the following statements:

    1. The almost-everywhere equality is an equivalence relation between functions.

    2. The family{A S : (A) = 0or(Ac) = 0} is a -algebra (the so-called -trivial -algebra).

    The blindness property of the Lebesgue integral we referred to above can now be stated for-mally:

    Proposition 3.28 (The blindness property of the Lebesgue integral) Suppose that f = g,a.e,. for somef, g L0+. Then

    f d =

    g d.

    PROOF Let Nbe an exceptional set for f = g, a.e., i.e., f = g on Nc and (N) = 0. Thenf1Nc =g1Nc , and so

    f1Ncd =

    g1Ncd. On the other hand f1N 1Nand

    1Nd = 0,

    so, by monotonicity,f1Nd = 0. Similarlyg1Nd = 0. It remains to use the additivity ofintegration to conclude that f d=

    f1Ncd +

    f1Nd =

    g1Ncd +

    g1Nd =

    g d.

    A statement which can be seen as a converse of Proposition 3.28 also holds:

    Problem 3.29 Let f L0+ be such that

    f d = 0. Show thatf = 0, a.e. (Hint: What is thenegation of the statement f= 0. a.e. forf L0+?)The monotone convergence theorem and the dominated convergence theorem both require thesequence {fn}nNfunctions to converge for eachxS. A slightly weaker notion of convergenceis required, though:

    Definition 3.30 (Almost-everywhere convergence) A sequence of functions{fn}nN is said toconverge almost everywhereto the functionf, if there exists a null setNsuch that

    fn(x) f(x)for allx Nc.

    45

  • 7/21/2019 theory_of_probability_Zitcovic.pdf

    47/162

    CHAPTER 3. LEBESGUE INTEGRATION

    Remark 3.31 If we want to emphasize that fn(x) f(x) for all x S, we say that{fn}nNconverges tofeverywhere.

    Proposition 3.32 (Monotone (almost-everywhere) convergence theorem) Let {fn}nNbe a se-quence in L0+[0, ] with the property that

    fn fn+1a.e., for alln N.

    Then

    limn

    fn d=

    f d,

    iff L0+and fn f, a.e.

    PROOF There are + 1a.e.-statements we need to deal with: one for each nN infnfn+1,a.e., and an extra one when we assume that fnf, a.e. Each of them comes with an exceptionalset; more precisely, let{An}nNbe such thatfn(x) fn+1(x)for x Acnand letB be such thatfn(x) f(x)forx Bc. DefineA SbyA= (nAn) Band note thatAis a null set. Moreover,consider the functions f,{fn}nNdefined by f = f1Ac , fn = fn1Ac . Thanks to the definition ofthe setA, fn(x) fn+1(x), for all n Nand x S; hence fn f, everywhere. Therefore, themonotone convergence theorem (Theorem 3.14) can be used to conclude that

    fn d

    f d.

    Finally, Proposition 3.28 implies that

    fn d=

    fn dforn N and

    f d =

    f d.

    Problem 3.33 State and prove a version of the dominated convergence theorem where the almost-everywhere convergence is used. Is it necessary for all {fn}nNto be dominated bygfor allx S,or only almost everywhere?

    Remark 3.34 There is a subtlety that needs to be pointed out. If a sequence{fn}nNof measur-able functions converges to the function feverywhere, thenfis necessarily a measurable function(see Proposition 1.43). However, iffn fonly almost everywhere, there is no guarantee thatfis measurable. There is, however, always a measurable function which is equal to f almosteverywhere; you can takelim infn fn, for example.

    3.4 Additional Problems

    Problem 3.35 (The monotone-class theorem) Prove the following result, known as themonotone-class theorem(remember thatan ameans thatanis a non-decreasing sequence andan a)

    Let

    Hbe a class of bounded functions from Sinto R satisfying the following conditions

    1.H is a vector space,2. the constant function1is in H, and3. if{fn}nN is a sequence of non-negative functions in Hsuch thatfn(x)f(x), for all

    x Sand fis bounded, thenf H.Then, ifH contains the indicator 1A of every set A in some -systemP, thenH necessarilycontains every bounded(P)-measurable function on S.

    46

  • 7/21/2019 theory_of_probability_Zitcovic.pdf

    48/162

    CHAPTER 3. LEBESGUE INTEGRATION

    (Hint:Use Theorems 3.16 and 2.16)

    Problem 3.36 (A form of continuity for Lebesgue integration) Let(S, S, )be a measure space,and suppose that f L1. Show that for each > 0there exists > 0such that ifA Sand(A)< , then

    Af d

    < .

    Problem 3.37 (Sums as integrals) Consider the measurable space (N, 2N, ), where is the count-ing measure.

    1. For a functionf : N [0, ], show that f d =

    n=1

    f(n).

    2. Use the monotone convergence theorem to show the following special case of Fubinis the-orem

    k=1

    n=1

    akn =

    n=1

    k=1

    akn,

    whenever {akn : k, n N} is a double sequence in[0, ].3. Show thatf :N R is in L1 if and only if the series

    n=1

    f(n),

    converges absolutely.

    Problem 3.38 (A criterion for integrability) Let(S, S, )be a finite measure space. For f L0+,show thatf L

    1

    if and only if nN

    ({f n})< .

    Problem 3.39 (A limit of integrals) Let(S, S, )be a measure space, and suppose f L1+is suchthat

    f d = c >0. Show that the limit

    limn

    n log

    1 + (f /n)

    d

    exists in[0, ]for each >0and compute its value.(Hint:Prove and use the inequality log(1 + x) x, valid forx 0and 1.)

    Problem 3.40 (Integrals converge but the functions dont . . . ) Construct an sequence {fn}nNofcontinuous functions fn : [0, 1] [0, 1]such that

    fn d 0, but the sequence{fn(x)}nN is

    divergent for eachx [0, 1].

    Problem 3.41 (. . . or they do, but are not dominated) Construct an sequence{fn}nNof continu-ous functionsfn : [0, 1] [0, )such that

    fn d 0, andfn(x) 0for allx, butf L1, where

    f(x) = supn fn(x).

    47

  • 7/21/2019 theory_of_probability_Zitcovic.pdf

    49/162

    CHAPTER 3. LEBESGUE INTEGRATION

    Problem 3.42 (Functions measurable in the generated -algebra) LetS=be a set and let f :S Rbe a function. Prove that a functiong : S Ris measurable with respect to the pair((f), B(R))if and only if there exists a Borel function h : R R such thatg = h f.

    Problem 3.43 (A change-of-variables formula) Let(S, S, )and (T, T, )be measurable spaces,and letF : S

    Tbe a measurable function with the property that = F

    (i.e., is the push-

    forward ofthroughF). Show that for everyf L0+(T, T)orf L1(T, T), we have f d=

    (f F) d.

    Problem 3.44 (The Riemann Integral) A finite collection ={t0, . . . , tn}, wherea = t0 < t1 < < tn =bandnN, is called apartitionof the interval[a, b]. The set of all partitions of[a, b]isdenoted byP([a, b]).

    For a bounded functionf : [a, b] R and = {t0, . . . , tn} P([a, b]), we define itsupper andlower Darboux sumsU(f, )andL(f, )by

    U(f, ) =

    nk=1

    supt(tk1,tk]

    f(t) (tk tk1)and

    L(f, ) =

    nk=1

    inft(tk1,tk]

    f(t)

    (tk tk1).

    A functionf : [a, b] R is said to beRiemann integrableif it is bounded and

    supP([a,b])

    L(f, ) = inf P([a,b])

    U(f, ).

    In that case the common value of the supremum and the infimum above is called the Riemannintegralof the functionf- denoted by(R) ba f(x) dx.

    1. Suppose that a bounded Borel-measurable functionf : [a, b] Ris Riemann-integrable.Show that

    [a,b]f d= (R)

    ba

    f(x) dx.

    2. Find an example of a bounded an Borel-measurable function f : [a, b] Rwhich is notRiemann-integrable.

    3. Show that every continuous function is Riemann integrable.

    4. It can be shown that for a bounded Borel-measurable function f : [a, b] Rthe followingcriterion holds (and you can use it without proof):fis Riemann integrable if and only if there exists a Borel set D[a, b]with(D) = 0such thatfis continuous atx, for eachx [a, b] \ D.Show that

    all monotone functions are Riemann-integrable, f gis Riemann integrable iff : [c, d]Ris Riemann integrable andg : [a, b][c, d]

    is continuous,

    48

  • 7/21/2019 theory_of_probability_Zitcovic.pdf

    50/162

    CHAPTER 3. LEBESGUE INTEGRATION

    products of Riemann-integrable functions are Riemann-integrable.5. Let ([a, b], B([a, b]), )be the completion of([a, b], B([a, b]), ). Show that each Riemann-

    integrable function on[a, b]is B([a, b])-measurable.(Hint:Pick a sequence {n}nNin P([a, b])so thatn n+1and U(f, n) L(f, n)0. Using those partitions and the functionf, define two sequences of Borel-measurablefunctions{fn}nNand{fn}nNso thatfn f,fn f,f f f, and

    (f f) d = 0.

    Conclude thatfagrees with a Borel measurable function on a complement of a subset of theset {f=f} which has Lebesgue measure0. )

    49

  • 7/21/2019 theory_of_probability_Zitcovic.pdf

    51/162

    Chapter4

    Lebesgue Spaces and Inequalities

    4.1 Lebesgue spaces

    We have seen how the family of all functions f L1 forms a vector space and how the mapf ||f||L1 , from L1 to[0, )defined by ||f||L1 =

    |f|dhas the following properties1. f= 0implies ||f||L1 = 0, forf L1,2.||f+ g||L1 ||f||L1+ ||g||L1 , forf, g L1,3.||f||L1 = || ||f||L1 , for R andf L1.

    Any map from a vector space into [0, )with the properties 1., 2., and 3. above is called apseudonorm. A pair (V, | | | |)where V is a vector space and|| ||is a pseudo norm on V is called apseudo-normed space.

    If a pseudo norm happens to satisfy the (stronger) axiom

    1. f= 0if and only if||f||L1 = 0, forf L1,instead of 1., it is called anorm, and the pair(V, | | | |)is called anormed space.

    The pseudo-norm | | | |L1is, in general, not a norm. Indeed, by Problem 3.29, we have ||f||L1 =0ifff = 0, a.e., and unlessis the only null-set, there are functions different from the constantfunction0with this property.

    Remark 4.1 There is a relatively simple procedure one can use to turn a pseudo-normed space(V, | | | |) into a normed one. Declare two elements x, y in V equivalent (denoted by x y) if||y x|| = 0, and let Vbe the quotient spaceV /(the set of all equivalence classes). It is easyto show that||x|| =||y||wheneverx y, so the pseudo-norm|| ||can be seen as defined on V.Moreover, it follows directly from the properties of the pseudo norm that (V , | | | |)is, in fact anormed space. Idea is, of course, bundle together the elements ofV which differ by such a smallamount that | | | | cannot detect it.

    This construction can be applied to the case of the pseudo-norm | | | |L1on L1, and the resultingnormed space is denoted by L1. The normed space L1 has properties similar to those ofL1, butits elements are not functions anymore - they are equivalence classes of measurable functions.Such a point of view is very useful in analysis, but it sometimes leads to confusion in probability(especially when one works with stochastic processes with infinite time-index sets). Therefore, wewill stick to L1 and deal with the fact that it is only a pseudo-normed space.

    50

  • 7/21/2019 theory_of_probability_Zitcovic.pdf

    52/162

    CHAPTER 4. LEBESGUE SPACES AND INEQUALITIES

    A pseudo-norm | | | | on a vector space can be used to define a pseudo metric (pseudo-distancefunction) onVby the following simple prescription:

    d(x, y) = ||y x||, x, y V.

    Just like a pseudo norm, a pseudo metric has most of the properties of a metric

    1. d(x, y) [0, ), forx, y V,2. d(x, y) + d(y, z) d(x, z), forx, y, z V,3. d(x, y) =d(y, x),x, y V,4. x= yimpliesd(x, y) = 0, forx, y V.

    The missing axiom is the stronger version of 4. given by

    4. x= yif and only ifd(x, y) = 0, forx, y V.

    Luckily, a pseudo metric is sufficient for the notion of convergence, where we say that a sequence{xn}nNin Vconverges towardsx V ifd(xn, x) 0, as n . If we apply it to our originalexample(L1, | | | |L1), we have the following definition:

    Definition 4.2 (Convergence in L1) For a sequence {fn}nNin L1, we say that {fn}nNconvergestof in L1 if

    ||fn f||L1 0.

    To get some intuition about convergence in

    L1, here is a problem:

    Problem 4.3 Show that the conclusion of the dominated convergence theorem (Theorem 3.23) canbe replaced by fn fin L1. Does the original conclusion follow from the new one?The only problem that arises when one defines convergence using a pseudo metric (as opposedto a bona-fide metric) is that limits are not unique. This is, however, merely an inconvenience andone gets used to it quite readily:

    Problem 4.4 Suppose that{fn}nN converges to f inL1. Show that{fn}nN also converges tog L1 if and only iff=g, a.e.In addition to the spaceL1, one can introduce many other vector spaces of similar flavor. Forp

    [1,

    ), let

    Lp denote the family of all functions f

    L0 such that

    |f|p

    L1.

    Problem 4.5 Show that there exists a constantC > 0(depending on p, but independent ofa, b)such that(a + b)p C(ap+ bp),p (0, )and for alla, b 0. Deduce that Lp is a vector space forallp (0, ).We will see soon that the map | | | |Lp , defined by

    ||f||Lp =

    |f|p d1/p

    , f Lp,

    51

  • 7/21/2019 theory_of_probability_Zitcovic.pdf

    53/162

    CHAPTER 4. LEBESGUE SPACES AND INEQUALITIES

    is a pseudo norm on Lp. The hard part of the proof - showing that ||f+ g||Lp ||f||Lp + ||g||Lpwillbe a direct consequence of an important inequality of Minkowski which will be proved below.

    Finally, there is a nice way to extend the definition ofLp top = .

    Definition 4.6 (Essential supremum) A numbera Ris called an essential supremum of thefunctionf L0 - and is denoted bya= esssup f- if

    1. ({f > a}) = 02. ({f > b})> 0for anyb < a.

    A functionf L0 with esssup f 0}) =({1, 1/2, 1/3, . . . }) = 0, butsupx[0,1] f(x) = .

    Let L denote the family of all essentially bounded functions inL0. Define ||f||L = esssup |f|,forf L.Problem 4.8 Show that

    Lis a vector space, and that

    | | | |Lis a pseudo-norm on

    L.

    The convergence in Lp forp >1is defined similarly to the L1-convergence:

    Definition 4.9 (Convergence in Lp) Letp [1, ]. We say that a sequence{fn}nN inLp con-verges in Lp tof Lp if

    ||fn f||Lp 0, asn .

    Problem 4.10 Show that{fn}nN L converges to f L inL if and only if there existfunctions

    {fn

    }nN, fin

    L0 such that

    1. fn = fn, a.e., and f=f, a.e, and

    2. fn funiformly (we say thatgn guniformly ifsupx |gn(x) g(x)| 0, asn ).

    52

  • 7/21/2019 theory_of_probability_Zitcovic.pdf

    54/162

    CHAPTER 4. LEBESGUE SPACES AND INEQUALITIES

    4.2 Inequalities

    Definition 4.11 (Conjugate exponents) We say that p, q [1, ] are conjugate exponents if1p +

    1q = 1.

    Lemma 4.12 (Youngs inequality) For allx, y 0and conjugate exponentsp, q [1, )we havexp

    p + yq

    q xy.(4.1)

    The equality holds if and only ifxp =yq.

    PROOF Ifx = 0or y = 0, the inequality trivially holds so we assume that x > 0and y > 0. Thefunctionlogis strictly concave on(0, )and 1p + 1q = 1, so

    log(1p+ 1q) 1plog() + 1qlog(),

    for all , > 0, with equality if and only if = . If we substitute = xp and = yq , andexponentiate both sides, we get

    xp

    p + yq

    q exp(1plog(xp) + 1qlog(yq)) =xy,with equality if and only ifxp =yq.

    Remark 4.13 If you do not want to be fancy, you can prove Youngs inequality by locating themaximum of the functionx xy 1pxp using nothing more than elementary calculus.

    Proposition 4.14 (Hlders inequality) Letp, q[0, ]be conjugate exponents. Forf Lp andg Lq, we have

    |f g| d ||f||Lp ||g||Lq .(4.2)

    The equality holds if and only if there exist constants , 0with + >0such that |f|p =|g|q,a.e.

    PROOF We assume that1 < p, g 0and ||q||Lq >0- otherwise, the inequality is trivially satisfied.We define f= |f| /||f||Lpand g= |g| /||g||Lq , so that ||f||Lp = ||g||Lq = 1.

    Plugging ffor xandgforyin Youngs inequality (Lemma 4.12 above) and integrating, we get

    1p

    fp d + 1q

    gq d

    fg d,(4.3)

    53

  • 7/21/2019 theory_of_probability_Zitcovic.pdf

    55/162

    CHAPTER 4. LEBESGUE SPACES AND INEQUALITIES

    and consequently, fg d 1,(4.4)

    because fp d =||f||p

    Lp = 1, and gq d=||g||Lp = 1and

    1p +

    1q = 1. Hlders inequality (4.2)

    now follows by multiplying both sides of (4.4) by ||f||Lp||g||Lq .If the equality in (4.2) holds, then it also holds a.e. in the Youngs inequality (4.3). Therefore,

    the equality will hold if and only if||g||qLq |f|p = ||f||pLp |g|q, a.e. The reader will check that if a pairof constants, as in the s