Probability Theory Presentation 12

8/8/2019 Probability Theory Presentation 12

1/72

BST 401 Probability Theory

Xing Qiu Ha Youn Lee

Department of Biostatistics and Computational BiologyUniversity of Rochester

October, 14, 2010

Qiu, Lee BST 401
http://goforward/http://find/http://goback/


2/72

Outline

1 Radon-Nikodym Theorem

2 Introduction of Conditional Expectation

Qiu, Lee BST 401


3/72

Motivation (I)

A little refresh of your undergraduate probability theory:There are two types of probability distributions: continuous

ones and discrete ones.

Continuous probabilities and discrete ones have different

definition of density functions (p.d.f.).You can have a mixture of the two. Example: survey

question, how much tax did you pay for year 2008? A small

but non-trivial proportion of U.S. residents didnt have to

pay. So you can describe it as a discrete random variable 0

(did not pay) and 1 (paid). But thats a bad survey design.Better way: for those who did pay, it is better to record how

muchdid they pay, which can be modeled as a continuous

random variable.

Qiu, Lee BST 401


4/72

Motivation (I)










random variable.

Qiu, Lee BST 401


5/72

Motivation (I)










random variable.

Qiu, Lee BST 401


6/72

Motivation (II)

The more challenging problem: are these two the onlytypes of probability measures? I.e., for every probability

measure (or L-S measure), can we always decompose it

into a continuous part and a discrete part?

The Radon-Nikodym theorem and the Lebesgue

decomposition theorem are all about the structure of L-Smeasures (probabilities).

Together they claim that every L-S measure can be

decomposed into (w.r.t. the L-measure) an absolutely

continuous part and a singular part.Where the singular part is much like, but not exactly

restricted to the discrete measures.

And the absolutely continuous part can be expressed by

integrating a density function w.r.t. the Lebesgue measure.

Qiu, Lee BST 401


7/72

Motivation (II)








continuous part and a singular part.

Where the singular part is much like, but not exactly




Qiu, Lee BST 401


8/72

Motivation (II)













Qiu, Lee BST 401


9/72

Motivation (II)

The more challenging problem: are these two the only

types of probability measures? I.e., for every probability












Qiu, Lee BST 401


10/72

Motivation (II)

The more challenging problem: are these two the only

types of probability measures? I.e., for every probability












Qiu, Lee BST 401


11/72

Motivation (III)

In this sense, R-N theorem a) defines an abstractderivative between two measures and , denoted as d

d;

b) provides a criterion based on which we can check if dd

exists or not.

Just like Lebesgue-Stieltjes integral is an extension of the

usual Riemann integral, Radon-Nikodym derivative is an

extension of the usual derivative.R

f(x)dx

Riemann integral

=

R

f(x)d(x)

L-S integral

,dF(x)

dx Calculus derivative

=d

d= f(x)

R-N derivative

,

where is the Lebesgue measure, F (f) is thedistribution (density) function of .

Qiu, Lee BST 401


12/72

Motivation (III)

In this sense, R-N theorem a) defines an abstractderivative between two measures and , denoted as d

d;

b) provides a criterion based on which we can check if dd

exists or not.

Just like Lebesgue-Stieltjes integral is an extension of the

usual Riemann integral, Radon-Nikodym derivative is an

extension of the usual derivative.R

f(x)dx

Riemann integral

=

R

f(x)d(x)

L-S integral

,dF(x)

dx Calculus derivative

=d

d= f(x)

R-N derivative

,

where is the Lebesgue measure, F (f) is thedistribution (density) function of .

Qiu, Lee BST 401


13/72

Hahn decomposition theorem

Every signed measure can be decomposed into a positive and

a negative part.

Recall the function +/- branch.

Let be a signed measure. (e.g., almost a measure except

that (A) can be negative. Analogy: electric charge).

Partition of the whole space into a positive set + and anegative set . = + , + = .

For each A F, (A +) 0 and (A ) 0.

This decomposition is unique up to a null set. If(0) = 0, then + 0 and \0 is an equivalent Hahndecomposition.

Qiu, Lee BST 401


14/72



a negative part.



that (A) can be negative. Analogy: electric charge).

Partition of the whole space into a positive set + and anegative set . = + , + = .



Qiu, Lee BST 401


15/72



a negative part.



that (A) can be negative. Analogy: electric charge).Partition of the whole space into a positive set + and anegative set . = + , + = .



Qiu, Lee BST 401


16/72



a negative part.






Qiu, Lee BST 401


17/72



a negative part.






Qiu, Lee BST 401


18/72

Singularity

1, 2 are measures on the same F.

They are said to be mutually singular, written as 1 2, ifthey concentrate on disjoint sets. i.e., B F, such that

1(B) = 0 and 2(Bc) = 0.Examples. Sets with two parts; discrete measures w.r.t.

L-measure.

Not all measures that are singular w.r.t. L-meas are

discrete measures. a) R2

, uniform measure on a circle/line;b) R1, uniform measure on the Cantor set. Wikipedia!

Qiu, Lee BST 401


19/72

Singularity




L-measure.




Qiu, Lee BST 401


20/72

Singularity




L-measure.




Qiu, Lee BST 401


21/72

Singularity




L-measure.


discrete measures. a)R

2


Qiu, Lee BST 401


22/72

Jordan-Hahn decomposition

A natural consequence of the Hahn decomposition theorem.

= + , where +, are two measures (meaning:with positive values) that are mutually singular.

+(A) = (A +), for all A F.

(A) = (A ), for all A F.

At least one of +, must be finite, otherwise() = is not well defined.

Qiu, Lee BST 401


23/72




+(A) = (A +), for all A F.



Qiu, Lee BST 401


24/72




+(A) = (A +), for all A F.



Qiu, Lee BST 401


25/72




+(A) = (A +), for all A F.



Qiu, Lee BST 401


26/72

Total variation

Sometimes, +, are called the upper/lower variations of

.|| = + + is called the total variation of . Sort of theabsolute value of a measure.

Qiu, Lee BST 401


27/72

Total variation

Sometimes, +, are called the upper/lower variations of

.|| = + + is called the total variation of . Sort of theabsolute value of a measure.

Qiu, Lee BST 401


28/72

Absolute continuity of measures

Exercise (5.8), Page 465: (A) = A gd defines a signedmeasure on F.

One interesting observation: (A) = 0 = (A) = 0.

Definition: is said to be absolute continuous w.r.t. to , iff

(A) = 0 = (A) = 0 for all A F.Notation: .

Calculus analogy: if F =

fdx, F must be continuous

(w.r.t. the Lebesgue measure).

The name absolute continuity implies that it is a strongertype of continuity. Just as in calculus, only certain

continuous function can be anti-derivatives. Wikipedia has

an excellent entry on this topic.

Qiu, Lee BST 401


29/72





(A) = 0 = (A) = 0 for all A F.Notation: .







Qiu, Lee BST 401


30/72





(A) = 0 = (A) = 0 for all A F

.Notation: .







Qiu, Lee BST 401


31/72





(A) = 0 = (A) = 0 for all A F

.Notation: .







Qiu, Lee BST 401


32/72





(A) = 0 = (A) = 0 for all A F

.Notation: .







Qiu, Lee BST 401


33/72





(A) = 0 = (A) = 0 for all A F

.Notation: .







Qiu, Lee BST 401


34/72

Radon-Nikodym Theorem

If , then there must exists a measurable function g

such that (A) = A gd.And this g is almost everywhere unique: if h is another

such function, then ga.e.= h.

This g is called the density function or the Radon-Nikodym

derivative of w.r.t. , denoted asd

d . and apparently are defined on the same -algebra.

: Lebesgue measure = g : the usual density function.

If both and are absolutely continuous w.r.t to theLebesgue measure (means both usual densities exist),

then R-N derivative is the ratio of the two densities

d

d=

g

f, g =

d

dx, f =

d

dx.

Qiu, Lee BST 401


35/72











d

d=

g

f, g =

d

dx, f =

d

dx.

Qiu, Lee BST 401

R d Nik d Th


36/72











d

d=

g

f, g =

d

dx, f =

d

dx.

Qiu, Lee BST 401

R d Nik d Th


37/72











d

d=

g

f, g =

d

dx, f =

d

dx.

Qiu, Lee BST 401

R d Nik d Th


38/72











d

d=

g

f, g =

d

dx, f =

d

dx.

Qiu, Lee BST 401

R d Nik d Th


39/72











d

d=

g

f, g =

d

dx, f =

d

dx.

Qiu, Lee BST 401

Lebesgue Decomposition Theorem


40/72


Radon-Nikodym theorem takes care of the continuous

measures. Now let us deal with the general case.

: a reference measure. a signed measure defined onthe same -field F.

= 1 + 2, where 1 and 2 are signed measures suchthat

1 (the absolutely continuous part), 2 (thesingular part).

This decomposition is unique.

Qiu, Lee BST 401



41/72








Qiu, Lee BST 401



42/72








Qiu, Lee BST 401



43/72








Qiu, Lee BST 401

Density function revisited


44/72


Continuous random variables. Def.

Reference measure: L-measure.Discrete r.v.s. Def.

Reference measure: counting measure on the state space.

Qiu, Lee BST 401



45/72





Qiu, Lee BST 401



46/72





Qiu, Lee BST 401



47/72





Qiu, Lee BST 401

Its all about averaging


48/72

It s all about averaging

A discrete example. = {s1, s2, . . . , sN}. X : R is ar.v.

The mathematical expectation of X is the theoretical

average of X over the whole space .

We can also do partial average. Say for some reason we

want to restrict the possible outcomes of X to only

A = {s1, s2}. Whats the theoretical average of Xconditional on A?

Answer: A XdPP(A)

, denoted as E(X|A).

Qiu, Lee BST 401



49/72








Answer: A XdPP(A)


Qiu, Lee BST 401



50/72








Answer: A XdPP(A)


Qiu, Lee BST 401



51/72

t s a about a e ag g







Answer: A XdPP(A)


Qiu, Lee BST 401


52/72

Conditional expectation and the total expectation


53/72

p p

In the same way we can compute E(X|Ac). The total

expectation is the weighted average of the two conditionalexpectations: EX = P(A)E(X|A) + P(Ac)E(X|Ac).If is a disjoint union of A1, A2, . . . , AK, we may computethe total expectation by first compute the conditional

expectations on Ak, then take the weighted average of

these conditional expectations (Equation 1.1f, page 224)

EX =K

k=1

P(Ak)E(X|Ak).

In fact, it is as easy to compute E(X|B) from E(X|Ak), if Bis a member of G = ({A1, A2, . . . , AK})

E(X|B) =1

P(B)

AkB

P(Ak)E(X|Ak).

Qiu, Lee BST 401

Conditional expectation and the total expectation


54/72

p p

In the same way we can compute E(X|Ac). The total

expectation is the weighted average of the two conditionalexpectations: EX = P(A)E(X|A) + P(Ac)E(X|Ac).If is a disjoint union of A1, A2, . . . , AK, we may computethe total expectation by first compute the conditional

expectations on Ak, then take the weighted average of

these conditional expectations (Equation 1.1f, page 224)

EX =K

k=1

P(Ak)E(X|Ak).

In fact, it is as easy to compute E(X|B) from E(X|Ak), if Bis a member of G = ({A1, A2, . . . , AK})

E(X|B) =1

P(B)

AkB

P(Ak)E(X|Ak).

Qiu, Lee BST 401

Cond. Exp. and the -algebra


55/72

p g

Last slides shows that you can view conditional

expectation as a r.v. on G.

Define this random variable in this way:

Y() = E(X|Ak), Ai.

This r.v. satisfies the following properties:1 Y is G measurable, denoted as Y G. It means

Y1(B) G for all B B(R) In general X / G because

X F but F is usually finer than G.2 For all B G,

B

YdP =

BXdP.

Qiu, Lee BST 401



56/72




Y() = E(X|Ak), Ai.




B

YdP =

BXdP.

Qiu, Lee BST 401


57/72



58/72




Y() = E(X|Ak), Ai.




B

YdP =

BXdP.

Qiu, Lee BST 401

Conditional expectation and R-N derivative


59/72

Y is denoted as E(X|G) and is called the conditionalexpectation of X given G.

It turns out that Y = ddP

, where is a signed measure

defined on G and P is P restricted on G

(A) =

A

XdP, P : G R, P(B) = P(B).

This construction shows that the conditional expectation is

just a special Radon-Nikodym derivative between two

measures.

Ha Youn will revisit this subject after the midterm exam.

Qiu, Lee BST 401



60/72





(A) =

A

XdP, P : G R, P(B) = P(B).



measures.


Qiu, Lee BST 401



61/72





(A) =

A

XdP, P : G R, P(B) = P(B).



measures.


Qiu, Lee BST 401



62/72





(A) =

A

XdP, P : G R, P(B) = P(B).



measures.


Qiu, Lee BST 401

Lena Sderberg, an Illustration of conditional

t ti


63/72

expectation

is the canvas (512512 pix-els), F is the -algebra gen-erate by these pixels, is thediscrete uniform distribution. A

graph is a random vector X : R3, X = (R, G, B).

{, } = F0 F1 . . . F9 = F.

These figures represent con-ditional expectation E(X|Fi).Basically E(X|F0) = E(X),E(X|F) = X.

Qiu, Lee BST 401


t ti


64/72

expectation



{, } = F0 F1 . . . F9 = F.


Qiu, Lee BST 401


t ti


65/72

expectation



{, } = F0 F1 . . . F9 = F.


Qiu, Lee BST 401


expectation


66/72

expectation



{, } = F0 F1 . . . F9 = F.


Qiu, Lee BST 401


expectation


67/72

expectation



{, } = F0 F1 . . . F9 = F.


Qiu, Lee BST 401


expectation


68/72

expectation



{, } = F0 F1 . . . F9 = F.


Qiu, Lee BST 401


expectation


69/72

expectation



{, } = F0 F1 . . . F9 = F.


Qiu, Lee BST 401


expectation


70/72

expectation



{, } = F0 F1 . . . F9 = F.


Qiu, Lee BST 401


expectation


71/72

expectation



{, } = F0 F1 . . . F9 = F.


Qiu, Lee BST 401


expectation


72/72

expectation



{, } = F0 F1 . . . F9 = F.

These figures represent con-

ditional expectation E(X|Fi).Basically E(X|F0) = E(X),E(X|F) = X.

Qiu, Lee BST 401

Probability Theory Presentation 12

Documents

Transcript of Probability Theory Presentation 12