Probability Theory Presentation 12
Transcript of Probability Theory Presentation 12
-
8/8/2019 Probability Theory Presentation 12
1/72
BST 401 Probability Theory
Xing Qiu Ha Youn Lee
Department of Biostatistics and Computational BiologyUniversity of Rochester
October, 14, 2010
Qiu, Lee BST 401
http://goforward/http://find/http://goback/ -
8/8/2019 Probability Theory Presentation 12
2/72
Outline
1 Radon-Nikodym Theorem
2 Introduction of Conditional Expectation
Qiu, Lee BST 401
http://goforward/http://find/http://goback/ -
8/8/2019 Probability Theory Presentation 12
3/72
Motivation (I)
A little refresh of your undergraduate probability theory:There are two types of probability distributions: continuous
ones and discrete ones.
Continuous probabilities and discrete ones have different
definition of density functions (p.d.f.).You can have a mixture of the two. Example: survey
question, how much tax did you pay for year 2008? A small
but non-trivial proportion of U.S. residents didnt have to
pay. So you can describe it as a discrete random variable 0
(did not pay) and 1 (paid). But thats a bad survey design.Better way: for those who did pay, it is better to record how
muchdid they pay, which can be modeled as a continuous
random variable.
Qiu, Lee BST 401
http://goforward/http://find/http://goback/ -
8/8/2019 Probability Theory Presentation 12
4/72
Motivation (I)
A little refresh of your undergraduate probability theory:There are two types of probability distributions: continuous
ones and discrete ones.
Continuous probabilities and discrete ones have different
definition of density functions (p.d.f.).You can have a mixture of the two. Example: survey
question, how much tax did you pay for year 2008? A small
but non-trivial proportion of U.S. residents didnt have to
pay. So you can describe it as a discrete random variable 0
(did not pay) and 1 (paid). But thats a bad survey design.Better way: for those who did pay, it is better to record how
muchdid they pay, which can be modeled as a continuous
random variable.
Qiu, Lee BST 401
http://goforward/http://find/http://goback/ -
8/8/2019 Probability Theory Presentation 12
5/72
Motivation (I)
A little refresh of your undergraduate probability theory:There are two types of probability distributions: continuous
ones and discrete ones.
Continuous probabilities and discrete ones have different
definition of density functions (p.d.f.).You can have a mixture of the two. Example: survey
question, how much tax did you pay for year 2008? A small
but non-trivial proportion of U.S. residents didnt have to
pay. So you can describe it as a discrete random variable 0
(did not pay) and 1 (paid). But thats a bad survey design.Better way: for those who did pay, it is better to record how
muchdid they pay, which can be modeled as a continuous
random variable.
Qiu, Lee BST 401
http://goforward/http://find/http://goback/ -
8/8/2019 Probability Theory Presentation 12
6/72
Motivation (II)
The more challenging problem: are these two the onlytypes of probability measures? I.e., for every probability
measure (or L-S measure), can we always decompose it
into a continuous part and a discrete part?
The Radon-Nikodym theorem and the Lebesgue
decomposition theorem are all about the structure of L-Smeasures (probabilities).
Together they claim that every L-S measure can be
decomposed into (w.r.t. the L-measure) an absolutely
continuous part and a singular part.Where the singular part is much like, but not exactly
restricted to the discrete measures.
And the absolutely continuous part can be expressed by
integrating a density function w.r.t. the Lebesgue measure.
Qiu, Lee BST 401
http://goforward/http://find/http://goback/ -
8/8/2019 Probability Theory Presentation 12
7/72
Motivation (II)
The more challenging problem: are these two the onlytypes of probability measures? I.e., for every probability
measure (or L-S measure), can we always decompose it
into a continuous part and a discrete part?
The Radon-Nikodym theorem and the Lebesgue
decomposition theorem are all about the structure of L-Smeasures (probabilities).
Together they claim that every L-S measure can be
decomposed into (w.r.t. the L-measure) an absolutely
continuous part and a singular part.
Where the singular part is much like, but not exactly
restricted to the discrete measures.
And the absolutely continuous part can be expressed by
integrating a density function w.r.t. the Lebesgue measure.
Qiu, Lee BST 401
http://goforward/http://find/http://goback/ -
8/8/2019 Probability Theory Presentation 12
8/72
Motivation (II)
The more challenging problem: are these two the onlytypes of probability measures? I.e., for every probability
measure (or L-S measure), can we always decompose it
into a continuous part and a discrete part?
The Radon-Nikodym theorem and the Lebesgue
decomposition theorem are all about the structure of L-Smeasures (probabilities).
Together they claim that every L-S measure can be
decomposed into (w.r.t. the L-measure) an absolutely
continuous part and a singular part.
Where the singular part is much like, but not exactly
restricted to the discrete measures.
And the absolutely continuous part can be expressed by
integrating a density function w.r.t. the Lebesgue measure.
Qiu, Lee BST 401
http://goforward/http://find/http://goback/ -
8/8/2019 Probability Theory Presentation 12
9/72
Motivation (II)
The more challenging problem: are these two the only
types of probability measures? I.e., for every probability
measure (or L-S measure), can we always decompose it
into a continuous part and a discrete part?
The Radon-Nikodym theorem and the Lebesgue
decomposition theorem are all about the structure of L-Smeasures (probabilities).
Together they claim that every L-S measure can be
decomposed into (w.r.t. the L-measure) an absolutely
continuous part and a singular part.
Where the singular part is much like, but not exactly
restricted to the discrete measures.
And the absolutely continuous part can be expressed by
integrating a density function w.r.t. the Lebesgue measure.
Qiu, Lee BST 401
http://goforward/http://find/http://goback/ -
8/8/2019 Probability Theory Presentation 12
10/72
Motivation (II)
The more challenging problem: are these two the only
types of probability measures? I.e., for every probability
measure (or L-S measure), can we always decompose it
into a continuous part and a discrete part?
The Radon-Nikodym theorem and the Lebesgue
decomposition theorem are all about the structure of L-Smeasures (probabilities).
Together they claim that every L-S measure can be
decomposed into (w.r.t. the L-measure) an absolutely
continuous part and a singular part.
Where the singular part is much like, but not exactly
restricted to the discrete measures.
And the absolutely continuous part can be expressed by
integrating a density function w.r.t. the Lebesgue measure.
Qiu, Lee BST 401
http://goforward/http://find/http://goback/ -
8/8/2019 Probability Theory Presentation 12
11/72
Motivation (III)
In this sense, R-N theorem a) defines an abstractderivative between two measures and , denoted as d
d;
b) provides a criterion based on which we can check if dd
exists or not.
Just like Lebesgue-Stieltjes integral is an extension of the
usual Riemann integral, Radon-Nikodym derivative is an
extension of the usual derivative.R
f(x)dx
Riemann integral
=
R
f(x)d(x)
L-S integral
,dF(x)
dx Calculus derivative
=d
d= f(x)
R-N derivative
,
where is the Lebesgue measure, F (f) is thedistribution (density) function of .
Qiu, Lee BST 401
http://goforward/http://find/http://goback/ -
8/8/2019 Probability Theory Presentation 12
12/72
Motivation (III)
In this sense, R-N theorem a) defines an abstractderivative between two measures and , denoted as d
d;
b) provides a criterion based on which we can check if dd
exists or not.
Just like Lebesgue-Stieltjes integral is an extension of the
usual Riemann integral, Radon-Nikodym derivative is an
extension of the usual derivative.R
f(x)dx
Riemann integral
=
R
f(x)d(x)
L-S integral
,dF(x)
dx Calculus derivative
=d
d= f(x)
R-N derivative
,
where is the Lebesgue measure, F (f) is thedistribution (density) function of .
Qiu, Lee BST 401
http://goforward/http://find/http://goback/ -
8/8/2019 Probability Theory Presentation 12
13/72
Hahn decomposition theorem
Every signed measure can be decomposed into a positive and
a negative part.
Recall the function +/- branch.
Let be a signed measure. (e.g., almost a measure except
that (A) can be negative. Analogy: electric charge).
Partition of the whole space into a positive set + and anegative set . = + , + = .
For each A F, (A +) 0 and (A ) 0.
This decomposition is unique up to a null set. If(0) = 0, then + 0 and \0 is an equivalent Hahndecomposition.
Qiu, Lee BST 401
http://goforward/http://find/http://goback/ -
8/8/2019 Probability Theory Presentation 12
14/72
Hahn decomposition theorem
Every signed measure can be decomposed into a positive and
a negative part.
Recall the function +/- branch.
Let be a signed measure. (e.g., almost a measure except
that (A) can be negative. Analogy: electric charge).
Partition of the whole space into a positive set + and anegative set . = + , + = .
For each A F, (A +) 0 and (A ) 0.
This decomposition is unique up to a null set. If(0) = 0, then + 0 and \0 is an equivalent Hahndecomposition.
Qiu, Lee BST 401
http://goforward/http://find/http://goback/ -
8/8/2019 Probability Theory Presentation 12
15/72
Hahn decomposition theorem
Every signed measure can be decomposed into a positive and
a negative part.
Recall the function +/- branch.
Let be a signed measure. (e.g., almost a measure except
that (A) can be negative. Analogy: electric charge).Partition of the whole space into a positive set + and anegative set . = + , + = .
For each A F, (A +) 0 and (A ) 0.
This decomposition is unique up to a null set. If(0) = 0, then + 0 and \0 is an equivalent Hahndecomposition.
Qiu, Lee BST 401
http://goforward/http://find/http://goback/ -
8/8/2019 Probability Theory Presentation 12
16/72
Hahn decomposition theorem
Every signed measure can be decomposed into a positive and
a negative part.
Recall the function +/- branch.
Let be a signed measure. (e.g., almost a measure except
that (A) can be negative. Analogy: electric charge).Partition of the whole space into a positive set + and anegative set . = + , + = .
For each A F, (A +) 0 and (A ) 0.
This decomposition is unique up to a null set. If(0) = 0, then + 0 and \0 is an equivalent Hahndecomposition.
Qiu, Lee BST 401
http://goforward/http://find/http://goback/ -
8/8/2019 Probability Theory Presentation 12
17/72
Hahn decomposition theorem
Every signed measure can be decomposed into a positive and
a negative part.
Recall the function +/- branch.
Let be a signed measure. (e.g., almost a measure except
that (A) can be negative. Analogy: electric charge).Partition of the whole space into a positive set + and anegative set . = + , + = .
For each A F, (A +) 0 and (A ) 0.
This decomposition is unique up to a null set. If(0) = 0, then + 0 and \0 is an equivalent Hahndecomposition.
Qiu, Lee BST 401
http://goforward/http://find/http://goback/ -
8/8/2019 Probability Theory Presentation 12
18/72
Singularity
1, 2 are measures on the same F.
They are said to be mutually singular, written as 1 2, ifthey concentrate on disjoint sets. i.e., B F, such that
1(B) = 0 and 2(Bc) = 0.Examples. Sets with two parts; discrete measures w.r.t.
L-measure.
Not all measures that are singular w.r.t. L-meas are
discrete measures. a) R2
, uniform measure on a circle/line;b) R1, uniform measure on the Cantor set. Wikipedia!
Qiu, Lee BST 401
http://goforward/http://find/http://goback/ -
8/8/2019 Probability Theory Presentation 12
19/72
Singularity
1, 2 are measures on the same F.
They are said to be mutually singular, written as 1 2, ifthey concentrate on disjoint sets. i.e., B F, such that
1(B) = 0 and 2(Bc) = 0.Examples. Sets with two parts; discrete measures w.r.t.
L-measure.
Not all measures that are singular w.r.t. L-meas are
discrete measures. a) R2
, uniform measure on a circle/line;b) R1, uniform measure on the Cantor set. Wikipedia!
Qiu, Lee BST 401
http://goforward/http://find/http://goback/ -
8/8/2019 Probability Theory Presentation 12
20/72
Singularity
1, 2 are measures on the same F.
They are said to be mutually singular, written as 1 2, ifthey concentrate on disjoint sets. i.e., B F, such that
1(B) = 0 and 2(Bc) = 0.Examples. Sets with two parts; discrete measures w.r.t.
L-measure.
Not all measures that are singular w.r.t. L-meas are
discrete measures. a) R2
, uniform measure on a circle/line;b) R1, uniform measure on the Cantor set. Wikipedia!
Qiu, Lee BST 401
http://goforward/http://find/http://goback/ -
8/8/2019 Probability Theory Presentation 12
21/72
Singularity
1, 2 are measures on the same F.
They are said to be mutually singular, written as 1 2, ifthey concentrate on disjoint sets. i.e., B F, such that
1(B) = 0 and 2(Bc) = 0.Examples. Sets with two parts; discrete measures w.r.t.
L-measure.
Not all measures that are singular w.r.t. L-meas are
discrete measures. a)R
2
, uniform measure on a circle/line;b) R1, uniform measure on the Cantor set. Wikipedia!
Qiu, Lee BST 401
http://goforward/http://find/http://goback/ -
8/8/2019 Probability Theory Presentation 12
22/72
Jordan-Hahn decomposition
A natural consequence of the Hahn decomposition theorem.
= + , where +, are two measures (meaning:with positive values) that are mutually singular.
+(A) = (A +), for all A F.
(A) = (A ), for all A F.
At least one of +, must be finite, otherwise() = is not well defined.
Qiu, Lee BST 401
http://goforward/http://find/http://goback/ -
8/8/2019 Probability Theory Presentation 12
23/72
Jordan-Hahn decomposition
A natural consequence of the Hahn decomposition theorem.
= + , where +, are two measures (meaning:with positive values) that are mutually singular.
+(A) = (A +), for all A F.
(A) = (A ), for all A F.
At least one of +, must be finite, otherwise() = is not well defined.
Qiu, Lee BST 401
http://goforward/http://find/http://goback/ -
8/8/2019 Probability Theory Presentation 12
24/72
Jordan-Hahn decomposition
A natural consequence of the Hahn decomposition theorem.
= + , where +, are two measures (meaning:with positive values) that are mutually singular.
+(A) = (A +), for all A F.
(A) = (A ), for all A F.
At least one of +, must be finite, otherwise() = is not well defined.
Qiu, Lee BST 401
http://goforward/http://find/http://goback/ -
8/8/2019 Probability Theory Presentation 12
25/72
Jordan-Hahn decomposition
A natural consequence of the Hahn decomposition theorem.
= + , where +, are two measures (meaning:with positive values) that are mutually singular.
+(A) = (A +), for all A F.
(A) = (A ), for all A F.
At least one of +, must be finite, otherwise() = is not well defined.
Qiu, Lee BST 401
http://goforward/http://find/http://goback/ -
8/8/2019 Probability Theory Presentation 12
26/72
Total variation
Sometimes, +, are called the upper/lower variations of
.|| = + + is called the total variation of . Sort of theabsolute value of a measure.
Qiu, Lee BST 401
http://goforward/http://find/http://goback/ -
8/8/2019 Probability Theory Presentation 12
27/72
Total variation
Sometimes, +, are called the upper/lower variations of
.|| = + + is called the total variation of . Sort of theabsolute value of a measure.
Qiu, Lee BST 401
http://goforward/http://find/http://goback/ -
8/8/2019 Probability Theory Presentation 12
28/72
Absolute continuity of measures
Exercise (5.8), Page 465: (A) = A gd defines a signedmeasure on F.
One interesting observation: (A) = 0 = (A) = 0.
Definition: is said to be absolute continuous w.r.t. to , iff
(A) = 0 = (A) = 0 for all A F.Notation: .
Calculus analogy: if F =
fdx, F must be continuous
(w.r.t. the Lebesgue measure).
The name absolute continuity implies that it is a strongertype of continuity. Just as in calculus, only certain
continuous function can be anti-derivatives. Wikipedia has
an excellent entry on this topic.
Qiu, Lee BST 401
http://goforward/http://find/http://goback/ -
8/8/2019 Probability Theory Presentation 12
29/72
Absolute continuity of measures
Exercise (5.8), Page 465: (A) = A gd defines a signedmeasure on F.
One interesting observation: (A) = 0 = (A) = 0.
Definition: is said to be absolute continuous w.r.t. to , iff
(A) = 0 = (A) = 0 for all A F.Notation: .
Calculus analogy: if F =
fdx, F must be continuous
(w.r.t. the Lebesgue measure).
The name absolute continuity implies that it is a strongertype of continuity. Just as in calculus, only certain
continuous function can be anti-derivatives. Wikipedia has
an excellent entry on this topic.
Qiu, Lee BST 401
http://goforward/http://find/http://goback/ -
8/8/2019 Probability Theory Presentation 12
30/72
Absolute continuity of measures
Exercise (5.8), Page 465: (A) = A gd defines a signedmeasure on F.
One interesting observation: (A) = 0 = (A) = 0.
Definition: is said to be absolute continuous w.r.t. to , iff
(A) = 0 = (A) = 0 for all A F
.Notation: .
Calculus analogy: if F =
fdx, F must be continuous
(w.r.t. the Lebesgue measure).
The name absolute continuity implies that it is a strongertype of continuity. Just as in calculus, only certain
continuous function can be anti-derivatives. Wikipedia has
an excellent entry on this topic.
Qiu, Lee BST 401
http://goforward/http://find/http://goback/ -
8/8/2019 Probability Theory Presentation 12
31/72
Absolute continuity of measures
Exercise (5.8), Page 465: (A) = A gd defines a signedmeasure on F.
One interesting observation: (A) = 0 = (A) = 0.
Definition: is said to be absolute continuous w.r.t. to , iff
(A) = 0 = (A) = 0 for all A F
.Notation: .
Calculus analogy: if F =
fdx, F must be continuous
(w.r.t. the Lebesgue measure).
The name absolute continuity implies that it is a strongertype of continuity. Just as in calculus, only certain
continuous function can be anti-derivatives. Wikipedia has
an excellent entry on this topic.
Qiu, Lee BST 401
http://goforward/http://find/http://goback/ -
8/8/2019 Probability Theory Presentation 12
32/72
Absolute continuity of measures
Exercise (5.8), Page 465: (A) = A gd defines a signedmeasure on F.
One interesting observation: (A) = 0 = (A) = 0.
Definition: is said to be absolute continuous w.r.t. to , iff
(A) = 0 = (A) = 0 for all A F
.Notation: .
Calculus analogy: if F =
fdx, F must be continuous
(w.r.t. the Lebesgue measure).
The name absolute continuity implies that it is a strongertype of continuity. Just as in calculus, only certain
continuous function can be anti-derivatives. Wikipedia has
an excellent entry on this topic.
Qiu, Lee BST 401
http://goforward/http://find/http://goback/ -
8/8/2019 Probability Theory Presentation 12
33/72
Absolute continuity of measures
Exercise (5.8), Page 465: (A) = A gd defines a signedmeasure on F.
One interesting observation: (A) = 0 = (A) = 0.
Definition: is said to be absolute continuous w.r.t. to , iff
(A) = 0 = (A) = 0 for all A F
.Notation: .
Calculus analogy: if F =
fdx, F must be continuous
(w.r.t. the Lebesgue measure).
The name absolute continuity implies that it is a strongertype of continuity. Just as in calculus, only certain
continuous function can be anti-derivatives. Wikipedia has
an excellent entry on this topic.
Qiu, Lee BST 401
http://goforward/http://find/http://goback/ -
8/8/2019 Probability Theory Presentation 12
34/72
Radon-Nikodym Theorem
If , then there must exists a measurable function g
such that (A) = A gd.And this g is almost everywhere unique: if h is another
such function, then ga.e.= h.
This g is called the density function or the Radon-Nikodym
derivative of w.r.t. , denoted asd
d . and apparently are defined on the same -algebra.
: Lebesgue measure = g : the usual density function.
If both and are absolutely continuous w.r.t to theLebesgue measure (means both usual densities exist),
then R-N derivative is the ratio of the two densities
d
d=
g
f, g =
d
dx, f =
d
dx.
Qiu, Lee BST 401
http://goforward/http://find/http://goback/ -
8/8/2019 Probability Theory Presentation 12
35/72
Radon-Nikodym Theorem
If , then there must exists a measurable function g
such that (A) = A gd.And this g is almost everywhere unique: if h is another
such function, then ga.e.= h.
This g is called the density function or the Radon-Nikodym
derivative of w.r.t. , denoted asd
d . and apparently are defined on the same -algebra.
: Lebesgue measure = g : the usual density function.
If both and are absolutely continuous w.r.t to theLebesgue measure (means both usual densities exist),
then R-N derivative is the ratio of the two densities
d
d=
g
f, g =
d
dx, f =
d
dx.
Qiu, Lee BST 401
R d Nik d Th
http://goforward/http://find/http://goback/ -
8/8/2019 Probability Theory Presentation 12
36/72
Radon-Nikodym Theorem
If , then there must exists a measurable function g
such that (A) = A gd.And this g is almost everywhere unique: if h is another
such function, then ga.e.= h.
This g is called the density function or the Radon-Nikodym
derivative of w.r.t. , denoted asd
d . and apparently are defined on the same -algebra.
: Lebesgue measure = g : the usual density function.
If both and are absolutely continuous w.r.t to theLebesgue measure (means both usual densities exist),
then R-N derivative is the ratio of the two densities
d
d=
g
f, g =
d
dx, f =
d
dx.
Qiu, Lee BST 401
R d Nik d Th
http://goforward/http://find/http://goback/ -
8/8/2019 Probability Theory Presentation 12
37/72
Radon-Nikodym Theorem
If , then there must exists a measurable function g
such that (A) = A gd.And this g is almost everywhere unique: if h is another
such function, then ga.e.= h.
This g is called the density function or the Radon-Nikodym
derivative of w.r.t. , denoted asd
d . and apparently are defined on the same -algebra.
: Lebesgue measure = g : the usual density function.
If both and are absolutely continuous w.r.t to theLebesgue measure (means both usual densities exist),
then R-N derivative is the ratio of the two densities
d
d=
g
f, g =
d
dx, f =
d
dx.
Qiu, Lee BST 401
R d Nik d Th
http://goforward/http://find/http://goback/ -
8/8/2019 Probability Theory Presentation 12
38/72
Radon-Nikodym Theorem
If , then there must exists a measurable function g
such that (A) = A gd.And this g is almost everywhere unique: if h is another
such function, then ga.e.= h.
This g is called the density function or the Radon-Nikodym
derivative of w.r.t. , denoted asd
d . and apparently are defined on the same -algebra.
: Lebesgue measure = g : the usual density function.
If both and are absolutely continuous w.r.t to theLebesgue measure (means both usual densities exist),
then R-N derivative is the ratio of the two densities
d
d=
g
f, g =
d
dx, f =
d
dx.
Qiu, Lee BST 401
R d Nik d Th
http://goforward/http://find/http://goback/ -
8/8/2019 Probability Theory Presentation 12
39/72
Radon-Nikodym Theorem
If , then there must exists a measurable function g
such that (A) = A gd.And this g is almost everywhere unique: if h is another
such function, then ga.e.= h.
This g is called the density function or the Radon-Nikodym
derivative of w.r.t. , denoted asd
d . and apparently are defined on the same -algebra.
: Lebesgue measure = g : the usual density function.
If both and are absolutely continuous w.r.t to theLebesgue measure (means both usual densities exist),
then R-N derivative is the ratio of the two densities
d
d=
g
f, g =
d
dx, f =
d
dx.
Qiu, Lee BST 401
Lebesgue Decomposition Theorem
http://goforward/http://find/http://goback/ -
8/8/2019 Probability Theory Presentation 12
40/72
Lebesgue Decomposition Theorem
Radon-Nikodym theorem takes care of the continuous
measures. Now let us deal with the general case.
: a reference measure. a signed measure defined onthe same -field F.
= 1 + 2, where 1 and 2 are signed measures suchthat
1 (the absolutely continuous part), 2 (thesingular part).
This decomposition is unique.
Qiu, Lee BST 401
Lebesgue Decomposition Theorem
http://goforward/http://find/http://goback/ -
8/8/2019 Probability Theory Presentation 12
41/72
Lebesgue Decomposition Theorem
Radon-Nikodym theorem takes care of the continuous
measures. Now let us deal with the general case.
: a reference measure. a signed measure defined onthe same -field F.
= 1 + 2, where 1 and 2 are signed measures suchthat
1 (the absolutely continuous part), 2 (thesingular part).
This decomposition is unique.
Qiu, Lee BST 401
Lebesgue Decomposition Theorem
http://goforward/http://find/http://goback/ -
8/8/2019 Probability Theory Presentation 12
42/72
Lebesgue Decomposition Theorem
Radon-Nikodym theorem takes care of the continuous
measures. Now let us deal with the general case.
: a reference measure. a signed measure defined onthe same -field F.
= 1 + 2, where 1 and 2 are signed measures suchthat
1 (the absolutely continuous part), 2 (thesingular part).
This decomposition is unique.
Qiu, Lee BST 401
Lebesgue Decomposition Theorem
http://goforward/http://find/http://goback/ -
8/8/2019 Probability Theory Presentation 12
43/72
Lebesgue Decomposition Theorem
Radon-Nikodym theorem takes care of the continuous
measures. Now let us deal with the general case.
: a reference measure. a signed measure defined onthe same -field F.
= 1 + 2, where 1 and 2 are signed measures suchthat
1 (the absolutely continuous part), 2 (thesingular part).
This decomposition is unique.
Qiu, Lee BST 401
Density function revisited
http://goforward/http://find/http://goback/ -
8/8/2019 Probability Theory Presentation 12
44/72
Density function revisited
Continuous random variables. Def.
Reference measure: L-measure.Discrete r.v.s. Def.
Reference measure: counting measure on the state space.
Qiu, Lee BST 401
Density function revisited
http://goforward/http://find/http://goback/ -
8/8/2019 Probability Theory Presentation 12
45/72
Density function revisited
Continuous random variables. Def.
Reference measure: L-measure.Discrete r.v.s. Def.
Reference measure: counting measure on the state space.
Qiu, Lee BST 401
Density function revisited
http://goforward/http://find/http://goback/ -
8/8/2019 Probability Theory Presentation 12
46/72
Density function revisited
Continuous random variables. Def.
Reference measure: L-measure.Discrete r.v.s. Def.
Reference measure: counting measure on the state space.
Qiu, Lee BST 401
Density function revisited
http://goforward/http://find/http://goback/ -
8/8/2019 Probability Theory Presentation 12
47/72
Density function revisited
Continuous random variables. Def.
Reference measure: L-measure.Discrete r.v.s. Def.
Reference measure: counting measure on the state space.
Qiu, Lee BST 401
Its all about averaging
http://goforward/http://find/http://goback/ -
8/8/2019 Probability Theory Presentation 12
48/72
It s all about averaging
A discrete example. = {s1, s2, . . . , sN}. X : R is ar.v.
The mathematical expectation of X is the theoretical
average of X over the whole space .
We can also do partial average. Say for some reason we
want to restrict the possible outcomes of X to only
A = {s1, s2}. Whats the theoretical average of Xconditional on A?
Answer: A XdPP(A)
, denoted as E(X|A).
Qiu, Lee BST 401
Its all about averaging
http://goforward/http://find/http://goback/ -
8/8/2019 Probability Theory Presentation 12
49/72
It s all about averaging
A discrete example. = {s1, s2, . . . , sN}. X : R is ar.v.
The mathematical expectation of X is the theoretical
average of X over the whole space .
We can also do partial average. Say for some reason we
want to restrict the possible outcomes of X to only
A = {s1, s2}. Whats the theoretical average of Xconditional on A?
Answer: A XdPP(A)
, denoted as E(X|A).
Qiu, Lee BST 401
Its all about averaging
http://goforward/http://find/http://goback/ -
8/8/2019 Probability Theory Presentation 12
50/72
It s all about averaging
A discrete example. = {s1, s2, . . . , sN}. X : R is ar.v.
The mathematical expectation of X is the theoretical
average of X over the whole space .
We can also do partial average. Say for some reason we
want to restrict the possible outcomes of X to only
A = {s1, s2}. Whats the theoretical average of Xconditional on A?
Answer: A XdPP(A)
, denoted as E(X|A).
Qiu, Lee BST 401
Its all about averaging
http://goforward/http://find/http://goback/ -
8/8/2019 Probability Theory Presentation 12
51/72
t s a about a e ag g
A discrete example. = {s1, s2, . . . , sN}. X : R is ar.v.
The mathematical expectation of X is the theoretical
average of X over the whole space .
We can also do partial average. Say for some reason we
want to restrict the possible outcomes of X to only
A = {s1, s2}. Whats the theoretical average of Xconditional on A?
Answer: A XdPP(A)
, denoted as E(X|A).
Qiu, Lee BST 401
http://goforward/http://find/http://goback/ -
8/8/2019 Probability Theory Presentation 12
52/72
Conditional expectation and the total expectation
-
8/8/2019 Probability Theory Presentation 12
53/72
p p
In the same way we can compute E(X|Ac). The total
expectation is the weighted average of the two conditionalexpectations: EX = P(A)E(X|A) + P(Ac)E(X|Ac).If is a disjoint union of A1, A2, . . . , AK, we may computethe total expectation by first compute the conditional
expectations on Ak, then take the weighted average of
these conditional expectations (Equation 1.1f, page 224)
EX =K
k=1
P(Ak)E(X|Ak).
In fact, it is as easy to compute E(X|B) from E(X|Ak), if Bis a member of G = ({A1, A2, . . . , AK})
E(X|B) =1
P(B)
AkB
P(Ak)E(X|Ak).
Qiu, Lee BST 401
Conditional expectation and the total expectation
http://goforward/http://find/http://goback/ -
8/8/2019 Probability Theory Presentation 12
54/72
p p
In the same way we can compute E(X|Ac). The total
expectation is the weighted average of the two conditionalexpectations: EX = P(A)E(X|A) + P(Ac)E(X|Ac).If is a disjoint union of A1, A2, . . . , AK, we may computethe total expectation by first compute the conditional
expectations on Ak, then take the weighted average of
these conditional expectations (Equation 1.1f, page 224)
EX =K
k=1
P(Ak)E(X|Ak).
In fact, it is as easy to compute E(X|B) from E(X|Ak), if Bis a member of G = ({A1, A2, . . . , AK})
E(X|B) =1
P(B)
AkB
P(Ak)E(X|Ak).
Qiu, Lee BST 401
Cond. Exp. and the -algebra
http://goforward/http://find/http://goback/ -
8/8/2019 Probability Theory Presentation 12
55/72
p g
Last slides shows that you can view conditional
expectation as a r.v. on G.
Define this random variable in this way:
Y() = E(X|Ak), Ai.
This r.v. satisfies the following properties:1 Y is G measurable, denoted as Y G. It means
Y1(B) G for all B B(R) In general X / G because
X F but F is usually finer than G.2 For all B G,
B
YdP =
BXdP.
Qiu, Lee BST 401
Cond. Exp. and the -algebra
http://goforward/http://find/http://goback/ -
8/8/2019 Probability Theory Presentation 12
56/72
Last slides shows that you can view conditional
expectation as a r.v. on G.
Define this random variable in this way:
Y() = E(X|Ak), Ai.
This r.v. satisfies the following properties:1 Y is G measurable, denoted as Y G. It means
Y1(B) G for all B B(R) In general X / G because
X F but F is usually finer than G.2 For all B G,
B
YdP =
BXdP.
Qiu, Lee BST 401
http://goforward/http://find/http://goback/ -
8/8/2019 Probability Theory Presentation 12
57/72
Cond. Exp. and the -algebra
-
8/8/2019 Probability Theory Presentation 12
58/72
Last slides shows that you can view conditional
expectation as a r.v. on G.
Define this random variable in this way:
Y() = E(X|Ak), Ai.
This r.v. satisfies the following properties:1 Y is G measurable, denoted as Y G. It means
Y1(B) G for all B B(R) In general X / G because
X F but F is usually finer than G.2 For all B G,
B
YdP =
BXdP.
Qiu, Lee BST 401
Conditional expectation and R-N derivative
http://goforward/http://find/http://goback/ -
8/8/2019 Probability Theory Presentation 12
59/72
Y is denoted as E(X|G) and is called the conditionalexpectation of X given G.
It turns out that Y = ddP
, where is a signed measure
defined on G and P is P restricted on G
(A) =
A
XdP, P : G R, P(B) = P(B).
This construction shows that the conditional expectation is
just a special Radon-Nikodym derivative between two
measures.
Ha Youn will revisit this subject after the midterm exam.
Qiu, Lee BST 401
Conditional expectation and R-N derivative
http://goforward/http://find/http://goback/ -
8/8/2019 Probability Theory Presentation 12
60/72
Y is denoted as E(X|G) and is called the conditionalexpectation of X given G.
It turns out that Y = ddP
, where is a signed measure
defined on G and P is P restricted on G
(A) =
A
XdP, P : G R, P(B) = P(B).
This construction shows that the conditional expectation is
just a special Radon-Nikodym derivative between two
measures.
Ha Youn will revisit this subject after the midterm exam.
Qiu, Lee BST 401
Conditional expectation and R-N derivative
http://goforward/http://find/http://goback/ -
8/8/2019 Probability Theory Presentation 12
61/72
Y is denoted as E(X|G) and is called the conditionalexpectation of X given G.
It turns out that Y = ddP
, where is a signed measure
defined on G and P is P restricted on G
(A) =
A
XdP, P : G R, P(B) = P(B).
This construction shows that the conditional expectation is
just a special Radon-Nikodym derivative between two
measures.
Ha Youn will revisit this subject after the midterm exam.
Qiu, Lee BST 401
Conditional expectation and R-N derivative
http://goforward/http://find/http://goback/ -
8/8/2019 Probability Theory Presentation 12
62/72
Y is denoted as E(X|G) and is called the conditionalexpectation of X given G.
It turns out that Y = ddP
, where is a signed measure
defined on G and P is P restricted on G
(A) =
A
XdP, P : G R, P(B) = P(B).
This construction shows that the conditional expectation is
just a special Radon-Nikodym derivative between two
measures.
Ha Youn will revisit this subject after the midterm exam.
Qiu, Lee BST 401
Lena Sderberg, an Illustration of conditional
t ti
http://goforward/http://find/http://goback/ -
8/8/2019 Probability Theory Presentation 12
63/72
expectation
is the canvas (512512 pix-els), F is the -algebra gen-erate by these pixels, is thediscrete uniform distribution. A
graph is a random vector X : R3, X = (R, G, B).
{, } = F0 F1 . . . F9 = F.
These figures represent con-ditional expectation E(X|Fi).Basically E(X|F0) = E(X),E(X|F) = X.
Qiu, Lee BST 401
Lena Sderberg, an Illustration of conditional
t ti
http://goforward/http://find/http://goback/ -
8/8/2019 Probability Theory Presentation 12
64/72
expectation
is the canvas (512512 pix-els), F is the -algebra gen-erate by these pixels, is thediscrete uniform distribution. A
graph is a random vector X : R3, X = (R, G, B).
{, } = F0 F1 . . . F9 = F.
These figures represent con-ditional expectation E(X|Fi).Basically E(X|F0) = E(X),E(X|F) = X.
Qiu, Lee BST 401
Lena Sderberg, an Illustration of conditional
t ti
http://goforward/http://find/http://goback/ -
8/8/2019 Probability Theory Presentation 12
65/72
expectation
is the canvas (512512 pix-els), F is the -algebra gen-erate by these pixels, is thediscrete uniform distribution. A
graph is a random vector X : R3, X = (R, G, B).
{, } = F0 F1 . . . F9 = F.
These figures represent con-ditional expectation E(X|Fi).Basically E(X|F0) = E(X),E(X|F) = X.
Qiu, Lee BST 401
Lena Sderberg, an Illustration of conditional
expectation
http://goforward/http://find/http://goback/ -
8/8/2019 Probability Theory Presentation 12
66/72
expectation
is the canvas (512512 pix-els), F is the -algebra gen-erate by these pixels, is thediscrete uniform distribution. A
graph is a random vector X : R3, X = (R, G, B).
{, } = F0 F1 . . . F9 = F.
These figures represent con-ditional expectation E(X|Fi).Basically E(X|F0) = E(X),E(X|F) = X.
Qiu, Lee BST 401
Lena Sderberg, an Illustration of conditional
expectation
http://goforward/http://find/http://goback/ -
8/8/2019 Probability Theory Presentation 12
67/72
expectation
is the canvas (512512 pix-els), F is the -algebra gen-erate by these pixels, is thediscrete uniform distribution. A
graph is a random vector X : R3, X = (R, G, B).
{, } = F0 F1 . . . F9 = F.
These figures represent con-ditional expectation E(X|Fi).Basically E(X|F0) = E(X),E(X|F) = X.
Qiu, Lee BST 401
Lena Sderberg, an Illustration of conditional
expectation
http://goforward/http://find/http://goback/ -
8/8/2019 Probability Theory Presentation 12
68/72
expectation
is the canvas (512512 pix-els), F is the -algebra gen-erate by these pixels, is thediscrete uniform distribution. A
graph is a random vector X : R3, X = (R, G, B).
{, } = F0 F1 . . . F9 = F.
These figures represent con-ditional expectation E(X|Fi).Basically E(X|F0) = E(X),E(X|F) = X.
Qiu, Lee BST 401
Lena Sderberg, an Illustration of conditional
expectation
http://goforward/http://find/http://goback/ -
8/8/2019 Probability Theory Presentation 12
69/72
expectation
is the canvas (512512 pix-els), F is the -algebra gen-erate by these pixels, is thediscrete uniform distribution. A
graph is a random vector X : R3, X = (R, G, B).
{, } = F0 F1 . . . F9 = F.
These figures represent con-ditional expectation E(X|Fi).Basically E(X|F0) = E(X),E(X|F) = X.
Qiu, Lee BST 401
Lena Sderberg, an Illustration of conditional
expectation
http://goforward/http://find/http://goback/ -
8/8/2019 Probability Theory Presentation 12
70/72
expectation
is the canvas (512512 pix-els), F is the -algebra gen-erate by these pixels, is thediscrete uniform distribution. A
graph is a random vector X : R3, X = (R, G, B).
{, } = F0 F1 . . . F9 = F.
These figures represent con-ditional expectation E(X|Fi).Basically E(X|F0) = E(X),E(X|F) = X.
Qiu, Lee BST 401
Lena Sderberg, an Illustration of conditional
expectation
http://goforward/http://find/http://goback/ -
8/8/2019 Probability Theory Presentation 12
71/72
expectation
is the canvas (512512 pix-els), F is the -algebra gen-erate by these pixels, is thediscrete uniform distribution. A
graph is a random vector X : R3, X = (R, G, B).
{, } = F0 F1 . . . F9 = F.
These figures represent con-ditional expectation E(X|Fi).Basically E(X|F0) = E(X),E(X|F) = X.
Qiu, Lee BST 401
Lena Sderberg, an Illustration of conditional
expectation
http://goforward/http://find/http://goback/ -
8/8/2019 Probability Theory Presentation 12
72/72
expectation
is the canvas (512512 pix-els), F is the -algebra gen-erate by these pixels, is thediscrete uniform distribution. A
graph is a random vector X : R3, X = (R, G, B).
{, } = F0 F1 . . . F9 = F.
These figures represent con-
ditional expectation E(X|Fi).Basically E(X|F0) = E(X),E(X|F) = X.
Qiu, Lee BST 401
http://goforward/http://find/http://goback/