GCF2011: Sandra Wu,Wen-Hsiu , President and CEO Green Infrastructure
Chapter 15: Likelihood, Bayesian, and Decision Theory AMS 572 Group Members Yen-hsiu Chen, Valencia...
-
Upload
elizabeth-emma-park -
Category
Documents
-
view
218 -
download
0
Transcript of Chapter 15: Likelihood, Bayesian, and Decision Theory AMS 572 Group Members Yen-hsiu Chen, Valencia...
Chapter 15: Likelihood,
Bayesian, and Decision Theory
AMS 572Group Members
Yen-hsiu Chen, Valencia Joseph, Lola Ojo,
Andrea Roberson, Dave Roelfs, Saskya Sauer, Olivia Shy, Ping Tung
Introduction
Maximum Likelihood, Bayesian, and Decision Theory are applied and have proven its selves useful and necessary in sciences, such as physics, as well as research in general.
They provide a practical way to begin and carry out an analysis or experiment.
"To call in the statistician after the experiment is done may be no more than asking him to perform a
post-mortem examination: he may be able to say what the experiment died of."
- R.A. Fisher
15.1 Maximum
Likelihood Estimation
15.1.1 Likelihood Function Objective : Estimating the unknown parameters θof a population distribution based on a random sample χ1,…,χn from that distribution
Previous chapters : Intuitive Estimates => Sample Means for Population Mean
To improve estimation, R. A. Fisher (1890~1962) proposed MLE in 1912~1922.
Ronald Aylmer Fisher (1890~1962)
The greatest of Darwin's successors
Known for : 1912 : Maximum likelihood 1922 : F-test 1925 : Analysis of variance
(Statistical Method for Research Workers )
Notable Prizes : Royal Medal (1938) Copley Medal (1955)Source: http://www-history.mcs.st-
andrews.ac.uk/history/PictDisplay/Fisher.html
Joint p.d.f. vs. Likelihood Function Identical quantities Different interpretation Joint p.d.f. of X1 ,…, Xn :
A function of χ1,…,χn for given θ Probability interpretation
Likelihood Function of θ : A function of θfor given χ1,…,χn No probability interpretation
( ) ( ) ( ) ( ) ( )1 1 2
1
,..., ...n
n n i
i
f x x f x f x f x f xθ θ θ θ θ=
= = ∏
( ) ( ) ( ) ( ) ( )1 1 1
1
,..., ,..., ...n
n n n i
i
L x x f x x f x f x f xθ θ θ θ θ=
= = = ∏
Suppose χ1,…,χn is a random sample from a normal distribution with p.d.f.:
parameter ( ), Likelihood Function:
Example : Normal Distribution
22
21
( )1( , ) [ exp{ }]
22
ni
i
xL
μμ σσσ π=
−= −∏
22
1
1 1( ) exp{ ( ) }
22
nn
i
i
x μσσ π =
= − −∑
2,μ σ
22
2
( )1( | , ) exp{ }
22
xf x
μμ σσσ π−
= −
15.1.2 Calculation of Maximum Likelihood Estimators (MLE) MLE of an unknown parameter θ:
The value which maximizes the likelihood
function
Example of MLE: 2 independent Bernoulli trials with success probability θ
θis known : 1/4 and 1/3
=>parameter space Θ= {1/4, 1/3}
Using Binomial distribution, the probabilities of observing
χ= 0, 1, 2 successes can be calculated
$ $( )1,..., nx xθ θ=
( )1,..., nL x xθ
• When χ=0, the MLE of
• When χ=1 or 2, the MLE of
• The MLE is chosen to maximize for observed χ
Example of MLE Probability of ObservingχSuccesses
$ $: 1/ 4θ θ =
the # of successesParameter space Θ
χ
0 1 2
1/4 9/16 6/16 1/16
1/3 4/ 9 4/ 9 1/9
$ $: 1/ 3θ θ =
$θ ( )L xθ
15.1.3 Properties of MLE’s
Objective
optimality properties in large sample
Fisher information (continuous case)
Alternatives of Fisher information
2 2ln ( | ) ln ( | )
( ) ( | )d f x d f x
I f x dx Ed d
θ θθ θθ θ
∞
−∞
⎧ ⎫⎪ ⎪⎡ ⎤ ⎡ ⎤= = ⎨ ⎬⎢ ⎥ ⎢ ⎥⎣ ⎦ ⎣ ⎦⎪ ⎪⎩ ⎭∫
2ln ( | ) ln ( | )
( )d f x d f x
I E Vard d
θ θθθ θ
⎧ ⎫⎪ ⎪⎡ ⎤ ⎧ ⎫= =⎨ ⎬ ⎨ ⎬⎢ ⎥⎣ ⎦ ⎩ ⎭⎪ ⎪⎩ ⎭
2 2
2
ln ( | ) ln ( | )( ) ( | )
d f x d f xI f x dx E
d d
θ θθ θθ θ
∞
−∞
⎧ ⎫⎡ ⎤⎪ ⎪⎡ ⎤= =− ⎨ ⎬⎢ ⎥⎢ ⎥⎣ ⎦ ⎪ ⎪⎣ ⎦⎩ ⎭∫
(1)
(2)
2ln ( | ) ln ( | )
( )d f x d f x
I E Vard d
θ θθθ θ
⎧ ⎫⎪ ⎪⎡ ⎤ ⎧ ⎫= =⎨ ⎬ ⎨ ⎬⎢ ⎥⎣ ⎦ ⎩ ⎭⎪ ⎪⎩ ⎭
∫ ∫
∫
∫
∞
∞−
∞
∞−
∞
∞−
∞
∞−
=
==
=
dxxfxfd
xdfdx
d
xdfd
ddx
d
xdf
dxxf
)|()|(
1)|()|(
01)|(
1)|(
θθθ
θ
θ
θθθ
θ
θ
0)|(ln
)|()|(ln
=⎭⎬⎫
⎩⎨⎧=
=∫∞
∞−
θθ
θθ
θ
dxfd
E
dxxfd
xfd
2 2
2
ln ( | ) ln ( | )( ) ( | )
d f x d f xI f x dx E
d d
θ θθ θθ θ
∞
−∞
⎧ ⎫⎡ ⎤⎪ ⎪⎡ ⎤= =− ⎨ ⎬⎢ ⎥⎢ ⎥⎣ ⎦ ⎪ ⎪⎣ ⎦⎩ ⎭∫
0)|()|(ln)|(ln
)|()|(
1)|(ln)|(ln
)|()|(ln)|(
)|(ln
)|()|(ln
tingdiffrentia
2
2
2
2
2
2
2
=⎥⎥⎦
⎤
⎢⎢⎣
⎡
⎭⎬⎫
⎩⎨⎧+=
⎥⎦
⎤⎢⎣
⎡+=
⎥⎦
⎤⎢⎣
⎡+
∫
∫
∫
∫
∞
∞−
∞
∞−
∞
∞−
∞
∞−
dxxfd
xfdd
xfd
dxxfxfd
xfdd
xfd
dxdxdf
dxfd
xfd
xfd
dxxfd
xfd
θθ
θθ
θ
θθθ
θθ
θ
θθ
θθθ
θθ
θθ
θ
Define the Fisher information for an i.i.d. sample
[ ]
1 2,
21 2
2
2
1 22
22 21 2
2 2
, i.i.d. sample from p.d.f ( | )
ln ( , , , X | )( )
ln ( | ) ln ( | ) ln ( | )
lln ( | ) ln ( | )
n
nn
n
X , X X f x
d f X XI E
d
dE f X f X f X
d
dd f X d f XE E E
d d
θ
θθ
θ
θ θ θθ
θ θ
θ θ
⋅⋅⋅
⎧ ⎫⋅⋅⋅= − ⎨ ⎬
⎩ ⎭
⎧ ⎫= − + +⎨ ⎬
⎩ ⎭
⎧ ⎫ ⎧ ⎫= − − − −⎨ ⎬ ⎨ ⎬
⎩ ⎭ ⎩ ⎭
L
L 2
n ( | )
( ) ( ) ( ) ( )
nf Xd
I I I nI
θθ
θ θ θ θ
⎧ ⎫⎨ ⎬⎩ ⎭
= + + =L
MLE (Continued)
1 2p.d.f. of an r.v. is ( | ), where ( , , , )
information matrix of , ( ), is given by
ln ( | ) ln ( | ) ( )
k
iji j
X f x
I
f x f xI E
E
θ θ θ θ θθ θ
θ θθθ θ
=
⎧ ⎫⎡ ⎤⎡ ⎤∂ ∂⎪ ⎪= ⎢ ⎥⎨ ⎬⎢ ⎥∂ ∂⎢ ⎥⎣ ⎦⎪ ⎪⎣ ⎦⎩ ⎭
∂=−
L
2 ln ( | )
i j
f x θθ θ
⎧ ⎫⎪ ⎪⎨ ⎬
∂ ∂⎪ ⎪⎩ ⎭
• Generalization of the Fisher information for
k-dimensional vector parameter
MLE (Continued)
• Cramér-Rao Lower Bound
A random sample X1, X2, …, Xn from p.d.f f(x|
θ).
Let be any estimator of θ with where B(θ) is the bias of If B(θ) is differentiable in θ and if certain regularity conditions holds, then
(Cramér-Rao inequality)
The ratio of the lower bound to the variance of any estimator of θ is called the efficiency of the estimator.
An estimator has efficiency = 1 is called the efficient estimator.
θ̂ ),()ˆ( θθθ BE +=.̂θ
[ ]21 ( )ˆ( )( )
BVar
nI
θθ
θ
′+≥
MLE (Continued)
Large sample inference on unknown parameter θ
estimate
100(1-α)% CI for θ
)(1)( ˆ
θθ
nIVar =
2
21 ˆ
ln ( | )1ˆ( )n
i
i
d f XI
dn θ θ
θθθ= =
⎡ ⎤=− ∑⎢ ⎥
⎣ ⎦
)ˆ(
1ˆ)ˆ(
1ˆ22 θ
θθθ
θ αα
nIz
nIz +≤≤−
15.1.4 Large Sample Inference Based on the MLE’s
15.1.4 Delta Method for Approximating the Variance of an Estimator Delta method
estimate a nonlinear function h(θ)
suppose that and is a known function of θ.
using
ˆexpand ( ) around using first-order taylor seriesh θ θ
ˆ( )E θ θ; ˆ( )Var θ
)()ˆ()()ˆ( θθθθθ hhh ′−+≅
ˆ( ) 0,E θ θ− ; [ ] [ ] )ˆ()()ˆ( 2 θθθ VarhhVar ′≅
15.2 Likelihood Ratio
Tests
15.2 Likelihood Ratio TestsThe last section presented an inference for pointwise
estimation based on likelihood theory. In this section, we
present a corresponding inference for testing hypotheses.
Let be a probability density function where is a real
valued parameter taking values in an interval that could be
the whole real line. We call the parameter space. An
alternative hypothesis will restrict the parameter to some
subset of the parameter space . The null hypothesis is
then the complement of with respect to .
€
f (x;θ )
€
θ
€
θ
€
Θ
€
Θ
€
H1
€
Θ1
€
H 0
€
Θ
€
θ
We will test versus on the
basis of the random sample from
. If the null hypothesis holds, we
would expect the likelihood
to be
relatively large, when evaluated at the
prevailing value . Consider the ratio of
two likelihood functions, namely
Note that , but if is true should be close to 1; while if is true, should be smaller. For a specified significance level , we have the decision rule, reject in favor of if , where c is such that
This test is called the likelihood ratio test.
versus , where is a specified value.
• Consider the two-sided hypothesis
01 : θθ ≠H
nXXX ,....,, 21
0θ
0θ
0H 1H);( θxf
∏=
=n
iixfL
1
);()( θθ
)ˆ(
)( 0
θθ
λL
L=
1≤λ 0H λ1H λ
α 0H1H c≤λ ][
0cP ≤= λα θ
00 : θθ =H
Let be a random sample of size n from a normal distribution with known variance. Obtain the likelihood ratio for testing versus .
Example 1nXXX ,....,, 21
00 : μμ =H 01 : μμ ≠H
( ) 2
2
2
)(
1 212
1,.....,/ σ
μ
πσμ
−−
=∏=ix
n
in eXXL2
1
2
2
)(
22 )2( σ
μ
πσ∑
==
−−−
n
iix
n
e
)(ln μL2
2
2
2)()2ln(
2 σμ
πσ ∑ −−
−= ixn
.0)(
)(ln2
=−
=∂∂ ∑
σμ
μμ
ixL x=μ̂ So is a maximum since
22
2 1)(ln
σμ
μ−
=∂∂
L x=μ̂ μ< 0 . Thus is the MLE of .
22
2)(
22
22
2)0(
22
)2((
)2(0
)ˆ(
)(
σ
σ
μ
πσ
πσ
μμ
λ ∑
∑==
−−−
−−−
xixn
ixn
e
e
LL
22
2)(2)0(
σ
μ∑ −−−−
=
xixix
e22
2)(2)]0()[(
σ
μ∑ −−−+−−
=
xixxxix
e
22
2)(2)0()0)((22)(
σ
μμ xixxxxixxix
e
−−−+−−+−−∑=
22
2)0(
σ
μ∑ −−
=
x
e22
2)0(
σ
μ−−
=
xn
e2
20z
e
−
=
c≤λthus is equivalent to .
Example 1 (continued)
€
e
− z02
2 c≤, or
€
z0
2≥ c
*
So α=⎟⎟⎠⎞⎜⎜⎝⎛≥**0czP thus 2/**αzc=
Let be a random sample from a Poisson distribution
with mean >0. a. Show that the likelihood ratio test of versus is based upon the statistic .
Obtain the null distribution of Y.
Example 2nXXX ,....,, 21
θ00 : θθ =H
01 : θθ ≠H ∑= ixY
( ) == −=∏ θθθ ex
Ln
ii
xi
1 ! ∏−∑
!i
n
x
eix θθ
∑ ∑−−= !lnln)(ln ii xnxL θθθ
0)(
)(ln =−=∂∂ ∑ n
xL i
θθ
θ
x=θ̂
0ˆ
ˆˆ1
|)(
)(ln2ˆ22
2
<−
=⋅−
=−
=∂∂
=∑
θθ
θθθ
θ θθ
nn
xL i θ̂ θ
So is a maximum since
thus is the mle of
)ˆ(
)( 0
θθ
λL
L=
∏
∏−
−
∑
∑
!
ˆ
!ˆ
00
i
n
i
n
xe
xe
ix
ix
θ
θ
θ
θ
0ˆ0
ˆθθ
θθ nn
x
ei
−∑
⎟⎠
⎞⎜⎝
⎛
00 θθ nxx
i
i
i
ex
n −∑∑
⎟⎟⎠
⎞⎜⎜⎝
⎛
∑
∑ ix 0H
nXXX ,....,, 21 )(~)(~ 00 θθ nPoissonYPoisson ⇒
The likelihood ratio test statistic is:
= =
=
And it’s a function of Y = . Under
,
Example 2 (continued)
b. For = 2 and n = 5, find the significance level of the test that rejects if or .
0θ0H 4≤y 17≥y
Example 2 (continued)
)16(1)4()17()4(0000
≤−+≤=≥+≤= YPYPYPYP HHHHα
056.973.1029. =−+=α
The null distribution of Y is Poisson(10).
. So we take the maximum of the likelihood over
Since the null hypothesis is composite, it isn’t certain which value of the parameter(s) prevails even under
The likelihood ratio approach has to be modified slightly when the null hypothesis is composite. When testing the null hypothesis 00 : μμ =H
2σ}0,:),{( 22 ∞<<∞<<−∞=Θ σμσμ 2R
}0,:),{( 20
20 ∞<<==Θ σμμσμ
0H
0Θ
)ˆ(max
)(max
0
0 0
θθ
λθ
θ
L
L
Θ∈
Θ∈=
concerning a normal mean when is unknown, the parameter space
is a subset of
The null hypothesis is composite and
The generalized likelihood ratio test statistic is defined as
Composite Null Hypothesis
be a random sample of size n from a normal distribution with unknown mean and variance. Obtain the likelihood ratio test statistic for testing
Example 3 nXXX ,....,, 21
00 : σσ =H01 : σσ ≠H
Let
versus
),( 20σμθ = },{ 2
02
0 σσμ =∞<<−∞=Θ
x=μ̂
( ) 2
2
2
)(
1 212
2
1,.....,/, σ
μ
πσσμ
−−
=∏=ix
n
in eXXL 21
2
2
)(
22 )2( σ
μ
πσ∑=
−−−
n
iix
n
e
In Example 1, we found the unrestricted mle:
Now
=
∑ ∑ −<−= =
n
i
n
iii xxx
1 1
22 )()( μ );,();,( 22 xLxxL σμσ >
2σ ).;,( 2 xxL σ
Since
,
we only need to find the value of maximizing
0)ˆ(2
|)(
2 4ˆ6
2
4 22 <−
=−
−=
∑σσσ σσ
nxxn i
=),(ln 2σμL2
2
2
2)()2ln(
2 σμ
πσ ∑ −−
− ixn
02
)(
2),(ln
4
2
22
2=
−+
−=
∂∂ ∑
σσσ
σxxn
xL i
n
xxi∑ −=
22 )(
σ̂
=∂∂
),(ln)(
222
2
σσ
xL
x=μ̂ .μ
n
xxi∑ −=
22 )(
σ̂ 2σ
So is a maximum since
Thus is the MLE of
Thus is the MLE of
.
We can also write n
sn
n
xxi22
2 )1()(ˆ
−=
−=∑σ
Example 3 (continued)
222
)1(2
)1(22
)1(2
)(
22
)1(2
)1(2)1(2)ˆ(
2
2
21
2
nn
sn
snnn
sn
xxnn
en
sn
en
sne
n
snL
n
ii
−−
−−−
−
−
−−−
⎥⎦
⎤⎢⎣
⎡ −=
⎥⎦
⎤⎢⎣
⎡ −=
∑
⎥⎦
⎤⎢⎣
⎡ −=
=
π
ππθ
2
2
2
)1(
220 )2()ˆ( o
snn
o eL σπσθ−−−
=
==)ˆ(
)( 0
θ
θλ
L
L
Example 3 (continued)
c≤λ ][0
cPH ≤= λα
keuun
22
−
=λ )1(~)1( 22
2
−−
= nsn
uo
χσ
22)(un
euuh−
=
0,)(2
1
2
1
2)(
21
2
2221
2'
==⇒−=
−=
−−
−−−
unuuneu
eueun
uh
un
unun
c≤λ 1cu ≤ 2cu ≥
αχ −=<< − 1)( 22
110ccP nH
Rejection region: , such that
so where
define and
So implies or
where
Example 3 (continued)
15.3 : Bayesian Inference
Thomas Bayes (pictured above) was a Presbyterian minister and a mathematician born in London who developed a special case of Bayes’ theorem which was published and studied after his death.
Bayesian inference refers to a statistical inference where new facts are presented and used draw updated conclusions on a prior belief. The term ‘Bayesian’ stems from the well known Bayes Theorem which was first derived by Reverend Thomas Bayes.
Thomas Bayes (c. 1702 – April 17, 1761)Source: www.wikipedia.com
Bayes’ Theorem (review): (15.1)f (A|B) = f (A ∩ B) / f (B) = f (B | A) f (A) / f(B)
since, f (A ∩ B)= f (B ∩ A) = f (B | A) f (A)
Some Key Terms in Bayesian Inference…•prior distribution – probability tendency of an uncertain quantity, θ, that expresses previous knowledge of θ from, for example, a past experience, with the absence of some proof
•posterior distribution – this distribution takes proof into account and is then the conditional probability of θ. The posterior probability is computed from the prior and the likelihood function using Bayes’ theorem.
•posterior mean – the mean of the posterior distribution
•posterior variance – the variance of the posterior distribution
•conjugate priors - a family of prior probability distributions in which the key property is that the posterior probability distribution also belongs to the family of the prior probability distribution
…in plain English
15.3.1 Bayesian EstimationSo far we’ve learned that the Bayesian approach treats θ as a random variable and then data is used to update the prior distribution to obtain the posterior distribution of θ. Now lets move on to how we can estimate parameters using this approach.(Using text notation)
Let θ be an unknown parameter based on a random sample, x1, x2, …, xn from a distribution with pdf/pmf f (x | θ).
Let π (θ) be the prior distribution of θ.
Let π *(θ | x1, x2, …, xn) be the posterior distribution.
**Note that π *(θ | x1, x2, …, xn) is the condition distribution of θ given the observed data, x1, x2, …, xn.
If we apply Bayes Theorem (Eq. 15.1), our posterior distribution becomes:f (x1, x2, …, xn | θ)
π(θ) f (x1, x2, …, xn | θ)π(θ)=f (x1, x2, …, xn | θ) π(θ)
f *(θ | x1, x2, …, xn)
dθ
*Note that f *(θ | x1, x2, …, xn) is the marginal PDF of X1, X2, …,Xn
(15.2)
http://www.stat.berkeley.edu/users/rice/Stat135/Bayes.pdf
☺
So, to get a better idea of the posterior distribution, we note that:
Bayesian Estimation (continued)As seen in equation 15.2, the posterior distribution represents what is known about θ after observing the data X = x1, x2, …, xn . From earlier chapters, we know that the likelihood of a variable θ is f (X | θ) .
posterior distribution likelihood x prior distribution
i.e. π *(θ | X) f (X | θ) x π (θ)
For a detailed practical example of deriving the posterior mean and using Bayesian estimation, visit:
Example 15.26Let x be the number of successes from n i.i.d. Bernoulli trials with unknown success probability p=θ. Show that the beta distribution is a conjugate prior on θ.
€
f (x |θ)π (θ) = f (x,θ)
€
f (x) = f (x,θ)dθ−∞
∞
∫ = f (x |θ)π (θ)dθ−∞
∞
∫
★
★
★Goal
€
π *(θ) = π (θ | x) =f (x,θ)
f (x)=
f (x |θ)π (θ)
f (x |θ)π (θ)dθ∫
€
π (θ) =Γ(a+ b)
Γ(a)Γ(b)θ a−1(1−θ)b−1
Example 15.26 (continued)X has a binominal distribution of n and p= θ
€
f (x |θ) = (xn )θ x (1−θ)n−x
Prior distribution of θ is the beta distribution
€
f (x,θ) = f (x |θ)π (θ) = (xn )
Γ(a+ b)
Γ(a)Γ(b)θ a−1(1−θ)n−x+b−1
€
f (x) = f (x,θ)dθ = (xn )
Γ(a+ b)
Γ(a)Γ(b)
Γ(a+ x)Γ(n + b− x)
Γ(n + a+ b)0
1
∫
x=1,2…,n
0≤ θ ≥1
Example 15.26 (continued)
€
π *(θ) = π (θ | x) =f (x,θ)
f (x)
=Γ(n + a+ b)
Γ(x + a)Γ(n − x + b)θ x+a−1(1−θ)n−x+b−1
It is a beta distribution with parameters (x+a) and (n-x+b)!!
Notes:1. The parameters a and b of the prior distribution may be interpreted as prior successes and prior failures, with m=a+b being the total number of prior observations. After actually observing x successes and n-x failures in n i.i.d Bernoulli trials, these parameters are updated to a+x and b+n-x, respectively.
2. The prior and posterior means are, respectively,
€
a
mand
€
a+ x
m + n
15.3.2 Bayesian Testing
€
H0 :θ =θ0
Ha :θ =θa
Assumption:
€
π 0* = π *(θ0) = P(θ =θ0 | x)
€
π a* = π *(θa ) = P(θ =θa | x)
€
π 0* + π a
* =1
If
€
π1*
π 0*
> k, we reject in favor of .
€
H0
€
Ha
Where k >0 is a suitably chosen critical constant.
Abraham Wald
(1902-1950)
was the founder of
Statistical decision theory.
His goal was to
provide a unified
theoretical framework
for diverse problems.
i.e. point estimation,
confidence interval
estimation and hypothesis testing.
Source: http://www-history.mcs.st-andrews.ac.uk/history/PictDisplay/Wald.html
The goal: is to choose a decision d from a set of possible decisions D, based on a sample outcome (data) x
Decision space is D
Sample space: the set of all sample outcomes denoted by x
Decision Rule: δ is a function δ(x) which assigns to every sample outcome x є X, a decision d є D.
Statistical Decision Problem
Continued… Denote by X the R.V. corresponding to x and the
probability distribution of X by f (x|θ).
The above distribution depends on an unknown parameter θ belonging to a parameter space Θ
Suppose one chooses a decision d when the true parameter is θ, a loss of L (d, θ) is incurred also known as the loss function.
The decision rule is assessed by evaluating its expected loss called the risk function:
R(δ, θ) = E[L(δ(X),θ)] = ∫xL(δ(X),θ) f (x|θ)dx.
Example Calculate and compare the
risk functions for the squared error loss of two estimators of success probability p from n i.i.d. Bernoulli trials. The first is the usual sample proportion of successes and the second is the bayes estimator from Example 15.26:
ṗ1 = X/n
and
ṗ2 = a + X/ m + n
Von Neumann (1928): Minimax
Source:http://jeff560.tripod.com/
How Minimax Works
Focuses on risk avoidance
Can be applied to both zero-
sum and non-zero-sum games
Can be applied to multi-stage
games
Can be applied to multi-person
games
Classic Example: The Prisoner’s Dilemma
Each player evaluates his/her alternatives, attempting to minimize his/her own risk
From a common sense standpoint, a sub-optimal equilibrium results
Prisoner B Stays Silent
Prisoner B Betrays
Prisoner A Stays Silent
Both serve six months
Prisoner A serves ten
years
Prisoner B goes free
Prisoner A
Betrays
Prisoner A goes free
Prisoner B serves ten
years
Both serve two years
Classic example: With Probabilities
When disregarding the probabilities when playing the game, (D,B) is the equilibrium point under minimax
With probabilities (p=q=r=1/4), player one will choose B. This is…
Two player game with simultaneous moves, where the probabilities with which player two acts are known to both players.
2
1Action A
[P(A)=p]
Action B
[P(B)=q]
Action C
[P(C)=r]
Action D
[P(D)=1-p=q=r]
Action A -1 1 -2 4
Action B -2 7 1 1
Action C 0 -1 0 3
Action D 1 0 2 3
…how Bayes works
View {(pi,qi,ri)} as
θi where i=1 in the
previous example
Letting i=[1,n] we get a much better idea of what Bayes meant by “states of nature” and how probabilities of each state enter into one’s strategy
ConclusionWe covered three theoretical approaches in our
presentation
Likelihood provides statistical justification for many of the
methods used in statistics MLE - method used to make inferences about parameters of
the underlying probability distribution of a given data set
Bayesian and Decision Theory paradigms used in statistics
Bayesian Theory probabilities are associated with individual event or statements rather than with sequences of events
Decision Theory Describe and rationalize the process of decision making, that is, making a choice of among several possible alternatives
Source: http://www.answers.com/maximum%20likelihood, http://www.answers.com/bayesian%20theory, http://www.answers.com/decision%20theory
The End Any questions for the group?