Introduction to Bayesian Methodsbmallick/Bayes1.pdf · Introduction We develop the Bayesian...

38
Introduction to Bayesian Methods Introduction to Bayesian Methods – p.1/??

Transcript of Introduction to Bayesian Methodsbmallick/Bayes1.pdf · Introduction We develop the Bayesian...

Page 1: Introduction to Bayesian Methodsbmallick/Bayes1.pdf · Introduction We develop the Bayesian paradigm for parametric inference. To this end, suppose we conduct (or wish to design)

Introduction to Bayesian Methods

Introduction to Bayesian Methods – p.1/??

Page 2: Introduction to Bayesian Methodsbmallick/Bayes1.pdf · Introduction We develop the Bayesian paradigm for parametric inference. To this end, suppose we conduct (or wish to design)

IntroductionWe develop the Bayesian paradigm for parametric inference.To

this end, suppose we conduct (or wish to design) a study, in

which the parameterθ is of inferential interest. Hereθ may be

vector valued. For example,

1. θ = difference in treatment means

2. θ = hazard ratio

3. θ = vector of regression coefficients

4. θ = probability a treatment is effective

Introduction to Bayesian Methods – p.2/??

Page 3: Introduction to Bayesian Methodsbmallick/Bayes1.pdf · Introduction We develop the Bayesian paradigm for parametric inference. To this end, suppose we conduct (or wish to design)

IntroductionIn parametric inference, we specify a parametric model for the

data, indexed by the parameterθ. Letting x denote the data, we

denote this model (density) byp(x|θ). The likelihood function of

θ is any function proportional top(x|θ), i.e.,

L(θ) ∝ p(x|θ).

Example

Supposex|θ Binomial(N, θ). Then

p(x|θ) =

(

N

x

)

θx(1 − θ)N−x,

x = 0, 1, ..., N.

Introduction to Bayesian Methods – p.3/??

Page 4: Introduction to Bayesian Methodsbmallick/Bayes1.pdf · Introduction We develop the Bayesian paradigm for parametric inference. To this end, suppose we conduct (or wish to design)

IntroductionWe can take

L(θ) = θx(1 − θ)N−x.

The parameterθ is unknown. In the Bayesian mind-set, we

express our uncertainty about quantities by specifying

distributions for them. Thus, we express our uncertainty aboutθ

by specifying aprior distribution for it. We denote the prior

density ofθ by π(θ). The word "prior" is used to denote that it is

the density ofθ before the data x is observed. By Bayes theorem,

we can construct the distribution ofθ|x, which is called the

posterior distribution of θ . We denote the posterior distribution

of θ by p(θ|x).

Introduction to Bayesian Methods – p.4/??

Page 5: Introduction to Bayesian Methodsbmallick/Bayes1.pdf · Introduction We develop the Bayesian paradigm for parametric inference. To this end, suppose we conduct (or wish to design)

IntroductionBy Bayes theorem,

p(θ|x) =p(x|θ)π(θ)

Θ

p(x|θ)π(θ)dθ

whereΘ denotes the parameter space ofθ. The quantity

p(x) =

Θ

p(x|θ)π(θ)dθ

is the normalizing constant of the posterior distribution.For most

inference problems, p(x) does not have a closed form. Bayesian

inference aboutθ is primarily based on the posterior distribution

of θ , p(θ|x).

Introduction to Bayesian Methods – p.5/??

Page 6: Introduction to Bayesian Methodsbmallick/Bayes1.pdf · Introduction We develop the Bayesian paradigm for parametric inference. To this end, suppose we conduct (or wish to design)

IntroductionFor example, one can compute various posterior summaries,

such as the mean, median, mode, variance, and quantiles. For

example, the posterior mean ofθ is given by

E(θ|x) =

Θ

θp(θ|x)dθ

Example 1 Given θ, suppose x1, x2, ..., xn are i.i.d.

Binomial(1,θ), and θ ∼ Beta(α, λ). The parameters of the

prior distribution are often called thehyperparameters.

Let us derive the posterior distribution ofθ. Let

x = (x1, x2, ..., xn), and thus,

Introduction to Bayesian Methods – p.6/??

Page 7: Introduction to Bayesian Methodsbmallick/Bayes1.pdf · Introduction We develop the Bayesian paradigm for parametric inference. To this end, suppose we conduct (or wish to design)

Introduction

p(x|θ) =n∏

i=1

p(xi|θ)

p(x|θ) =n∏

i=1

θxi(1 − θ)n−xi

p(x|θ) = θP

xi(1 − θ)n−P

xi

where∑

xi =∑n

i=1 xi. Also,

π(θ|x) =Γ(α + λ)

Γ(α)Γ(λ)θα−1(1 − θ)λ−1

Now, we can write thekernel of the posterior density as

Introduction to Bayesian Methods – p.7/??

Page 8: Introduction to Bayesian Methodsbmallick/Bayes1.pdf · Introduction We develop the Bayesian paradigm for parametric inference. To this end, suppose we conduct (or wish to design)

Introduction

p(θ|x) ∝ θP

xiθα−1(1 − θ)n−P

xi(1 − θ)λ−1

= θP

xi+α−1(1 − θ)n−P

xi+λ−1

Thusp(θ|x) ∝ θP

xi+α−1(1 − θ)n−P

xi+λ−1. We can recognize

this kernel as a beta kernel with paramters

(∑

xi + α, n −∑

xi + λ) . Thus,

θ|x ∼ Beta(

xi + α, n −∑

xi + λ)

and therefore

p(θ|x) =Γ(α + n + λ)

Γ(P

xi + α)Γ(n −P

xi + λ)× θ

P

xi+α−1(1 − θ)n−

P

xi+λ−1.

Introduction to Bayesian Methods – p.8/??

Page 9: Introduction to Bayesian Methodsbmallick/Bayes1.pdf · Introduction We develop the Bayesian paradigm for parametric inference. To this end, suppose we conduct (or wish to design)

IntroductionRemark In deriving posterior densities, an often used technique

is to try and recognize the kernel of the posterior density ofθ.

This avoids direct computation ofp(x). This technique saves lots

of time in derivation. If the kernel cannot be recognized, then

p(x) must be computed directly.

In this example we have

p(x) = p(x1, ..., xn)

∝∫ 1

0

θP

xi+α−1(1 − θ)n−P

xi+λ−1.

=Γ(∑

xi + α)Γ(n −∑

xi + λ)

Γ(α + n + λ)

Introduction to Bayesian Methods – p.9/??

Page 10: Introduction to Bayesian Methodsbmallick/Bayes1.pdf · Introduction We develop the Bayesian paradigm for parametric inference. To this end, suppose we conduct (or wish to design)

IntroductionThus

p(x1, ..., xn) =Γ(α + λ)

Γ(α)Γ(λ)

Γ(∑

xi + α)Γ(n −∑xi + λ)

Γ(α + n + λ)

for xi = 0, 1, andi = 1, ..., n.

SupposeA1, A2, ... are events such thatAi

Aj = φ and⋃∞

i=1 = Ω, whereΩ denotes thesample space. Let B denote an

event inΩ. Then Bayes theorem for events can be written as

p(Ai|B) =P (B|Ai)P (Ai)

∑∞i=1 P (B|Ai)P (Ai)

Introduction to Bayesian Methods – p.10/??

Page 11: Introduction to Bayesian Methodsbmallick/Bayes1.pdf · Introduction We develop the Bayesian paradigm for parametric inference. To this end, suppose we conduct (or wish to design)

IntroductionP (Ai) is the prior probability ofAi andp(Ai|B) is the posterior

probability ofAi givenB has ocurred.

Example 2 Bayes theorem is often used in diagnostic tests for

cancer. A young person was diagnosed as having a type of cancer

that occurs extremely rarely in young people. Naturally, has was

very upset. A friend told him that it was probably a mistake. His

friend reasons as follows. No medical test is perfect: Thereare

always incidences of false positives and false negatives.

Introduction to Bayesian Methods – p.11/??

Page 12: Introduction to Bayesian Methodsbmallick/Bayes1.pdf · Introduction We develop the Bayesian paradigm for parametric inference. To this end, suppose we conduct (or wish to design)

IntroductionLet C stand for the event that he has cancer and let+ stand for

the event that an individual responds positively to the test.

AssumeP (C) = 1/1, 000, 000 = 10−6 andP (+|Cc) = .01. (So

only one per million people his age have the disease and the test

is extremely god relative to most medical tests, giving only1%

false positives and 1% false negatives). Find the probability that

he has cancer given that he has a positive response. (After you

make this calculation you will not be surprised to learn thathe

did not have cancer.)

P (C|+) =P (+|C)P (C)

P (+|C)P (C) + P (+|Cc)P (Cc)

P (C|+) =(.99)(10−6)

(.99)(10−6) + (.01)(.999999)Introduction to Bayesian Methods – p.12/??

Page 13: Introduction to Bayesian Methodsbmallick/Bayes1.pdf · Introduction We develop the Bayesian paradigm for parametric inference. To this end, suppose we conduct (or wish to design)

Introduction

P (C|+) =00000099

.01000098= .00009899

Example 3Supposex1, ..., xn is a random sample fromN(µ, σ2).i) Supposeσ2 is known andµ ∼ N(µo, σ

2o). The posterior

density ofµ is given by:

P (µ|x) ∝

nY

i=1

p(xi|µ, σ2)

!

π(µ)

exp

−1

2σ2

X

(xi − µ)2ff«

×

exp

−1

2σ2o

(µ − µo)2ff«

Introduction to Bayesian Methods – p.13/??

Page 14: Introduction to Bayesian Methodsbmallick/Bayes1.pdf · Introduction We develop the Bayesian paradigm for parametric inference. To this end, suppose we conduct (or wish to design)

Introduction

∝ exp

−1

2

(

nσ2o + σ2

σ2oσ

2

)

µ2 + 2µ

(

σ2o

xi + µoσ2

2σ2oσ

2

)

= exp

−1

2

(

nσ2o + σ2

σ2oσ

2

)[

µ2 − 2µ

(

σ2o

xi + µoσ2

nσ2o + σ2

)]

∝ exp

−1

2

(

nσ2o + σ2

σ2oσ

2

)[

µ −(

σ2o

xi + µoσ2

nσ2o + σ2

)]2

We can recognize this as a normal kernel with mean

µpost = σ2o

P

xi+µoσ2

nσ2o+σ2 and varianceσ2

post = (nσ2o+σ2

σ2oσ2 )−1 = σ2

oσ2

nσ2o+σ2

Thus

µ|x ∼ N

(

σ2o

xi + µoσ2

nσ2o + σ2

,σ2

oσ2

nσ2o + σ2

)

.

Introduction to Bayesian Methods – p.14/??

Page 15: Introduction to Bayesian Methodsbmallick/Bayes1.pdf · Introduction We develop the Bayesian paradigm for parametric inference. To this end, suppose we conduct (or wish to design)

Introductionii) Supposeµ is known andσ2 is unknown. Letτ = 1/σ2. τ is

often called theprecisionparameter. Suppose

τ ∼ gamma( δo

2, γo

2). Thus

π(τ) ∝ τδo2−1exp

(

−τγo

2

)

Let us derive the posterior distribution ofτ .

p(τ |x) ∝ τn/2exp

−τ

2

(xi − µ)2

τδo2−1exp

−τγo

2

p(τ |x) ∝ τn+δo

2−1exp

−τ

2(γo +

(xi − µ)2)

Introduction to Bayesian Methods – p.15/??

Page 16: Introduction to Bayesian Methodsbmallick/Bayes1.pdf · Introduction We develop the Bayesian paradigm for parametric inference. To this end, suppose we conduct (or wish to design)

IntroductionThus

τ |x ∼ gamma

(

n + δo

2,γo +

(xi − µ)2

2

)

iii) Now supposeµ andσ2 are both unknown. Suppose we

specify the joint prior

π(µ, τ) = π(µ|τ)π(τ)

where

µ|τ ∼ N(µo, τ−1σ2

o)

τ ∼ gamma

(

δo

2,γo

2

)

Introduction to Bayesian Methods – p.16/??

Page 17: Introduction to Bayesian Methodsbmallick/Bayes1.pdf · Introduction We develop the Bayesian paradigm for parametric inference. To this end, suppose we conduct (or wish to design)

IntroductionThe joint posterior density of (µ, τ ) is given by

∝(

τn/2exp

−τ

2

(xi − µ)2)

×(

τ1

2 exp

− τ

2σ2o

(µ − µo)2

)

×(

τ δo/2−1exp

−τγo

2

)

= τn+δo+1

2−1exp

−τ

2

(

γo +(µ − µo)

2

σ2o

+∑

(xi − µ)2

)

The joint posterior does not have a clear recognizable form.Thus,

we need to computep(x) by brute force.

Introduction to Bayesian Methods – p.17/??

Page 18: Introduction to Bayesian Methodsbmallick/Bayes1.pdf · Introduction We develop the Bayesian paradigm for parametric inference. To this end, suppose we conduct (or wish to design)

Introduction

p(x) ∝

Z

0

Z

−∞

τn+δo+1

2−1exp

−τ

2

γo +(µ − µo)2

σ2o

+X

(xi − µ)2«ff

dµdτ

Z

0

Z

−∞

τn+δo+1

2−1 × exp−

τ

2(γo + µ2(n + 1/σ2

o) − 2µ(X

xi + µo/σ2o) +

(µ2o/σ2

o +X

x2i )dµτ

=

„Z

0

τn+δo+1

2−1exp

n

−τ

2(γo + µ2

o/σ2o +

X

x2i )o

«

×

„Z

−∞

expn

−τ

2

µ2(n + 1/σ2o) − 2µ(

X

xi + µo/σ2o)”o

«

Introduction to Bayesian Methods – p.18/??

Page 19: Introduction to Bayesian Methodsbmallick/Bayes1.pdf · Introduction We develop the Bayesian paradigm for parametric inference. To this end, suppose we conduct (or wish to design)

IntroductionThe integratal with respect toµ can be evaluated by completing

the square.

∫ ∞

0

exp

−τ(n + σ−2o )2

2

[

µ − (∑

xi + µoσ−2o )

(n + σ−2o )

]2

×exp

τ(∑

xi + µoσ−2o )2

2(n + σ−2o )

= exp

τ(∑

xi + µoσ−2o )2

2(n + σ−2o )

(2π)1/2τ−1/2(n + σ−2o )−1/2

Introduction to Bayesian Methods – p.19/??

Page 20: Introduction to Bayesian Methodsbmallick/Bayes1.pdf · Introduction We develop the Bayesian paradigm for parametric inference. To this end, suppose we conduct (or wish to design)

IntroductionNow we need to evaluate

∫ ∞

0

(2π)1/2(n + σ−2o )−1/2τ−1/2τ

n+δo/2−1

2−1

×exp

−τ

2[γo + µ2

o/σ2o +

x2i ]

×exp

τ

2[(∑

xi + µo/σ2o)

2

(n + 1/σ2o)

]

= (2π)1/2(n + σ−2o )−1/2

∫ ∞

0

τn+δo/2−1

2−1

×exp

−τ

2

[

γo + µ2o/σ

2o +

x2i −

(∑

xi + µo/σ2o)

2

(n + 1/σ2o)

]

Introduction to Bayesian Methods – p.20/??

Page 21: Introduction to Bayesian Methodsbmallick/Bayes1.pdf · Introduction We develop the Bayesian paradigm for parametric inference. To this end, suppose we conduct (or wish to design)

Introduction

=(2π)1/2Γ

(

n+δo

2

)

(n + 1/σ2o)

− 1

2

[

12

(

γo + µ2o/σ

2o +

x2i − (

P

xi+µo/σ2o)2

(n+1/σ2o)

)]n+δo

2

=(2π)1/2Γ

(

n+δo

2

)

2n+δo

2 (n + 1/σ2o)

− 1

2

[

γo + µ2o/σ

2o +

x2i − (

P

xi+µo/σ2o)2

(n+1/σ2o)

]n+δo

2

≡ p∗(x)

Thus,

p(x) =

(

(2π)−(n+1)/2σ−1o

(γo

2)δo/2

Γ( δo

2)

)

p∗(x)

Introduction to Bayesian Methods – p.21/??

Page 22: Introduction to Bayesian Methodsbmallick/Bayes1.pdf · Introduction We develop the Bayesian paradigm for parametric inference. To this end, suppose we conduct (or wish to design)

IntroductionThe joint posterior density of(µ, τ |x) can also be obtained in this

case by derivingp(µ, τ |x) = p(µ|τ |x)p(τ |x).

Exercise:Findp(µ|τ |x) andp(τ |x).

It is of great interest to find the marginal posterior distributions

of µ andτ .

p(µ|x) =

∫ ∞

0

p(µ, τ |x)dτ

∝∫ ∞

0

τn+δ0+1

2−1exp

−τ

2

[

γo + µ2o/σ

2o +

x2i

]

×exp

−τ

2

[

µ2(n + 1/σ2o) − 2µ(

xi + µo/σ2o)]

Introduction to Bayesian Methods – p.22/??

Page 23: Introduction to Bayesian Methodsbmallick/Bayes1.pdf · Introduction We develop the Bayesian paradigm for parametric inference. To this end, suppose we conduct (or wish to design)

Introduction

=

∫ ∞

0

τn+δ0+1

2−1exp

−τ

2

[

γo + µ2o/σ

2o +

x2i

]

×exp

−τ(n + 1/σ2o)

2

[

(

µ −∑

xi + µo/σ2o

n + 1/σ2o

)2]

×exp

τ

2

[

(∑

xi + µo/σ2o)

2

n + 1/σ2o

]

Let a = (∑

xi+µo/σ2

o)

(n+1/σ2o) . Then, we can write the integral

as

Introduction to Bayesian Methods – p.23/??

Page 24: Introduction to Bayesian Methodsbmallick/Bayes1.pdf · Introduction We develop the Bayesian paradigm for parametric inference. To this end, suppose we conduct (or wish to design)

Introduction

=

Z

0

τn+δ0+1

2−1

×expn

−τ

2

h

γo + µ2o/σ2

o +X

x2i + (n + 1/σ2

o)(µ − a)2 − (n + 1/σ2o)a2

io

=Γ“

n+δ0+1

2

2n+δ0+1

2

ˆ

γo + µ2o/σ2

o +P

x2i + (n + 1/σ2

o)(µ − a)2 − (n + 1/σ2o)a2

˜

»

1 +c(µ − a)2

b − ca2

n+δ0+1

2

wherec = n + 1/σ2o andb = γo + µ2

o/σ2o +

x2i . We recognize

this kernel as that of at-distribution with location parametera

and dispersion parameter(

(n+δo)cb−ca2

)−1

, andn+ δo degrees of free-

dom.

Introduction to Bayesian Methods – p.24/??

Page 25: Introduction to Bayesian Methodsbmallick/Bayes1.pdf · Introduction We develop the Bayesian paradigm for parametric inference. To this end, suppose we conduct (or wish to design)

IntroductionDefinition Let y = (y1, ..., yp)

′ be ap × 1 random vector. Theny

is said to have ap diminsional multivariatet distribution withd

degrees of freedom, location paramterm and dispersion matrix

Σp×p if y has density

p(y) =

(

Γ(

d+p2

)

(πd)−p/2|Σ|−1/2)

Γ(

d2

)

×[

1 +1

d(y − m)′Σ−1(y − m)

]− d+p2

We write this asy ∼ Sp(d, m, Σ). In our problem,p =

1, d = n + δo, m = a, Σ−1 = (n+δo)cb−ca2 , Σ =

(

(n+δo)cb−ca2

)−1

Introduction to Bayesian Methods – p.25/??

Page 26: Introduction to Bayesian Methodsbmallick/Bayes1.pdf · Introduction We develop the Bayesian paradigm for parametric inference. To this end, suppose we conduct (or wish to design)

IntroductionThe marginal distribution ofτ is give by

p(τ |y) =

Z

0

τn+δ0+1

2−1 × exp

n

−τ

2

h

γo + µ2o/σ2

o +X

x2i

io

×expn τ

2(n + 1/σ2

o)a2o

× exp

−τ(n + 1/σ2

o)

2(m − a)2

ff

∝ τn+δ0+1

2−1τ−

12 exp

n

−τ

2

h

γo + µ2o/σ2

o +X

x2i − (n + 1/σ2

o)a2io

= τn+δ0

2−1exp

n

−τ

2

h

γo + µ2o/σ2

o +X

x2i − (n + 1/σ2

o)a2io

Thus,

τ |x ∼ gamma

»

n + δ0

2,1

2

γo + µ2o/σ2

o +X

x2i − (n + 1/σ2

o)a2”

.

Introduction to Bayesian Methods – p.26/??

Page 27: Introduction to Bayesian Methodsbmallick/Bayes1.pdf · Introduction We develop the Bayesian paradigm for parametric inference. To this end, suppose we conduct (or wish to design)

IntroductionRemark A t distribution can be obtain as a scale mixture of

normals. That is, ifx|τ ∼ Np (m, τ−1Σ) andτ ∼ gamma(δo, γo),

then

p(x) =

∫ ∞

0

p(x|τ)π(τ)dτ

is theSp

(

δo,m, γo

δoΣ)

density. That is,x ∼ Sp

(

δo,m, γo

δoΣ)

Note:

p(x|τ) = (2π)−p/2τ p/2|Σ|−1/2

×exp

−τ

2(x − m)′|Σ|−1(x − m)

.

Introduction to Bayesian Methods – p.27/??

Page 28: Introduction to Bayesian Methodsbmallick/Bayes1.pdf · Introduction We develop the Bayesian paradigm for parametric inference. To this end, suppose we conduct (or wish to design)

IntroductionRemark Note that in Examples 1 and 3i),ii), the posterior

distribution is of the same family as the prior distribution. When

the posterior distributionof a paramter is of the sme familyas the

prior istribution, such prior distributions are calledconjugateprior distributions .

For example 1, a Beta prior inθ led to a Beta posterior forθ. In

example 3i), a normal prior forµ yielded a normal posterior forµ.

In example 3ii), a gamma prior forτ yielded a gamma posterior

for τ . More on conjugate priors later.

Introduction to Bayesian Methods – p.28/??

Page 29: Introduction to Bayesian Methodsbmallick/Bayes1.pdf · Introduction We develop the Bayesian paradigm for parametric inference. To this end, suppose we conduct (or wish to design)

Advantages of Bayesian Methods

1. InterpretationHaving a distribution for your unknown parameterθ is easier to

understand that a point estimate and a standard error. In addition,

we consider the following example of a confidence interval. A

95% confidence interval for a population meanθ can be written

as

x ± (1.96)s/√

n.

ThusP (a < θ < b) 6= 0.95.

Introduction to Bayesian Methods – p.29/??

Page 30: Introduction to Bayesian Methodsbmallick/Bayes1.pdf · Introduction We develop the Bayesian paradigm for parametric inference. To this end, suppose we conduct (or wish to design)

Advantages of Bayesian Methods

1. Interpretation We have to rely on a repeated sampling

interpretation to make a probability as above. Thus, after

observing the data, wecannotmake a statement like the trueθ

has a 95% chance of falling in

x ± (1.96)s/√

n.

although we are tempted to say this.

Introduction to Bayesian Methods – p.30/??

Page 31: Introduction to Bayesian Methodsbmallick/Bayes1.pdf · Introduction We develop the Bayesian paradigm for parametric inference. To this end, suppose we conduct (or wish to design)

Advantages of Bayesian Methods

2. Bayes Inference Obeys the Likelihood Principal

The likelihood principle: If two distinct sampling plans (designs)

yield proportional likelihood functions forθ, then inference about

θ should be identical from these two designs. Frequentist infer-

ence does not obey the likelihood principle, in general.

Example Suppose in 12 independent tosses of a coin, 9 heads

and 3 tails are observed. I wish to test the null hypothesis

Ho : θ = 1/2 vs.Ho : θ > 1/2, whereθ is the true probabil-

ity of heads.

Introduction to Bayesian Methods – p.31/??

Page 32: Introduction to Bayesian Methodsbmallick/Bayes1.pdf · Introduction We develop the Bayesian paradigm for parametric inference. To this end, suppose we conduct (or wish to design)

Advantages of Bayesian Methods

Consider the following 2 choices for the likelihood function:

a) Binomial n = 12 (fixed), x = number of heads.x ∼Binomial(12,θ) and the likelihood is

L1(θ) =

(

n

x

)

θx(1 − θ)n−x

=

(

12

9

)

θ9(1 − θ)3

b) Negative Binomial:n is not fixed, flip until the third

tail appears. Herex is the number of flips required to

complete the experiment,x ∼ NegBinomial(r=3,θ).

Introduction to Bayesian Methods – p.32/??

Page 33: Introduction to Bayesian Methodsbmallick/Bayes1.pdf · Introduction We develop the Bayesian paradigm for parametric inference. To this end, suppose we conduct (or wish to design)

Advantages of Bayesian Methods

L2(θ) =

(

r + x − 1

x

)

θx(1 − θ)r

=

(

11

9

)

θ9(1 − θ)3

Note thatL1(θ) ∝ L2(θ). From a Bayesian perspective, the

posterior distribution ofθ is thesameunder either design. That

is

p(θ|x) =L1(θ)π(θ)

L1(θ)π(θ)dθ≡ L2(θ)π(θ)∫

L2(θ)π(θ)dθ

Introduction to Bayesian Methods – p.33/??

Page 34: Introduction to Bayesian Methodsbmallick/Bayes1.pdf · Introduction We develop the Bayesian paradigm for parametric inference. To this end, suppose we conduct (or wish to design)

Advantages of Bayesian Methods

However, under the frequentist paradigm, inferences aboutθ are

quite different under each design. The rejection region based on

the binomial likelihood is

p(x ≥ 9|θ = 1/2) =12∑

j=9

(

12

j

)

θj(1 − θ)12−j = 0.075

while for the negative binomial likelihood, thep-value is

p(x ≥ 9|θ = 1/2) =∞∑

j=9

(

2 + j

j

)

θj(1 − θ)3 = 0.0325

The two designs lead to different decisions, rejectingHo under

design 2 and not under design 1.

Introduction to Bayesian Methods – p.34/??

Page 35: Introduction to Bayesian Methodsbmallick/Bayes1.pdf · Introduction We develop the Bayesian paradigm for parametric inference. To this end, suppose we conduct (or wish to design)

Advantages of Bayesian Methods

3. Bayesian Inference Does not Lead to Absurd Results

Absurd results can be obtained when doing UMVUE estimation.

Supposex ∼ Poisson(λ), and we want to estimateθ = e−2λ,

0 < θ < 1. It can be shown that the UMVUE ofθ is (−1)x. Thus,

if x is even the UMVUE ofθ is 1 and ifx is odd the UMVUE of

θ is -1!!

Introduction to Bayesian Methods – p.35/??

Page 36: Introduction to Bayesian Methodsbmallick/Bayes1.pdf · Introduction We develop the Bayesian paradigm for parametric inference. To this end, suppose we conduct (or wish to design)

Advantages of Bayesian Methods

4. Bayes Theorem is a formula for learningSuppose you conduct an experiment and collect observations

x1, ..., xn. Then

p(θ|x) =p(x|θ)π(θ)

Θ

p(x|θ)π(θ)dθ

wherex = (x1, ..., xn). Suppose you collect an additional

observationxn+1 in a new study. Then,

p(θ|x, xn+1) =p(xn+1|θ)π(θ|x)

Θ

p(xn+1|θ)π(θ|x)dθ

So your prior in the new study is the posterior from the previous.Introduction to Bayesian Methods – p.36/??

Page 37: Introduction to Bayesian Methodsbmallick/Bayes1.pdf · Introduction We develop the Bayesian paradigm for parametric inference. To this end, suppose we conduct (or wish to design)

Advantages of Bayesian Methods

5. Bayes inference does not require large sample theoryWith modern computing advances, “exact” calculations can be

carried out using Markov chain Monte Carlo (MCMC) methods.

Bayes methods do not require asymptotics for valid inference.

Thus small sample Bayesian inference proceeds in the same way

as if one had a large sample.

Introduction to Bayesian Methods – p.37/??

Page 38: Introduction to Bayesian Methodsbmallick/Bayes1.pdf · Introduction We develop the Bayesian paradigm for parametric inference. To this end, suppose we conduct (or wish to design)

Advantages of Bayesian Methods

6. Bayes inference often has frequentist inference as a specialcaseOften one can obtain frequentists answers by choosing a uniform

priorfor the parameters, i.e.π(θ) ∝ 1, so that

p(θ|x) ∝ L(θ)

In such cases, frenquentist answers can be obtained from such a

posterior distribution.

Introduction to Bayesian Methods – p.38/??