Lecture 2 Bayesian Statistics and Inference. Lecture Contents What is Bayesian inference Prior...

37
Lecture 2 Bayesian Statistics and Inference

Transcript of Lecture 2 Bayesian Statistics and Inference. Lecture Contents What is Bayesian inference Prior...

Page 1: Lecture 2 Bayesian Statistics and Inference. Lecture Contents What is Bayesian inference Prior distributions Examples of conjugate Bayesian analysis Credible.

Lecture 2

Bayesian Statistics and Inference

Page 2: Lecture 2 Bayesian Statistics and Inference. Lecture Contents What is Bayesian inference Prior distributions Examples of conjugate Bayesian analysis Credible.

Lecture Contents

• What is Bayesian inference

• Prior distributions

• Examples of conjugate Bayesian analysis

• Credible intervals

• Bayes factors

• Bayesian linear regression

Page 3: Lecture 2 Bayesian Statistics and Inference. Lecture Contents What is Bayesian inference Prior distributions Examples of conjugate Bayesian analysis Credible.

Bayes Theorem

• Bayesian statistics named after Rev. Thomas Bayes (1702-1761)

• Bayes Theorem for probability events A and B

• Or for a set of mutually exclusive and exhaustive events (i.e. ), then

)(

)()|()|(

Bp

ApABpBAp

i i ii ApAp 1)()(

j jj

iii APABp

ApABpBAp

)()|(

)()|()|(

Page 4: Lecture 2 Bayesian Statistics and Inference. Lecture Contents What is Bayesian inference Prior distributions Examples of conjugate Bayesian analysis Credible.

Example – coin tossing

• Let A be the event of 2 Heads in three tosses of a fair coin. B be the event of 1st coin is a Head.

• Three coins have 8 equally probable patterns {HHH,HHT,HTH,HTT,THH,THT,TTH,TTT}

• A = {HHT,HTH,THH} →p(A)=3/8• B = {HHH,HTH,HTH,HTT} →p(B)=1/2• A|B = {HHT,HTH}|{HHH,HTH,HTH,HTT} →p(A|

B)=1/2• B|A = {HHT,HTH}|{HHT,HTH,THH} →p(B|A)=2/3• P(A|B) = P(B|A)P(A)/P(B) = (2/3*3/8)/(1/2) = 1/2

Page 5: Lecture 2 Bayesian Statistics and Inference. Lecture Contents What is Bayesian inference Prior distributions Examples of conjugate Bayesian analysis Credible.

Example 2 – Diagnostic testing

• A new HIV test is claimed to have “95% sensitivity and 98% specificity”

• In a population with an HIV prevalence of 1/1000, what is the chance that a patient testing positive actually has HIV?

Let A be the event patient is truly positive, A’ be the event that they are truly negative

Let B be the event that they test positive

Page 6: Lecture 2 Bayesian Statistics and Inference. Lecture Contents What is Bayesian inference Prior distributions Examples of conjugate Bayesian analysis Credible.

Diagnostic Testing ctd.

• We want p(A|B)• “95% sensitivity” means that p(B|A) = 0.95• “98% specificity” means that p(B|A’) = 0.02

So from Bayes Theorem

045.0999.002.0001.095.0

001.095.0

)'()'|()()|(

)()|()|(

ApABpApABp

ApABpBAp

Thus over 95% of those testing positive will, in fact, not have HIV.

Page 7: Lecture 2 Bayesian Statistics and Inference. Lecture Contents What is Bayesian inference Prior distributions Examples of conjugate Bayesian analysis Credible.

Being Bayesian!

• So the vital issue in this example is how should this test result change our prior belief that the patient is HIV positive?

• The disease prevalence (p=0.001) can be thought of as a ‘prior’ probability.

• Observing a positive result causes us to modify this probability to p=0.045 which is our ‘posterior’ probability that the patient is HIV positive.

• This use of Bayes theorem applied to observables is uncontroversial however its use in general statistical analyses where parameters are unknown quantities is more controversial.

Page 8: Lecture 2 Bayesian Statistics and Inference. Lecture Contents What is Bayesian inference Prior distributions Examples of conjugate Bayesian analysis Credible.

Bayesian Inference

In Bayesian inference there is a fundamental distinction between

• Observable quantities x, i.e. the data• Unknown quantities θθ can be statistical parameters, missing data, latent

variables…• Parameters are treated as random variablesIn the Bayesian framework we make probability statements

about model parametersIn the frequentist framework, parameters are fixed non-

random quantities and the probability statements concern the data.

Page 9: Lecture 2 Bayesian Statistics and Inference. Lecture Contents What is Bayesian inference Prior distributions Examples of conjugate Bayesian analysis Credible.

Prior distributions

As with all statistical analyses we start by positing a model which specifies p(x| θ)

This is the likelihood which relates all variables into a ‘full probability model’

However from a Bayesian point of view : is unknown so should have a probability

distribution reflecting our uncertainty about it before seeing the data

• Therefore we specify a prior distribution p(θ)Note this is like the prevalence in the example

Page 10: Lecture 2 Bayesian Statistics and Inference. Lecture Contents What is Bayesian inference Prior distributions Examples of conjugate Bayesian analysis Credible.

Posterior Distributions

Also x is known so should be conditioned on and here we use Bayes theorem to obtain the conditional distribution for unobserved quantities given the data which is known as the posterior distribution.

)|()()|()(

)|()()|(

xppdxpp

xppxp

The prior distribution expresses our uncertainty about before seeing the data.

The posterior distribution expresses our uncertainty about after seeing the data.

Page 11: Lecture 2 Bayesian Statistics and Inference. Lecture Contents What is Bayesian inference Prior distributions Examples of conjugate Bayesian analysis Credible.

Examples of Bayesian Inferenceusing the Normal distribution

Known variance, unknown mean

It is easier to consider first a model with 1 unknown parameter. Suppose we have a sample of Normal data:

Let us assume we know the variance, 2 and we assume a prior distribution for the mean, based on our prior beliefs:

Now we wish to construct the posterior distribution p(|x).

.,...,1 ,),(~ 2 niNxi

),(~ 200 N

Page 12: Lecture 2 Bayesian Statistics and Inference. Lecture Contents What is Bayesian inference Prior distributions Examples of conjugate Bayesian analysis Credible.

Posterior for Normal distribution mean

So we have

))//()//1(exp(

)/)(exp()2(

)/)(exp()2(

)|()()|(

hence and

)/)(exp()2()|(

)/)(exp()2()(

2200

220

221

2

1

2212

20

202

120

22212

20

202

120

21

21

21

21

consxn

x

xppxp

xxp

p

ii

N

ii

ii

Page 13: Lecture 2 Bayesian Statistics and Inference. Lecture Contents What is Bayesian inference Prior distributions Examples of conjugate Bayesian analysis Credible.

Posterior for Normal distribution mean (continued)

For a Normal distribution with response y with mean and variance we have

}/exp{

}/)(exp{)2()(12

21

2212

1

consyy

yyf

We can equate this to our posterior as follows:

)//( and )//1(

))//()//1(exp(

2200

1220

2200

220

221

ii

ii

xn

consxn

Page 14: Lecture 2 Bayesian Statistics and Inference. Lecture Contents What is Bayesian inference Prior distributions Examples of conjugate Bayesian analysis Credible.

Precisions and means

• In Bayesian statistics the precision = 1/variance is often more important than the variance.

• For the Normal model we have

))//(/( and )//1(/1 2200

220 nxn

In other words the posterior precision = sum of prior precision and data precision, and the posterior mean is a (precision weighted) average of the prior mean and data mean.

Page 15: Lecture 2 Bayesian Statistics and Inference. Lecture Contents What is Bayesian inference Prior distributions Examples of conjugate Bayesian analysis Credible.

Large sample properties

As n Posterior precision

So posterior variance

Posterior mean

And so posterior distribution

Compared to in the frequentist setting

xnx ))//(/( 2200

2220 / )//1(/1 nn

n/2

)/,()|( 2 nxNxp

)/,()|( 2 nNxp

Page 16: Lecture 2 Bayesian Statistics and Inference. Lecture Contents What is Bayesian inference Prior distributions Examples of conjugate Bayesian analysis Credible.

Girls Heights Example

• 10 girls aged 18 had both their heights and weights measured.

• Their heights (in cm) where as follows:169.6,166.8,157.1,181.1,158.4,165.6,166.7,156.5,168.1,165.3

We will assume the variance is known to be 50.

Two individuals gave the following prior distributions for the mean height

Individual 1

Individual 2 )3,170(~)(

)2,165(~)(2

2

21

Np

Np

Page 17: Lecture 2 Bayesian Statistics and Inference. Lecture Contents What is Bayesian inference Prior distributions Examples of conjugate Bayesian analysis Credible.

Constructing posterior 1

• To construct the posterior we use the formulae we have just calculated

• From the prior,

• From the data,

• The posterior is therefore

4,165 200

10,50,52.165 2 nx

.23.165)(

,222.2)( where

),(~)|(

502.1655

4165

11

15010

41

1

11

Nxp

Page 18: Lecture 2 Bayesian Statistics and Inference. Lecture Contents What is Bayesian inference Prior distributions Examples of conjugate Bayesian analysis Credible.

Prior and posterior comparison

Page 19: Lecture 2 Bayesian Statistics and Inference. Lecture Contents What is Bayesian inference Prior distributions Examples of conjugate Bayesian analysis Credible.

Constructing posterior 2

• Again to construct the posterior we use the earlier formulae we have just calculaed

• From the prior,

• From the data,

• The posterior is therefore

10,50,52.165 2 nx

9,170 200

.12.167)(

,214.3)( where

),(~)|(

502.1655

9170

22

15010

91

2

22

Nxp

Page 20: Lecture 2 Bayesian Statistics and Inference. Lecture Contents What is Bayesian inference Prior distributions Examples of conjugate Bayesian analysis Credible.

Prior 2 comparisonNote this prior is not as close to the data as prior 1 and hence posterior is somewhere between prior and likelihood.

Page 21: Lecture 2 Bayesian Statistics and Inference. Lecture Contents What is Bayesian inference Prior distributions Examples of conjugate Bayesian analysis Credible.

Other conjugate examples

• When the posterior is in the same family as the prior we have conjugacy. Examples include:

Likelihood Parameter Prior Posterior

Normal Mean Normal Normal

Normal Precision Gamma Gamma

Binomial Probability Beta Beta

Poisson Mean Gamma Gamma

Page 22: Lecture 2 Bayesian Statistics and Inference. Lecture Contents What is Bayesian inference Prior distributions Examples of conjugate Bayesian analysis Credible.

In all cases• The posterior mean is a compromise between

the prior mean and the MLE• The posterior s.d. is less than both the prior s.d.

and the s.e. (MLE)

‘A Bayesian is one who, vaguely expecting a horse and catching a glimpse of a donkey, strongly concludes he has seen a mule’ (Senn)

As n • The posterior mean the MLE• The posterior s.d. the s.e. (MLE)• The posterior does not depend on the prior.

Page 23: Lecture 2 Bayesian Statistics and Inference. Lecture Contents What is Bayesian inference Prior distributions Examples of conjugate Bayesian analysis Credible.

Non-informative priors

• We often do not have any prior information, although true Bayesian’s would argue we always have some prior information!

• We would hope to have good agreement between the frequentist approach and the Bayesian approach with a non-informative prior.

• Diffuse or flat priors are often better terms to use as no prior is strictly non-informative!

• For our example of an unknown mean, candidate priors are a Uniform distribution over a large range or a Normal distribution with a huge variance.

Page 24: Lecture 2 Bayesian Statistics and Inference. Lecture Contents What is Bayesian inference Prior distributions Examples of conjugate Bayesian analysis Credible.

Improper priors

• The limiting prior of both the Uniform and Normal is a Uniform prior on the whole real line.

• Such a prior is defined as improper as it is not strictly a probability distribution and doesn’t integrate to 1.

• Some care has to be taken with improper priors however in many cases they are acceptable provided they result in a proper posterior distribution.

• Uniform priors are often used as non-informative priors however it is worth noting that a uniform prior on one scale can be very informative on another.

• For example: If we have an unknown variance we may put a uniform prior on the variance, standard deviation or log(variance) which will all have different effects.

Page 25: Lecture 2 Bayesian Statistics and Inference. Lecture Contents What is Bayesian inference Prior distributions Examples of conjugate Bayesian analysis Credible.

Point and Interval Estimation

• In Bayesian inference the outcome of interest for a parameter is its full posterior distribution however we may be interested in summaries of this distribution.

• A simple point estimate would be the mean of the posterior. (although the median and mode are alternatives.)

• Interval estimates are also easy to obtain from the posterior distribution and are given several names, for example credible intervals, Bayesian confidence intervals and Highest density regions (HDR). All of these refer to the same quantity.

Page 26: Lecture 2 Bayesian Statistics and Inference. Lecture Contents What is Bayesian inference Prior distributions Examples of conjugate Bayesian analysis Credible.

Credible Intervals

• If we consider the heights example with our first prior then our posterior is

P(μ|x)~ N(165.23,2.222),and a 95% credible interval for μ is

165.23±1.96×sqrt(2.222) = (162.31,168.15).

Similarly prior 2 results in a 95% credible interval for μ is (163.61,170.63).

Note that credible intervals can be interpreted in the more natural way that there is a probability of 0.95 that the interval contains μ rather than the frequentist conclusion that 95% of such intervals contain μ.

Page 27: Lecture 2 Bayesian Statistics and Inference. Lecture Contents What is Bayesian inference Prior distributions Examples of conjugate Bayesian analysis Credible.

Hypothesis Testing

Another big issue in statistical modelling is the ability to test hypotheses and model comparisons in general.

The Bayesian approach is in some ways more straightforward. For an unknown parameter θ

we simply calculate the posterior probabilities

and decide between H0 and H1 accordingly.

We also require the prior probabilities to achieve this

)|( ),|( 1100 xPpxPp

)( ),( 1100 PP

Page 28: Lecture 2 Bayesian Statistics and Inference. Lecture Contents What is Bayesian inference Prior distributions Examples of conjugate Bayesian analysis Credible.

Bayes factors

• Prior odds on H0 against H1 is π0 /π1

• Posterior odds on H0 against H1 is p0 /p1

• The Bayes factor B in favour of H0 against H1 is

Note that when hypotheses are simple B is the likelihood ratio of H0 against H1 i.e. the odds in favour of H0 against H1 that are given by the data however in complex hypotheses B also involves the prior distributions.

01

10

10

10

)/(

)/(

p

pppB

Page 29: Lecture 2 Bayesian Statistics and Inference. Lecture Contents What is Bayesian inference Prior distributions Examples of conjugate Bayesian analysis Credible.

Bayes factors – Girls height example prior 1

Let us assume that H0 is μ >165 and hence H1 is μ ≤165. Now we have π0= π1=0.5 under the N(165,4) prior

The posterior is N(165.23,2.222) which results in p0 =0.561 p1=0.439 and results in a Bayes factor of 0.561/0.439=1.278 here the Bayes factor is close to 1 and so the data has not much altered our beliefs about the hypothesis under discussion.

Page 30: Lecture 2 Bayesian Statistics and Inference. Lecture Contents What is Bayesian inference Prior distributions Examples of conjugate Bayesian analysis Credible.

Bayes factors – Girls height example prior 2

Now under the N(170,9) prior we have π0=0.952 and π1=0.048 so strong a priori evidence for H0 against H1

The posterior is N(167.12,3.214) which results in p0 =0.881, p1=0.119 and results in a Bayes factor of (0.881×0.048)/(0.952×0.119) = 0.373 so in the case the Bayes factor is smaller than 1 as the data gives less evidence for H0 against H1 than the prior distribution.

It should be noted that care needs to be taken when using Bayes factors and non-informative priors.

Page 31: Lecture 2 Bayesian Statistics and Inference. Lecture Contents What is Bayesian inference Prior distributions Examples of conjugate Bayesian analysis Credible.

Bayesian inference with more unknown parameters

We have so far restricted ourselves to an example with only 1 unknown parameter which is generally unrealistic.

For example it would be more common to consider a Normal distribution with both mean and variance unknown.

In such a situation interest may focus on the marginal posterior distribution of the mean treating the variance as a nuisance parameter.

The marginal distribution is created by integrating the joint posterior distribution over the nuisance parameters

dxpxp )|,()|(

Page 32: Lecture 2 Bayesian Statistics and Inference. Lecture Contents What is Bayesian inference Prior distributions Examples of conjugate Bayesian analysis Credible.

Bayesian inference with more unknown parameters

This integration is one of the reasons why Bayesian statistics has been of less practical use in the past. This means that for even reasonably simple models Bayesian inference becomes involved.

However the revolution in computer speed and memory size has meant that integrations can be easily approximated by simulation methods as we will describe in the next session.

We will now briefly describe a Bayesian linear regression model before going on to a lab that allows you to try simulation approaches to solve the simple models in these lectures.

Page 33: Lecture 2 Bayesian Statistics and Inference. Lecture Contents What is Bayesian inference Prior distributions Examples of conjugate Bayesian analysis Credible.

Linear Regression example

In our whistle-stop tour of Bayesian statistics we have here skipped over many standard multiple parameter models. We will focus on linear regression here for comparison with the frequentist methods.

I will give brief details as it is less important to know how to calculate posterior distributions analytically when we will generally use simulation-based methods later.

Although the intention is not to scare you, the derivations are rather complex.

Page 34: Lecture 2 Bayesian Statistics and Inference. Lecture Contents What is Bayesian inference Prior distributions Examples of conjugate Bayesian analysis Credible.

Linear Regression

• Our model is

Now we need priors for the 3 unknown parameters, which we will consider in more detail in the practical.

For now we will use a convenient non-informative prior based on a uniform distribution on

This results in

The posterior can be expressed as follows:

),(~),,,|( 210

210 iii xNxyp

210 ,,

)log,,( 10

2210 )|,,( xp

)|(),|()|,( 222 ypypyp

Page 35: Lecture 2 Bayesian Statistics and Inference. Lecture Contents What is Bayesian inference Prior distributions Examples of conjugate Bayesian analysis Credible.

Linear Regression

We then get

22

2

2

2

222

11

22

settingby obtainederror standard

classical ely with threspective and ˆ are

and of estimates classical that theNote

)ˆ()ˆ(1

where

),(~)|(

)( ,)(ˆ where

),ˆ(~),|(

s

s

XyXykn

s

sknInvyp

XXVyXXX

VNyp

T

TTT

Page 36: Lecture 2 Bayesian Statistics and Inference. Lecture Contents What is Bayesian inference Prior distributions Examples of conjugate Bayesian analysis Credible.

Estimating the linear regression

To sample from the posterior distribution given, we firstly calculate the values of

Note that in the practical we will return to the heights example and regress the girls heights on their weights while trying various informative priors.

in turn. and for

onsdistributi thefrom sampling before and ,ˆ

2

2

sV

Page 37: Lecture 2 Bayesian Statistics and Inference. Lecture Contents What is Bayesian inference Prior distributions Examples of conjugate Bayesian analysis Credible.

Information for the Practical

In this first practical you will use an MCMC estimation package called WinBUGS to fit the models discussed in the lecture.

This practical is meant to confirm the answers from the lecture notes and also to familiarize you a little with WinBUGS.

We will give more details on WinBUGS in later lectures.