Foundations of Statistical Inferenceberestyc/teaching/lecture6.pdf · Foundations of Statistical...

74
Foundations of Statistical Inference Julien Berestycki Department of Statistics University of Oxford MT 2016 Julien Berestycki (University of Oxford) SB2a MT 2016 1 / 20

Transcript of Foundations of Statistical Inferenceberestyc/teaching/lecture6.pdf · Foundations of Statistical...

Page 1: Foundations of Statistical Inferenceberestyc/teaching/lecture6.pdf · Foundations of Statistical Inference Julien Berestycki Department of Statistics ... By contrast, in SubjectiveBayesian

Foundations of Statistical Inference

Julien Berestycki

Department of StatisticsUniversity of Oxford

MT 2016

Julien Berestycki (University of Oxford) SB2a MT 2016 1 / 20

Page 2: Foundations of Statistical Inferenceberestyc/teaching/lecture6.pdf · Foundations of Statistical Inference Julien Berestycki Department of Statistics ... By contrast, in SubjectiveBayesian

Lecture 6 : Bayesian Inference

Julien Berestycki (University of Oxford) SB2a MT 2016 2 / 20

Page 3: Foundations of Statistical Inferenceberestyc/teaching/lecture6.pdf · Foundations of Statistical Inference Julien Berestycki Department of Statistics ... By contrast, in SubjectiveBayesian

Ideas of probability

The majority of statistics you have learned so are are called classicalor Frequentist. The probability for an event is defined as the proportionof successes in an infinite number of repeatable trials.

By contrast, in Subjective Bayesian inference, probability is a measureof the strength of belief.

We treat parameters as random variables. Before collecting any datawe assume that there is uncertainty about the value of a parameter.This uncertainty can be formalised by specifying a pdf (or pmf) for theparameter. We then conduct an experiment to collect some data thatwill give us information about the parameter. We then use BayesTheorem to combine our prior beliefs with the data to derive anupdated estimate of our uncertainty about the parameter.

Julien Berestycki (University of Oxford) SB2a MT 2016 3 / 20

Page 4: Foundations of Statistical Inferenceberestyc/teaching/lecture6.pdf · Foundations of Statistical Inference Julien Berestycki Department of Statistics ... By contrast, in SubjectiveBayesian

Ideas of probability

The majority of statistics you have learned so are are called classicalor Frequentist. The probability for an event is defined as the proportionof successes in an infinite number of repeatable trials.

By contrast, in Subjective Bayesian inference, probability is a measureof the strength of belief.

We treat parameters as random variables. Before collecting any datawe assume that there is uncertainty about the value of a parameter.This uncertainty can be formalised by specifying a pdf (or pmf) for theparameter. We then conduct an experiment to collect some data thatwill give us information about the parameter. We then use BayesTheorem to combine our prior beliefs with the data to derive anupdated estimate of our uncertainty about the parameter.

Julien Berestycki (University of Oxford) SB2a MT 2016 3 / 20

Page 5: Foundations of Statistical Inferenceberestyc/teaching/lecture6.pdf · Foundations of Statistical Inference Julien Berestycki Department of Statistics ... By contrast, in SubjectiveBayesian

Ideas of probability

The majority of statistics you have learned so are are called classicalor Frequentist. The probability for an event is defined as the proportionof successes in an infinite number of repeatable trials.

By contrast, in Subjective Bayesian inference, probability is a measureof the strength of belief.

We treat parameters as random variables. Before collecting any datawe assume that there is uncertainty about the value of a parameter.This uncertainty can be formalised by specifying a pdf (or pmf) for theparameter. We then conduct an experiment to collect some data thatwill give us information about the parameter. We then use BayesTheorem to combine our prior beliefs with the data to derive anupdated estimate of our uncertainty about the parameter.

Julien Berestycki (University of Oxford) SB2a MT 2016 3 / 20

Page 6: Foundations of Statistical Inferenceberestyc/teaching/lecture6.pdf · Foundations of Statistical Inference Julien Berestycki Department of Statistics ... By contrast, in SubjectiveBayesian

The history of Bayesian Statistics

Bayesian methods originated with Bayes and Laplace (late 1700sto mid 1800s).

In the early 1920’s, Fisher put forward an opposing viewpoint, thatstatistical inference must be based entirely on probabilities withdirect experimental interpretation i.e. the repeated samplingprinciple.In 1939 Jeffrey’s book ’The theory of probability’ started aresurgence of interest in Bayesian inference.This continued throughout the 1950-60s, especially as problemswith the Frequentist approach started to emerge.The development of simulation based inference has transformedBayesian statistics in the last 20-30 years and it now plays aprominent part in modern statistics.

Julien Berestycki (University of Oxford) SB2a MT 2016 4 / 20

Page 7: Foundations of Statistical Inferenceberestyc/teaching/lecture6.pdf · Foundations of Statistical Inference Julien Berestycki Department of Statistics ... By contrast, in SubjectiveBayesian

The history of Bayesian Statistics

Bayesian methods originated with Bayes and Laplace (late 1700sto mid 1800s).In the early 1920’s, Fisher put forward an opposing viewpoint, thatstatistical inference must be based entirely on probabilities withdirect experimental interpretation i.e. the repeated samplingprinciple.

In 1939 Jeffrey’s book ’The theory of probability’ started aresurgence of interest in Bayesian inference.This continued throughout the 1950-60s, especially as problemswith the Frequentist approach started to emerge.The development of simulation based inference has transformedBayesian statistics in the last 20-30 years and it now plays aprominent part in modern statistics.

Julien Berestycki (University of Oxford) SB2a MT 2016 4 / 20

Page 8: Foundations of Statistical Inferenceberestyc/teaching/lecture6.pdf · Foundations of Statistical Inference Julien Berestycki Department of Statistics ... By contrast, in SubjectiveBayesian

The history of Bayesian Statistics

Bayesian methods originated with Bayes and Laplace (late 1700sto mid 1800s).In the early 1920’s, Fisher put forward an opposing viewpoint, thatstatistical inference must be based entirely on probabilities withdirect experimental interpretation i.e. the repeated samplingprinciple.In 1939 Jeffrey’s book ’The theory of probability’ started aresurgence of interest in Bayesian inference.

This continued throughout the 1950-60s, especially as problemswith the Frequentist approach started to emerge.The development of simulation based inference has transformedBayesian statistics in the last 20-30 years and it now plays aprominent part in modern statistics.

Julien Berestycki (University of Oxford) SB2a MT 2016 4 / 20

Page 9: Foundations of Statistical Inferenceberestyc/teaching/lecture6.pdf · Foundations of Statistical Inference Julien Berestycki Department of Statistics ... By contrast, in SubjectiveBayesian

The history of Bayesian Statistics

Bayesian methods originated with Bayes and Laplace (late 1700sto mid 1800s).In the early 1920’s, Fisher put forward an opposing viewpoint, thatstatistical inference must be based entirely on probabilities withdirect experimental interpretation i.e. the repeated samplingprinciple.In 1939 Jeffrey’s book ’The theory of probability’ started aresurgence of interest in Bayesian inference.This continued throughout the 1950-60s, especially as problemswith the Frequentist approach started to emerge.

The development of simulation based inference has transformedBayesian statistics in the last 20-30 years and it now plays aprominent part in modern statistics.

Julien Berestycki (University of Oxford) SB2a MT 2016 4 / 20

Page 10: Foundations of Statistical Inferenceberestyc/teaching/lecture6.pdf · Foundations of Statistical Inference Julien Berestycki Department of Statistics ... By contrast, in SubjectiveBayesian

The history of Bayesian Statistics

Bayesian methods originated with Bayes and Laplace (late 1700sto mid 1800s).In the early 1920’s, Fisher put forward an opposing viewpoint, thatstatistical inference must be based entirely on probabilities withdirect experimental interpretation i.e. the repeated samplingprinciple.In 1939 Jeffrey’s book ’The theory of probability’ started aresurgence of interest in Bayesian inference.This continued throughout the 1950-60s, especially as problemswith the Frequentist approach started to emerge.The development of simulation based inference has transformedBayesian statistics in the last 20-30 years and it now plays aprominent part in modern statistics.

Julien Berestycki (University of Oxford) SB2a MT 2016 4 / 20

Page 11: Foundations of Statistical Inferenceberestyc/teaching/lecture6.pdf · Foundations of Statistical Inference Julien Berestycki Department of Statistics ... By contrast, in SubjectiveBayesian

Julien Berestycki (University of Oxford) SB2a MT 2016 5 / 20

Page 12: Foundations of Statistical Inferenceberestyc/teaching/lecture6.pdf · Foundations of Statistical Inference Julien Berestycki Department of Statistics ... By contrast, in SubjectiveBayesian

Problems with Frequentist Inference - I

Frequentist Inference generally does not condition on the observeddata

A confidence interval is a set-valued function C(X ) ⊆ Θ of the data Xwhich covers the parameter θ ∈ C(X ) a fraction 1− α of repeateddraws of X taken under the null H0.

This is not the same as the statement that, given data X = x , theinterval C(x) covers θ with probability 1− α. But this is the type ofstatement we might wish to make. (observe that this statement makessense iff θ is a r.v.)

Example 1 Suppose X1,X2 ∼ U(θ − 1/2, θ + 1/2) so that X(1) and X(2)are the order statistics. Then C(X ) = [X(1),X(2)] is a α = 1/2 level CIfor θ. Suppose in your data X = x , x(2) − x(1) > 1/2 (this happens inan eighth of data sets). Then θ ∈ [x(1), x(2)] with probability one.

Julien Berestycki (University of Oxford) SB2a MT 2016 6 / 20

Page 13: Foundations of Statistical Inferenceberestyc/teaching/lecture6.pdf · Foundations of Statistical Inference Julien Berestycki Department of Statistics ... By contrast, in SubjectiveBayesian

Problems with Frequentist Inference - I

Frequentist Inference generally does not condition on the observeddata

A confidence interval is a set-valued function C(X ) ⊆ Θ of the data Xwhich covers the parameter θ ∈ C(X ) a fraction 1− α of repeateddraws of X taken under the null H0.

This is not the same as the statement that, given data X = x , theinterval C(x) covers θ with probability 1− α. But this is the type ofstatement we might wish to make. (observe that this statement makessense iff θ is a r.v.)

Example 1 Suppose X1,X2 ∼ U(θ − 1/2, θ + 1/2) so that X(1) and X(2)are the order statistics. Then C(X ) = [X(1),X(2)] is a α = 1/2 level CIfor θ. Suppose in your data X = x , x(2) − x(1) > 1/2 (this happens inan eighth of data sets). Then θ ∈ [x(1), x(2)] with probability one.

Julien Berestycki (University of Oxford) SB2a MT 2016 6 / 20

Page 14: Foundations of Statistical Inferenceberestyc/teaching/lecture6.pdf · Foundations of Statistical Inference Julien Berestycki Department of Statistics ... By contrast, in SubjectiveBayesian

Problems with Frequentist Inference - I

Frequentist Inference generally does not condition on the observeddata

A confidence interval is a set-valued function C(X ) ⊆ Θ of the data Xwhich covers the parameter θ ∈ C(X ) a fraction 1− α of repeateddraws of X taken under the null H0.

This is not the same as the statement that, given data X = x , theinterval C(x) covers θ with probability 1− α. But this is the type ofstatement we might wish to make. (observe that this statement makessense iff θ is a r.v.)

Example 1 Suppose X1,X2 ∼ U(θ − 1/2, θ + 1/2) so that X(1) and X(2)are the order statistics. Then C(X ) = [X(1),X(2)] is a α = 1/2 level CIfor θ. Suppose in your data X = x , x(2) − x(1) > 1/2 (this happens inan eighth of data sets). Then θ ∈ [x(1), x(2)] with probability one.

Julien Berestycki (University of Oxford) SB2a MT 2016 6 / 20

Page 15: Foundations of Statistical Inferenceberestyc/teaching/lecture6.pdf · Foundations of Statistical Inference Julien Berestycki Department of Statistics ... By contrast, in SubjectiveBayesian

Problems with Frequentist Inference - I

Frequentist Inference generally does not condition on the observeddata

A confidence interval is a set-valued function C(X ) ⊆ Θ of the data Xwhich covers the parameter θ ∈ C(X ) a fraction 1− α of repeateddraws of X taken under the null H0.

This is not the same as the statement that, given data X = x , theinterval C(x) covers θ with probability 1− α. But this is the type ofstatement we might wish to make. (observe that this statement makessense iff θ is a r.v.)

Example 1 Suppose X1,X2 ∼ U(θ − 1/2, θ + 1/2) so that X(1) and X(2)are the order statistics. Then C(X ) = [X(1),X(2)] is a α = 1/2 level CIfor θ. Suppose in your data X = x , x(2) − x(1) > 1/2 (this happens inan eighth of data sets). Then θ ∈ [x(1), x(2)] with probability one.

Julien Berestycki (University of Oxford) SB2a MT 2016 6 / 20

Page 16: Foundations of Statistical Inferenceberestyc/teaching/lecture6.pdf · Foundations of Statistical Inference Julien Berestycki Department of Statistics ... By contrast, in SubjectiveBayesian

Problems with Frequentist Inference - II

Frequentist Inference depends on data that were never observed

The likelihood principle Suppose that two experiments relating to θ,E1,E2, give rise to data y1, y2, such that the corresponding likelihoodsare proportional, that is, for all θ

L(θ; y1,E1) = cL(θ; y2,E2).

then the two experiments lead to identical conclusions about θ.

Key point MLE’s respect the likelihood principle i.e. the MLEs for θ areidentical in both experiments. But significance tests do not respect thelikelihood principle.

Julien Berestycki (University of Oxford) SB2a MT 2016 7 / 20

Page 17: Foundations of Statistical Inferenceberestyc/teaching/lecture6.pdf · Foundations of Statistical Inference Julien Berestycki Department of Statistics ... By contrast, in SubjectiveBayesian

Problems with Frequentist Inference - II

Frequentist Inference depends on data that were never observed

The likelihood principle Suppose that two experiments relating to θ,E1,E2, give rise to data y1, y2, such that the corresponding likelihoodsare proportional, that is, for all θ

L(θ; y1,E1) = cL(θ; y2,E2).

then the two experiments lead to identical conclusions about θ.

Key point MLE’s respect the likelihood principle i.e. the MLEs for θ areidentical in both experiments. But significance tests do not respect thelikelihood principle.

Julien Berestycki (University of Oxford) SB2a MT 2016 7 / 20

Page 18: Foundations of Statistical Inferenceberestyc/teaching/lecture6.pdf · Foundations of Statistical Inference Julien Berestycki Department of Statistics ... By contrast, in SubjectiveBayesian

Problems with Frequentist Inference - II

Frequentist Inference depends on data that were never observed

The likelihood principle Suppose that two experiments relating to θ,E1,E2, give rise to data y1, y2, such that the corresponding likelihoodsare proportional, that is, for all θ

L(θ; y1,E1) = cL(θ; y2,E2).

then the two experiments lead to identical conclusions about θ.

Key point MLE’s respect the likelihood principle i.e. the MLEs for θ areidentical in both experiments. But significance tests do not respect thelikelihood principle.

Julien Berestycki (University of Oxford) SB2a MT 2016 7 / 20

Page 19: Foundations of Statistical Inferenceberestyc/teaching/lecture6.pdf · Foundations of Statistical Inference Julien Berestycki Department of Statistics ... By contrast, in SubjectiveBayesian

Consider a pure test for significance where we specify just H0. Wemust choose a test statistic T (x), and define the p-value for dataT (x) = t as

p-value = P(T (X ) at least as extreme as t |H0).

The choice of T (X ) amounts to a statement about the direction oflikely departures from the null, which requires some consideration ofalternative models.

Note 1 The calculation of the p-value involves a sum (or integral) overdata that was not observed, and this can depend upon the form of theexperiment.

Note 2 A p-value is not P(H0|T (X ) = t).

Julien Berestycki (University of Oxford) SB2a MT 2016 8 / 20

Page 20: Foundations of Statistical Inferenceberestyc/teaching/lecture6.pdf · Foundations of Statistical Inference Julien Berestycki Department of Statistics ... By contrast, in SubjectiveBayesian

Consider a pure test for significance where we specify just H0. Wemust choose a test statistic T (x), and define the p-value for dataT (x) = t as

p-value = P(T (X ) at least as extreme as t |H0).

The choice of T (X ) amounts to a statement about the direction oflikely departures from the null, which requires some consideration ofalternative models.

Note 1 The calculation of the p-value involves a sum (or integral) overdata that was not observed, and this can depend upon the form of theexperiment.

Note 2 A p-value is not P(H0|T (X ) = t).

Julien Berestycki (University of Oxford) SB2a MT 2016 8 / 20

Page 21: Foundations of Statistical Inferenceberestyc/teaching/lecture6.pdf · Foundations of Statistical Inference Julien Berestycki Department of Statistics ... By contrast, in SubjectiveBayesian

Consider a pure test for significance where we specify just H0. Wemust choose a test statistic T (x), and define the p-value for dataT (x) = t as

p-value = P(T (X ) at least as extreme as t |H0).

The choice of T (X ) amounts to a statement about the direction oflikely departures from the null, which requires some consideration ofalternative models.

Note 1 The calculation of the p-value involves a sum (or integral) overdata that was not observed, and this can depend upon the form of theexperiment.

Note 2 A p-value is not P(H0|T (X ) = t).

Julien Berestycki (University of Oxford) SB2a MT 2016 8 / 20

Page 22: Foundations of Statistical Inferenceberestyc/teaching/lecture6.pdf · Foundations of Statistical Inference Julien Berestycki Department of Statistics ... By contrast, in SubjectiveBayesian

Example

A Bernoulli trial succeeds with probability p.

E1 fix n1 Bernoulli trials, count number y1 of successes

E2 count number n2 Bernoulli trials to get fixed number y2 successes

L(p; y1,E1) =

(n1

y1

)py1(1− p)n1−y1 binomial

L(p; n2,E2) =

(n2 − 1y2 − 1

)py2(1− p)n2−y2 negative binomial

If n1 = n2 = n, y1 = y2 = y then L(p; y1,E1) ∝ L(p; n2,E2).So MLEs for p will be the same under E1 and E2.

Julien Berestycki (University of Oxford) SB2a MT 2016 9 / 20

Page 23: Foundations of Statistical Inferenceberestyc/teaching/lecture6.pdf · Foundations of Statistical Inference Julien Berestycki Department of Statistics ... By contrast, in SubjectiveBayesian

Example

A Bernoulli trial succeeds with probability p.

E1 fix n1 Bernoulli trials, count number y1 of successes

E2 count number n2 Bernoulli trials to get fixed number y2 successes

L(p; y1,E1) =

(n1

y1

)py1(1− p)n1−y1 binomial

L(p; n2,E2) =

(n2 − 1y2 − 1

)py2(1− p)n2−y2 negative binomial

If n1 = n2 = n, y1 = y2 = y then L(p; y1,E1) ∝ L(p; n2,E2).So MLEs for p will be the same under E1 and E2.

Julien Berestycki (University of Oxford) SB2a MT 2016 9 / 20

Page 24: Foundations of Statistical Inferenceberestyc/teaching/lecture6.pdf · Foundations of Statistical Inference Julien Berestycki Department of Statistics ... By contrast, in SubjectiveBayesian

Example

A Bernoulli trial succeeds with probability p.

E1 fix n1 Bernoulli trials, count number y1 of successes

E2 count number n2 Bernoulli trials to get fixed number y2 successes

L(p; y1,E1) =

(n1

y1

)py1(1− p)n1−y1 binomial

L(p; n2,E2) =

(n2 − 1y2 − 1

)py2(1− p)n2−y2 negative binomial

If n1 = n2 = n, y1 = y2 = y then L(p; y1,E1) ∝ L(p; n2,E2).So MLEs for p will be the same under E1 and E2.

Julien Berestycki (University of Oxford) SB2a MT 2016 9 / 20

Page 25: Foundations of Statistical Inferenceberestyc/teaching/lecture6.pdf · Foundations of Statistical Inference Julien Berestycki Department of Statistics ... By contrast, in SubjectiveBayesian

Example

A Bernoulli trial succeeds with probability p.

E1 fix n1 Bernoulli trials, count number y1 of successes

E2 count number n2 Bernoulli trials to get fixed number y2 successes

L(p; y1,E1) =

(n1

y1

)py1(1− p)n1−y1 binomial

L(p; n2,E2) =

(n2 − 1y2 − 1

)py2(1− p)n2−y2 negative binomial

If n1 = n2 = n, y1 = y2 = y then L(p; y1,E1) ∝ L(p; n2,E2).So MLEs for p will be the same under E1 and E2.

Julien Berestycki (University of Oxford) SB2a MT 2016 9 / 20

Page 26: Foundations of Statistical Inferenceberestyc/teaching/lecture6.pdf · Foundations of Statistical Inference Julien Berestycki Department of Statistics ... By contrast, in SubjectiveBayesian

Example

A Bernoulli trial succeeds with probability p.

E1 fix n1 Bernoulli trials, count number y1 of successes

E2 count number n2 Bernoulli trials to get fixed number y2 successes

L(p; y1,E1) =

(n1

y1

)py1(1− p)n1−y1 binomial

L(p; n2,E2) =

(n2 − 1y2 − 1

)py2(1− p)n2−y2 negative binomial

If n1 = n2 = n, y1 = y2 = y then L(p; y1,E1) ∝ L(p; n2,E2).So MLEs for p will be the same under E1 and E2.

Julien Berestycki (University of Oxford) SB2a MT 2016 9 / 20

Page 27: Foundations of Statistical Inferenceberestyc/teaching/lecture6.pdf · Foundations of Statistical Inference Julien Berestycki Department of Statistics ... By contrast, in SubjectiveBayesian

Example

A Bernoulli trial succeeds with probability p.

E1 fix n1 Bernoulli trials, count number y1 of successes

E2 count number n2 Bernoulli trials to get fixed number y2 successes

L(p; y1,E1) =

(n1

y1

)py1(1− p)n1−y1 binomial

L(p; n2,E2) =

(n2 − 1y2 − 1

)py2(1− p)n2−y2 negative binomial

If n1 = n2 = n, y1 = y2 = y then L(p; y1,E1) ∝ L(p; n2,E2).

So MLEs for p will be the same under E1 and E2.

Julien Berestycki (University of Oxford) SB2a MT 2016 9 / 20

Page 28: Foundations of Statistical Inferenceberestyc/teaching/lecture6.pdf · Foundations of Statistical Inference Julien Berestycki Department of Statistics ... By contrast, in SubjectiveBayesian

Example

A Bernoulli trial succeeds with probability p.

E1 fix n1 Bernoulli trials, count number y1 of successes

E2 count number n2 Bernoulli trials to get fixed number y2 successes

L(p; y1,E1) =

(n1

y1

)py1(1− p)n1−y1 binomial

L(p; n2,E2) =

(n2 − 1y2 − 1

)py2(1− p)n2−y2 negative binomial

If n1 = n2 = n, y1 = y2 = y then L(p; y1,E1) ∝ L(p; n2,E2).So MLEs for p will be the same under E1 and E2.

Julien Berestycki (University of Oxford) SB2a MT 2016 9 / 20

Page 29: Foundations of Statistical Inferenceberestyc/teaching/lecture6.pdf · Foundations of Statistical Inference Julien Berestycki Department of Statistics ... By contrast, in SubjectiveBayesian

Example

But significance tests contradict : eg, H0 : p = 1/2 againstH1 : p < 1/2 and suppose n = 12 and y = 3.

The p-value based on E1 is

P(

Y ≤ y |θ =12

)=

y∑k=0

(nk

)(1/2)k (1− 1/2)n−k (= 0.073)

while the p-value based on E2 is

P(

N ≥ n|θ =12

)=∞∑

k=n

(k − 1y − 1

)(1/2)k (1− 1/2)n−k (= 0.033)

so different conclusions at significance level 0.05.

Note The p-values disagree because they sum over portions of twodifferent sample spaces.

Julien Berestycki (University of Oxford) SB2a MT 2016 10 / 20

Page 30: Foundations of Statistical Inferenceberestyc/teaching/lecture6.pdf · Foundations of Statistical Inference Julien Berestycki Department of Statistics ... By contrast, in SubjectiveBayesian

Example

But significance tests contradict : eg, H0 : p = 1/2 againstH1 : p < 1/2 and suppose n = 12 and y = 3.

The p-value based on E1 is

P(

Y ≤ y |θ =12

)=

y∑k=0

(nk

)(1/2)k (1− 1/2)n−k (= 0.073)

while the p-value based on E2 is

P(

N ≥ n|θ =12

)=∞∑

k=n

(k − 1y − 1

)(1/2)k (1− 1/2)n−k (= 0.033)

so different conclusions at significance level 0.05.

Note The p-values disagree because they sum over portions of twodifferent sample spaces.

Julien Berestycki (University of Oxford) SB2a MT 2016 10 / 20

Page 31: Foundations of Statistical Inferenceberestyc/teaching/lecture6.pdf · Foundations of Statistical Inference Julien Berestycki Department of Statistics ... By contrast, in SubjectiveBayesian

Example

But significance tests contradict : eg, H0 : p = 1/2 againstH1 : p < 1/2 and suppose n = 12 and y = 3.

The p-value based on E1 is

P(

Y ≤ y |θ =12

)=

y∑k=0

(nk

)(1/2)k (1− 1/2)n−k (= 0.073)

while the p-value based on E2 is

P(

N ≥ n|θ =12

)=∞∑

k=n

(k − 1y − 1

)(1/2)k (1− 1/2)n−k (= 0.033)

so different conclusions at significance level 0.05.

Note The p-values disagree because they sum over portions of twodifferent sample spaces.

Julien Berestycki (University of Oxford) SB2a MT 2016 10 / 20

Page 32: Foundations of Statistical Inferenceberestyc/teaching/lecture6.pdf · Foundations of Statistical Inference Julien Berestycki Department of Statistics ... By contrast, in SubjectiveBayesian

Example

But significance tests contradict : eg, H0 : p = 1/2 againstH1 : p < 1/2 and suppose n = 12 and y = 3.

The p-value based on E1 is

P(

Y ≤ y |θ =12

)=

y∑k=0

(nk

)(1/2)k (1− 1/2)n−k (= 0.073)

while the p-value based on E2 is

P(

N ≥ n|θ =12

)=∞∑

k=n

(k − 1y − 1

)(1/2)k (1− 1/2)n−k (= 0.033)

so different conclusions at significance level 0.05.

Note The p-values disagree because they sum over portions of twodifferent sample spaces.

Julien Berestycki (University of Oxford) SB2a MT 2016 10 / 20

Page 33: Foundations of Statistical Inferenceberestyc/teaching/lecture6.pdf · Foundations of Statistical Inference Julien Berestycki Department of Statistics ... By contrast, in SubjectiveBayesian

Julien Berestycki (University of Oxford) SB2a MT 2016 11 / 20

Page 34: Foundations of Statistical Inferenceberestyc/teaching/lecture6.pdf · Foundations of Statistical Inference Julien Berestycki Department of Statistics ... By contrast, in SubjectiveBayesian

Bayesian Inference - Revison

Likelihood f (x | θ) and prior distribution π(θ) for ϑ

. The posteriordistribution of ϑ at ϑ = θ, given x , is

π(θ | x) =f (x | θ)π(θ)∫f (x | θ)π(θ)dθ

⇒ π(θ | x) ∝ f (x | θ)π(θ)

posterior ∝ likelihood× prior

The same form for θ continuous (π(θ | x) a pdf) or discrete (π(θ | x) apmf). We call

∫f (x | θ)π(θ)dθ the marginal likelihood.

Likelihood principle Notice that, if we base all inference on the posteriordistribution, then we respect the likelihood principle. If two likelihoodfunctions are proportional, then any constant cancels top and bottomin Bayes rule, and the two posterior distributions are the same.

Julien Berestycki (University of Oxford) SB2a MT 2016 12 / 20

Page 35: Foundations of Statistical Inferenceberestyc/teaching/lecture6.pdf · Foundations of Statistical Inference Julien Berestycki Department of Statistics ... By contrast, in SubjectiveBayesian

Bayesian Inference - Revison

Likelihood f (x | θ) and prior distribution π(θ) for ϑ. The posteriordistribution of ϑ at ϑ = θ, given x , is

π(θ | x) =f (x | θ)π(θ)∫f (x | θ)π(θ)dθ

⇒ π(θ | x) ∝ f (x | θ)π(θ)

posterior ∝ likelihood× prior

The same form for θ continuous (π(θ | x) a pdf) or discrete (π(θ | x) apmf). We call

∫f (x | θ)π(θ)dθ the marginal likelihood.

Likelihood principle Notice that, if we base all inference on the posteriordistribution, then we respect the likelihood principle. If two likelihoodfunctions are proportional, then any constant cancels top and bottomin Bayes rule, and the two posterior distributions are the same.

Julien Berestycki (University of Oxford) SB2a MT 2016 12 / 20

Page 36: Foundations of Statistical Inferenceberestyc/teaching/lecture6.pdf · Foundations of Statistical Inference Julien Berestycki Department of Statistics ... By contrast, in SubjectiveBayesian

Bayesian Inference - Revison

Likelihood f (x | θ) and prior distribution π(θ) for ϑ. The posteriordistribution of ϑ at ϑ = θ, given x , is

π(θ | x) =f (x | θ)π(θ)∫f (x | θ)π(θ)dθ

⇒ π(θ | x) ∝ f (x | θ)π(θ)

posterior ∝ likelihood× prior

The same form for θ continuous (π(θ | x) a pdf) or discrete (π(θ | x) apmf). We call

∫f (x | θ)π(θ)dθ the marginal likelihood.

Likelihood principle Notice that, if we base all inference on the posteriordistribution, then we respect the likelihood principle. If two likelihoodfunctions are proportional, then any constant cancels top and bottomin Bayes rule, and the two posterior distributions are the same.

Julien Berestycki (University of Oxford) SB2a MT 2016 12 / 20

Page 37: Foundations of Statistical Inferenceberestyc/teaching/lecture6.pdf · Foundations of Statistical Inference Julien Berestycki Department of Statistics ... By contrast, in SubjectiveBayesian

Bayesian Inference - Revison

Likelihood f (x | θ) and prior distribution π(θ) for ϑ. The posteriordistribution of ϑ at ϑ = θ, given x , is

π(θ | x) =f (x | θ)π(θ)∫f (x | θ)π(θ)dθ

⇒ π(θ | x) ∝ f (x | θ)π(θ)

posterior ∝ likelihood× prior

The same form for θ continuous (π(θ | x) a pdf) or discrete (π(θ | x) apmf). We call

∫f (x | θ)π(θ)dθ the marginal likelihood.

Likelihood principle Notice that, if we base all inference on the posteriordistribution, then we respect the likelihood principle. If two likelihoodfunctions are proportional, then any constant cancels top and bottomin Bayes rule, and the two posterior distributions are the same.

Julien Berestycki (University of Oxford) SB2a MT 2016 12 / 20

Page 38: Foundations of Statistical Inferenceberestyc/teaching/lecture6.pdf · Foundations of Statistical Inference Julien Berestycki Department of Statistics ... By contrast, in SubjectiveBayesian

Example 1

X ∼ Bin(n, ϑ) for known n and unknown ϑ

. Suppose our priorknowledge about ϑ is represented by a Beta distribution on (0,1), andθ is a trial value for ϑ.

π(θ) =θa−1(1− θ)b−1

B(a,b), 0 < θ < 1.

0.0 0.2 0.4 0.6 0.8 1.0

01

23

45

x

dens

ity

a=1, b=1a=20, b=20a=1, b=3a=8, b=2

Julien Berestycki (University of Oxford) SB2a MT 2016 13 / 20

Page 39: Foundations of Statistical Inferenceberestyc/teaching/lecture6.pdf · Foundations of Statistical Inference Julien Berestycki Department of Statistics ... By contrast, in SubjectiveBayesian

Example 1

X ∼ Bin(n, ϑ) for known n and unknown ϑ. Suppose our priorknowledge about ϑ is represented by a Beta distribution on (0,1), andθ is a trial value for ϑ.

π(θ) =θa−1(1− θ)b−1

B(a,b), 0 < θ < 1.

0.0 0.2 0.4 0.6 0.8 1.0

01

23

45

x

dens

ity

a=1, b=1a=20, b=20a=1, b=3a=8, b=2

Julien Berestycki (University of Oxford) SB2a MT 2016 13 / 20

Page 40: Foundations of Statistical Inferenceberestyc/teaching/lecture6.pdf · Foundations of Statistical Inference Julien Berestycki Department of Statistics ... By contrast, in SubjectiveBayesian

Example 1

X ∼ Bin(n, ϑ) for known n and unknown ϑ. Suppose our priorknowledge about ϑ is represented by a Beta distribution on (0,1), andθ is a trial value for ϑ.

π(θ) =θa−1(1− θ)b−1

B(a,b), 0 < θ < 1.

0.0 0.2 0.4 0.6 0.8 1.0

01

23

45

x

dens

ity

a=1, b=1a=20, b=20a=1, b=3a=8, b=2

Julien Berestycki (University of Oxford) SB2a MT 2016 13 / 20

Page 41: Foundations of Statistical Inferenceberestyc/teaching/lecture6.pdf · Foundations of Statistical Inference Julien Berestycki Department of Statistics ... By contrast, in SubjectiveBayesian

Example 1

Prior probability density

π(θ) =θa−1(1− θ)b−1

B(a,b), 0 < θ < 1.

Likelihood

f (x | θ) =

(nx

)θx (1− θ)n−x , x = 0, . . . ,n

Posterior probability density

π(θ | x) ∝ likelihood× prior∝ θa−1(1− θ)b−1θx (1− θ)n−x

= θa+x−1(1− θ)n−x+b−1

Here posterior has the same form as the prior (conjugacy) withupdated parameters a,b replaced by a + x ,b + n − x , so

π(θ | x) =θa+x−1(1− θ)n−x+b−1

B(a + x ,b + n − x)

Julien Berestycki (University of Oxford) SB2a MT 2016 14 / 20

Page 42: Foundations of Statistical Inferenceberestyc/teaching/lecture6.pdf · Foundations of Statistical Inference Julien Berestycki Department of Statistics ... By contrast, in SubjectiveBayesian

Example 1

Prior probability density

π(θ) =θa−1(1− θ)b−1

B(a,b), 0 < θ < 1.

Likelihood

f (x | θ) =

(nx

)θx (1− θ)n−x , x = 0, . . . ,n

Posterior probability density

π(θ | x) ∝ likelihood× prior∝ θa−1(1− θ)b−1θx (1− θ)n−x

= θa+x−1(1− θ)n−x+b−1

Here posterior has the same form as the prior (conjugacy) withupdated parameters a,b replaced by a + x ,b + n − x , so

π(θ | x) =θa+x−1(1− θ)n−x+b−1

B(a + x ,b + n − x)

Julien Berestycki (University of Oxford) SB2a MT 2016 14 / 20

Page 43: Foundations of Statistical Inferenceberestyc/teaching/lecture6.pdf · Foundations of Statistical Inference Julien Berestycki Department of Statistics ... By contrast, in SubjectiveBayesian

Example 1

Prior probability density

π(θ) =θa−1(1− θ)b−1

B(a,b), 0 < θ < 1.

Likelihood

f (x | θ) =

(nx

)θx (1− θ)n−x , x = 0, . . . ,n

Posterior probability density

π(θ | x) ∝ likelihood× prior

∝ θa−1(1− θ)b−1θx (1− θ)n−x

= θa+x−1(1− θ)n−x+b−1

Here posterior has the same form as the prior (conjugacy) withupdated parameters a,b replaced by a + x ,b + n − x , so

π(θ | x) =θa+x−1(1− θ)n−x+b−1

B(a + x ,b + n − x)

Julien Berestycki (University of Oxford) SB2a MT 2016 14 / 20

Page 44: Foundations of Statistical Inferenceberestyc/teaching/lecture6.pdf · Foundations of Statistical Inference Julien Berestycki Department of Statistics ... By contrast, in SubjectiveBayesian

Example 1

Prior probability density

π(θ) =θa−1(1− θ)b−1

B(a,b), 0 < θ < 1.

Likelihood

f (x | θ) =

(nx

)θx (1− θ)n−x , x = 0, . . . ,n

Posterior probability density

π(θ | x) ∝ likelihood× prior∝ θa−1(1− θ)b−1θx (1− θ)n−x

= θa+x−1(1− θ)n−x+b−1

Here posterior has the same form as the prior (conjugacy) withupdated parameters a,b replaced by a + x ,b + n − x , so

π(θ | x) =θa+x−1(1− θ)n−x+b−1

B(a + x ,b + n − x)

Julien Berestycki (University of Oxford) SB2a MT 2016 14 / 20

Page 45: Foundations of Statistical Inferenceberestyc/teaching/lecture6.pdf · Foundations of Statistical Inference Julien Berestycki Department of Statistics ... By contrast, in SubjectiveBayesian

Example 1

Prior probability density

π(θ) =θa−1(1− θ)b−1

B(a,b), 0 < θ < 1.

Likelihood

f (x | θ) =

(nx

)θx (1− θ)n−x , x = 0, . . . ,n

Posterior probability density

π(θ | x) ∝ likelihood× prior∝ θa−1(1− θ)b−1θx (1− θ)n−x

= θa+x−1(1− θ)n−x+b−1

Here posterior has the same form as the prior (conjugacy) withupdated parameters a,b replaced by a + x ,b + n − x , so

π(θ | x) =θa+x−1(1− θ)n−x+b−1

B(a + x ,b + n − x)

Julien Berestycki (University of Oxford) SB2a MT 2016 14 / 20

Page 46: Foundations of Statistical Inferenceberestyc/teaching/lecture6.pdf · Foundations of Statistical Inference Julien Berestycki Department of Statistics ... By contrast, in SubjectiveBayesian

Example 1

Prior probability density

π(θ) =θa−1(1− θ)b−1

B(a,b), 0 < θ < 1.

Likelihood

f (x | θ) =

(nx

)θx (1− θ)n−x , x = 0, . . . ,n

Posterior probability density

π(θ | x) ∝ likelihood× prior∝ θa−1(1− θ)b−1θx (1− θ)n−x

= θa+x−1(1− θ)n−x+b−1

Here posterior has the same form as the prior (conjugacy) withupdated parameters a,b replaced by a + x ,b + n − x

, so

π(θ | x) =θa+x−1(1− θ)n−x+b−1

B(a + x ,b + n − x)

Julien Berestycki (University of Oxford) SB2a MT 2016 14 / 20

Page 47: Foundations of Statistical Inferenceberestyc/teaching/lecture6.pdf · Foundations of Statistical Inference Julien Berestycki Department of Statistics ... By contrast, in SubjectiveBayesian

Example 1

Prior probability density

π(θ) =θa−1(1− θ)b−1

B(a,b), 0 < θ < 1.

Likelihood

f (x | θ) =

(nx

)θx (1− θ)n−x , x = 0, . . . ,n

Posterior probability density

π(θ | x) ∝ likelihood× prior∝ θa−1(1− θ)b−1θx (1− θ)n−x

= θa+x−1(1− θ)n−x+b−1

Here posterior has the same form as the prior (conjugacy) withupdated parameters a,b replaced by a + x ,b + n − x , so

π(θ | x) =θa+x−1(1− θ)n−x+b−1

B(a + x ,b + n − x)Julien Berestycki (University of Oxford) SB2a MT 2016 14 / 20

Page 48: Foundations of Statistical Inferenceberestyc/teaching/lecture6.pdf · Foundations of Statistical Inference Julien Berestycki Department of Statistics ... By contrast, in SubjectiveBayesian

Example 1

For a Beta distribution with parameters a,b

µ =a

a + b, σ2 =

ab(a + b)2(a + b + 1)

The posterior mean and variance are

a + Xa + b + n

,(a + X )(b + n − X )

(a + b + n)2(a + b + n + 1)

Suppose X = n and we set a = b = 1 for our prior. Then posteriormean is

n + 1n + 2

i.e. when we observe events of just one type then our point estimate isnot 0 or 1 (which is sensible especially in small sample sizes).

Julien Berestycki (University of Oxford) SB2a MT 2016 15 / 20

Page 49: Foundations of Statistical Inferenceberestyc/teaching/lecture6.pdf · Foundations of Statistical Inference Julien Berestycki Department of Statistics ... By contrast, in SubjectiveBayesian

Example 1

For a Beta distribution with parameters a,b

µ =a

a + b, σ2 =

ab(a + b)2(a + b + 1)

The posterior mean and variance are

a + Xa + b + n

,(a + X )(b + n − X )

(a + b + n)2(a + b + n + 1)

Suppose X = n and we set a = b = 1 for our prior. Then posteriormean is

n + 1n + 2

i.e. when we observe events of just one type then our point estimate isnot 0 or 1 (which is sensible especially in small sample sizes).

Julien Berestycki (University of Oxford) SB2a MT 2016 15 / 20

Page 50: Foundations of Statistical Inferenceberestyc/teaching/lecture6.pdf · Foundations of Statistical Inference Julien Berestycki Department of Statistics ... By contrast, in SubjectiveBayesian

Example 1

For a Beta distribution with parameters a,b

µ =a

a + b, σ2 =

ab(a + b)2(a + b + 1)

The posterior mean and variance are

a + Xa + b + n

,(a + X )(b + n − X )

(a + b + n)2(a + b + n + 1)

Suppose X = n and we set a = b = 1 for our prior. Then posteriormean is

n + 1n + 2

i.e. when we observe events of just one type then our point estimate isnot 0 or 1 (which is sensible especially in small sample sizes).

Julien Berestycki (University of Oxford) SB2a MT 2016 15 / 20

Page 51: Foundations of Statistical Inferenceberestyc/teaching/lecture6.pdf · Foundations of Statistical Inference Julien Berestycki Department of Statistics ... By contrast, in SubjectiveBayesian

Example 1

For large n, the posterior mean and variance are approximately

Xn,

X (n − X )

n3

In classical statistics

θ̂ =Xn,θ̂(1− θ̂)

n=

X (n − X )

n3

Julien Berestycki (University of Oxford) SB2a MT 2016 16 / 20

Page 52: Foundations of Statistical Inferenceberestyc/teaching/lecture6.pdf · Foundations of Statistical Inference Julien Berestycki Department of Statistics ... By contrast, in SubjectiveBayesian

Example 1

Suppose ϑ = 0.3 but prior mean is 0.7 with std 0.1

. Suppose dataX ∼ Bin(n, ϑ) with n = 10 (yielding X = 3) and then n = 100 (yieldingX = 30, say).

0.0 0.2 0.4 0.6 0.8 1.0

02

46

810

theta

prob

abilit

y de

nsity

priorpost n=10

post n=100

As n increases, the likelihood overwhelms information in prior.

Julien Berestycki (University of Oxford) SB2a MT 2016 17 / 20

Page 53: Foundations of Statistical Inferenceberestyc/teaching/lecture6.pdf · Foundations of Statistical Inference Julien Berestycki Department of Statistics ... By contrast, in SubjectiveBayesian

Example 1

Suppose ϑ = 0.3 but prior mean is 0.7 with std 0.1. Suppose dataX ∼ Bin(n, ϑ) with n = 10 (yielding X = 3) and then n = 100 (yieldingX = 30, say).

0.0 0.2 0.4 0.6 0.8 1.0

02

46

810

theta

prob

abilit

y de

nsity

priorpost n=10

post n=100

As n increases, the likelihood overwhelms information in prior.

Julien Berestycki (University of Oxford) SB2a MT 2016 17 / 20

Page 54: Foundations of Statistical Inferenceberestyc/teaching/lecture6.pdf · Foundations of Statistical Inference Julien Berestycki Department of Statistics ... By contrast, in SubjectiveBayesian

Example 1

Suppose ϑ = 0.3 but prior mean is 0.7 with std 0.1. Suppose dataX ∼ Bin(n, ϑ) with n = 10 (yielding X = 3) and then n = 100 (yieldingX = 30, say).

0.0 0.2 0.4 0.6 0.8 1.0

02

46

810

theta

prob

abilit

y de

nsity

priorpost n=10

post n=100

As n increases, the likelihood overwhelms information in prior.

Julien Berestycki (University of Oxford) SB2a MT 2016 17 / 20

Page 55: Foundations of Statistical Inferenceberestyc/teaching/lecture6.pdf · Foundations of Statistical Inference Julien Berestycki Department of Statistics ... By contrast, in SubjectiveBayesian

Example 1

Suppose ϑ = 0.3 but prior mean is 0.7 with std 0.1. Suppose dataX ∼ Bin(n, ϑ) with n = 10 (yielding X = 3) and then n = 100 (yieldingX = 30, say).

0.0 0.2 0.4 0.6 0.8 1.0

02

46

810

theta

prob

abilit

y de

nsity

priorpost n=10

post n=100

As n increases, the likelihood overwhelms information in prior.Julien Berestycki (University of Oxford) SB2a MT 2016 17 / 20

Page 56: Foundations of Statistical Inferenceberestyc/teaching/lecture6.pdf · Foundations of Statistical Inference Julien Berestycki Department of Statistics ... By contrast, in SubjectiveBayesian

Example 2 - Conjugate priors

Normal distribution when the mean and variance are unknown.

Letτ = 1/σ2, θ = (τ, µ). τ is called the precision. The prior τ has aGamma distribution with parameters α, β > 0, and conditional on τ , µhas a distribution N(ν, 1

kτ ) for some k > 0, ν ∈ R.The prior is

π(τ, µ) =βα

Γ(α)τα−1e−βτ · (2π)−1/2(kτ)1/2 exp

{−kτ

2(µ− ν)2

}or

π(τ, µ) ∝ τα−1/2 exp[−τ{β +

k2

(µ− ν)2}]

The likelihood is

f (x | µ, τ) = (2π)−n/2τn/2 exp

{−τ

2

n∑i=1

(xi − µ)2

}

Julien Berestycki (University of Oxford) SB2a MT 2016 18 / 20

Page 57: Foundations of Statistical Inferenceberestyc/teaching/lecture6.pdf · Foundations of Statistical Inference Julien Berestycki Department of Statistics ... By contrast, in SubjectiveBayesian

Example 2 - Conjugate priors

Normal distribution when the mean and variance are unknown.Letτ = 1/σ2, θ = (τ, µ). τ is called the precision.

The prior τ has aGamma distribution with parameters α, β > 0, and conditional on τ , µhas a distribution N(ν, 1

kτ ) for some k > 0, ν ∈ R.The prior is

π(τ, µ) =βα

Γ(α)τα−1e−βτ · (2π)−1/2(kτ)1/2 exp

{−kτ

2(µ− ν)2

}or

π(τ, µ) ∝ τα−1/2 exp[−τ{β +

k2

(µ− ν)2}]

The likelihood is

f (x | µ, τ) = (2π)−n/2τn/2 exp

{−τ

2

n∑i=1

(xi − µ)2

}

Julien Berestycki (University of Oxford) SB2a MT 2016 18 / 20

Page 58: Foundations of Statistical Inferenceberestyc/teaching/lecture6.pdf · Foundations of Statistical Inference Julien Berestycki Department of Statistics ... By contrast, in SubjectiveBayesian

Example 2 - Conjugate priors

Normal distribution when the mean and variance are unknown.Letτ = 1/σ2, θ = (τ, µ). τ is called the precision. The prior τ has aGamma distribution with parameters α, β > 0, and conditional on τ , µhas a distribution N(ν, 1

kτ ) for some k > 0, ν ∈ R.

The prior is

π(τ, µ) =βα

Γ(α)τα−1e−βτ · (2π)−1/2(kτ)1/2 exp

{−kτ

2(µ− ν)2

}or

π(τ, µ) ∝ τα−1/2 exp[−τ{β +

k2

(µ− ν)2}]

The likelihood is

f (x | µ, τ) = (2π)−n/2τn/2 exp

{−τ

2

n∑i=1

(xi − µ)2

}

Julien Berestycki (University of Oxford) SB2a MT 2016 18 / 20

Page 59: Foundations of Statistical Inferenceberestyc/teaching/lecture6.pdf · Foundations of Statistical Inference Julien Berestycki Department of Statistics ... By contrast, in SubjectiveBayesian

Example 2 - Conjugate priors

Normal distribution when the mean and variance are unknown.Letτ = 1/σ2, θ = (τ, µ). τ is called the precision. The prior τ has aGamma distribution with parameters α, β > 0, and conditional on τ , µhas a distribution N(ν, 1

kτ ) for some k > 0, ν ∈ R.The prior is

π(τ, µ) =βα

Γ(α)τα−1e−βτ

· (2π)−1/2(kτ)1/2 exp{−kτ

2(µ− ν)2

}or

π(τ, µ) ∝ τα−1/2 exp[−τ{β +

k2

(µ− ν)2}]

The likelihood is

f (x | µ, τ) = (2π)−n/2τn/2 exp

{−τ

2

n∑i=1

(xi − µ)2

}

Julien Berestycki (University of Oxford) SB2a MT 2016 18 / 20

Page 60: Foundations of Statistical Inferenceberestyc/teaching/lecture6.pdf · Foundations of Statistical Inference Julien Berestycki Department of Statistics ... By contrast, in SubjectiveBayesian

Example 2 - Conjugate priors

Normal distribution when the mean and variance are unknown.Letτ = 1/σ2, θ = (τ, µ). τ is called the precision. The prior τ has aGamma distribution with parameters α, β > 0, and conditional on τ , µhas a distribution N(ν, 1

kτ ) for some k > 0, ν ∈ R.The prior is

π(τ, µ) =βα

Γ(α)τα−1e−βτ · (2π)−1/2(kτ)1/2 exp

{−kτ

2(µ− ν)2

}

or

π(τ, µ) ∝ τα−1/2 exp[−τ{β +

k2

(µ− ν)2}]

The likelihood is

f (x | µ, τ) = (2π)−n/2τn/2 exp

{−τ

2

n∑i=1

(xi − µ)2

}

Julien Berestycki (University of Oxford) SB2a MT 2016 18 / 20

Page 61: Foundations of Statistical Inferenceberestyc/teaching/lecture6.pdf · Foundations of Statistical Inference Julien Berestycki Department of Statistics ... By contrast, in SubjectiveBayesian

Example 2 - Conjugate priors

Normal distribution when the mean and variance are unknown.Letτ = 1/σ2, θ = (τ, µ). τ is called the precision. The prior τ has aGamma distribution with parameters α, β > 0, and conditional on τ , µhas a distribution N(ν, 1

kτ ) for some k > 0, ν ∈ R.The prior is

π(τ, µ) =βα

Γ(α)τα−1e−βτ · (2π)−1/2(kτ)1/2 exp

{−kτ

2(µ− ν)2

}or

π(τ, µ) ∝ τα−1/2 exp[−τ{β +

k2

(µ− ν)2}]

The likelihood is

f (x | µ, τ) = (2π)−n/2τn/2 exp

{−τ

2

n∑i=1

(xi − µ)2

}

Julien Berestycki (University of Oxford) SB2a MT 2016 18 / 20

Page 62: Foundations of Statistical Inferenceberestyc/teaching/lecture6.pdf · Foundations of Statistical Inference Julien Berestycki Department of Statistics ... By contrast, in SubjectiveBayesian

Example 2 - Conjugate priors

Normal distribution when the mean and variance are unknown.Letτ = 1/σ2, θ = (τ, µ). τ is called the precision. The prior τ has aGamma distribution with parameters α, β > 0, and conditional on τ , µhas a distribution N(ν, 1

kτ ) for some k > 0, ν ∈ R.The prior is

π(τ, µ) =βα

Γ(α)τα−1e−βτ · (2π)−1/2(kτ)1/2 exp

{−kτ

2(µ− ν)2

}or

π(τ, µ) ∝ τα−1/2 exp[−τ{β +

k2

(µ− ν)2}]

The likelihood is

f (x | µ, τ) = (2π)−n/2τn/2 exp

{−τ

2

n∑i=1

(xi − µ)2

}

Julien Berestycki (University of Oxford) SB2a MT 2016 18 / 20

Page 63: Foundations of Statistical Inferenceberestyc/teaching/lecture6.pdf · Foundations of Statistical Inference Julien Berestycki Department of Statistics ... By contrast, in SubjectiveBayesian

Example 2 - Conjugate priors

Thus

π(τ, µ | x) ∝ τα+(n/2)−1/2 exp

[−τ

{β +

k2

(µ− ν)2 +12

n∑i=1

(xi − µ)2

}]

Complete the square to see that

k(µ− ν)2 +∑

(xi − µ)2

= (k + n)

(µ− kν + nx̄

k + n

)2

+nk

n + k(x̄ − ν)2 +

∑(xi − x̄)2

Julien Berestycki (University of Oxford) SB2a MT 2016 19 / 20

Page 64: Foundations of Statistical Inferenceberestyc/teaching/lecture6.pdf · Foundations of Statistical Inference Julien Berestycki Department of Statistics ... By contrast, in SubjectiveBayesian

Example 2 - Conjugate priors

Thus

π(τ, µ | x) ∝ τα+(n/2)−1/2 exp

[−τ

{β +

k2

(µ− ν)2 +12

n∑i=1

(xi − µ)2

}]

Complete the square to see that

k(µ− ν)2 +∑

(xi − µ)2

= (k + n)

(µ− kν + nx̄

k + n

)2

+nk

n + k(x̄ − ν)2 +

∑(xi − x̄)2

Julien Berestycki (University of Oxford) SB2a MT 2016 19 / 20

Page 65: Foundations of Statistical Inferenceberestyc/teaching/lecture6.pdf · Foundations of Statistical Inference Julien Berestycki Department of Statistics ... By contrast, in SubjectiveBayesian

Example 2 - Completing the square

Your posterior dependence on µ is entirely in the factor

exp

[−τ

{β +

k2

(µ− ν)2 +12

n∑i=1

(xi − µ)2

}]

The idea is to try to write that as

c exp[−τ{β′ +

k ′

2(ν ′ − µ)2

}]so that (conditional on τ ) we recognize a Normal density.

k(µ− ν)2 +∑

(xi − µ)2

= µ2(k + n)− µ(2kν + 2∑

xi) + . . .

= (k + n)

(µ− kν + nx̄

k + n

)2

+nk

n + k(x̄ − ν)2 +

∑(xi − x̄)2

Julien Berestycki (University of Oxford) SB2a MT 2016 20 / 20

Page 66: Foundations of Statistical Inferenceberestyc/teaching/lecture6.pdf · Foundations of Statistical Inference Julien Berestycki Department of Statistics ... By contrast, in SubjectiveBayesian

Example 2 - Completing the square

Your posterior dependence on µ is entirely in the factor

exp

[−τ

{β +

k2

(µ− ν)2 +12

n∑i=1

(xi − µ)2

}]The idea is to try to write that as

c exp[−τ{β′ +

k ′

2(ν ′ − µ)2

}]so that (conditional on τ ) we recognize a Normal density.

k(µ− ν)2 +∑

(xi − µ)2

= µ2(k + n)− µ(2kν + 2∑

xi) + . . .

= (k + n)

(µ− kν + nx̄

k + n

)2

+nk

n + k(x̄ − ν)2 +

∑(xi − x̄)2

Julien Berestycki (University of Oxford) SB2a MT 2016 20 / 20

Page 67: Foundations of Statistical Inferenceberestyc/teaching/lecture6.pdf · Foundations of Statistical Inference Julien Berestycki Department of Statistics ... By contrast, in SubjectiveBayesian

Example 2 - Completing the square

Your posterior dependence on µ is entirely in the factor

exp

[−τ

{β +

k2

(µ− ν)2 +12

n∑i=1

(xi − µ)2

}]The idea is to try to write that as

c exp[−τ{β′ +

k ′

2(ν ′ − µ)2

}]so that (conditional on τ ) we recognize a Normal density.

k(µ− ν)2 +∑

(xi − µ)2

= µ2(k + n)− µ(2kν + 2∑

xi) + . . .

= (k + n)

(µ− kν + nx̄

k + n

)2

+nk

n + k(x̄ − ν)2 +

∑(xi − x̄)2

Julien Berestycki (University of Oxford) SB2a MT 2016 20 / 20

Page 68: Foundations of Statistical Inferenceberestyc/teaching/lecture6.pdf · Foundations of Statistical Inference Julien Berestycki Department of Statistics ... By contrast, in SubjectiveBayesian

Example 2 - Conjugate priors

Thus the posterior is

π(τ, µ | x) ∝ τα′−1/2 exp[−τ{β′ +

k ′

2(ν ′ − µ)2

}]where

α′ = α +n2

β′ = β +12· nk

n + k(x̄ − ν)2 +

12

∑(xi − x̄)2

k ′ = k + n

ν ′ =kν + nx̄

k + n

This is the same form as the prior, so the class is conjugate prior.

Julien Berestycki (University of Oxford) SB2a MT 2016 21 / 20

Page 69: Foundations of Statistical Inferenceberestyc/teaching/lecture6.pdf · Foundations of Statistical Inference Julien Berestycki Department of Statistics ... By contrast, in SubjectiveBayesian

Example 2 - Contd

If we are interested in the posterior distribution of µ alone

π(µ|x) =

∫π(τ, µ|x)dτ

=

∫π(µ|τ, x)π(τ |x)dτ

Here, we have a simplification if we assume 2α = m ∈ N. Then

τ = W/2β, µ = ν + Z/√

with W ∼ χ2m and Z ∼ N(0,1). Recall that Z

√m/W ∼ tm (Studen with

m d.f.) we see that the prior of µ is√km2β

(µ− ν) ∼ tm

and the posterior is√k ′m′

2β′(µ− ν ′) ∼ tm′ , m′ = m + n

Julien Berestycki (University of Oxford) SB2a MT 2016 22 / 20

Page 70: Foundations of Statistical Inferenceberestyc/teaching/lecture6.pdf · Foundations of Statistical Inference Julien Berestycki Department of Statistics ... By contrast, in SubjectiveBayesian

Example 2 - Contd

If we are interested in the posterior distribution of µ alone

π(µ|x) =

∫π(τ, µ|x)dτ =

∫π(µ|τ, x)π(τ |x)dτ

Here, we have a simplification if we assume 2α = m ∈ N. Then

τ = W/2β, µ = ν + Z/√

with W ∼ χ2m and Z ∼ N(0,1).

Recall that Z√

m/W ∼ tm (Studen withm d.f.) we see that the prior of µ is√

km2β

(µ− ν) ∼ tm

and the posterior is√k ′m′

2β′(µ− ν ′) ∼ tm′ , m′ = m + n

Julien Berestycki (University of Oxford) SB2a MT 2016 22 / 20

Page 71: Foundations of Statistical Inferenceberestyc/teaching/lecture6.pdf · Foundations of Statistical Inference Julien Berestycki Department of Statistics ... By contrast, in SubjectiveBayesian

Example 2 - Contd

If we are interested in the posterior distribution of µ alone

π(µ|x) =

∫π(τ, µ|x)dτ =

∫π(µ|τ, x)π(τ |x)dτ

Here, we have a simplification if we assume 2α = m ∈ N. Then

τ = W/2β, µ = ν + Z/√

with W ∼ χ2m and Z ∼ N(0,1). Recall that Z

√m/W ∼ tm (Studen with

m d.f.) we see that the prior of µ is√km2β

(µ− ν) ∼ tm

and the posterior is√k ′m′

2β′(µ− ν ′) ∼ tm′ , m′ = m + n

Julien Berestycki (University of Oxford) SB2a MT 2016 22 / 20

Page 72: Foundations of Statistical Inferenceberestyc/teaching/lecture6.pdf · Foundations of Statistical Inference Julien Berestycki Department of Statistics ... By contrast, in SubjectiveBayesian

Example: estimating the probability of female birthgiven placenta previa

Result of german study: 980 birth, 437 females. In general populationthe proportion is 0.485.

Using a uniform (Beta(1,1)) prior, posterior is Beta(438,544).

post. mean = 0.446 post. std dev = 0.016central 95% post. interval = [0.415,0.477]

Sensibility to proposed prior. α/β − 2 = “prior sample size".

Julien Berestycki (University of Oxford) SB2a MT 2016 23 / 20

Page 73: Foundations of Statistical Inferenceberestyc/teaching/lecture6.pdf · Foundations of Statistical Inference Julien Berestycki Department of Statistics ... By contrast, in SubjectiveBayesian

Example: estimating the probability of female birthgiven placenta previa

Result of german study: 980 birth, 437 females. In general populationthe proportion is 0.485.Using a uniform (Beta(1,1)) prior, posterior is Beta(438,544).

post. mean = 0.446 post. std dev = 0.016central 95% post. interval = [0.415,0.477]

Sensibility to proposed prior. α/β − 2 = “prior sample size".

Julien Berestycki (University of Oxford) SB2a MT 2016 23 / 20

Page 74: Foundations of Statistical Inferenceberestyc/teaching/lecture6.pdf · Foundations of Statistical Inference Julien Berestycki Department of Statistics ... By contrast, in SubjectiveBayesian

Example: estimating the probability of female birthgiven placenta previa

Result of german study: 980 birth, 437 females. In general populationthe proportion is 0.485.Using a uniform (Beta(1,1)) prior, posterior is Beta(438,544).

post. mean = 0.446 post. std dev = 0.016central 95% post. interval = [0.415,0.477]

Sensibility to proposed prior. α/β − 2 = “prior sample size".

Julien Berestycki (University of Oxford) SB2a MT 2016 23 / 20