Introduction to Bayesian statistics

Introduction to Bayesian statistics Three approaches to Probability

Axiomatic Probability by definition and properties

Relative Frequency Repeated trials

Degree of belief (subjective) Personal measure of uncertainty

Problems The chance that a meteor strikes earth is 1% The probability of rain today is 30% The chance of getting an A on the exam is 50%

Problems of statistical inference Ho: θ=1 versus Ha: θ>1 Classical approach

P-value = P(Data | θ=1) P-value is NOT P(Null hypothesis is true) Confidence interval [a, b] : What does it mean?

But scientist wants to know: P(θ=1 | Data) P(Ho is true) = ?

Problem θ “not random”

Bayesian statistics Fundamental change in philosophy Θ assumed to be a random variable Allows us to assign a probability distribution

for θ based on prior information 95% “confidence” interval [1.34 < θ < 2.97]

means what we “want” it to mean: P(1.34 < θ < 2.97) = 95%

P-values mean what we want them to mean: P(Null hypothesis is false)

Estimating P(Heads) for a biased coin Parameter p Data: 0, 0, 0, 1, 0, 1, 0, 0, 1, 0 p = 3/10 = 0.3 But what if we believe

coin is biased in favor

of low probabilities? How to incorporate prior beliefs into model We’ll see that p-hat = .22

0.1 0.2 0.3 0.4

0.2

0.4

0.6

0.8

1

Bayes Theorem

( and )( | )

( )

( | ) ( )

( )

( | ) ( )

( | ) ( ) ( | ) ( )C C

P A BP A B

P B

P B A P A

P B

P B A P A

P B A P A P B A P A

Example Population has 10% liars Lie Detector gets it “right” 90% of the time. Let A = {Actual Liar}, Let R = {Lie Detector reports you are Liar} Lie Detector reports suspect is a liar. What is

probability that suspect actually is a liar?

( | ) ( )( | )

( | ) ( ) ( | ) ( )

(.90)(.10) 1!!!!!

(.90)(.10) (.10)(.90) 2

C C

P L A P AP A L

P L A P A P L A P A

More general form of Bayes Theorem

1

If , then

( and ) ( | ) ( )( | )

( ) ( )

( | ) ( )

( | ) ( )

n

ii

i i ii

i i

j jj

S A

P A B P B A P AP A B

P B P B

P A A P A

P B A P A

Example Three urns

Urn A: 1 red, 1 blue Urn B: 2 reds, 1 blue Urn C: 2 reds, 3 blues Roll a fair die. If it’s 1, pick Urn A. If 2 or 3, pick Urn B. If 4, 5, 6, pick

Urn C. Then choose one ball. A ball was chosen and it’s red. What’s the probability it came from

Urn C?

( | )

( | ) ( )

( | ) ( ) ( | ) ( ) ( | ) ( )

(2 / 5)(3 / 6)0.3956

(1/ 2)(1/ 6) (2 / 3)(2 / 6) (2 / 5)(3 / 6)

P C red

P red C P C

P red A P A P red B P B P red C P C

Bayes Theorem for Statistics Let θ represent parameter(s) Let X represent data

Left-hand side is a function of θ Denominator on right-hand side does not depend on θ

Posterior distribution Likelihood x Prior distribution Posterior dist’n = Constant x Likelihood x Prior dist’n Equation can be understood at the level of densities Goal: Explore the posterior distribution of θ

( | ) ( | ) ( ) / ( )f X f X f f X

( | ) ( | ) ( )f X f X f

A simple estimation example

Biased coin estimation: P(Heads) = p = ? 0-1 i.i.d. Bernoulli(p) trials Let be the number of heads in n trials Likelihood is For prior distribution use uninformative prior

Uniform distribution on (0,1): f(p) = 1 So posterior distribution is proportional to

f(X|p)f(p) = f(p|X)

1, , nX X

( | ) (1 )X n Xf X p p p

(1 )X n Xp p (1 )X n Xp p

iX X

Coin estimation (cont’d)

Posterior density of the form f(p)=Cpx(1-p)n-x

Beta distribution: Parameters x+1 and n-x+1 http://

mathworld.wolfram.com/BetaDistribution.html Data: 0, 0, 1, 0, 0, 0, 0, 1, 0, 1 n=10 and x=3 Posterior dist’n is Beta(3+1,7+1) = Beta(4,8)

Coin estimation (cont’d) Posterior dist’n: Beta(4,8) Mean: 0.33 Mode: 0.30 Median: 0.3238 qbeta(.025,4,8),

qbeta(.975,4,8)

= [.11, .61] gives 95%

credible interval for p P(.11 < p < .61|X) = .95

Prior distribution Choice of beta distribution for prior

Posterior Likelihood x Prior

= [ px(1-p)n-x ] [ pa+1(1-p)b+1 ]

= px+a+1(1-p)n-x+b+1

Posterior distribution is Beta(x+a, n-x+b)

Prior distributions

Posterior summaries: Mean = (x+a)/(n+a+b) Mode = (x+a-1)/(n+a+b-2) Quantiles can be computed by integrating the

beta density For this example, prior and posterior

distributions have same general form Priors which have the same form as the

posteriors are called conjugate priors

Data example Maternal condition placenta previa Unusual condition of pregnancy where placenta is

implanted very low in uterus preventing normal delivery

Is this related to the sex of the baby? Proportion of female births in general population is

0.485 Early study in Germany found that in 980 placenta

previa births, 437 were female (0.4459) Ho: p = 0.485 versus Ha: p < 0.485

Placenta previa births Assume uniform prior Beta(1,1) Posterior is Beta(438,544) Posterior summaries

Mean = 0.446, Standard Deviation = 0.016 95% confidence interval: [ qbeta(.025,438,544),

qbeta(.975,438,544) ] = [ .415, .477 ]

Sensitivity of Prior Suppose we took a prior more concentrated

about the null

hypothesis value E.g., Prior ~ Normal(.485,.01) Posterior proportional to

Constant of integration is about 10-294

Mean, summary statistics, confidence intervals, etc., require numerical methods

See S-script: http://www.people.carleton.edu/~rdobrow/courses/275w05/Scripts/Bayes.ssc

2( .485)437 543 2*.01(1 )

p

p p e

Introduction to Bayesian statistics

Documents

Transcript of Introduction to Bayesian statistics