Frequentist versus Bayesian. Glen CowanStatistics in HEP, IoP Half Day Meeting, 16 November 2005,...

37
Frequentist versus Bayesian

Transcript of Frequentist versus Bayesian. Glen CowanStatistics in HEP, IoP Half Day Meeting, 16 November 2005,...

Page 1: Frequentist versus Bayesian. Glen CowanStatistics in HEP, IoP Half Day Meeting, 16 November 2005, Manchester The Bayesian approach In Bayesian statistics.

Frequentist versus Bayesian

Page 2: Frequentist versus Bayesian. Glen CowanStatistics in HEP, IoP Half Day Meeting, 16 November 2005, Manchester The Bayesian approach In Bayesian statistics.
Page 3: Frequentist versus Bayesian. Glen CowanStatistics in HEP, IoP Half Day Meeting, 16 November 2005, Manchester The Bayesian approach In Bayesian statistics.

Glen Cowan Statistics in HEP, IoP Half Day Meeting, 16 November 2005, Manchester

The Bayesian approach

In Bayesian statistics we can associate a probability witha hypothesis, e.g., a parameter value .

Interpret probability of as ‘degree of belief’ (subjective).

Need to start with ‘prior pdf’ (), this reflects degree of belief about before doing the experiment.

Our experiment has data x, → likelihood function L(x|).

Bayes’ theorem tells how our beliefs should be updated inlight of the data x:

Posterior pdf p(|x) contains all our knowledge about .

Page 4: Frequentist versus Bayesian. Glen CowanStatistics in HEP, IoP Half Day Meeting, 16 November 2005, Manchester The Bayesian approach In Bayesian statistics.

Glen Cowan Statistics in HEP, IoP Half Day Meeting, 16 November 2005, Manchester

Case #4: Bayesian method

We need to associate prior probabilities with 0 and 1, e.g.,

Putting this into Bayes’ theorem gives:

posterior Q likelihood prior

← based on previous measurement

reflects ‘prior ignorance’, in anycase much broader than

Page 5: Frequentist versus Bayesian. Glen CowanStatistics in HEP, IoP Half Day Meeting, 16 November 2005, Manchester The Bayesian approach In Bayesian statistics.

Glen Cowan Statistics in HEP, IoP Half Day Meeting, 16 November 2005, Manchester

Bayesian method (continued)

Ability to marginalize over nuisance parameters is an importantfeature of Bayesian statistics.

We then integrate (marginalize) p(0, 1 | x) to find p(0 | x):

In this example we can do the integral (rare). We find

Page 6: Frequentist versus Bayesian. Glen CowanStatistics in HEP, IoP Half Day Meeting, 16 November 2005, Manchester The Bayesian approach In Bayesian statistics.

Bayesian Statistics at work: The Troublesome Extraction of the

angle

Stéphane T’JAMPENS

LAPP (CNRS/IN2P3 & Université de Savoie)

J. Charles, A. Hocker, H. Lacker, F.R. Le Diberder, S. T’Jampens, hep-ph-0607246

Page 7: Frequentist versus Bayesian. Glen CowanStatistics in HEP, IoP Half Day Meeting, 16 November 2005, Manchester The Bayesian approach In Bayesian statistics.

Frequentist: probability about the data (randomness of measurements), given the model

P(data|model)

Hypothesis testing: given a model, assess the consistency of the data with a particular parameter value 1-CL curve (by varying the parameter value)

[only repeatable events (Sampling Theory)]

Statistics tries answering a wide variety of questions two main different! frameworks:

Digression: StatisticsD.R. Cox, Principles of Statistical Inference, CUP (2006)

W.T. Eadie et al., Statistical Methods in Experimental Physics, NHP (1971)

www.phystat.org

Bayesian: probability about the model (degree of belief), given the data

P(model|data) Likelihood(data,model) Prior(model)

Page 8: Frequentist versus Bayesian. Glen CowanStatistics in HEP, IoP Half Day Meeting, 16 November 2005, Manchester The Bayesian approach In Bayesian statistics.

Bayesian Statistics in 1 slide

Bayesian: probability about the model (degree of belief), given the data

P(model|data) Likelihood(data;model) Prior(model)

“it treats information derived from data (“likelihood”) as on exactly equal footing with probabilities derived from vague and unspecified sources (“prior”). The assumption that all aspects of uncertainties are directly comparable is often unacceptable.”

“nothing guarantees that my uncertainty assessment is any good for you - I'm just expressing an opinion (degree of belief). To convince you that it's a good uncertainty assessment, I need to show that the statistical model I created makes good predictions in situations where we know what the truth is, and the process of calibrating predictions against reality is inherently frequentist.”(e.g., MC simulations)

Bayes’rule

The Bayesian approach is based on the use of inverse probability (“posterior”):

Cox – Principles of Statistical Inference (2006)

Page 9: Frequentist versus Bayesian. Glen CowanStatistics in HEP, IoP Half Day Meeting, 16 November 2005, Manchester The Bayesian approach In Bayesian statistics.

Uniform prior: model of ignorance?

A central problem : specifying a prior distribution for a parameter about which nothing is known flat prior

Problems:

Not re-parametrization invariant (metric dependent): uniform in is not uniform in z=cos

Favors large values too much [the prior probability for the range 0.1 to 1 is 10 times less than for 1 to 10]

Flat priors in several dimensions may produce clearly unacceptable answers.

In simple problems, appropriate* flat priors yield essentially same answer as non-Bayesian sampling theory. However, in other situations, particularly those involving more than two parameters, ignorance priors lead to different and entirely unacceptable answers.* (uniform prior for scalar location parameter, Jeffreys’ prior for scalar scale parameter).

Cox – Principles of Statistical Inference (2006)

Page 10: Frequentist versus Bayesian. Glen CowanStatistics in HEP, IoP Half Day Meeting, 16 November 2005, Manchester The Bayesian approach In Bayesian statistics.

Hypersphere:

One knows nothing about the individual Cartesian coordinates x,y,z…

What do we known about the radius r =√(x^2+y^2+…) ?

One has achieved the remarkable feat of learning something about the radius of the hypersphere, whereas one knew nothing about the Cartesian coordinates and without making any experiment.

6D space

Uniform Prior in Multidimensional Parameter Space

Page 11: Frequentist versus Bayesian. Glen CowanStatistics in HEP, IoP Half Day Meeting, 16 November 2005, Manchester The Bayesian approach In Bayesian statistics.

Isospin Analysis : B→hh J. Charles et al. – hep-ph/0607246

Gronau/London (1990)

MA: Modulus & ArgumentRI: Real & Imaginary

Improper posterior

Page 12: Frequentist versus Bayesian. Glen CowanStatistics in HEP, IoP Half Day Meeting, 16 November 2005, Manchester The Bayesian approach In Bayesian statistics.

Isospin Analysis: removing information from B0→00

No model-independent constraint on can be inferred in this case

Information is extracted on , which is introduced by the priors (where else?)

Page 13: Frequentist versus Bayesian. Glen CowanStatistics in HEP, IoP Half Day Meeting, 16 November 2005, Manchester The Bayesian approach In Bayesian statistics.

Conclusion

Statistics is not a science, it is mathematics (Nature will not decide for us) [You will not learn it in Physics books go to the professional literature!]

Many attempts to define “ignorance” prior to “let the data speak by themselves” but none convincing. Priors are informative.

Quite generally a prior that gives results that are reasonable from various viewpoints for a single parameter will have unappealing features if applied independently to many parameters.

In a multiparameter space, credible Bayesian intervals generally under-cover.

If the problem has some invariance properties, then the prior should have the corresponding structure.specification of priors is fraught with pitfalls (especially in high dimensions).

Examine the consequences of your assumptions (metric, priors, etc.)Check for robustness: vary your assumptionsExploring the frequentist properties of the result should be strongly encouraged.

PHYSTAT Conferences:

http://www.phystat.org

Page 14: Frequentist versus Bayesian. Glen CowanStatistics in HEP, IoP Half Day Meeting, 16 November 2005, Manchester The Bayesian approach In Bayesian statistics.
Page 15: Frequentist versus Bayesian. Glen CowanStatistics in HEP, IoP Half Day Meeting, 16 November 2005, Manchester The Bayesian approach In Bayesian statistics.
Page 16: Frequentist versus Bayesian. Glen CowanStatistics in HEP, IoP Half Day Meeting, 16 November 2005, Manchester The Bayesian approach In Bayesian statistics.
Page 17: Frequentist versus Bayesian. Glen CowanStatistics in HEP, IoP Half Day Meeting, 16 November 2005, Manchester The Bayesian approach In Bayesian statistics.
Page 18: Frequentist versus Bayesian. Glen CowanStatistics in HEP, IoP Half Day Meeting, 16 November 2005, Manchester The Bayesian approach In Bayesian statistics.
Page 19: Frequentist versus Bayesian. Glen CowanStatistics in HEP, IoP Half Day Meeting, 16 November 2005, Manchester The Bayesian approach In Bayesian statistics.
Page 20: Frequentist versus Bayesian. Glen CowanStatistics in HEP, IoP Half Day Meeting, 16 November 2005, Manchester The Bayesian approach In Bayesian statistics.
Page 21: Frequentist versus Bayesian. Glen CowanStatistics in HEP, IoP Half Day Meeting, 16 November 2005, Manchester The Bayesian approach In Bayesian statistics.
Page 22: Frequentist versus Bayesian. Glen CowanStatistics in HEP, IoP Half Day Meeting, 16 November 2005, Manchester The Bayesian approach In Bayesian statistics.
Page 23: Frequentist versus Bayesian. Glen CowanStatistics in HEP, IoP Half Day Meeting, 16 November 2005, Manchester The Bayesian approach In Bayesian statistics.
Page 24: Frequentist versus Bayesian. Glen CowanStatistics in HEP, IoP Half Day Meeting, 16 November 2005, Manchester The Bayesian approach In Bayesian statistics.
Page 25: Frequentist versus Bayesian. Glen CowanStatistics in HEP, IoP Half Day Meeting, 16 November 2005, Manchester The Bayesian approach In Bayesian statistics.
Page 26: Frequentist versus Bayesian. Glen CowanStatistics in HEP, IoP Half Day Meeting, 16 November 2005, Manchester The Bayesian approach In Bayesian statistics.

αα[[ππππ] : ] : B-factories status LP07 B-factories status LP07

Page 27: Frequentist versus Bayesian. Glen CowanStatistics in HEP, IoP Half Day Meeting, 16 November 2005, Manchester The Bayesian approach In Bayesian statistics.

A+0

A+0

• B+0 |A+0|= |A+0|

Isospin analysis : reminderIsospin analysis : reminder

√2 A+0 = √2 A(Bu π+π0) = e-iα (T+- +T00) √2 A+0 = e+iα (T+- +T00)

A+- = A(Bd π+π-) = e-iα T+- + P+- A+- = e+iα T+- + P+-

√2 A00 = √2 A(Bd π0π0) = e-iα T00 - P+- √2 A00 = e+iα T00 - P+-

ΔΦΔΦ=2=2αα

ΔΦΔΦ=2=2ααeffeff

• Neglecting EW penguin, the amplitude of the SU(2)-related Bππ modes is :

• SU(2) triangular relation : A+0 = A+-/ √2 + A00

• Same for Bρρ decay dominated by longitudinal polarized ρ (CP-even fs)

• S+- sin(2αeff ) 2-fold αeff in [0,π]

• B00, C00 |A00|,|A00|

A00

A00

A+-/√2

A+-/√2

• B+-, C+- |A+-|,|A+-|

Closing SU(2) triangle Closing SU(2) triangle 8-fold 8-fold αα

α

• SS0000 relative phase between A00 & A00

Re

Im

Page 28: Frequentist versus Bayesian. Glen CowanStatistics in HEP, IoP Half Day Meeting, 16 November 2005, Manchester The Bayesian approach In Bayesian statistics.

BbarBbar

BB

PiPiPiPi RhoRho RhoRho RhoRho RhoRho CC0000 but noS but noS0000 no Cno C0000/S/S0000 CC0000 AND S AND S0000

• Sin(2αeff) from B (π/ρ)+ (π/ρ)- 2 solutions for αeff in [0,π]• Δα = α-αeff from SU(2) B/Bbar triangles 1 ,2 or 4 solutions for Δα (dep. on triangles closure)

2, 4 or 8 solutions for 2, 4 or 8 solutions for αα = = ααeff eff + + ΔαΔα

4-fold Δα

2-fold Δα 1-fold Δα (‘plateau’)A00/A+0

A+-/√2/A+0

1-fold Δα (peak)

Isospin analysis : reminderIsospin analysis : reminder

Page 29: Frequentist versus Bayesian. Glen CowanStatistics in HEP, IoP Half Day Meeting, 16 November 2005, Manchester The Bayesian approach In Bayesian statistics.

Developments in Bayesian Priors

Roger Barlow

Manchester IoP meeting

November 16th 2005

Page 30: Frequentist versus Bayesian. Glen CowanStatistics in HEP, IoP Half Day Meeting, 16 November 2005, Manchester The Bayesian approach In Bayesian statistics.

Plan

• Probability– Frequentist– Bayesian

• Bayes Theorem– Priors

• Prior pitfalls (1): Le Diberder• Prior pitfalls (2): Heinrich• Jeffreys’ Prior

– Fisher Information

• Reference Priors: Demortier

Page 31: Frequentist versus Bayesian. Glen CowanStatistics in HEP, IoP Half Day Meeting, 16 November 2005, Manchester The Bayesian approach In Bayesian statistics.

Probability

Probability as limit of frequency

P(A)= Limit NA/Ntotal

Usual definition taught to students

Makes sense

Works well most of the time-

But not all

Page 32: Frequentist versus Bayesian. Glen CowanStatistics in HEP, IoP Half Day Meeting, 16 November 2005, Manchester The Bayesian approach In Bayesian statistics.

Frequentist probability

“It will probably rain tomorrow.”

“ Mt=174.3±5.1 GeV means the top quark mass lies between 169.2 and 179.4, with 68% probability.”

“The statement ‘It will rain tomorrow.’ is probably true.”

“Mt=174.3±5.1 GeV means: the top quark mass lies between 169.2 and 179.4, at 68% confidence.”

Page 33: Frequentist versus Bayesian. Glen CowanStatistics in HEP, IoP Half Day Meeting, 16 November 2005, Manchester The Bayesian approach In Bayesian statistics.

Bayesian Probability

P(A) expresses my belief that A is true

Limits 0(impossible) and 1 (certain)

Calibrated off clear-cut instances (coins, dice, urns)

Page 34: Frequentist versus Bayesian. Glen CowanStatistics in HEP, IoP Half Day Meeting, 16 November 2005, Manchester The Bayesian approach In Bayesian statistics.

Frequentist versus Bayesian?

Two sorts of probability – totally different. (Bayesian probability also known as Inverse Probability.)

Rivals? Religious differences? Particle Physicists tend to be frequentists.

Cosmologists tend to be BayesiansNo. Two different tools for practitionersImportant to:• Be aware of the limits and pitfalls of both• Always be aware which you’re using

Page 35: Frequentist versus Bayesian. Glen CowanStatistics in HEP, IoP Half Day Meeting, 16 November 2005, Manchester The Bayesian approach In Bayesian statistics.

Bayes Theorem (1763)

P(A|B) P(B) = P(A and B) = P(B|A) P(A)

P(A|B)=P(B|A) P(A)

P(B)

Frequentist use eg Čerenkov counter

P( | signal)=P(signal | ) P() / P(signal)

Bayesian use

P(theory |data) = P(data | theory) P(theory)

P(data)

Page 36: Frequentist versus Bayesian. Glen CowanStatistics in HEP, IoP Half Day Meeting, 16 November 2005, Manchester The Bayesian approach In Bayesian statistics.

Bayesian Prior

P(theory) is the Prior

Expresses prior belief theory is true

Can be function of parameter:

P(Mtop), P(MH), P(α,β,γ)

Bayes’ Theorem describes way prior belief is modified by experimental data

But what do you take as initial prior?

Page 37: Frequentist versus Bayesian. Glen CowanStatistics in HEP, IoP Half Day Meeting, 16 November 2005, Manchester The Bayesian approach In Bayesian statistics.

Uniform Prior

General usage: choose P(a) uniform in a(principle of insufficient reason)

Often ‘improper’: ∫P(a)da =∞. Though posterior P(a|x) comes out sensible

BUT!If P(a) uniform, P(a2) , P(ln a) , P(√a).. are notInsufficient reason not valid (unless a is ‘most

fundamental’ – whatever that means)Statisticians handle this: check results for

‘robustness’ under different priors