GEOGG121: Bayes’ Theorem & Monte Carlo · •Intro to Bayes’ Theorem –Science and scientific...

42
GEOGG121: Bayes’ Theorem & Monte Carlo Dr. Mathias (Mat) Disney UCL Geography Office: 113, Pearson Building Tel: 7670 0592 Email: [email protected] www.geog.ucl.ac.uk/~mdisney

Transcript of GEOGG121: Bayes’ Theorem & Monte Carlo · •Intro to Bayes’ Theorem –Science and scientific...

Page 1: GEOGG121: Bayes’ Theorem & Monte Carlo · •Intro to Bayes’ Theorem –Science and scientific thinking –Probability & Bayes Theorem –why is it important? –Frequentistsv

GEOGG121: Bayes’ Theorem &Monte CarloDr. Mathias (Mat) DisneyUCL GeographyOffice: 113, Pearson BuildingTel: 7670 0592Email: [email protected]/~mdisney

Page 2: GEOGG121: Bayes’ Theorem & Monte Carlo · •Intro to Bayes’ Theorem –Science and scientific thinking –Probability & Bayes Theorem –why is it important? –Frequentistsv

• Intro to Bayes’ Theorem– Science and scientific thinking– Probability & Bayes Theorem – why is it important?– Frequentists v Bayesian– Background, rationale– Methods– Advantages / disadvantages

• Applications: – parameter estimation, uncertainty– Practical – basic Bayesian estimation

Lecture outline

Page 3: GEOGG121: Bayes’ Theorem & Monte Carlo · •Intro to Bayes’ Theorem –Science and scientific thinking –Probability & Bayes Theorem –why is it important? –Frequentistsv

Reading and browsingBayesian methods, data analysis• Gauch, H., 2002, Scientific Method in Practice, CUP.• Sivia, D. S., with Skilling, J. (2008) Data Analysis, 2nd ed., OUP, Oxford.

• Shih and Kochanski (2006) Bayes Theorem teaching notes: a very nice short intro to Bayes Theorem: http://kochanski.org/gpk/teaching/0401Oxford/Bayes.pdf

Computational• Press et al. (1992) Numerical Methods in C, 2nd ed – see

http://apps.nrbook.com/c/index.html• Flake, W. G. (2000) Computational Beauty of Nature, MIT Press.• Gershenfeld, N. (2002) The Nature of Mathematical Modelling,, CUP.• Wainwright, J. and Mulligan, M. (2004) (eds) Environmental Modelling:

Finding Simplicity in Complexity, John Wiley and Sons.

Page 4: GEOGG121: Bayes’ Theorem & Monte Carlo · •Intro to Bayes’ Theorem –Science and scientific thinking –Probability & Bayes Theorem –why is it important? –Frequentistsv

Reading and browsingPapers, articles, links

Reproducibility, problems with P-values etc. • Ioannidis, J. P. A. (2005) Why most published research findings are false, PLoS Medicine,

0101-0106.• Siegfried, T. (2010) “Odds are it’s wrong”, Science News, 107(7),

http://www.sciencenews.org/view/feature/id/57091/title/Odds_Are,_Its_Wrong• 5 ways to fix statistics (2017) https://www.nature.com/articles/d41586-017-07522-

z?utm_source=TWT_NatureNews&sf174718406=1

Bayes• Hill, R. (2004) Multiple sudden infant deaths – coincidence or beyond coincidence, Pediatric

and Perinatal Epidemiology, 18, 320-326 (http://www.cse.salford.ac.uk/staff/RHill/ppe_5601.pdf)

• http://betterexplained.com/articles/an-intuitive-and-short-explanation-of-bayes-theorem/• http://yudkowsky.net/rational/bayes• http://kochanski.org/gpk/teaching/0401Oxford/Bayes.pdf

The false positive risk calculator: http://fpr-calc.ucl.ac.uk/

Page 5: GEOGG121: Bayes’ Theorem & Monte Carlo · •Intro to Bayes’ Theorem –Science and scientific thinking –Probability & Bayes Theorem –why is it important? –Frequentistsv

• Carry out experiments?• Collect observations?• Test hypotheses (models)?• Generate “understanding”?• Objective knowledge??• Induction? Deduction?

So how do we do science?

Page 6: GEOGG121: Bayes’ Theorem & Monte Carlo · •Intro to Bayes’ Theorem –Science and scientific thinking –Probability & Bayes Theorem –why is it important? –Frequentistsv

• Deduction– Inference, by reasoning, from general to particular– E.g. Premises: i) every mammal has a heart; ii)

every horse is a mammal. – Conclusion: Every horse has a heart.

– Valid if the truth of premises guarantees truth of conclusions & false otherwise.

– Conclusion is either true or false

Induction and deduction

Page 7: GEOGG121: Bayes’ Theorem & Monte Carlo · •Intro to Bayes’ Theorem –Science and scientific thinking –Probability & Bayes Theorem –why is it important? –Frequentistsv

• Induction– Process of inferring general principles from

observation of particular cases– E.g. Premise: every horse that has ever been

observed has a heart– Conclusion: Every horse has a heart.

– Conclusion goes beyond information present, even implicitly, in premises

– Conclusions have a degree of strength (weak -> near certain).

Induction and deduction

Page 8: GEOGG121: Bayes’ Theorem & Monte Carlo · •Intro to Bayes’ Theorem –Science and scientific thinking –Probability & Bayes Theorem –why is it important? –Frequentistsv

Induction and deduction

• Example from Gauch (2003: 219) which we will return to:– Q1: Given a fair coin (P(H) = 0.5), what is P that 100

tosses will produce 45 heads and 55 tails?– Q2: Given that 100 tosses yield 45 heads and 55 tails,

what is the P that it is a fair coin?• Q1 is deductive: definitive answer – probability

• Q2 is inductive: no definitive answer – statistics– Oh dear: this is what we usually get in science

Page 9: GEOGG121: Bayes’ Theorem & Monte Carlo · •Intro to Bayes’ Theorem –Science and scientific thinking –Probability & Bayes Theorem –why is it important? –Frequentistsv

• Informally, the Bayesian Q is:– “What is the probability (P) that a hypothesis (H) is

true, given the data and any prior knowledge?”– Weighs different hypotheses (models) in the light of

data• The frequentist Q is:

– “How reliable is an inference procedure, by virtue of not rejecting a true hypothesis or accepting a false hypothesis?”

– Weighs procedures (different sets of data) in the light of hypothesis

Bayes: see Gauch (2003) ch 5

Page 10: GEOGG121: Bayes’ Theorem & Monte Carlo · •Intro to Bayes’ Theorem –Science and scientific thinking –Probability & Bayes Theorem –why is it important? –Frequentistsv

• To Bayes, Laplace, Bernoulli…:– P represents a ‘degree-of-belief’ or plausibility– i.e. degree of truth, based on evidence at hand

• BUT this appears to be subjective, so P was redefined (Fisher, Neyman, Pearson etc.) :– P is the ‘long-run relative frequency’ with which an event

occurs, given (infinite) repeated expts.– We can measure frequencies, so P now an objective tool for

dealing with random phenomena

• BUT we do NOT have infinite repeated expts…?

Probability? see S&S(1006) p9

Page 11: GEOGG121: Bayes’ Theorem & Monte Carlo · •Intro to Bayes’ Theorem –Science and scientific thinking –Probability & Bayes Theorem –why is it important? –Frequentistsv

• The “chief rule involved in the process of learning from experience” (Jefferys, 1983)

• Formally:

• P(H|D) = Posterior i.e. probability of hypothesis (model) H being true, given data D

• P(D|H) = Likelihood i.e probability of data D being observed if H is true

• P(H) = Prior i.e. probability of hypothesis being true before measurement of D

Bayes’ Theorem

P H |D( )∝P D |H( )×P H( )

Page 12: GEOGG121: Bayes’ Theorem & Monte Carlo · •Intro to Bayes’ Theorem –Science and scientific thinking –Probability & Bayes Theorem –why is it important? –Frequentistsv

• Prior?– What is known beyond the particular experiment at

hand, which may be substantial or negligible• We all have priors: assumptions, experience,

other pieces of evidence• Bayes approach explicitly requires you to

assign a probability to your prior (somehow)• Bayesian view

– probability as degree of belief rather than a frequency of occurrence (in the long run…)

Bayes: see Gauch (2003) ch 5

Page 13: GEOGG121: Bayes’ Theorem & Monte Carlo · •Intro to Bayes’ Theorem –Science and scientific thinking –Probability & Bayes Theorem –why is it important? –Frequentistsv

• Importance? P(H|D) appears on the left of BT• i.e. BT solves the inverse (inductive) problem –

probability of a hypothesis given some data• This is how we do science in practice• We don’t have access to infinite repetitions of

expts (the ‘long run frequency’ view)

Bayes’ Theorem

Page 14: GEOGG121: Bayes’ Theorem & Monte Carlo · •Intro to Bayes’ Theorem –Science and scientific thinking –Probability & Bayes Theorem –why is it important? –Frequentistsv

• I is background (or conditioning) information as there is ‘no such thing as absolute probability’ (see S & S p 5)

• P(rain today) will depend on clouds this morning, whether we saw forecast etc. etc. – I is usually left out but ….

• Power of Bayes’ Theorem– Relates the quantity of interest i.e. P of H being true given D, to

that which we might estimate in practice i.e. P of observing D, given H is correct

Bayes’ Theorem

P Hypoth. |Data, I( )∝P Data |Hypoth., I( )×P Hypoth. | I( )

Page 15: GEOGG121: Bayes’ Theorem & Monte Carlo · •Intro to Bayes’ Theorem –Science and scientific thinking –Probability & Bayes Theorem –why is it important? –Frequentistsv

• To go from to µ to = we need to divide by P(D|I)

• Where P(D|I) is known as the ‘Evidence’• Normalisation constant which can be left out for parameter

estimation as independent of H• But is required in model selection for e.g. where data

amount may be critical

Bayes’ Theorem & marginalisation

P H |D, I( ) =P D |H, I( )×P H, I( )

P(D | I )

Page 16: GEOGG121: Bayes’ Theorem & Monte Carlo · •Intro to Bayes’ Theorem –Science and scientific thinking –Probability & Bayes Theorem –why is it important? –Frequentistsv

• Suppose a drug test is 99% accurate for true positives, and 99% accurate for true negatives, and that 0.5% of the population use the drug.

• What is the probability that someone who tests positive is in fact a user i.e. what is P(User|+ve)?

• So

• P(D) on bottom , evidence, is the sum of all possible models (2 in this case) in the light of the data we observe

Bayes’ Theorem: example

http://kochanski.org/gpk/teaching/0401Oxford/Bayes.pdfhttp://en.wikipedia.org/wiki/Bayes'_theorem

P User |+ve( ) =P +ve |User( )×P User( )

P(+ve |User)P(User)+P(+ve | Non−user)P(Non−user)

=0.99×0.005

0.99×0.005+ 0.01×0.995= 0.332

True +ve False +ve

Page 17: GEOGG121: Bayes’ Theorem & Monte Carlo · •Intro to Bayes’ Theorem –Science and scientific thinking –Probability & Bayes Theorem –why is it important? –Frequentistsv

• So, for a +ve test, P(User) is only 33% i.e. there is 67% chance they are NOT a user

• This is NOT an effective test – why not?• Number of non-users v. large compared to users (99.5% to 0.5%)• So false positives (0.01x0.995 = 0.995%) >> true positives (0.99x0.005 =

0.495%)• Twice rate (67% to 33%)• So need to be very careful when considering large numbers / small results• See Sally Clark example at end….

Bayes’ Theorem: example

http://kochanski.org/gpk/teaching/0401Oxford/Bayes.pdfhttp://en.wikipedia.org/wiki/Bayes'_theorem

Page 18: GEOGG121: Bayes’ Theorem & Monte Carlo · •Intro to Bayes’ Theorem –Science and scientific thinking –Probability & Bayes Theorem –why is it important? –Frequentistsv

• Laplace (1749-1827) estimated MSaturn from orbital data• i.e. posterior prob(M|{data},I) where I is background knowledge of

orbital mechanics etc.• Shaded area under posterior pdf shows degree of belief that m1 ≤

MSaturn < m2 (he was right to within < 0.7%)• How do we interpret this pdf in terms of frequencies?

– Some ensemble of universes all constant other than MSaturn? Distribution of MSaturn in repeated experiments?

– But data consist of orbital periods, and these multiple expts. didn’t happen

Eg Laplace and the mass of Saturn

Best estimate of M

Degree of certainty of M

The posterior pdf expresses ALL our best understanding of the problem

Page 19: GEOGG121: Bayes’ Theorem & Monte Carlo · •Intro to Bayes’ Theorem –Science and scientific thinking –Probability & Bayes Theorem –why is it important? –Frequentistsv

• H? HT? HTTTTHTHHTT?? What do we mean fair? • Consider range of contiguous propositions (hypotheses)

about range in which coin bias-weighting, H might lie• If H = 0, double tail; H = 1, double head; H = 0.5 is fair• E.g. 0.0 ≤ H1 < 0.01; 0.01 ≤ H2 < 0.02; 0.02 ≤ H3 < 0.03 etc.

Example: is this a fair coin?

Heads I win, tails you lose?

Page 20: GEOGG121: Bayes’ Theorem & Monte Carlo · •Intro to Bayes’ Theorem –Science and scientific thinking –Probability & Bayes Theorem –why is it important? –Frequentistsv

• If we assign high P to a given H (or range of Hs), relative to all others, we are confident of estimate of ‘fairness’

• If all H are equally likely, then we are ignorant• This is summarised by conditional (posterior) pdf

prob(H|{data},I)• So, we need prior prob(H,I) – if we know nothing let’s use

flat (uniform) prior i.e.

Example: is this a fair coin?

prob H | I( ) = 1 0 ≤ H ≤10 otherwise

"#$

P Hypoth. |Data, I( )∝P Data |Hypoth., I( )×P Hypoth. | I( )

Page 21: GEOGG121: Bayes’ Theorem & Monte Carlo · •Intro to Bayes’ Theorem –Science and scientific thinking –Probability & Bayes Theorem –why is it important? –Frequentistsv

• Now need likelihood i.e. prob({data}|H,I)• Measure of chance of obtaining {data} we have actually

observed if bias-weighting H was known• Assume that each toss is independent event (part of I)• Then prob(R heads in N tosses) is given by binomial

theorem i.e.

– H is chance of head and there are R of them, then there must be N-R tails (chance 1-H).

Example: is this a fair coin?

prob data{ } |H, I( )∝HR 1−H( )N−R

P Hypoth. |Data, I( )∝P Data |Hypoth., I( )×P Hypoth. | I( )

Page 22: GEOGG121: Bayes’ Theorem & Monte Carlo · •Intro to Bayes’ Theorem –Science and scientific thinking –Probability & Bayes Theorem –why is it important? –Frequentistsv

• How does prob(H|{data},I) evolve?

Example: is this a fair coin?

prob data{ } |H, I( )∝HR 1−H( )N−R prob H | I( ) = 1 0 ≤ H ≤10 otherwise

"#$

HHTTTTTTTH

Page 23: GEOGG121: Bayes’ Theorem & Monte Carlo · •Intro to Bayes’ Theorem –Science and scientific thinking –Probability & Bayes Theorem –why is it important? –Frequentistsv

• How does prob(H|{data},I) evolve?

Gaussian prior μ = 0.5, σ = 0.05

prob data{ } |H, I( )∝HR 1−H( )N−R prob H | I( ) = 1 0 ≤ H ≤10 otherwise

"#$

H0 (mean) not always at peakParticularly when N small

T

Page 24: GEOGG121: Bayes’ Theorem & Monte Carlo · •Intro to Bayes’ Theorem –Science and scientific thinking –Probability & Bayes Theorem –why is it important? –Frequentistsv

• The posterior pdf summarises our knowledge, based on {data} and prior– Note{data} in this case actually np.random.binomial(N, p)

• Weak prior shifted easily

• Stronger Gaussian prior (rightly) requires a lot more data to be convinced

• See S & S for other priors….• Bayes’ Theorem encapsulates the learning process

Summary

prob H | I( ) = 1 0 ≤ H ≤10 otherwise

"#$

P Hypoth. |Data, I( )∝P Data |Hypoth., I( )×P Hypoth. | I( )

prob H | I( ) = e−H−µ( )2σ 2

2

Page 25: GEOGG121: Bayes’ Theorem & Monte Carlo · •Intro to Bayes’ Theorem –Science and scientific thinking –Probability & Bayes Theorem –why is it important? –Frequentistsv

• Takes a lot of coin tosses to estimate H to within 0.2-0.3• If we toss 10 times and get 10 T, this might be strong

evidence for bias• But if we toss 100 times and get 45H 55T, difference still

10 BUT much more uncertain• Gaussian: Although H(0.5) ~ 250000 H(0.25), 1000 tosses

gets posterior to within 0.02

• HOW TO SAMPLE BAYESIAN POSTERIOR DIST??• MONTE CARLO! (Markov Chain, Metropolis Hastings)

Summary

P Hypoth. |Data, I( )∝P Data |Hypoth., I( )×P Hypoth. | I( )

Page 26: GEOGG121: Bayes’ Theorem & Monte Carlo · •Intro to Bayes’ Theorem –Science and scientific thinking –Probability & Bayes Theorem –why is it important? –Frequentistsv

26

The tragic case of Sally Clark: how Bayesian odds ratios would have avoided ….

• Two cot-deaths (SIDS), 1 year apart, aged 11 weeks and 8 weeks. Mother Sally Clark charged with double murder, tried and convicted in 1999– Statistical evidence was misunderstood, “expert” testimony was

wrong, and a fundamental logical fallacy was introduced• What happened? • We can use Bayes’ Theorem to decide between 2

hypotheses– H1 = Sally Clark committed double murder– H2 = Two children DID die of SIDS

• http://betterexplained.com/articles/an-intuitive-and-short-explanation-of-bayes-theorem/

• http://yudkowsky.net/rational/bayes

Page 27: GEOGG121: Bayes’ Theorem & Monte Carlo · •Intro to Bayes’ Theorem –Science and scientific thinking –Probability & Bayes Theorem –why is it important? –Frequentistsv

27

The tragic case of Sally Clark

• Data? We observe there are 2 dead children• We need to decide which of H1 or H2 are more

plausible, given D (and prior expectations)• i.e. want ratio P(H1|D) / P(H2|D) i.e. odds of H1 being

true compared to H2, GIVEN data and prior

P H1|D( )P H2 |D( )

=P D |H1( )P D |H2( )

×P H1( )P H2( )

prob. of H1 or H2 given data D

Likelihoods i.e. prob. of getting data D IF H1 is true, or if H2 is true

Very important -PRIOR probability i.e. previous best guess

Page 28: GEOGG121: Bayes’ Theorem & Monte Carlo · •Intro to Bayes’ Theorem –Science and scientific thinking –Probability & Bayes Theorem –why is it important? –Frequentistsv

28

The tragic case of Sally Clark• ERROR 1: events NOT independent• P(1 child dying of SIDS)? ~ 1:1300, but for affluent non-

smoking, mother > 26yrs ~ 1:8500. • Prof. Sir Roy Meadows (expert witness)

– P(2 deaths)? 1:8500*8500 ~ 1:73 million. – This was KEY to her conviction & is demonstrably wrong– ~650000 births a year in UK, so at 1:73M a double cot death is a 1

in 100 year event. BUT 1 or 2 occur every year – how come?? No one checked …

– NOT independent P(2nd death | 1st death) 5-10 higher i.e. 1:100 to 200, so P(H2) actually 1:1300*5/1300 ~ 1:300000

Page 29: GEOGG121: Bayes’ Theorem & Monte Carlo · •Intro to Bayes’ Theorem –Science and scientific thinking –Probability & Bayes Theorem –why is it important? –Frequentistsv

29

The tragic case of Sally Clark

• ERROR 2: “Prosecutor’s Fallacy”– 1:300000 still VERY rare, so she’s unlikely to be innocent, right??

• Meadows “Law”: ‘one cot death is a tragedy, two cot deaths is suspicious and, until the contrary is proved, three cot deaths is murder’

– WRONG: Fallacy to mistake chance of a rare event as chance that defendant is innocent

• In large samples, even rare events occur quite frequently -someone wins the lottery (1:14M) nearly every week

• 650000 births a year, expect 2-3 double cot deaths…..• AND we are ignoring rarity of double murder (H1)

Page 30: GEOGG121: Bayes’ Theorem & Monte Carlo · •Intro to Bayes’ Theorem –Science and scientific thinking –Probability & Bayes Theorem –why is it important? –Frequentistsv

30

The tragic case of Sally Clark• ERROR 3: ignoring odds of alternative (also very rare)

– Single child murder v. rare (~30 cases a year) BUT generally significant family/social problems i.e. NOT like the Clarks.

– P(1 murder) ~ 30:650000 i.e. 1:21700– Double MUCH rarer, BUT P(2nd|1st murder) ~ 200 x more likely given first,

so P(H1|D) ~ (1/21700* 200/21700) ~ 1:2.4M• So, two very rare events, but double murder ~ 10 x rarer than

double SIDS• So P(H1|D) / P(H2|D)?

– P (murder) : P (cot death) ~ 1:10 i.e. 10 x more likely to be double SIDS– Says nothing about guilt & innocence, just relative probability

Page 31: GEOGG121: Bayes’ Theorem & Monte Carlo · •Intro to Bayes’ Theorem –Science and scientific thinking –Probability & Bayes Theorem –why is it important? –Frequentistsv

31

The tragic case of Sally Clark• Sally Clark acquitted in 2003 after 2nd appeal (but not on

statistical fallacies) after 3 yrs in prison, died of alcohol poisoning in 2007– Meadows “Law” redux: triple murder v triple SIDS?

• In fact, P(triple murder | 2 previous) : P(triple SIDS| 2 previous) ~ ((21700 x 123) x 10) / ((1300 x 228) x 50) = 1.8:1

• So P(triple murder) > P(SIDS) but not by much

• Meadows’ ‘Law’ should be: – ‘when three sudden deaths have occurred in the same family, statistics give no

strong indication one way or the other as to whether the deaths are more or less likely to be SIDS than homicides’

From: Hill, R. (2004) Multiple sudden infant deaths – coincidence or beyond coincidence, Pediatric and Perinatal Epidemiology, 18, 320-326 (http://www.cse.salford.ac.uk/staff/RHill/ppe_5601.pdf)

Page 32: GEOGG121: Bayes’ Theorem & Monte Carlo · •Intro to Bayes’ Theorem –Science and scientific thinking –Probability & Bayes Theorem –why is it important? –Frequentistsv

• Brute force method(s) for integration / parameter estimation / sampling– Powerful BUT essentially last resort as involves random

sampling of parameter space– Time consuming – more samples gives better approximation– Errors tend to reduce as 1/N1/2

• N = 100 -> error down by 10; N = 1000000 -> error down by 1000

– Fast computers can solve complex problems

• Applications:– Numerical integration (eg radiative transfer eqn), Bayesian

inference (posterior), computational physics, sensitivity analysis etc etc

Very brief intro to Monte Carlo

Numerical Recipes in C ch. 7, p304http://apps.nrbook.com/c/index.htmlhttp://en.wikipedia.org/wiki/Monte_Carlo_method

http://en.wikipedia.org/wiki/Monte_Carlo_integration

Page 33: GEOGG121: Bayes’ Theorem & Monte Carlo · •Intro to Bayes’ Theorem –Science and scientific thinking –Probability & Bayes Theorem –why is it important? –Frequentistsv

• Pick N random points in a multidimensional volume V, x1, x2, …. xN

• MC integration approximates integral of function f over volume V as

• Where and

• +/- term is 1SD error – falls of as 1/N1/2

Basics: MC integration

Fromhttp://apps.nrbook.com/c/index.html

Choose random points in AIntegral is fraction of points under curve x A

f dV ≈V f∫ ±Vf2 − f

2

N

f ≡1

Nf xi( )

i=1

N

∑ f2

≡1

Nf2xi( )

i=1

N

Page 34: GEOGG121: Bayes’ Theorem & Monte Carlo · •Intro to Bayes’ Theorem –Science and scientific thinking –Probability & Bayes Theorem –why is it important? –Frequentistsv

• Why not choose a grid? Error falls as N-1 (quadrature approach)• BUT we need to choose grid spacing. For random we sample until

we have ‘good enough’ approximation• Is there a middle ground? Pick points sort of at random BUT in

such a way as to fill space more quickly (avoid local clustering)?• Yes – quasi-random sampling:

– Space filling: i.e. “maximally avoiding of each other”

Basics: MC integration

Sobol method v pseudorandom: 1000 pointsFROM: http://en.wikipedia.org/wiki/Low-discrepancy_sequence

Page 35: GEOGG121: Bayes’ Theorem & Monte Carlo · •Intro to Bayes’ Theorem –Science and scientific thinking –Probability & Bayes Theorem –why is it important? –Frequentistsv

• A simple example of MC methods in practice

MC approximation of Pi?

Page 36: GEOGG121: Bayes’ Theorem & Monte Carlo · •Intro to Bayes’ Theorem –Science and scientific thinking –Probability & Bayes Theorem –why is it important? –Frequentistsv

• A simple example of MC methods in practice• In Python?

– import numpy as np– a = np.random.rand(10,2)– np.sum(a*a,1)<1– array([ True, True, False, False, True, False, True, False, True, True], dtype=bool)

– 4*np.mean(np.sum(a*a,1)<1)2.3999999999999999

MC approximation of Pi?

Page 37: GEOGG121: Bayes’ Theorem & Monte Carlo · •Intro to Bayes’ Theorem –Science and scientific thinking –Probability & Bayes Theorem –why is it important? –Frequentistsv

• Integration / parameter estimation / sampling– From 80s: “It was rapidly realised that most Bayesian inference could

be done by MCMC, whereas very little could be done without MCMC” (Geyer, 2010)

• Formally– MCMC methods sample from probability distribution (eg a

posterior) based on constructing a Markov Chain with the desired distribution as its equilibrium (tends to) distribution

– Markov Chain: system of random transitions where next state dpeends on only on current, not preceding chain (ie no “memory” of how we got here)

• Many implementations of MCMC including Metropolis-Hastings, Gibbs Sampler etc.

Markov Chain Monte Carlo (MCMC)

From: http://homepages.inf.ed.ac.uk/imurray2/teaching/09mlss/slides.pdfSee also: http://www.mcmchandbook.net/HandbookChapter1.pdf

Page 38: GEOGG121: Bayes’ Theorem & Monte Carlo · •Intro to Bayes’ Theorem –Science and scientific thinking –Probability & Bayes Theorem –why is it important? –Frequentistsv

• Initialise: pick a state x at random• Pick a new candidate state x’ at random. Accept based on criteria

• Where A is the acceptance distribution, is the proposal distribution (conditional prob of proposing state x’, given x)

• Transition probability P of x -> x’

• If not accepted then x’ = x (no change) OR state transits to x’• Repeat N times, save the new state x’• Repeat whole process

A x→ "x( ) =min 1,P "x( ) g "x → x( )P x( ) g x→ "x( )

#

$

%%

&

'

((

P x→ "x( ) = g x→ "x( )A x→ "x( )

g !x → x( )

MCMC: Metropolis-Hastings

From: http://en.wikipedia.org/wiki/Metropolis%E2%80%93Hastings_algorithm

Page 39: GEOGG121: Bayes’ Theorem & Monte Carlo · •Intro to Bayes’ Theorem –Science and scientific thinking –Probability & Bayes Theorem –why is it important? –Frequentistsv

Don’t forget: Course feedback• Short Bayes & MC practical (now)

• Thanks! And have a great Christmas and New Year

Page 40: GEOGG121: Bayes’ Theorem & Monte Carlo · •Intro to Bayes’ Theorem –Science and scientific thinking –Probability & Bayes Theorem –why is it important? –Frequentistsv

Revision: key topics, points• Model inversion – why?

– Forward model: model predicts system behaviourbased on given set of parameter values (system state vector) f(x)

– BUT we usually want to observe system and INFER parameter values

– Inversion: f-1(x) - estimate the parameter values (system state) that give rise to observed values

– Forward modelling useful for understanding system, sensitivity analysis etc.

– Inverse model allows us to estimate system state

Page 41: GEOGG121: Bayes’ Theorem & Monte Carlo · •Intro to Bayes’ Theorem –Science and scientific thinking –Probability & Bayes Theorem –why is it important? –Frequentistsv

Revision: key topics, points• Model inversion – How?

– Linear: pros and cons?• Can be done using linear algebra (matrices) V fast but …

– Non-linear: pros and cons?• Many approaches, all based around minimising some cost

function: eg RMSE – difference between MODEL & OBS for a given parameter set

• Iterative – based on getting to mimimum as quickly as possible OR as robustly as possible OR with fewest function evaluations

• Gradient descent (L-BFGS); simplex, Powell (no gradient needed); LUT (brute force); simulated annealing; geneaticalgorithms; artifical neural networks etc etc

Page 42: GEOGG121: Bayes’ Theorem & Monte Carlo · •Intro to Bayes’ Theorem –Science and scientific thinking –Probability & Bayes Theorem –why is it important? –Frequentistsv

Revision: key topics, points• Analytical v Numerical

– Analytical• Can write down equations for f-1(x)• Can do fast

– Numerical • No written expression for f-1(x) or perhaps even f(x)• Need to approximate parts of it numerically• Hard to differentiate (for inversion, gradient descent)