Bayesian Astronomy with R - California Institute of...

Bayesian Astronomy with R Aaron Robotham, ICRAR, UWA

Lecture 1

Get this lecture at: https://dl.dropboxusercontent.com/u/10495241/AstroR-Bayes-Lec1.pdf

Bayesian Astronomy with R

> install.packages('magicaxis') > Install.packages('ellipse') > install.packages(’mvtnorm') > install.packages('PATH/?/?/LaplacesDemon_13.07.16.tar.gz', type='source') > library('magicaxis') > library('ellipse') > library(’mvtnorm’) > library('LaplacesDemon')

•  Get R at http://cran.rstudio.com/

•  Download Laplace’s Demon from (not on CRAN): http://www.bayesian-inference.com/LaplacesDemon_13.07.16.tar.gz •  Type ‘R’ on the terminal to begin.

•  Done that? Good, now you’ve got everything you need to recreate the data and plots in this lecture.

Aaron Robotham, ICRAR, UWA

Summary of lecture 0 "   R is free (as in lunch and speech).

"   R is extensively documented- indeed packages aren’t allowed to be hosted on CRAN until all documentation is complete.

"   R is a fully featured vectorised language, so fast where it needs to be.

"   It supports many data structures and data types, including vectors, matrices, data frames and lists.

"   It has a huge wealth of statistics modules, including all of the useful statistical distributions that we regularly need for data analysis (even in astronomy).

"   It has a large suite of plotting build in, and a huge amount of extended support offered by external packages. There is no real limit on the kind of plots you can generate.

"   For further help try www.rseek.org, and consider joining the ‘Astronomy with R’ FaceBook group.

Bayes versus Frequentist Theory (spoiler: both are correct)

"   Frequentist view is that an experiment is a single sampling of the ‘true’ distribution. p-values are used to determine how likely it is we should should observe such an extreme experiment and whether to accept or reject the null hypothesis (i.e. model is real and the data mutable).

"   Bayesian view is that we have an opinion on the experiment already (say the distribution of the population mean), and the data is a modification of your current world view (i.e. the data is real and model an abstraction).

"   When you have *a lot* of data they behave in much the same way.

"   They differ when the experiment is small (as we’ll see).

You already think Bayes

"   No doubt you’ve heard a bit about Bayes, and you’re aware that it’s use of ‘priors’ encourages some disquiet.

"   Consider these cases: "   A musician correctly identifies between two violinsts 10 times

in a row.

"   A “psychic” correctly identifies the gender of a person behind a screen 10 times in a row.

"   A tea drinker correctly identifies that water used in their tea has been boiled on a hob or in a kettle 10 times in a row.

"   The observations say the same thing in all 3 cases, but most people will be unsurprised by the musician, skeptical of the psychic and impressed by the tea drinker. Why?

Prior Belief

"   Naturally, you are applying prior knowledge. "   You’re pretty sure a musician should be able

to do this already, so you’re prior assumption is strengthened slightly.

"   You don’t believe psychics have any ability beyond chance, so this result will have little impact on your assumptions.

"   You think it’s possible that a tea drinker might be able to do this, but you went in assuming a 50/50 was most likely. In this case your prior assumption has been modifed a lot by the data.

"   Now we will state this concept more formally. Plots show our prior assumed long-term outcome for many samples

Bayes is pretty obvious

"   Many independent minds have formulated a Bayes-like theory: "   Thomas Bayes (provider of the name)

" James Bernoulli

"   Pierre Simon Laplace (father of much of modern statistics)

"   It formulates ideas that are elementary in origin:

•  The probability that A occurs after B having occurred is P(A|B). i.e. the likelihood we are in AnB given we are in B.

•  The outright probability of AnB is therefore the probability that B occurs times P(A|B): •  P(AnB) = P(A|B) P(B), and similarly: •  P(AnB) = P(B|A) P(A)

•  The case above is the simplest, where we know what the possible B probabilities and intersections with A are.

•  So we can see that P(A)=P(AnB1)+P(AnB2)…+P(AnBk)

•  A real example is a bag with 5 coins, 4 are fair and 1 has two heads. If we take a coin out and toss it three times and get three heads, what is the probability the coin is fair?

•  You might be able to answer this “intuitively”, in which case you’re a natural Bayesian, for the rest we can work through it.

•  Above is a Venn schematic of the problem. This is much simpler than the complicated example on the previous slide.

•  We aren’t interested in the chance of picking a biased coin, or observing 3 heads, rather we want to know the likelihood the coin is biased *given that we’ve already observed* 3 heads.

•  This method of thinking is key to Bayes. We don’t care how probable an observation is, just what the most probable cause of the observation was.

•  (Aside: It’s an idea that jurys find famously hard to grasp: given 5 premature deaths on a ward the most likely explanation is usually a random sampling of a parent (nationwide) statistical distribution, rather than murder. 5-simga (‘one in a million’) events have to occur somewhere, afterall.)

Whose coin is it anyway? "   The answer we want is what is the probability of our coin being fair {P(fair)}

given that we already know it has been tossed 3 times and come up heads each. i.e.: "   P(fair|3head), which we know equals P(fair n 3head)/P(3head)

"   We now use P(3head) = P(3head n fair) + P(3head n biased)

"   So, P(3head) = P(3head|fair)P(fair) + P(3head|biased)P(biased)

"   P(3head) = (1/2)3(4/5) + (1)3(1/5) = 1/10 + 2/10 = 3/10, so 7/10 times we would expect to pick a coin, toss it 3 times, and *not* get 3 heads in a row.

"   P(fair n 3head) = P(3head|fair)P(fair) = 1/10 (as above)

"   P(fair|3head) = (1/10) / (3/10) = 1/3, so a one third chance, i.e. it is more likely the coin is biased (twice as likely, in fact).

"   For the keen student- how many heads in a row do we need to toss until we are confident the coin will be biased 99% of the time?

The Monty Hall Problem "   “Suppose you're on a game show, and you're given the choice of three doors: Behind one door is a car; behind

the others, goats. You pick a door, say No. 1, and the host, who knows what's behind the doors, opens another door, say No. 3, which has a goat. He then says to you, "Do you want to pick door No. 2?" Is it to your advantage to switch your choice? In this version of the problem the host *always* opens one losing door at random, if there is only one he chooses that.” as asked of Marilyn vos Savant in Parade Magazine (1990).

"   This is a famous puzzle for being non-intuitive (for most people at least), but this is the Bayes approach:

"   P(D2car|D3goat) = P(D2car n D3goat) / P(D3goat)

"   P(D2car n D3goat) = P(D3goat|D2car)P(D2car) = (1)(1/3) = 1/3

"   P(D3goat) = P(D3goat|D1car)P(D1car) + P(D3goat|D2car)P(D2car) + P(D3goat|D3car)P(D3car)

"   P(D3goat) = (1/2)(1/3) {Randomly chooses D3} + (1)(1/3) {No choice with door} + (0)(1/3) {N/A} = 1/2

"   P(D2car|D3goat) = (1/3) / (1/2) = 2/3

"   Currently have P(D1car) = 1/3, so yes, we should definitely swap, twice as likely to win!

"   The simplest way to arrive at an ‘intuitive’ answer is to realise that you only lose when the car is behind your original door, and this only happens 1/3 of the time, so 2/3 of the time you win by swapping.

And now for the magic… "   You might have noticed that when assessing the better of two options

we end up doing a calculation that looks like: "   (X1/Y)/(X2/Y), where Y=X1+X2+X3+…Xk

"   In cases where we only care about calculating the relative likelihood of two models, calculating Y is clearly unnecessary, we can just skip to calculating X1/X2. This equivalent to saying:

"   P(A|B) = P(B|A) P(A) / P(B), so "   P(A|B) P(B|A) P(A)

"   This means we can avoid the expensive, difficult, and in practice usually impossible calculation of P(B). The ratio of different P(A|B) is known as Bayes factor (BF from here).

∝

Monty Hall Again…

"   Using this insight we can simplify the problem to:

"   BF = P(D2car|D3goat) / P(D1car|D3goat)

"   BF = P(D3goat|D2car)P(D2car) / P(D3goat|D1car)P(D1car) "   Being smart we know that P(D1car)=P(D2car), so…

"   BF = P(D3goat|D2car) / P(D3goat|D1car) = (1) / (1/2) = 2

"   This means door 2 having a car is twice as *likely* as door 1 (our current door) to have the car behind it. We don’t know what the exact probability is because we’ve made some shortcuts, but we’ve arrived at the correct insight that we should swap, and we know the correct degree of increased likelihood of winning (a factor of 2). If we are comparing models, relative likelihood is all we really care about.

So, finally we get to the useful Bayesian equations

"   So, now we get to the famous Bayes formula:

"   P(Θ,y) P(y,Θ) x P(Θ)

"   We can recast our arguments into the following form, where instead of probabilities P we will now use probability densities p:

"   p(Θ,y) p(y,Θ) p(Θ)

∝

The probability of a particular model given the data

is proportional to the probability of the data given the model

times

the probability of the model.

∝The posterior probability distribution is proportional to

the likelihood probability distribution

the prior probability distribution.

⊗

times

A new killer "   There’s a new disease called Frequentitus that is entirely fatal to anyone who

contracts it. 0.1% of people are affected. Luckily there is a test administered by Perth Bayes Health. This test is 99.9%* accurate at correctly determining you are positive, and 99%* accurate at correctly determining you are negative. Unfortunately, you test positive. How worried should you be? *Numbers fixed!

"   We want to know: "   P(ArePos|TestPos)=P(TestPos|ArePos)P(ArePos)/P(TestPos) "   P(TestPos|ArePos)P(ArePos)=0.999x0.001=0.000999 "   P(TestPos)=P(TestPos|ArePos)P(ArePos)+P(TestPos|AreNeg)P(AreNeg) "   P(TestPos)=(0.999x0.001)+(0.01x0.999)=0.010989 "   P(ArePos|TestPos)=0.000999/0.010989=0.09090909 (i.e. 1/11)

"   So 1/11 times when we test positive will we actually have the disease (1:10 ArePos:AreNeg).

"   This helps demonstrate a recognised problem with testing *rare* disorders- most of time when we test positive we are actually negative!

Lecture 1 Problems A new killer and rubbish clinic

A new killer and rubbish clinic "   Alas, in Sydney they only have access to Sydney Bayes Health. The

nominal test is the same as at Perth Bayes Health, so the test is 99.9% accurate at correctly determining you are positive, and 99% accurate at correctly determining you are negative. However, their practices are much sloppier than Perth Bayes Health- 5% of samples are mixed up, and 2% of the time the computer will return a random (50/50 positive/negative) result. Alas, you once again test positive. Whilst annoyed at the sloppiness of the testing procedure (no doubt you’ll move to Perth some time soon), you still want to know how woried you should be. What’s the answer?

"   Hint: This is a much more complicated question, and is an example of a case where calculating P(TestPos) is difficult and computationally expensive. Think of a way to simplify the problem.

"   If you crack this then you’re thinking like a Bayesian.

1001 German tanks

"   This problem is based on real scenario that occurred during WWII.

"   German tanks had sequentially numbered gear boxes, so in principal if 5 tanks were randomly captured with the numbers 20, 40, 60, 80 and 100 you might reasonably be able to estimate how many tanks the German army has in total.

"   With this situation in mind, imagine if one (and only one) tank was captured, and it had a gear box ID #1001. What would a statistician working for the British army estimate as the mean median and mode of the total number of tanks?

"   Assume no priors on the tank distribution.

"   This is an example of a situation where an instinctive frequentist response gives you a *very* different answer to a Bayesian answer.

Summary of Lecture 1

"   Bayes states some fairly obvious (intuitive) ideas about probability theory. Simply: P(A|B)=P(B|A)P(A)/P(B)

"   Most important:

"   P(Θ,y) P(y,Θ) x P(Θ) (for probabilities)

"   p(Θ,y) p(y,Θ) p(Θ) (for probability densities)

"   It is applicable in almost every situation you can think of, but as we show, it is not always trivial to calculate Bayes Factor by hand.

"   In the next lecture we will see how computers are used to attack Bayesian problems.

∝∝ ⊗

Bayesian Astronomy with R - California Institute of...

Documents

Transcript of Bayesian Astronomy with R - California Institute of...