ASTR633 Astrophysical Techniques BAYESIAN STATISTICS · We have 36 possible dice outcomes (1+1,...

1

ASTR633AstrophysicalTechniques

BAYESIANSTATISTICS(originalnotesbyEricNielsen,editedbyMikeLiuandJonathanWilliams)

Frequentist(a.k.a.“classical")statisticsiswhatwehavedonesofar:

1. ThereisaPlatonicideal(“parentpopulation”)fortheparameteryouaremeasuring.Thisisafixedvaluewithnouncertainty,andthusprobabilitystatementsaboutitaremeaningless.

2. Probabilitydistributionsrefertothechancethatthetruevalueiswithinagivenconfidenceintervalofthemeanfromourmeasurements.

3. Repeatedmeasurementsgetyoueverclosertofindingthetruevalue.Bayesianstatistics(whichactuallycamebefore"classical")insteadbelievesthefollowing:

1. ThePlatonicmeanisnotausefulconcept.Onlythedataarereal.(“Theworldismessy.”)

2. Probabilityisusedtodefineourconfidencethataparameterwemeasurefromthedataaccuratelydescribestheuniverse.

3. Probabilitiescanbeassignedtothingsotherthandata,includingmodelparametersandmodelthemselves.

4. Wecanincorporateourpriorknowledge,sinceweknowmoreabouttheuniversethanjustthisonedataset.

Choicebetweenthetwoisinsomesensephilosophicalbuthasprofoundconsequences,aswewillsee.Toillustratethedifference:

Deductivelogic–givenacause,wecandetermineitsoutcome,e.g.givenafaircoin,whatistheprobabilitythat10tosseswillproduce10heads,9heads+1tail,etc.Thisistypicalforpuremathematics:givensomecoreaxioms,derivetheoutcomes.Inductivelogic-giventhatcertaineffectsareobserved,whatis(are)theunderlyingcause(s)?E.g.if10flipsyielded7heads,isthecoinfairorbiased?Thisistypicalforobservationalscience(andindeedeverydaylife).

Bothschoolsofthoughthavevalue,andindeedoftenproducethesameresults.Maximumlikelihoodisamajorconceptinbothparadigms.

2

BriefHistoryTheproblemofinferringcausesfromeffectswasfirstaddressedbyReverendThomasBayes(1701-1761),publishedposthumouslyin1763. Initialbelief+NewdataàImprovedBelief (“prior”)(“likelihood”)(“posterior”,orprobabilitydistributionfunction)Mostrecentposteriorthenbecomesthepriorforthenextroundofestimating.Pierre-SimonLaplace(1749-1827)independentlydiscoveredBayes’work(1774)andgreatlyclarifiedit.Alsoappliedittoimportantproblems,e.g.estimatedthemassofSaturnto<1%accuracyfromthemodernvalue.Somesayitshouldbecalled“Laplacianstatistics”.Thenlargelyignoreduntilthemiddle20thcentury.Nowwidelyusedasthelong-standingchallengeofBayesiancalculationsbeingmoredifficultthanfrequentistoneshasbeenfinallyovercomethankstomoderncomputers.Also,MarkovChainMonteCarlo(MCMC)hasallowedBayesianstodoalotmorethanfrequentistscan. BayesTheoremBayes’Theoremisjustastatementaboutconditionalprobabilityandfollowsdirectlyfromtherulesofprobability.ConsidereventXandeventY: ProbthatbothX&Ywillhappen:P(X,Y)=P(X|Y)xP(Y)

where“|”means“given”“,”means“and”

Similarly,P(Y,X)=P(Y|X)xP(X)ButweknowP(X,Y)=P(Y,X)SothenP(X|Y)xP(Y)=P(Y|X)xP(X). P(Y|X)=P(X|Y)xP(Y)/P(X)Oftenwrittentoexplicitlyacknowledgetheexistenceofbackgroundinformation(“I”): P(Y|X,I)=P(X|Y,I)xP(Y,I)/P(X,I)

3

In-classexample:theMontyHallproblemToseetherelevanceforus,replace X=data Y=model(yourhypothesis) P(model|data)=P(data|model)xP(model)/P(data) “posterior” “likelihood”“prior”“evidence”

“posterior”=whatyougetafteryouexaminethedata,i.e.theprobabilitydistributionfunction(PDF)foryourmodelparameters,e.g.y=A+Bx.“likelihood”=howlikelyisitthatyourmodelcanproduceyourdata?Thisiswhereyoudoactualstatistics,butoftenverystraight-forward.“prior”=whatyouknewbeforeyouexaminethedata,e.g.whatdidyouthinkofyourmodelbeforeyouwenttothetelescope?Thisisthemostsubjective/controversialpartofBayesiananalysis,thoughoftentimesthechoicedoesnotimpactthebasicoutcome.“evidence”=thiscanbeignoredforparameterestimation,sinceitjustprovidestheoverallnormalization.(importantinmodelselection,whichwewillnotdiscuss.)

Bayesianstatisticsisthenjustastraight-forwardwaytoconstrainmodels,basedonbothyourdataandpriorknowledge.Whilethederivationisnon-controversial,notethattheinterpretationofP(model|data)isonlymeaningfulinBayesianstatistics.i.e.itcorrespondstoourstateofknowledge(i.e.belief)aboutamodelanditsparametersgiventhedata(e.g.Laplace’sestimateofthemassofSaturn,giventheorbitaldata).Itisnotmeaningfulinfrequentiststatistics–thereisonlyonemassforSaturn,notadistributionofmasses.Ofcourse,thesumofthefinalPDF(theposterior)mustbe1,i.e.

!𝑃(model|data) = 1whichmeansyoucanuseanunnormalizedexpressionforthelikelihoodand/ortheprior.

Fortheprior,youcanuseanyinforelevanttotheparametersofyourmodel.Canalsojuststartfromthescratchwithnoprior(“uninformedprior”).Oftenthat’sPrior=1,whichiscomputationallyisthesameasmaximumlikelihood.Butsometimesyoucanusephysicalintuitiontoinformyourunderstandingoftheresult.

4

Simpleexample:xkcdcartoonWewanttoderivethePDFforB(“boom”fortheSun),withtwopossiblevaluesforB B=0àSunhasnotexploded B=1àSunhasexploded. AndofcourseP(B=0)+P(B=1)=1Frequentistapproach:ouronlyknowledgeistheresultsfromthemachine.

Wehave36possiblediceoutcomes(1+1,1+2,2+1,etc.)sotheprobabilityofany1eventis1/36~3%.

Sothemachinelies3%ofthetimeandtellsthetruththeother97%.SoifthemachinesaystheSunjustexploded,there’sa97%chanceitdidanda3%chanceitdidn’t. P(B=0)=3% P(B=1)=97%

Bayesianapproach:P(model|data)~P(data|model)xP(model)

SowewanttoevaluatethisforthetwovaluesofBusingBayes’Theorum:P(B=0|data)~P(data|B=0)xP(B=0)P(B=1|data)~P(data|B=1)xP(B=1)

Step1:likelihood.Justlikethefrequentistapproach P(data|B=0)=3% P(data|B=1)=97%Step2:prior.Whatistheprobabilitythatourbasicunderstandingofstellarevolutioniswrong?,say1inamillion: P(B=0)=0.999999 P(B=1)=0.000001Putthemtogether: P(B=0|data)~3%x0.999999~3% P(B=1|data)~97%x0.000001~0.0001Normalizethetotalprobabilities: P(B=0|data)+P(B=1|data)=1 P(B=0|data)=3%/(3%+0.0001%)~100% P(B=1|data)=0.0001%/(3%+0.0001%)~0.00003%SotheSunexplodinggot30xmorelikelythanourprior,buttheconclusionthatisstillthesameàitdidnotexplode.

5

ModelparameterestimationusingBayesianinferenceAverycommonandpowerfulapplicationofBayes’Theoremisparameterestimation. P(model|data)µP(data|model)xP(model) “posterior” “likelihood”“prior”Don’tthinkofthisasanequationyoucomputeonce.Insteadcomputelots(andlots!)oftimesforallpossiblevaluesofyourmodelparameters.Theresultistheposterior(a.k.a.PDF)foryourparameters.

Goingbackwardsishard:findingthebest-fittingparametersforacomplicatedmodelishard.E.g.if𝑦 = 𝐴 + 𝐵45sin(𝐷𝑥 + 𝑒)</>,itwouldbehardtofindbest{A,B,C,D,E}.

Goingforwardiseasy:givensomechoiceof{A,B,C,D,E},you(i.e.thecomputer)caneasilycomputethevalueofy.Sojustdolotsofcalculationsoveragridof{A,B,C,D,E}toextractthePDF(a.k.a.“bruteforce”).Canbedonealsofornon-analyticmodels.

What’sthelikelihood?Anyequationthatdescribesaprobability.Inastronomy,thetwomostcommonare:

1. Countingthings:Poissonstatistics2. Genericdatawithuncertainties:chi-square

Poissionlikelihood:

1. Counthowmany“things”fellintoasinglebin(M=#ofobservedthings)2. Usethemodelparameterstopredictnumberineachbin(E=expected#)3. Calculatetheprobability:

𝑃(𝑑𝑎𝑡𝑎|𝑚𝑜𝑑𝑒𝑙) = 𝐸F

𝑀! 𝑒IJ

4. Multiplybytheprior.5. Repeatforeachbin,thenmultiplyalltheprobabilities(oneforeachbin)

together.

Chi-squarelikelihood(you’vealreadyseenthisinmaximumlikelihood):Genericchi-squareformulaisallyouneed:

𝜒> =!(𝑀L − 𝐸L)>

𝜎L>

O

LP<

whichisreadilygeneralizedtomulti-dimensionaldata:

𝜒> =(𝑀5< − 𝐸5<)>

𝜎5<>+Q𝑀R<𝐸R<S

>

𝜎R<>+(𝑀5> − 𝐸5>)>

𝜎5>>

theprobabilityisthenjustgivenby𝑃(𝑑𝑎𝑡𝑎|𝑚𝑜𝑑𝑒𝑙)µexp V− WX

>Y

6

Astrophysicalexample:age-datingoffieldB&Astars(Nielsenetal2013,ApJ,776,4)B&Astarsareprimetargetsfordirectimagingofplanets(e.g.HR8799,betaPic).Whendirectlyimagingplanets,youngerisbetterbecausetheplanetsarebrighter.Sowe'rehighlymotivatedtodetermineagesofB&Astars&theiruncertainties.

P(model|data)~P(data|model)xP(model)“posterior’’“likelihood”“prior”

data=M(V),(B-V)+theiruncertaintiesmodel=predictionsfor{M(V),B-V}fromstellarevolutionarymodelsasafunctionof{[Fe/H],age,stellarmass}likelihood:chi-squarefordata(M(V),B-V)withrespecttomodelpredictionsfor(M(V),B-V)asfunctionof{[Fe/H],age,mass}

where“O”=observedand“E”=expectedfromthemodelsthenlikelihoodisthen

whichproducesa3-dgridoflikelihoods,withaxes{[age,mass,[Fe/H]}

Whatisthemostlikelyageofthestar?

7

priors:-metallicitydistributionofsolarneighborhood(e.g.lowZstarsarerare) adoptGaussianfor[Fe/H]withmean=0,sigma=0.1dex-flatagedistribution,i.e.constantstarformationrateovertime-SalpeterIMF(thisturnsouttohavelittleeffect)multiplylikelihoodbythepriors,normalizetheoverallPDFsumto1.0,marginalizetoshow1-dPDFs,seecovariancesfrom2-dPDFs

seeNielsenetal.2013,ApJ,776,4

8

MarkovChainMonteCarloparameterestimationReferences-Ford2005,AJ,129,1706-Foreman-Mackeyetal.2013,PASP,125,306 http://dfm.io/emcee/current/ -Sharma2017,ARAA,55,213 https://github.com/sanjibs/bmcmcStellarageexampleisbruteforce;namelycomputethelikelihoodoverawidegridofmodelvalues.Oftenthisis,atminimum,wasteful.Formanyproblems,itisalsoinfeasible.ForNmodelparameterswithRgridstepsperparameter,thenumberofcalculationsis~R**N. e.g.50**2=2500elementarray 50**7=781billionelementarray(severalTbofmemory)WewantaprocessthatquicklyfindsthepeakofthePDFthenspendsmostofthetimenearthepeakmappingitsshape(i.e.doingcalculations),wheretheprobabilityishighest.AvoidregionswithlowprobabilitytosavecomputingtimeàMarkovChainMonteCarlo(MCMC)

AMarkovchainissequenceofrandomvariablesinwhichtheprobabilityofstepsbetweenadjacentstepsisdependentonlyonthecurrentstateofthesystem(ithasnomemoryofthepastorpredictionforthefuture).

MCMCproducesa“chain”(aseriesofcalculations)thatasymptoticallyapproachesthePDF,e.g.68%ofthestepswilloccurinthe68%confidencelimitofthePDF,95%ofstepsinthe95%CL,etc.Itwilltellusthepeak&theimportantpartofthePDF,namelythepartwithnon-negligibleprobability.RecipeforaMetropolis-HastingsMCMC(simplestMCMCimplementation):

• Startwithinitialguessofmodelparameters{A0,B0,C0,…}andcomputetheprobabilityofgettingourdatagiventhoseinitialvalues(i.e.thelikelihood)=P0

• Determinethenextstepinthechain:1. Randomlyvaryeachoftheparameterstogetanewset{Ai,Bi,Ci,…}(“trial”)2. Computetheprobabilityforthesenewparameters:Pi3. IsPi>Pi-1?

i. Ifyes,adoptthenewparameters(“takethestep”,movefromlowprobabilitylocationtohighone)

ii. Ifno,generateauniformrandomnumberUifrom0to1.

9

iii. IfUi<Pi/Pi-1,thenadoptthenewparameters.Otherwisekeepthecurrentones.Allowsexplorationoflowerprobabilityregions,butnottoolowprobability

4. ReturntoStep(1).Repeatmanytimes(e.g.>106).

ForStep1:simplestcaseistoassumeeachparameterisaGaussian,sogeneratenewvaluesbydrawingfromaGaussianwithafixedstandarddeviationandmeanatthevalueofthecurrentstep: Ai=Ai-1+G*σA Bi=Bi-1+G*σBwhereGisanormallydistributedrandomvariablewithmean=0andvariance=1.Note:sigmasmustremainconstantthroughoutthechain,orelsefinalresultsarenotavalidmeasureofthePDF.

Howtochoosesigmas?Wanta“reasonable”acceptancerate~25%,i.e.fractionofproposedstepsinStep3thatistaken.

Ifratetoolargeàinefficientb/cstepsizetoosmall.Ifratetoosmallàinefficientb/cstepstoolarge,sonevermoving.

Candoaninitialtestrunofvariousstepsizesandadjustacceptancerate.

• Theresult:MCMCchainisanarrayofvaluesforeachmodelparameter.E.g.1000stepsofy=A+Bxgives1000valuesofA&1000valuesofBàthesearraysareourmodelparameterPDFs!infact,theresultalsoprovidesthejointPDFforallourparameters

• ThisMCMCPDFisthelikelihoodinBayes’Theorem.Ifwehaveaprior,multiplyitbythePDFtogetthePosterior.

• Sidebenefit:ThePDFforanyquantityderivedfromthemodelparametersisjust

thechainofthatquantitycalculatedfromthemodelchains.

NowyougettodoityourselfinthecomputationallyintensiveProblemSet#6

ASTR633 Astrophysical Techniques BAYESIAN STATISTICS · We have 36 possible dice outcomes (1+1,...

Documents

Transcript of ASTR633 Astrophysical Techniques BAYESIAN STATISTICS · We have 36 possible dice outcomes (1+1,...