Markov Chain Monte Carlo modelsmcvean/DTC/STAT/Lectures/Tues_wk2/...1/30/12 1 1/29/12 Markov Chain...
Transcript of Markov Chain Monte Carlo modelsmcvean/DTC/STAT/Lectures/Tues_wk2/...1/30/12 1 1/29/12 Markov Chain...
1/30/12
1
1/29/12
MarkovChainMonteCarlo
ZaminIqbal
Modelsandmethods
• Wewanttobeabletostudybiologicalsystems(popula=ons,inheritance,diseasesuscep=bility,genomeevolu=on,selec=on)
• Inthisendeavour,wemakemodels.
• Wehavetochoosewhichparameters/conceptstoincludeandhowtobuildthemodel.
• Typicallythereisatrade‐offbetween“modelrealism”andcomputa=onal/sta=s=calapplicability
• MCMCisamethodthatallowsyoutoexploremorerealis=cmodelsthathavenosimpleanaly=calsolu=ons.
BayesianInference
• InBayesiansta=s=cs,wewanttolearnabouttheprobabilitydistribu=onoftheparameterofinterestgiventhedata=theposterior
• Insimplecaseswecanderiveananaly=calexpressionfortheposterior
€
P(θ |D) =P(θ)P(D |θ)
P(D)Posterior
Prior
Likelihood
Normalisingconstant
BayesCartoon
Beliefbefore=P(theta)
Likelihood(D|theta)
BeliefaTer=P(theta|D)(Posterior)
1/30/12
2
MonteCarlo(noMarkovChainsyet)
• Inmanysitua=onsthenormalisingconstantcannotbecalculatedanaly=cally
• Typicallytrueifyouhavemul=pleparameters,mul=dimensionalparameters,complexmodelstructuresorcomplexlikelihoodfunc=ons.(i.e.mostofmodernsta=s=cs)
• MonteCarlomethodsallowyoutosamplefrom(andthereforees=matefunc=onsof)theposterior(seenextslide).
€
P(D) = P(θ)P(D |θ)dθ∫
Samplingtoes=mateanintegral
• Youhavealreadymettheideathatsamplingcanbeusedtoes=mateanexpecta=on(=integral).Ifwehaveasetofiidrandomvariables,then
• Themeanofthesampleconvergestothemeanofthe
€
Xni=1
n
∑n
→ µ
€
Xi
Samplingtoes=mateanintegral
• Thisisalsotrueformoregeneralexpecta=ons
• Soifweareinterestedinsomeexpecta=onoftheposterior,wecoulduseaniidsequencetoapproximateit.
€
g(Xn )i=1
n
∑n
→ g(x) fX (x)dx∫
MarkovChainMonteCarlo
• Greatidea–wedon’tneedourrandomvariablestobeiid:anyMarkovchainwhosesta:onarydistribu:onistheposteriorwilldo.
• UsedbyMetropolisandUlamaspartoftheManha]anprojecttosolvetheproblemofini=a=ngfusioninabomb
• Broughttoprominenceinsta=s=csbyGelfandandSmithin1990.
1/30/12
3
MarkovChains
• AMarkovChainisastochas=cprocessthatgeneratesrandomvariablessuchthat
i.ethedistribu=onofthenextvariabledependsonlyonthecurrentone
• Wetalkabouttransi=onprobabili=es:• Note:thearetypicallyhighlycorrelated,soeachsampleisnotan
independentdrawfromtheposterior.(Thinningofthechaincanleadtoeffec=velyindependentsamples).
€
Xi
€
P(Xi | X1,X2 ...Xi−1) = P(Xi | Xi−1)
€
Xi
€
pij = Pr(Xn+1 = j | Xn = i)
Nota=on
• Ishallrefertotheposteriordistribu=onasthetargetdistribu=on.
• I’llcallthetransi=onprobabili=esintheMarkovchaintheproposaldistribu:on,orthetransi:onkernel,q(X|Y)
• Whentalkingaboutmul=pleparametersItalkaboutthejointposterior
andthecondi:onaldistribu:ons€
π (θ1,θ2,...θr )
€
π (θ2 |ϑ1,...θr)
TheMetropolisAlgorithm
• HowcanweconstructaMarkovchainthatconvergestotheposteriorwewant?
• SupposeweareinstateX,andwewanttomove.DrawanewstateYfromtheproposaldistribu:onq(Y|X),whereqcanbeanythingyoulike,provideditissymmetric
i.e.q(Y|X)=q(X|Y)
Acceptthisproposalwithprobabilitygivenby
€
α
€
α =min 1, π (Y )π (X)
=min 1,
P(Y )L(Y )P(X)L(X)
€
π
Whathappenswhenyou“reach”thelimi=ngdistribu=on?
• Considerasysteminwhichasingleparametercantakekpossiblevalues.Myproposalistoselectatrandomfromtheotherk‐1possiblevalues
• Supposethesystemhasreacheditssta=onarystate.Whatdoesthismean?Theprobabilityofbeinginastateispropor=onaltotheprior=mesthelikelihood
• Considertwostatesiandj,whereTheratesofflowinthetwodirec=onsare:Flowij:
Flowj‐>i€
π (Xi) > π (X j )
€
π (Xi)q(X j | Xi)α ij = π (Xi) ×1k −1
×π (X j )π (Xi)
=π (X j )k −1
€
π (X j )q(Xi | X j )α ji = π (X j )1k −1
“Detailedbalance”equa:ons
1/30/12
4
TheHas=ngsRa=o
• Asimplechangetotheacceptanceformulaallowsyoutouseasymmetricproposals:
• Movesthatmul=plyordivideparametersneedtoapplythechangeofvariablesrule€
α =min 1, π (Y )q(X |Y )π (X)q(Y | X)
Thesmallprint
• Ifdetailedbalanceholdsforeverypairofstates,thenifthesystemreachesthesta=onarydistribu=on,itwillstaythere
‐it’suptoyoutoensureitreachesit• Therearethreecondi=onsforthechainXitoconvergetothe
sta=onarydistribu=on
1) Xmustbeirreducible(everystatemustbereachablefromeveryotherstate)
2) Xmustbeaperiodic(stopsthesystemfromgehngstuckoscilla=ngbetweenstates)
3) Xmustbeposi:verecurrent(theexpectedwai=ng=metoreturntoastateisfinite)
Proposaldistribu=onchoice
• MCMCallowsyoutoexplorethebehaviourofmodelsthataretoocomplexforyoutosolveanaly=cally
• Howeverthereisnoguaranteeitwillwork;badchoicesofproposaldistribu=oncanpreventyoufromconvergingtothedistribu=onyouwant
Let’slookatanotherexample
Firsta]empt,usingMCMC
• Uniformprioron[0,1]
• Proposaldistribu=onisnormallydistributedaroundcurrentposi=on,withsd=1
1/30/12
5
0 200 400 600 800 1000
1.0
2.5
4.0
Compare 3 runs of the chain
Index
z1
Histogram of z1
z1
Density
1.0 1.5 2.0 2.5 3.0 3.5 4.0
0.0
0.3
0.6
Histogram of z2
z2
Density
1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5
0.0
0.3
0.6
Histogram of z3
z3
Density
0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0
0.0
0.3
Mul=plerunsgetthesame(right)answer
Seconda]empt,smallmodifica=on
• Uniformprioron[0,1]
• Proposaldistribu=onisnormallydistributedaroundcurrentposi=on,withsd=0.1
0 200 400 600 800 1000
1.0
1.6
Compare 3 runs of the chain
Index
z1
Histogram of z1
z1
Density
0 1 2 3 4 5
0.0
0.8
Histogram of z2
z2
Density
0 1 2 3 4 5
0.0
0.8
Histogram of z3
z3
Density
0 1 2 3 4 5
0.0
0.8
Wearestuckinonesideofthetargetdistribu=on
Gibbssampling
• InGibbssampling,wewanttofindtheposteriorforasetofparameters
• Eachparameterisupdatedinturnbysamplingfromthecondi=onaldistribu=ongiventhedataandthecurrentvaluesofalltheotherparameters
• ConsiderthecaseofasingleparameterupdatedusingtheMetropolisalgorithm,wheretheproposaldensityisthecondi=onaldistribu=on
• i.e.theGibbssamplerisanMCMCwhereeveryproposalisaccepted
• Withmul=pleparameters,youneedtobecarefulaboutupdateordertoensurereversibility
€
αXY =min 1, π (Y )q(X |Y )π (X)q(Y | X)
=min 1,
π (Y )π (X)π (X)π (Y )
=1
1/30/12
6
Convergence
• Ifwellconstructed,theMarkovchainisguaranteedtohavetheposteriorasitssta=onarydistribu=on
• BUTthisdoesnottellyouhowlongyouhavetorunittoreachsta=onarity
Theini=alposi=onmayhaveabiginfluence
Theproposaldistribu=onmayleadtolowacceptancerates
Thechainmaygetcaughtinalocalmaximuminthelikelihoodsurface
• Mul=plerunsfromdifferentini=alcondi=ons,andgraphicalcheckscanbeusedtocheckconvergence
Theefficiencyofthechaincanbemeasuredintermsofthevarianceofes=matesobtainedbyrunningthechainforashort=me
Watchyourchain
Twochains,samplingfromanexp(1)distribu=on.Proposaldistribu=onisnormalwithsd=0.001(red,andsd=1(black)
0 2000 4000 6000 8000 10000
02
46
8
Watch chains (sd=0.001 in red, sd=1 in black)
Index
z2
Histogram of sd=0.001
z1
Density
0 1 2 3 4 5
02
46
Histogram of sd=1
z2
Density
0 1 2 3 4 5
0.0
0.2
0.4
0.6
0.8
Acceptancerate99.97%
Acceptancerate51.9%
Burn‐in
• OTenyoustartthechainfarawayfromthetargetdistribu=on
Truthunknown Checkforconvergence• Thefirst“few”samplesfromthechainareapoor
representa=onofthesta=onarydistribu=on
• Theseareusuallythrownawayas“burn‐in”• Thereisnotheorytellingyouhowmuchtothrowaway,but
be]ertoerronthesideofmorethanless
Otheruses
• Marginaleffects–supposewehaveamul=dimensionalparameter,wemayonlybeinterestedinsomesubset
• Predic=on: Givenourposteriordistribu=ononparameters,wecan
predictthedistribu=onoffuturedatabysamplingparametersfromtheposterior,andsimula=ngdatagiventhoseparameters
ThePosteriorpredic=vedistribu=onisausefulsourceofgoodness‐of‐fittes=ng:ifthedatawesimulatedoesnotlooklikethedataweoriginallycollected,themodelispoor.
1/30/12
7
Modifica=ons
• Animportantdevelopmenthasbeentoallowtrans‐dimensionalmoves(Green1995),alsoknownasreversible‐jumpMCMC.
‐usefulwhenlookingforchangepoints(eginrate)alongasequence
‐e.g.whenlookingforperiodsofelevatedaccidentrateorelevatedrecombina=onrate
Therearemanysubtlevaria=onsofbasicMCMCthatallowyoutoincreasetheefficiencyincomplexsitua=ons(seeLiu2001formany)
Furtherreading
• ForbasicMarkovChainbackground:ProbabilityandRandomProcesses,Grimme]andS=rzaker,(OUP,2001)
• MarkovChainMonteCarloinPrac=ce,1996,edsGilks,Richardson,Spiegelhalter.(ChapmanandHall/CRC).
• BayesianDataAnalysis,2004.Gelman,Carlin,SternandRubin.(ChapmanandHall/CRC).
• MonteCarloStrategiesinScien=ficCompu=ng,2001,Liu(Springer‐Verlag).
• ChrisHolmes’shortcourseonBayesianSta=s=cs:h]p://www.stats.ox.ac.uk/~cholmes/Courses/BDA/
bda_mcmc.pdf