Markov Chain Monte Carlo modelsmcvean/DTC/STAT/Lectures/Tues_wk2/...1/30/12 1 1/29/12 Markov Chain...

7
1/30/12 1 1/29/12 Markov Chain Monte Carlo Zamin Iqbal Models and methods We want to be able to study biological systems (popula=ons, inheritance, disease suscep=bility, genome evolu=on, selec=on) In this endeavour, we make models. We have to choose which parameters/concepts to include and how to build the model. Typically there is a trade‐off between “model realism” and computa=onal/sta=s=cal applicability MCMC is a method that allows you to explore more realis=c models that have no simple analy=cal solu=ons. Bayesian Inference In Bayesian sta=s=cs, we want to learn about the probability distribu=on of the parameter of interest given the data = the posterior In simple cases we can derive an analy=cal expression for the posterior P( θ | D) = P ( θ ) P ( D | θ ) P ( D) Posterior Prior Likelihood Normalising constant Bayes Cartoon Belief before =P(theta) Likelihood(D|theta) Belief aTer =P(theta|D) (Posterior)

Transcript of Markov Chain Monte Carlo modelsmcvean/DTC/STAT/Lectures/Tues_wk2/...1/30/12 1 1/29/12 Markov Chain...

1/30/12

1

1/29/12

MarkovChainMonteCarlo

ZaminIqbal

Modelsandmethods

•  Wewanttobeabletostudybiologicalsystems(popula=ons,inheritance,diseasesuscep=bility,genomeevolu=on,selec=on)

•  Inthisendeavour,wemakemodels.

•  Wehavetochoosewhichparameters/conceptstoincludeandhowtobuildthemodel.

•  Typicallythereisatrade‐offbetween“modelrealism”andcomputa=onal/sta=s=calapplicability

•  MCMCisamethodthatallowsyoutoexploremorerealis=cmodelsthathavenosimpleanaly=calsolu=ons.

BayesianInference

•  InBayesiansta=s=cs,wewanttolearnabouttheprobabilitydistribu=onoftheparameterofinterestgiventhedata=theposterior

•  Insimplecaseswecanderiveananaly=calexpressionfortheposterior

P(θ |D) =P(θ)P(D |θ)

P(D)Posterior

Prior

Likelihood

Normalisingconstant

BayesCartoon

Beliefbefore=P(theta)

Likelihood(D|theta)

BeliefaTer=P(theta|D)(Posterior)

1/30/12

2

MonteCarlo(noMarkovChainsyet)

•  Inmanysitua=onsthenormalisingconstantcannotbecalculatedanaly=cally

•  Typicallytrueifyouhavemul=pleparameters,mul=dimensionalparameters,complexmodelstructuresorcomplexlikelihoodfunc=ons.(i.e.mostofmodernsta=s=cs)

•  MonteCarlomethodsallowyoutosamplefrom(andthereforees=matefunc=onsof)theposterior(seenextslide).

P(D) = P(θ)P(D |θ)dθ∫

Samplingtoes=mateanintegral

•  Youhavealreadymettheideathatsamplingcanbeusedtoes=mateanexpecta=on(=integral).Ifwehaveasetofiidrandomvariables,then

•  Themeanofthesampleconvergestothemeanofthe

Xni=1

n

∑n

→ µ

Xi

Samplingtoes=mateanintegral

•  Thisisalsotrueformoregeneralexpecta=ons

•  Soifweareinterestedinsomeexpecta=onoftheposterior,wecoulduseaniidsequencetoapproximateit.

g(Xn )i=1

n

∑n

→ g(x) fX (x)dx∫

MarkovChainMonteCarlo

•  Greatidea–wedon’tneedourrandomvariablestobeiid:anyMarkovchainwhosesta:onarydistribu:onistheposteriorwilldo.

•  UsedbyMetropolisandUlamaspartoftheManha]anprojecttosolvetheproblemofini=a=ngfusioninabomb

•  Broughttoprominenceinsta=s=csbyGelfandandSmithin1990.

1/30/12

3

MarkovChains

•  AMarkovChainisastochas=cprocessthatgeneratesrandomvariablessuchthat

i.ethedistribu=onofthenextvariabledependsonlyonthecurrentone

•  Wetalkabouttransi=onprobabili=es:•  Note:thearetypicallyhighlycorrelated,soeachsampleisnotan

independentdrawfromtheposterior.(Thinningofthechaincanleadtoeffec=velyindependentsamples).

Xi

P(Xi | X1,X2 ...Xi−1) = P(Xi | Xi−1)

Xi

pij = Pr(Xn+1 = j | Xn = i)

Nota=on

•  Ishallrefertotheposteriordistribu=onasthetargetdistribu=on.

•  I’llcallthetransi=onprobabili=esintheMarkovchaintheproposaldistribu:on,orthetransi:onkernel,q(X|Y)

•  Whentalkingaboutmul=pleparametersItalkaboutthejointposterior

andthecondi:onaldistribu:ons€

π (θ1,θ2,...θr )

π (θ2 |ϑ1,...θr)

TheMetropolisAlgorithm

•  HowcanweconstructaMarkovchainthatconvergestotheposteriorwewant?

•  SupposeweareinstateX,andwewanttomove.DrawanewstateYfromtheproposaldistribu:onq(Y|X),whereqcanbeanythingyoulike,provideditissymmetric

i.e.q(Y|X)=q(X|Y)

Acceptthisproposalwithprobabilitygivenby

α

α =min 1, π (Y )π (X)

=min 1,

P(Y )L(Y )P(X)L(X)

π

Whathappenswhenyou“reach”thelimi=ngdistribu=on?

•  Considerasysteminwhichasingleparametercantakekpossiblevalues.Myproposalistoselectatrandomfromtheotherk‐1possiblevalues

•  Supposethesystemhasreacheditssta=onarystate.Whatdoesthismean?Theprobabilityofbeinginastateispropor=onaltotheprior=mesthelikelihood

•  Considertwostatesiandj,whereTheratesofflowinthetwodirec=onsare:Flowij:

Flowj‐>i€

π (Xi) > π (X j )

π (Xi)q(X j | Xi)α ij = π (Xi) ×1k −1

×π (X j )π (Xi)

=π (X j )k −1

π (X j )q(Xi | X j )α ji = π (X j )1k −1

“Detailedbalance”equa:ons

1/30/12

4

TheHas=ngsRa=o

•  Asimplechangetotheacceptanceformulaallowsyoutouseasymmetricproposals:

•  Movesthatmul=plyordivideparametersneedtoapplythechangeofvariablesrule€

α =min 1, π (Y )q(X |Y )π (X)q(Y | X)

Thesmallprint

•  Ifdetailedbalanceholdsforeverypairofstates,thenifthesystemreachesthesta=onarydistribu=on,itwillstaythere

‐it’suptoyoutoensureitreachesit•  Therearethreecondi=onsforthechainXitoconvergetothe

sta=onarydistribu=on

1)  Xmustbeirreducible(everystatemustbereachablefromeveryotherstate)

2)  Xmustbeaperiodic(stopsthesystemfromgehngstuckoscilla=ngbetweenstates)

3)  Xmustbeposi:verecurrent(theexpectedwai=ng=metoreturntoastateisfinite)

Proposaldistribu=onchoice

•  MCMCallowsyoutoexplorethebehaviourofmodelsthataretoocomplexforyoutosolveanaly=cally

•  Howeverthereisnoguaranteeitwillwork;badchoicesofproposaldistribu=oncanpreventyoufromconvergingtothedistribu=onyouwant

Let’slookatanotherexample

Firsta]empt,usingMCMC

•  Uniformprioron[0,1]

•  Proposaldistribu=onisnormallydistributedaroundcurrentposi=on,withsd=1

1/30/12

5

0 200 400 600 800 1000

1.0

2.5

4.0

Compare 3 runs of the chain

Index

z1

Histogram of z1

z1

Density

1.0 1.5 2.0 2.5 3.0 3.5 4.0

0.0

0.3

0.6

Histogram of z2

z2

Density

1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5

0.0

0.3

0.6

Histogram of z3

z3

Density

0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0

0.0

0.3

Mul=plerunsgetthesame(right)answer

Seconda]empt,smallmodifica=on

•  Uniformprioron[0,1]

•  Proposaldistribu=onisnormallydistributedaroundcurrentposi=on,withsd=0.1

0 200 400 600 800 1000

1.0

1.6

Compare 3 runs of the chain

Index

z1

Histogram of z1

z1

Density

0 1 2 3 4 5

0.0

0.8

Histogram of z2

z2

Density

0 1 2 3 4 5

0.0

0.8

Histogram of z3

z3

Density

0 1 2 3 4 5

0.0

0.8

Wearestuckinonesideofthetargetdistribu=on

Gibbssampling

•  InGibbssampling,wewanttofindtheposteriorforasetofparameters

•  Eachparameterisupdatedinturnbysamplingfromthecondi=onaldistribu=ongiventhedataandthecurrentvaluesofalltheotherparameters

•  ConsiderthecaseofasingleparameterupdatedusingtheMetropolisalgorithm,wheretheproposaldensityisthecondi=onaldistribu=on

•  i.e.theGibbssamplerisanMCMCwhereeveryproposalisaccepted

•  Withmul=pleparameters,youneedtobecarefulaboutupdateordertoensurereversibility

αXY =min 1, π (Y )q(X |Y )π (X)q(Y | X)

=min 1,

π (Y )π (X)π (X)π (Y )

=1

1/30/12

6

Convergence

•  Ifwellconstructed,theMarkovchainisguaranteedtohavetheposteriorasitssta=onarydistribu=on

•  BUTthisdoesnottellyouhowlongyouhavetorunittoreachsta=onarity

  Theini=alposi=onmayhaveabiginfluence

  Theproposaldistribu=onmayleadtolowacceptancerates

  Thechainmaygetcaughtinalocalmaximuminthelikelihoodsurface

•  Mul=plerunsfromdifferentini=alcondi=ons,andgraphicalcheckscanbeusedtocheckconvergence

  Theefficiencyofthechaincanbemeasuredintermsofthevarianceofes=matesobtainedbyrunningthechainforashort=me

Watchyourchain

Twochains,samplingfromanexp(1)distribu=on.Proposaldistribu=onisnormalwithsd=0.001(red,andsd=1(black)

0 2000 4000 6000 8000 10000

02

46

8

Watch chains (sd=0.001 in red, sd=1 in black)

Index

z2

Histogram of sd=0.001

z1

Density

0 1 2 3 4 5

02

46

Histogram of sd=1

z2

Density

0 1 2 3 4 5

0.0

0.2

0.4

0.6

0.8

Acceptancerate99.97%

Acceptancerate51.9%

Burn‐in

•  OTenyoustartthechainfarawayfromthetargetdistribu=on

 Truthunknown Checkforconvergence•  Thefirst“few”samplesfromthechainareapoor

representa=onofthesta=onarydistribu=on

•  Theseareusuallythrownawayas“burn‐in”•  Thereisnotheorytellingyouhowmuchtothrowaway,but

be]ertoerronthesideofmorethanless

Otheruses

•  Marginaleffects–supposewehaveamul=dimensionalparameter,wemayonlybeinterestedinsomesubset

•  Predic=on: Givenourposteriordistribu=ononparameters,wecan

predictthedistribu=onoffuturedatabysamplingparametersfromtheposterior,andsimula=ngdatagiventhoseparameters

 ThePosteriorpredic=vedistribu=onisausefulsourceofgoodness‐of‐fittes=ng:ifthedatawesimulatedoesnotlooklikethedataweoriginallycollected,themodelispoor.

1/30/12

7

Modifica=ons

•  Animportantdevelopmenthasbeentoallowtrans‐dimensionalmoves(Green1995),alsoknownasreversible‐jumpMCMC.

‐usefulwhenlookingforchangepoints(eginrate)alongasequence

‐e.g.whenlookingforperiodsofelevatedaccidentrateorelevatedrecombina=onrate

Therearemanysubtlevaria=onsofbasicMCMCthatallowyoutoincreasetheefficiencyincomplexsitua=ons(seeLiu2001formany)

Furtherreading

•  ForbasicMarkovChainbackground:ProbabilityandRandomProcesses,Grimme]andS=rzaker,(OUP,2001)

•  MarkovChainMonteCarloinPrac=ce,1996,edsGilks,Richardson,Spiegelhalter.(ChapmanandHall/CRC).

•  BayesianDataAnalysis,2004.Gelman,Carlin,SternandRubin.(ChapmanandHall/CRC).

•  MonteCarloStrategiesinScien=ficCompu=ng,2001,Liu(Springer‐Verlag).

•  ChrisHolmes’shortcourseonBayesianSta=s=cs:h]p://www.stats.ox.ac.uk/~cholmes/Courses/BDA/

bda_mcmc.pdf