MultitreeGeneralized SteppingstoneSampling–ANew...
Transcript of MultitreeGeneralized SteppingstoneSampling–ANew...
![Page 1: MultitreeGeneralized SteppingstoneSampling–ANew …phylo.bio.ku.edu/slides/HolderAustinSteppingstone.pdf · 2018-01-25 · MultitreeGeneralized SteppingstoneSampling–ANew MCMCMethodforEstimating](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f355f5214ddc76330074f21/html5/thumbnails/1.jpg)
Multitree GeneralizedSteppingstone Sampling – A NewMCMC Method for Estimatingthe Marginal Likelihood of a
Models
Mark T. Holder, Paul O. Lewis, David L. Swofford, andDavid Bryant
KU, UConn, Duke, U. Otago (NZ)
Feb 16, 2013 – Austin, TX
![Page 2: MultitreeGeneralized SteppingstoneSampling–ANew …phylo.bio.ku.edu/slides/HolderAustinSteppingstone.pdf · 2018-01-25 · MultitreeGeneralized SteppingstoneSampling–ANew MCMCMethodforEstimating](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f355f5214ddc76330074f21/html5/thumbnails/2.jpg)
Context
• We rarely know the “true” model to use for analyses.• Model-averaging methods can be difficult to implementand use.
• We can use the marginal likelihood to choose betweenmodels.
![Page 3: MultitreeGeneralized SteppingstoneSampling–ANew …phylo.bio.ku.edu/slides/HolderAustinSteppingstone.pdf · 2018-01-25 · MultitreeGeneralized SteppingstoneSampling–ANew MCMCMethodforEstimating](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f355f5214ddc76330074f21/html5/thumbnails/3.jpg)
The marginal likelihood is thedenominator of Bayes’ Rule
p(θ | D,M) =P(D | θ,M)p(θ |M)
P(D |M)
θ is a set of parameter values in the model M
D is the data
P(D | θ,M) is the likelihood of θ.
p(θ,M) is the prior of θ.
![Page 4: MultitreeGeneralized SteppingstoneSampling–ANew …phylo.bio.ku.edu/slides/HolderAustinSteppingstone.pdf · 2018-01-25 · MultitreeGeneralized SteppingstoneSampling–ANew MCMCMethodforEstimating](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f355f5214ddc76330074f21/html5/thumbnails/4.jpg)
The marginal likelihood assess the fit ofthe model to the data by considering allparameter values
P(D |M) =
∫P(D | θ,M)p(θ |M)dθ
![Page 5: MultitreeGeneralized SteppingstoneSampling–ANew …phylo.bio.ku.edu/slides/HolderAustinSteppingstone.pdf · 2018-01-25 · MultitreeGeneralized SteppingstoneSampling–ANew MCMCMethodforEstimating](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f355f5214ddc76330074f21/html5/thumbnails/5.jpg)
MCMC avoids the marginal like. calc.
p(θ∗|D,M)
p(θ | D,M)=
P(D|θ∗,M)p(θ∗|M)P(D|M)
P(D|θ,M)p(θ|M)P(D|M)
p(θ∗|D,M)
p(θ | D,M)=
P(D | θ∗,M)p(θ∗ |M)
P(D | θ,M)p(θ |M)
![Page 6: MultitreeGeneralized SteppingstoneSampling–ANew …phylo.bio.ku.edu/slides/HolderAustinSteppingstone.pdf · 2018-01-25 · MultitreeGeneralized SteppingstoneSampling–ANew MCMCMethodforEstimating](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f355f5214ddc76330074f21/html5/thumbnails/6.jpg)
Bayesian model selection
Bayes Factor between two models:
B10 =P(D |M1)
P(D |M0)
We could estimate P(D|M1) by drawingpoints from the prior on θ and calculatingthe mean likelihood.
![Page 7: MultitreeGeneralized SteppingstoneSampling–ANew …phylo.bio.ku.edu/slides/HolderAustinSteppingstone.pdf · 2018-01-25 · MultitreeGeneralized SteppingstoneSampling–ANew MCMCMethodforEstimating](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f355f5214ddc76330074f21/html5/thumbnails/7.jpg)
−2 −1 0 1 2
010
2030
40
Sharp posterior (black) and prior (red)
x
dens
ity
![Page 8: MultitreeGeneralized SteppingstoneSampling–ANew …phylo.bio.ku.edu/slides/HolderAustinSteppingstone.pdf · 2018-01-25 · MultitreeGeneralized SteppingstoneSampling–ANew MCMCMethodforEstimating](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f355f5214ddc76330074f21/html5/thumbnails/8.jpg)
Drawing from the prior will often miss any of the trees withhigh posterior density – massive sample sizes would beneeded.
Perhaps we can use samples from the posterior?
![Page 9: MultitreeGeneralized SteppingstoneSampling–ANew …phylo.bio.ku.edu/slides/HolderAustinSteppingstone.pdf · 2018-01-25 · MultitreeGeneralized SteppingstoneSampling–ANew MCMCMethodforEstimating](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f355f5214ddc76330074f21/html5/thumbnails/9.jpg)
We can use the harmonic mean of the likelihoods of MCMCsamples to estimate P(D|M1).
However. . .
![Page 10: MultitreeGeneralized SteppingstoneSampling–ANew …phylo.bio.ku.edu/slides/HolderAustinSteppingstone.pdf · 2018-01-25 · MultitreeGeneralized SteppingstoneSampling–ANew MCMCMethodforEstimating](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f355f5214ddc76330074f21/html5/thumbnails/10.jpg)
“The Harmonic Mean of the Likelihood:Worst Monte Carlo Method Ever”
A post on Dr. Radford Neal’s bloghttp://radfordneal.wordpress.com/2008/08/17/
the-harmonic-mean-of-the-likelihood-worst-monte-carlo-method-ever
“The total unsuitability of the harmonic mean estimatorshould have been apparent within an hour of its discovery.”
![Page 11: MultitreeGeneralized SteppingstoneSampling–ANew …phylo.bio.ku.edu/slides/HolderAustinSteppingstone.pdf · 2018-01-25 · MultitreeGeneralized SteppingstoneSampling–ANew MCMCMethodforEstimating](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f355f5214ddc76330074f21/html5/thumbnails/11.jpg)
Harmonic mean estimator of themarginal likelihood
• appealing because it comes “for free” after we havesampled the posterior using MCMC,
• unfortunately the estimator can have a huge varianceassociated with it in some (very common) cases. Forexample if:• the vast majority of parameter space has very low
likelihood, and• a very small region has high likelihoods.
![Page 12: MultitreeGeneralized SteppingstoneSampling–ANew …phylo.bio.ku.edu/slides/HolderAustinSteppingstone.pdf · 2018-01-25 · MultitreeGeneralized SteppingstoneSampling–ANew MCMCMethodforEstimating](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f355f5214ddc76330074f21/html5/thumbnails/12.jpg)
Importance sampling to approximate adifficult target distribution
1 Simulate points from an easy distribution.2 Reweight the points by the ratio of densities betweenthe easy and target distribution.
3 Treat the reweighted samples as draws from the targetdistribution.
![Page 13: MultitreeGeneralized SteppingstoneSampling–ANew …phylo.bio.ku.edu/slides/HolderAustinSteppingstone.pdf · 2018-01-25 · MultitreeGeneralized SteppingstoneSampling–ANew MCMCMethodforEstimating](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f355f5214ddc76330074f21/html5/thumbnails/13.jpg)
−3 −2 −1 0 1 2 3
0.0
0.5
1.0
1.5
2.0
2.5
Importance and target densities
x
dens
ity
![Page 14: MultitreeGeneralized SteppingstoneSampling–ANew …phylo.bio.ku.edu/slides/HolderAustinSteppingstone.pdf · 2018-01-25 · MultitreeGeneralized SteppingstoneSampling–ANew MCMCMethodforEstimating](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f355f5214ddc76330074f21/html5/thumbnails/14.jpg)
−3 −2 −1 0 1 2 3
0.0
0.5
1.0
1.5
2.0
2.5
Importance and target densities
x
dens
ity
−3 −2 −1 0 1 2 3
02
46
810
12
Importance weights
x
wei
ght
![Page 15: MultitreeGeneralized SteppingstoneSampling–ANew …phylo.bio.ku.edu/slides/HolderAustinSteppingstone.pdf · 2018-01-25 · MultitreeGeneralized SteppingstoneSampling–ANew MCMCMethodforEstimating](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f355f5214ddc76330074f21/html5/thumbnails/15.jpg)
−3 −2 −1 0 1 2 3
0.0
0.5
1.0
1.5
2.0
2.5
Importance and target densities
x
dens
ity
−3 −2 −1 0 1 2 3
02
46
810
12
Importance weights
x
wei
ght
−3 −2 −1 0 1 2 3
02
46
810
12
Samples from importance distribution
x
−3 −2 −1 0 1 2 3
02
46
810
12
Weighted samples
x
wei
ght
![Page 16: MultitreeGeneralized SteppingstoneSampling–ANew …phylo.bio.ku.edu/slides/HolderAustinSteppingstone.pdf · 2018-01-25 · MultitreeGeneralized SteppingstoneSampling–ANew MCMCMethodforEstimating](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f355f5214ddc76330074f21/html5/thumbnails/16.jpg)
Importance sampling
The method works well if the importance distribution is:• fairly similar to the target distribution, and• not “too tight” to allow sampling the full range of thetarget distribution
![Page 17: MultitreeGeneralized SteppingstoneSampling–ANew …phylo.bio.ku.edu/slides/HolderAustinSteppingstone.pdf · 2018-01-25 · MultitreeGeneralized SteppingstoneSampling–ANew MCMCMethodforEstimating](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f355f5214ddc76330074f21/html5/thumbnails/17.jpg)
In phylogenetics our posterior distribution is too peakedand our prior is too vague to allow us to use them inimportance sampling:
−2 −1 0 1 2
010
2030
40
Sharp posterior (black) and prior (red)
x
dens
ity
![Page 18: MultitreeGeneralized SteppingstoneSampling–ANew …phylo.bio.ku.edu/slides/HolderAustinSteppingstone.pdf · 2018-01-25 · MultitreeGeneralized SteppingstoneSampling–ANew MCMCMethodforEstimating](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f355f5214ddc76330074f21/html5/thumbnails/18.jpg)
Steppingstone sampling uses a series ofimportance sampling runs
−2 −1 0 1 2
010
2030
40
Steppingstone densities
x
dens
ity
![Page 19: MultitreeGeneralized SteppingstoneSampling–ANew …phylo.bio.ku.edu/slides/HolderAustinSteppingstone.pdf · 2018-01-25 · MultitreeGeneralized SteppingstoneSampling–ANew MCMCMethodforEstimating](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f355f5214ddc76330074f21/html5/thumbnails/19.jpg)
Steppingstone sampling (Xie et al. 2011, Fan et al. 2011)blends two distributions:• the posterior, P(D | θ,M1)P(θ,M1)
• a tractable reference distribution, π(θ)
pβ(θ | D,M1) =[P(D | θ,M1)P(θ,M1)]
β [π(θ)](1−β)
cβ
p1(θ | D,M1) is the posterior.c1 is the marginal likelihood of the model.
p0(θ | D,M1) is the reference distribution.c0 is 1.
![Page 20: MultitreeGeneralized SteppingstoneSampling–ANew …phylo.bio.ku.edu/slides/HolderAustinSteppingstone.pdf · 2018-01-25 · MultitreeGeneralized SteppingstoneSampling–ANew MCMCMethodforEstimating](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f355f5214ddc76330074f21/html5/thumbnails/20.jpg)
Steppingstone sampling (Xie et al. 2011, Fan et al. 2011)blends two distributions:• the posterior, P(D | θ,M1)P(θ,M1)
• a tractable reference distribution, π(θ)
pβ(θ | D,M1) =[P(D | θ,M1)P(θ,M1)]
β [π(θ)](1−β)
cβ
P(D |M1) =c1c0
=
(c1c0.38
)(c0.38
c0.1
)(c0.1c0.01
)(c0.01
c0
)=
(c1
���c0.38
)(���c0.38
��c0.1
)(��c0.1
���c0.01
)(���c0.01
c0
)
![Page 21: MultitreeGeneralized SteppingstoneSampling–ANew …phylo.bio.ku.edu/slides/HolderAustinSteppingstone.pdf · 2018-01-25 · MultitreeGeneralized SteppingstoneSampling–ANew MCMCMethodforEstimating](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f355f5214ddc76330074f21/html5/thumbnails/21.jpg)
Run MCMC with different β values
−2 −1 0 1 2
010
2030
40
Steppingstone densities
x
dens
ity
![Page 22: MultitreeGeneralized SteppingstoneSampling–ANew …phylo.bio.ku.edu/slides/HolderAustinSteppingstone.pdf · 2018-01-25 · MultitreeGeneralized SteppingstoneSampling–ANew MCMCMethodforEstimating](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f355f5214ddc76330074f21/html5/thumbnails/22.jpg)
P(D |M) =(
P(D|M)c0.38
)(c0.38c0.1
)(c0.1c0.01
) (c0.01
1
)
Photo by Johan Nobel http://www.flickr.com/photos/43147325@N08/4326713557/ downloaded from Wikimedia
![Page 23: MultitreeGeneralized SteppingstoneSampling–ANew …phylo.bio.ku.edu/slides/HolderAustinSteppingstone.pdf · 2018-01-25 · MultitreeGeneralized SteppingstoneSampling–ANew MCMCMethodforEstimating](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f355f5214ddc76330074f21/html5/thumbnails/23.jpg)
Reference distributions in Steppingstonesampling
In the original steppingstone (Xie et al 2011):
reference = prior
In generalized steppingstone (Fan et al 2011) it can be anydistribution:• Should be centered around the areas with highprobability,
• Must be a probability distribution with a knownnormalizing constant,
• Should be easy to draw sample from.
![Page 24: MultitreeGeneralized SteppingstoneSampling–ANew …phylo.bio.ku.edu/slides/HolderAustinSteppingstone.pdf · 2018-01-25 · MultitreeGeneralized SteppingstoneSampling–ANew MCMCMethodforEstimating](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f355f5214ddc76330074f21/html5/thumbnails/24.jpg)
The log Bayes Factor for a complex model compared to a simple(true) model. Estimated twice (by Fan et al, 2011)
Harmonic mean: Original Steppingstone:
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
● ●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
● ●
●
●●
●
●
●●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
−60 −40 −20 0 20
−60
−40
−20
020
Seed 1
Seed
2
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
−60 −40 −20 0 20
−60
−40
−20
020
Seed 1
Seed
2
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
−60 −40 −20 0 20
−60
−40
−20
020
Seed 1
Seed
2
![Page 25: MultitreeGeneralized SteppingstoneSampling–ANew …phylo.bio.ku.edu/slides/HolderAustinSteppingstone.pdf · 2018-01-25 · MultitreeGeneralized SteppingstoneSampling–ANew MCMCMethodforEstimating](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f355f5214ddc76330074f21/html5/thumbnails/25.jpg)
Original Steppingstone: Generalized Steppingstone:
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
−60 −40 −20 0 20
−60
−40
−20
020
Seed 1
Seed
2
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
● ●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
● ●
●
●●
●
●
●●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
−60 −40 −20 0 20
−60
−40
−20
020
Seed 1
Seed
2
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
−60 −40 −20 0 20
−60
−40
−20
020
Seed 1
Seed
2
Figure from Fan et al., 2011
![Page 26: MultitreeGeneralized SteppingstoneSampling–ANew …phylo.bio.ku.edu/slides/HolderAustinSteppingstone.pdf · 2018-01-25 · MultitreeGeneralized SteppingstoneSampling–ANew MCMCMethodforEstimating](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f355f5214ddc76330074f21/html5/thumbnails/26.jpg)
Steppingstone sampling when the tree isnot known
• generalized steppingstone assumes afixed tree,
• the original steppingstone can very slowwhen the tree is not known.
![Page 27: MultitreeGeneralized SteppingstoneSampling–ANew …phylo.bio.ku.edu/slides/HolderAustinSteppingstone.pdf · 2018-01-25 · MultitreeGeneralized SteppingstoneSampling–ANew MCMCMethodforEstimating](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f355f5214ddc76330074f21/html5/thumbnails/27.jpg)
Tree-CenteredIndependent-Split-Probability (TCISP)distribution
Input: a tree with probabilities for eachsplit (from a pilot MCMC run).
Output: a probability distribution over alltree topologies.
![Page 28: MultitreeGeneralized SteppingstoneSampling–ANew …phylo.bio.ku.edu/slides/HolderAustinSteppingstone.pdf · 2018-01-25 · MultitreeGeneralized SteppingstoneSampling–ANew MCMCMethodforEstimating](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f355f5214ddc76330074f21/html5/thumbnails/28.jpg)
A
G
D
E
J L
HF
C
KI
0.9
0.8 0.6 0.5
0.4 0.8
0.3
0.9
Input: a focal treewith split probabilitiesto center the distribution
![Page 29: MultitreeGeneralized SteppingstoneSampling–ANew …phylo.bio.ku.edu/slides/HolderAustinSteppingstone.pdf · 2018-01-25 · MultitreeGeneralized SteppingstoneSampling–ANew MCMCMethodforEstimating](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f355f5214ddc76330074f21/html5/thumbnails/29.jpg)
A
G
D
E
J L
HF
C
KI
The tree we will drawwill display the blue splitsand will not display the red splits
![Page 30: MultitreeGeneralized SteppingstoneSampling–ANew …phylo.bio.ku.edu/slides/HolderAustinSteppingstone.pdf · 2018-01-25 · MultitreeGeneralized SteppingstoneSampling–ANew MCMCMethodforEstimating](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f355f5214ddc76330074f21/html5/thumbnails/30.jpg)
A G
D
EJ
LHF
CKI
![Page 31: MultitreeGeneralized SteppingstoneSampling–ANew …phylo.bio.ku.edu/slides/HolderAustinSteppingstone.pdf · 2018-01-25 · MultitreeGeneralized SteppingstoneSampling–ANew MCMCMethodforEstimating](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f355f5214ddc76330074f21/html5/thumbnails/31.jpg)
A
G
D
E
J LH
F
C
KI
One of the many resolutionswhich avoid the red splits
![Page 32: MultitreeGeneralized SteppingstoneSampling–ANew …phylo.bio.ku.edu/slides/HolderAustinSteppingstone.pdf · 2018-01-25 · MultitreeGeneralized SteppingstoneSampling–ANew MCMCMethodforEstimating](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f355f5214ddc76330074f21/html5/thumbnails/32.jpg)
A
G
D
E
J L
HF
C
KI
A
G
D
E
J LH
F
C
KI
![Page 33: MultitreeGeneralized SteppingstoneSampling–ANew …phylo.bio.ku.edu/slides/HolderAustinSteppingstone.pdf · 2018-01-25 · MultitreeGeneralized SteppingstoneSampling–ANew MCMCMethodforEstimating](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f355f5214ddc76330074f21/html5/thumbnails/33.jpg)
Calculating a tree’s probability
A
G
D
E
J L
HF
C
KI
0.9
0.8 0.6 0.5
0.4 0.8
0.3
0.9
A
G
D
E
J LH
F
C
KI
focal tree tree to score
![Page 34: MultitreeGeneralized SteppingstoneSampling–ANew …phylo.bio.ku.edu/slides/HolderAustinSteppingstone.pdf · 2018-01-25 · MultitreeGeneralized SteppingstoneSampling–ANew MCMCMethodforEstimating](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f355f5214ddc76330074f21/html5/thumbnails/34.jpg)
Calculating a tree’s probability
A
G
D
E
J L
HF
C
KI
A
G
D
E
J LH
F
C
KI
focal tree tree to score
![Page 35: MultitreeGeneralized SteppingstoneSampling–ANew …phylo.bio.ku.edu/slides/HolderAustinSteppingstone.pdf · 2018-01-25 · MultitreeGeneralized SteppingstoneSampling–ANew MCMCMethodforEstimating](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f355f5214ddc76330074f21/html5/thumbnails/35.jpg)
Calculating a tree’s probability
A
G
D
E
J L
HF
C
KI
0.9
0.2 0.4 0.5
0.6 0.2
0.7
0.9
A
G
D
E
J LH
F
C
KI
focal tree tree to score
![Page 36: MultitreeGeneralized SteppingstoneSampling–ANew …phylo.bio.ku.edu/slides/HolderAustinSteppingstone.pdf · 2018-01-25 · MultitreeGeneralized SteppingstoneSampling–ANew MCMCMethodforEstimating](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f355f5214ddc76330074f21/html5/thumbnails/36.jpg)
Calculating a tree’s probability
A
G
D
E
J L
HF
C
KI
0.9
0.2 0.4 0.5
0.6 0.2
0.7
0.9
A G
D
EJ
LHF
CKI
focal tree tree with shared splits
P = 0.2× 0.9× 0.4× 0.5× 0.6× 0.2× 0.7× 0.9
![Page 37: MultitreeGeneralized SteppingstoneSampling–ANew …phylo.bio.ku.edu/slides/HolderAustinSteppingstone.pdf · 2018-01-25 · MultitreeGeneralized SteppingstoneSampling–ANew MCMCMethodforEstimating](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f355f5214ddc76330074f21/html5/thumbnails/37.jpg)
Counting trees the max. distance from atree
• Bryant and Steel(2009) provide anO(n5) algorithm.
• Bryant contributed an O(n2) algorithm.
![Page 38: MultitreeGeneralized SteppingstoneSampling–ANew …phylo.bio.ku.edu/slides/HolderAustinSteppingstone.pdf · 2018-01-25 · MultitreeGeneralized SteppingstoneSampling–ANew MCMCMethodforEstimating](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f355f5214ddc76330074f21/html5/thumbnails/38.jpg)
Multitree steppingstone sampling• Has been validated on small data sets.• Minimum running time is unknown.• Appears to be a practical way to approximatea model’s marginal likelihood.
• The TCISP distribution could be useful inother context.
• The method assumes unrooted, fully resolvedtrees,
• implemented in phycas(http://www.phycas.org)
Holder, Lewis, Swofford, Bryant (in prep)
![Page 39: MultitreeGeneralized SteppingstoneSampling–ANew …phylo.bio.ku.edu/slides/HolderAustinSteppingstone.pdf · 2018-01-25 · MultitreeGeneralized SteppingstoneSampling–ANew MCMCMethodforEstimating](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f355f5214ddc76330074f21/html5/thumbnails/39.jpg)
Marginal likelihoods• Harmonic mean estimator is not reliable• AIC is preferable to the harmonic meanestimator (Guy Baele et al MBE 2012)
• Thermodynamic Integration (aka PathSampling) is an accurate method (Lartillot andPhilippe, 2006; Rodrigue and Aris-Brousou)
• Model-jumping approaches avoid the need tocalculate marginal likelihoods.
• Steppingstone and/or Path Sampling areavailable in MrBayes, Beast, ∗Beast, Migrate,and other Bayesian software for evolutionaryanalyses
![Page 40: MultitreeGeneralized SteppingstoneSampling–ANew …phylo.bio.ku.edu/slides/HolderAustinSteppingstone.pdf · 2018-01-25 · MultitreeGeneralized SteppingstoneSampling–ANew MCMCMethodforEstimating](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f355f5214ddc76330074f21/html5/thumbnails/40.jpg)
Thanks!
Thanks to the organizers, NSF, and to you(for listening to stats on a Saturdaymorning)
![Page 41: MultitreeGeneralized SteppingstoneSampling–ANew …phylo.bio.ku.edu/slides/HolderAustinSteppingstone.pdf · 2018-01-25 · MultitreeGeneralized SteppingstoneSampling–ANew MCMCMethodforEstimating](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f355f5214ddc76330074f21/html5/thumbnails/41.jpg)
This slide shows the math to demonstrate that we can usethe harmonic mean to estimate the marginal likelihood,P (D). Consider, a model that has a discrete parameter vthat can take two values (r and b). h in the harmonic meanof the likelihoods for parameters values sampled from theposterior distribution
h =1
P(v = r|D)[
1P(D|v=r)
]+ P(v = b|D)
[1
P(D|v=b)
]=
1[P(v=r)P(D|v=r)
P(D)
] [1
P(D|v=r)
]+[
P(v=b)P(D|v=b)P(D)
] [1
P(D|v=b)
]=
1[P(v=r)P(D)
]+[
P(v=b)P(D)
]=
P(D)
P(v = r) + P(v = b)
= P(D)
![Page 42: MultitreeGeneralized SteppingstoneSampling–ANew …phylo.bio.ku.edu/slides/HolderAustinSteppingstone.pdf · 2018-01-25 · MultitreeGeneralized SteppingstoneSampling–ANew MCMCMethodforEstimating](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f355f5214ddc76330074f21/html5/thumbnails/42.jpg)
The harmonic mean as an importancesampling estimator
The marginal likelihood is the expected value of thelikelihood over points drawn from the prior:
P(D) = Ep(θ)[P(D|θ)] =
∫P(D|θ)P(θ)dθ
In importance sampling, we draw parameter values θ froma different density, g(θ), but multiple the points by aweight, w:
Ep(θ)[P(D|θ)] =1n
∑ni=1 P(D|θi)wi
b
where b is a normalization constant.
![Page 43: MultitreeGeneralized SteppingstoneSampling–ANew …phylo.bio.ku.edu/slides/HolderAustinSteppingstone.pdf · 2018-01-25 · MultitreeGeneralized SteppingstoneSampling–ANew MCMCMethodforEstimating](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f355f5214ddc76330074f21/html5/thumbnails/43.jpg)
The appropriate weight is the ratio of the target densityand our importance density:
wi =P(θi)
g(θi)
And it turns out that the normalization constant is simplya sum of the importance weights:
b =1
n
n∑i=1
P(θi)
g(θi)
Ep(θ)[P(D|θ)] =
1n
∑ni=1
P(D|θi)P(θi)g(θi)
1n
∑ni=1
P(θi)g(θi)
![Page 44: MultitreeGeneralized SteppingstoneSampling–ANew …phylo.bio.ku.edu/slides/HolderAustinSteppingstone.pdf · 2018-01-25 · MultitreeGeneralized SteppingstoneSampling–ANew MCMCMethodforEstimating](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f355f5214ddc76330074f21/html5/thumbnails/44.jpg)
Ep(θ)[P(D|θ)] =
1n
∑ni=1
P(D|θi)P(θi)g(θi)
1n
∑ni=1
P(θi)g(θi)
If the importance distribution, g(θ), is the prior then theimportance weights are all 1:
Ep(θ)[P(D|θ)] =
1n
∑ni=1
P(D|θi)P(θi)P(θi)
1n
∑ni=1
P(θi)P(θi)
=1
n
n∑i=1
P(D|θi)
Recall that the points, θ, are draw by sampling from theimportance distribution, so this sum of likelihoods isalready weighted by the prior.
![Page 45: MultitreeGeneralized SteppingstoneSampling–ANew …phylo.bio.ku.edu/slides/HolderAustinSteppingstone.pdf · 2018-01-25 · MultitreeGeneralized SteppingstoneSampling–ANew MCMCMethodforEstimating](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f355f5214ddc76330074f21/html5/thumbnails/45.jpg)
Ep(θ)[P(D|θ)] =
1n
∑ni=1
P(D|θi)P(θi)g(θi)
1n
∑ni=1
P(θi)g(θi)
If the importance distribution, g(θ) is the posterior then:
Ep(θ)[P(D|θ)] =
1n
∑ni=1
P(D|θi)P(θi)P(D|θi)P(θi)
1n
∑ni=1
P(θi)P(D|θi)P(θi)
=1
1n
∑ni=1
P(θi)P(D|θi)P(θi)
=n∑n
i=11
P(D|θi)
This is the justification of the harmonic mean estimator ofthe marginal likelihood.
![Page 46: MultitreeGeneralized SteppingstoneSampling–ANew …phylo.bio.ku.edu/slides/HolderAustinSteppingstone.pdf · 2018-01-25 · MultitreeGeneralized SteppingstoneSampling–ANew MCMCMethodforEstimating](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f355f5214ddc76330074f21/html5/thumbnails/46.jpg)
Lartillot and Philippe’s thermodynamicintegration
Like the steppingstone sampler, the thermodyanmicintegration method (or path sampling) use power posteriordensities with 0 ≤ β ≤ 1:
pβ(θ|D) =P(D|θ)βP(θ)
cβ
with c0 = 0 and c1 = P(D).Lartillot and Philippe showed that
∂ ln cβ∂β
= Epβ
[∂ ln
[P(D|θ)βP(θ)
]∂β
]Note that by definition of a definite integral:∫ 1
0
∂ ln cβ∂β
dβ = ln c1 − ln c0 = ln P(D)
![Page 47: MultitreeGeneralized SteppingstoneSampling–ANew …phylo.bio.ku.edu/slides/HolderAustinSteppingstone.pdf · 2018-01-25 · MultitreeGeneralized SteppingstoneSampling–ANew MCMCMethodforEstimating](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f355f5214ddc76330074f21/html5/thumbnails/47.jpg)
Thus,
ln P(D) =
∫ 1
0
Epβ
[∂ ln
[P(D|θ)βP(θ)
]∂β
]dβ
By differentiating we see that:
∂ ln[P(D|θ)βP(θ)
]∂β
=∂[ln P(θ) + β ln P(D|θ)]
∂β
= ln P(D|θ)
The integration is not analytically tractable, but we cancalculate Epβ [ln P(D|θ)] by conducting MCMC at the apower posterior distribution pβ and taking an average ofthe log-likelihoods.Then we can use standard numerical integration techniquesto estimate the integral.
![Page 48: MultitreeGeneralized SteppingstoneSampling–ANew …phylo.bio.ku.edu/slides/HolderAustinSteppingstone.pdf · 2018-01-25 · MultitreeGeneralized SteppingstoneSampling–ANew MCMCMethodforEstimating](https://reader034.fdocuments.in/reader034/viewer/2022042419/5f355f5214ddc76330074f21/html5/thumbnails/48.jpg)
This integration is not analytically tractable, but we cancalculate Epβ [ln P(D|θ)] by conducting MCMC at the apower posterior distribution pβ and taking an average ofthe log-likelihoods.Then we can use standard numerical integration techniquesto estimate the integral.