Non-Parametric Bayesian Population Dynamics …Non-Parametric Bayesian Population Dynamics Inference...
Transcript of Non-Parametric Bayesian Population Dynamics …Non-Parametric Bayesian Population Dynamics Inference...
Non-Parametric Bayesian Population DynamicsInference
Philippe Lemey and Marc A. Suchard
Department of Microbiology and ImmunologyK.U. Leuven, Belgium, and
Departments of Biomathematics, Biostatistics and Human GeneticsUniversity of California, Los Angeles
SISMID
Lemey and Suchard (UCLA) SISMID 1 / 16
Review: Continuous-Time Coalescent
Time measured in N generation unitsN = const ! uk " Exp
!"k2#$
N = N(t) !
Pr(uk > t |tk+1) = e!(k2)R t+tk+1tk+1
NN(u)du
uk are not independent any more
Constantpopulation size
Exponentialgrowth
Lemey and Suchard (UCLA) SISMID 2 / 16
Review: Continuous-Time Coalescent
Time measured in N generation unitsN = const ! uk " Exp
!"k2#$
N = N(t) !
Pr(uk > t |tk+1) = e!(k2)R t+tk+1tk+1
NN(u)du
uk are not independent any more
Constantpopulation size
Exponentialgrowth
Lemey and Suchard (UCLA) SISMID 2 / 16
Review: Continuous-Time Coalescent
Time
t4
t3
t2
t1
u4
u3
u2
Time measured in N generation unitsN = const ! uk " Exp
!"k2#$
N = N(t) !
Pr(uk > t |tk+1) = e!(k2)R t+tk+1tk+1
NN(u)du
uk are not independent any more
Constantpopulation size
Exponentialgrowth
Lemey and Suchard (UCLA) SISMID 2 / 16
Review: Continuous-Time Coalescent
Time
t4
t3
t2
t1
u4
u3
u2
Time measured in N generation unitsN = const ! uk " Exp
!"k2#$
N = N(t) !
Pr(uk > t |tk+1) = e!(k2)R t+tk+1tk+1
NN(u)du
uk are not independent any more
Constantpopulation size
Exponentialgrowth
N(t) = N
N(t) = Ne−100t
Lemey and Suchard (UCLA) SISMID 2 / 16
Sequence Data ! Population Model Parameters
accggaaacgcgcgaaatttacacgggggaccggaaacgcgcgaaatttacacggggg
accggaaacgcgcgaaatttacacgggggaccggaaacgcgcgaaatttacacgggggaccggaaacgcgcgaaatttacacggggg N(t)
Time
GenealogySequence Data Pop. Dynamics
More Formally (Bayesian Approach):Pr (G, Q,! |D) # Pr (D |G, Q) Pr (Q) Pr (G |!) Pr (!)
G - genealogy with branch lengthsQ - substitution matrix! - population genetics parametersD - sequence dataPr (G |!) - Coalescent prior
Lemey and Suchard (UCLA) SISMID 3 / 16
Piecewise Constant Demographic Model
u3 u4
t2t1 t3 t4 t5= 0
u2
42Number of Lineages: 53
5u
Isochronous DataNe(t) = !k for tk < t $ tk!1.
u2, . . . ,un are independent
Pr (uk | !k ) = k(k!1)2!k
e!k(k!1)uk
2!k
Pr (F |!) #%n
k=2 Pr (uk | !k )
Equivalent to estimating exponential mean from one observation.
Need further restrictions to estimate !!Lemey and Suchard (UCLA) SISMID 4 / 16
Piecewise Constant Demographic Model
Number of Lineages:
41 w51w50 w52
u3 u4 u5
w30 w40
u2
w20
343432 1
w
Heterochronous Dataw20, . . . ,wnjn are independent
Pr(wk0 | !k )=nk0(nk0!1)
2!ke!
nk0(nk0!1)wk02!k
Pr(wkj | !k)=e!
nkj (nkj!1)wkj2!k , j>0
Pr (F |!) "Qn
k=2Qjk
j=0 Pr`
wkj | !k´
Equivalent to estimating exponential mean from one observation.
Need further restrictions to estimate !!Lemey and Suchard (UCLA) SISMID 4 / 16
Piecewise Constant Demographic Model
Number of Lineages:
41 w51w50 w52
u3 u4 u5
w30 w40
u2
w20
343432 1
w
Heterochronous Dataw20, . . . ,wnjn are independent
Pr(wk0 | !k )=nk0(nk0!1)
2!ke!
nk0(nk0!1)wk02!k
Pr(wkj | !k)=e!
nkj (nkj!1)wkj2!k , j>0
Pr (F |!) "Qn
k=2Qjk
j=0 Pr`
wkj | !k´
Equivalent to estimating exponential mean from one observation.
Need further restrictions to estimate !!Lemey and Suchard (UCLA) SISMID 4 / 16
Current Approaches
Strimmer and Pybus (2001)Make Ne(t) constant across some inter-Coalescent timesGroup inter-Coalescent intervals with AIC
Drummond et al. (2005)Multiple change-point model with fixed number of change-pointsChange-points allowed only at Coalescent eventsJoint estimation of phylogenies and population dynamics
Opgen-Rhein et al. (2005)Multiple change-point model with random number ofchange-pointsChange-points allowed anywhere in interval (0, t1]Posterior is approximated with rjMCMC
Lemey and Suchard (UCLA) SISMID 5 / 16
Current Approaches
Strimmer and Pybus (2001)Make Ne(t) constant across some inter-Coalescent timesGroup inter-Coalescent intervals with AIC
Drummond et al. (2005)Multiple change-point model with fixed number of change-pointsChange-points allowed only at Coalescent eventsJoint estimation of phylogenies and population dynamics
Opgen-Rhein et al. (2005)Multiple change-point model with random number ofchange-pointsChange-points allowed anywhere in interval (0, t1]Posterior is approximated with rjMCMC
Lemey and Suchard (UCLA) SISMID 5 / 16
Current Approaches
Strimmer and Pybus (2001)Make Ne(t) constant across some inter-Coalescent timesGroup inter-Coalescent intervals with AIC
Drummond et al. (2005)Multiple change-point model with fixed number of change-pointsChange-points allowed only at Coalescent eventsJoint estimation of phylogenies and population dynamics
Opgen-Rhein et al. (2005)Multiple change-point model with random number ofchange-pointsChange-points allowed anywhere in interval (0, t1]Posterior is approximated with rjMCMC
Lemey and Suchard (UCLA) SISMID 5 / 16
Smoothing Prior (GMRF approach)Go to the log scale xk = log !k
Pr(x |") # "(n!2)/2 exp&
%"
2
n!2'
k=1
1dk
(xk+1 % xk )2(
31x x4 xn!2
xn!1
d1 d2 d3 dn−2
x2 x
Weighting Schemes1 Uniform: dk = 12 Time-Aware: dk = uk+1+uk
2
Pr(x,") = Pr(x |")Pr(")
Pr(") # ""!1e!#$, diffuse prior with # = 0.01, $ = 0.01Lemey and Suchard (UCLA) SISMID 6 / 16
MCMC Algorithm
Pr (G, Q, x |D) # Pr (D |G, Q) Pr (Q) Pr (G |x) Pr (x)
Updating Population Size TrajectoryUse fast GMRF sampling (Rue et al., 2001, 2004)Draw "# from an arbitrary univariate proposal distributionUse Gaussian approximation of Pr(x |"#, G) to propose x#
Jointly accept/reject ("#, x#) in Metropolis-Hastings step
Object-Oriented Reality?
BEAST = Bayesian Evolutionary Analysis Sampling Trees
Pr(G |x, D, Q) - sampled by BEASTPr(Q |G, D) - sampled by BEAST
Lemey and Suchard (UCLA) SISMID 7 / 16
Simulation: Constant Population SizeClassical Skyline Plot
Effe
ctive
Pop
ulat
ion
Size
1.5 1.0 0.5 0.0
0.1
1.0
5.0
ORMCP Model
1.5 1.0 0.5 0.0
0.1
1.0
5.0
Beast MCP
1.5 1.0 0.5 0.0
0.1
1.0
5.0
Uniform GMRF
Time (Past to Present)
Effe
ctive
Pop
ulat
ion
Size
1.5 1.0 0.5 0.0
0.1
1.0
5.0
Time−Aware GMRF
Time (Past to Present)1.5 1.0 0.5 0.0
0.1
1.0
5.0
Beast GMRF
Time (Past to Present)1.5 1.0 0.5 0.0
0.1
1.0
5.0
Lemey and Suchard (UCLA) SISMID 8 / 16
Simulation: Exponential GrowthClassical Skyline Plot
Effe
ctive
Pop
ulat
ion
Size
0.014 0.010 0.006 0.002
0.00
10.
11.
0
ORMCP Model
0.014 0.010 0.006 0.002
0.00
10.
11.
0
Beast MCP
0.014 0.010 0.006 0.002
0.00
10.
11.
0
Uniform GMRF
Time (Past to Present)
Effe
ctive
Pop
ulat
ion
Size
0.014 0.010 0.006 0.002
0.00
10.
11.
0
Time−Aware GMRF
Time (Past to Present)0.014 0.010 0.006 0.002
0.00
10.
11.
0
Beast GMRF
Time (Past to Present)0.014 0.010 0.006 0.002
0.00
10.
11.
0
Lemey and Suchard (UCLA) SISMID 9 / 16
Simulation: Exponential Growth with BottleneckClassical Skyline Plot
Effe
ctive
Pop
ulat
ion
Size
0.15 0.10 0.05 0.00
0.00
10.
011.
0
ORMCP Model
0.15 0.10 0.05 0.00
0.00
10.
011.
0
Beast MCP
0.15 0.10 0.05 0.00
0.00
10.
011.
0
Uniform GMRF
Time (Past to Present)
Effe
ctive
Pop
ulat
ion
Size
0.15 0.10 0.05 0.00
0.00
10.
011.
0
Time−Aware GMRF
Time (Past to Present)0.15 0.10 0.05 0.00
0.00
10.
011.
0Beast GMRF
Time (Past to Present)0.15 0.10 0.05 0.00
0.00
10.
011.
0Lemey and Suchard (UCLA) SISMID 10 / 16
Accuracy in Simulations
Percent Error =
) TMRCA
0
|*Ne(t) % Ne(t)|Ne(t)
dt & 100, (1)
Table: Percent error in simulations. We compare percent errors, defined inequation (1), for the Opgen-Rhein multiple change-point (ORMCP), uniformand fixed-tree time-aware Gaussian Markov random field (GMRF) smoothing,BEAST multiple change-point (MCP) model, and BEAST GMRF smoothing.
Model Constant Exponential BottleneckORMCP 14.0 1.7 7.4Uniform GMRF 32.8 1.5 5.9Time-Aware GMRF 2.8 1.2 4.8BEAST MCP 38.2 1.6 5.2BEAST GMRF 1.7 1.0 5.4
Lemey and Suchard (UCLA) SISMID 11 / 16
GMRF Precision Prior Sensitivity" - GMRF precision, controls smoothnessUsually Pr(" |D) is sensitive to perturbations of Pr(")
Not in our Coalescent model!
GMRF Precision Prior and Posterior
log τ
Dens
ity0.
00.
20.
40.
60.
8
−10 −8 −6 −4 −2 0 2
prior
posterior
1 2 5 10 100−6
−5−4
−3
GMRF Precision Sensitiviy to Prior
Prior Mean of τ
Post
erio
r of l
og τ
Lemey and Suchard (UCLA) SISMID 12 / 16
HCV Epidemics in EgyptEstimated Genealogy BEAST GMRF
Scal
ed E
ffect
ive
Pop.
Siz
e
250 150 50 0
1010
210
310
410
5
Unconstrained Fixed−Tree GMRF
Time (Years Before 1993)
Scal
ed E
ffect
ive
Pop.
Siz
e
250 150 50 0
1010
210
310
410
5
Constrained Fixed−Tree GMRF
Time (Years Before 1993)250 150 50 0
1010
210
310
410
5
PAT Starts
Random populationsampleNo sign of populationsub-structure
Parenteralantischistosomaltherapy (PAT) waspracticed from 1920sto 1980s
Bayes Factor 12,880in favor of constantpopulation size priorto 1920
Lemey and Suchard (UCLA) SISMID 13 / 16
Influenza Intra-Season Population Dynamics1999−2000 Season
Time (Weeks before 03/01/00)
Scal
ed E
ffect
ive
Pop.
Siz
e
40 30 20 10 0
1010
010
310
4
October 1
Time (Weeks before 03/01/00)100 80 60 40 20 0
1010
010
310
4
2001−2002 Season
Time (Weeks before 03/21/02)
Scal
ed E
ffect
ive
Pop.
Siz
e
60 50 40 30 20 10 0
1010
010
310
4
October 1
Time (Weeks before 03/21/02)70 50 30 10 0
1010
010
310
4New York state hemagglutinin sequences serially sampled(Ghedin et al., 2005)
Lemey and Suchard (UCLA) SISMID 14 / 16
Influenza Intra-Season Population Dynamics2001−2002 Season
Time (Weeks before 03/21/02)
Scal
ed E
ffect
ive
Pop.
Siz
e
60 50 40 30 20 10 0
1010
010
310
4
October 1
Time (Weeks before 03/21/02)70 50 30 10 0
1010
010
310
4
2003−2004 Season
Time (Weeks before 02/05/04)
Scal
ed E
ffect
ive
Pop.
Siz
e
80 60 40 20 0
110
100
103
104
105
October 1
Time (Weeks before 02/05/04)50 40 30 20 10 0
110
100
103
104
105
New York state hemagglutinin sequences serially sampled(Ghedin et al., 2005)
Lemey and Suchard (UCLA) SISMID 14 / 16
Summary
Genealogies inform us about population size trajectoriesPrior restrictions are necessary for non(semi)-parametricestimation of Ne(t)Smoothing can be imposed by GMRF priors
Software: The Skyride
Implemented as a Coalescent prior inBEASTExploits approximate Gibbs samplingFaster convergence? Better mixing?
Reference: Minin, Bloomquist and Suchard (2008) Molecular Biology &Evolution, 25, 1459–1471.
Lemey and Suchard (UCLA) SISMID 15 / 16
Active Ideas: GMRFs are Highly Generalizable
Hierarchical Modeling
Flu genes display similar (not equal) dynamics
Incorporate multiple locisimultaneouslyPool information for statistical powerNo need for strict equality
Introducing Covariates
Augment field at fixed observation timesFormal statistical testing for:
External factors (environment, drug tx)Population dynamics (bottle-necks, growth)
Lemey and Suchard (UCLA) SISMID 16 / 16