Divergence-time estimation in RevBayes
-
Upload
tracy-heath -
Category
Science
-
view
178 -
download
3
Transcript of Divergence-time estimation in RevBayes
E S D T
Tracy A. HeathUniversity of Kansas
University of California, BerkeleyIowa State University (in 2015)
Tanja StadlerETH Zürich
2014 Phylogenetic Analysis in RevBayesNESCent Academy
A T-S E
Phylogenetic trees can provide both topological informationand temporal information
0.2 expected
substitutions/site
Primates
Carnivora
Cetacea
Simiiformes
Artiodactyla
Mic
roce
bu
s
Homininae
Hippo
Fo
ssil
Ca
libra
tio
ns
Ph
ylo
ge
ne
tic R
ela
tio
nsh
ips
Se
qu
en
ce
Da
ta
Did Simiiformes experience
accelerated rates of molecular
evolution?
What is the age of the MRCA of
mouse lemurs (Microcebus)?
100 0.020.040.060.080.0
EquusRhinocerosBosHippopotamusBalaenopteraPhyseterUrsusCanisFelisHomoPanGorillaPongoMacacaCallithrixLorisGalagoDaubentoniaVareciaEulemurLemurHapalemurPropithecusLepilemur
MirzaM. murinusM. griseorufus
M. myoxinusM. berthaeM. rufus1M. tavaratraM. rufus2M. sambiranensisM. ravelobensis
Cheirogaleus
Sim
iiform
es
Mic
rocebus
Cretaceous Paleogene Neogene Q
Time (Millions of years)
Understanding Evolutionary Processes (Yang & Yoder Syst. Biol. 2003)
A T-S EPhylogenetic divergence-timeestimation
• What was the spacial andclimatic environment of ancientangiosperms?
• Did the uplift of the PatagonianAndes drive the diversity ofPeruvian lilies?
• How has mammalian body-sizechanged over time?
• Is diversification in Caribbeananoles correlated with ecologicalopportunity?
• How has the rate of molecularevolution changed across theTree of Life?
(Antonelli & Sanmartin. Syst. Biol. 2011)
(Lartillot & Delsuc. Evolution 2012) (Mahler, Revell, Glor, & Losos. Evolution 2010)
(Nabholz, Glemin, Galtier. MBE 2008)
Historical biogeography Molecular evolution
Trait evolution
Diversi�cation
Anolis fowleri (image by L. Mahler)
Understanding Evolutionary Processes
T D I D
Divergence time estimationof rapidly evolvingpathogens provideinformation about spatialand temporal dynamics ofinfectious diseases
Sequences sampled atdifferent time horizonsimpose a temporal structureon the tree by providingages for non-contemporaneous tips
(Pybus & Rambaut. 2009. Nature Reviews Genetics.)
D T E
Goal: Estimate the ages of interior nodes to understand thetiming and rates of evolutionary processes
Model how rates aredistributed across the tree
Describe the distribution ofspeciation events over time
External calibrationinformation for estimates ofabsolute node times
calibrated node
100 0.020.040.060.080.0
EquusRhinocerosBosHippopotamusBalaenopteraPhyseterUrsusCanisFelisHomoPanGorillaPongoMacacaCallithrixLorisGalagoDaubentoniaVareciaEulemurLemurHapalemurPropithecusLepilemur
MirzaM. murinusM. griseorufus
M. myoxinusM. berthaeM. rufus1M. tavaratraM. rufus2M. sambiranensisM. ravelobensis
Cheirogaleus
Sim
iiform
es
Mic
rocebus
Cretaceous Paleogene Neogene Q
Time (Millions of years)
A T-S E
Phylogenetic trees can provide both topological informationand temporal information
100 0.020.040.060.080.0
EquusRhinocerosBosHippopotamusBalaenopteraPhyseterUrsusCanisFelisHomoPanGorillaPongoMacacaCallithrixLorisGalagoDaubentoniaVareciaEulemurLemurHapalemurPropithecusLepilemur
MirzaM. murinusM. griseorufus
M. myoxinusM. berthaeM. rufus1M. tavaratraM. rufus2M. sambiranensisM. ravelobensis
Cheirogaleus
Sim
iiform
es
Mic
roce
bu
s
Cretaceous Paleogene Neogene Q
Time (Millions of years)
Understanding Evolutionary Processes (Yang & Yoder Syst. Biol. 2003; Heath et al. MBE 2012)
T G M C
Assume that the rate ofevolutionary change isconstant over time
(branch lengths equalpercent sequencedivergence) 10%
400 My
200 My
A B C
20%
10%10%
(Based on slides by Jeff Thorne; http://statgen.ncsu.edu/thorne/compmolevo.html)
T G M C
We can date the tree if weknow the rate of change is1% divergence per 10 My N
A B C
20%
10%10%
10%200 My
400 My
200 My
(Based on slides by Jeff Thorne; http://statgen.ncsu.edu/thorne/compmolevo.html)
T G M C
If we found a fossil of theMRCA of B and C, we canuse it to calculate the rateof change & date the rootof the tree
N
A B C
20%
10%10%
10%200 My
400 My
(Based on slides by Jeff Thorne; http://statgen.ncsu.edu/thorne/compmolevo.html)
R G M C
Rates of evolution vary across lineages and over time
Mutation rate:Variation in
• metabolic rate
• generation time
• DNA repair
Fixation rate:Variation in
• strength and targets ofselection
• population sizes
10%
400 My
200 My
A B C
20%
10%10%
U A
Sequence data provideinformation about branchlengths
In units of the expected # ofsubstitutions per site
branch length = rate × time0.2 expected
substitutions/site
Ph
ylo
ge
ne
tic R
ela
tio
nsh
ips
Se
qu
en
ce
Da
ta
P L
f (D | V , θs,Ψ)
V Vector of branch lengths
θs Sequence model parameters
D Sequence data
Ψ Tree topology
R T
The expected # of substitutions/site occurring along abranch is the product of the substitution rate and time
length = rate × time length = rate length = time
Methods for dating species divergences estimate thesubstitution rate and time separately
R T
The sequence dataprovide informationabout branch length
for any possible rate,there’s a time that fitsthe branch lengthperfectly
0
1
2
3
4
5
0 1 2 3 4 5
Bra
nch
Rat
e
Branch Time
time = 0.8rate = 0.625
branch length = 0.5
(based on Thorne & Kishino, 2005)
B D T E
length = rate length = time
R = (r, r, r, . . . , rN−)
A = (a, a, a, . . . , aN−)
N = number of tips
B D T E
length = rate length = time
R = (r, r, r, . . . , rN−)
A = (a, a, a, . . . , aN−)
N = number of tips
B D T E
Posterior probability
f (R,A, θR, θA, θs | D,Ψ)
R Vector of rates on branches
A Vector of internal node ages
θR, θA, θs Model parameters
D Sequence data
Ψ Tree topology
B D T E
f(R,A, θR, θA, θs | D) =
f (D | R,A, θs) f(R | θR) f(A | θA) f(θs)
f(D)
f(D | R,A, θR, θA, θs) Likelihood
f(R | θR) Prior on rates
f(A | θA) Prior on node ages
f(θs) Prior on substitution parameters
f(D) Marginal probability of the data
B D T E
Estimating divergence times relies on 2 main elements:
• Branch-specific rates: f (R | θR)
• Node ages: f (A | θA,C)
M R VSome models describing lineage-specific substitution ratevariation:
• Global molecular clock (Zuckerkandl & Pauling, 1962)
• Local molecular clocks (Hasegawa, Kishino & Yano 1989;Kishino & Hasegawa 1990; Yoder & Yang 2000; Yang & Yoder
2003, Drummond and Suchard 2010)
• Punctuated rate change model (Huelsenbeck, Larget andSwofford 2000)
• Log-normally distributed autocorrelated rates (Thorne,Kishino & Painter 1998; Kishino, Thorne & Bruno 2001; Thorne &
Kishino 2002)
• Uncorrelated/independent rates models (Drummond et al.2006; Rannala & Yang 2007; Lepage et al. 2007)
• Mixture models on branch rates (Heath, Holder, Huelsenbeck2012)
Models of Lineage-specific Rate Variation
G M C
The substitution rate isconstant over time
All lineages share the samerate
branch length = substitution rate
low high
Models of Lineage-specific Rate Variation (Zuckerkandl & Pauling, 1962)
G M C
rate
de
nsity
r
rate prior distribution
Models of Lineage-specific Rate Variation (Zuckerkandl & Pauling, 1962)
G M C
The sampled rate is appliedto every branch in the tree
rate
de
nsity
r
rate prior distribution
Models of Lineage-specific Rate Variation (Zuckerkandl & Pauling, 1962)
R G M C
Rates of evolution vary across lineages and over time
Mutation rate:Variation in
• metabolic rate
• generation time
• DNA repair
Fixation rate:Variation in
• strength and targets ofselection
• population sizes
10%
400 My
200 My
A B C
20%
10%10%
R-C M
To accommodate variation in substitution rates‘relaxed-clock’ models estimate lineage-specific substitutionrates
• Local molecular clocks
• Punctuated rate change model
• Log-normally distributed autocorrelated rates
• Uncorrelated/independent rates models
• Mixture models on branch rates
L M C
Rate shifts occurinfrequently over the tree
Closely related lineageshave equivalent rates(clustered by sub-clades)
low high
branch length = substitution rate
Models of Lineage-specific Rate Variation (Yang & Yoder 2003, Drummond and Suchard 2010)
L M C
Most methods forestimating local clocksrequired specifying thenumber and locations ofrate changes a priori
Drummond and Suchard(2010) introduced aBayesian method thatsamples over a broad rangeof possible random localclocks
low high
branch length = substitution rate
Models of Lineage-specific Rate Variation (Yang & Yoder 2003, Drummond and Suchard 2010)
A R
Substitution rates evolvegradually over time –closely related lineages havesimilar rates
The rate at a node isdrawn from a lognormaldistribution with a meanequal to the parent rate
low high
branch length = substitution rate
Models of Lineage-specific Rate Variation (Thorne, Kishino & Painter 1998; Kishino, Thorne & Bruno 2001)
A R
Models of Lineage-specific Rate Variation (Thorne, Kishino & Painter 1998; Kishino, Thorne & Bruno 2001)
P R C
Rate changes occur alonglineages according to apoint process
At rate-change events, thenew rate is a product ofthe parent’s rate and aΓ-distributed multiplier
low high
branch length = substitution rate
Models of Lineage-specific Rate Variation (Huelsenbeck, Larget and Swofford 2000)
I/U R
Lineage-specific rates areuncorrelated when the rateassigned to each branch isindependently drawn froman underlying distribution
low high
branch length = substitution rate
Models of Lineage-specific Rate Variation (Drummond et al. 2006)
I/U R
low high
branch length = substitution rate
Models of Lineage-specific Rate Variation (Drummond et al. 2006)
I M M
Dirichlet process prior:Branches are partitionedinto distinct rate categories
branch length = substitution rate
c5
c4
c3
c2
substitution rate classes
c1
Models of Lineage-specific Rate Variation (Heath, Holder, Huelsenbeck. 2012 MBE)
T D P P (DPP)
A stochastic process that models data as a mixture ofdistributions and can identify latent classes present in thedata
Branches are assumed toform distinct substitutionrate clusters
Efficient Markov chainMonte Carlo (MCMC)implementations allow forinference under this model
branch length = substitution rate
c5
c4
c3
c2
substitution rate classes
c1
DPP Model of Lineage-specific Rate Variation (Heath, Holder, Huelsenbeck. 2012 MBE 29:939-955)
T D P P (DPP)
A stochastic process that models data as a mixture ofdistributions and can identify latent classes present in thedata
Random variables under theDPP informed by the data:
• the number of rateclasses
• the assignment ofbranches to classes
• the rate value for eachclass
branch length = substitution rate
c5
c4
c3
c2
substitution rate classes
c1
DPP Model of Lineage-specific Rate Variation (Heath, Holder, Huelsenbeck. 2012 MBE 29:939-955)
T D P P (DPP)
A stochastic process that models data as a mixture ofdistributions and can identify latent classes present in thedata
Random variables under theDPP informed by the data:
• the number of rateclasses
• the assignment ofbranches to classes
• the rate value for eachclass
DPP Model of Lineage-specific Rate Variation (Heath, Holder, Huelsenbeck. 2012 MBE 29:939-955)
DPP C P
Global molecular
clock
rate classes
branch length = substitution rate
18c1
DPP Model of Lineage-specific Rate Variation (Heath, Holder, Huelsenbeck. 2012 MBE 29:939-955)
DPP C P
Independent
rates
1c1
1c18
1c3
1c2
rate classes
branch length = substitution rate
DPP Model of Lineage-specific Rate Variation (Heath, Holder, Huelsenbeck. 2012 MBE 29:939-955)
DPP C P
Local molecular
clock
5c3
5c2
rate classes
branch length = substitution rate
8c1
(262,142 different local molecular clocks)
DPP Model of Lineage-specific Rate Variation (Heath, Holder, Huelsenbeck. 2012 MBE 29:939-955)
DPP C P
Global molecular
clock
4c3
6c2
rate classes
branch length = substitution rate
8c1
Each of the
682,076,806,159
configurations
has a prior weight
DPP Model of Lineage-specific Rate Variation (Heath, Holder, Huelsenbeck. 2012 MBE 29:939-955)
M R V
These are only a subset of the available models forbranch-rate variation
• Global molecular clock
• Local molecular clocks
• Punctuated rate change model
• Log-normally distributed autocorrelated rates
• Uncorrelated/independent rates models
• Dirchlet process prior
Models of Lineage-specific Rate Variation
M R V
Are our models appropriate across all data sets?
cave bear
American
black bear
sloth bear
Asian
black bear
brown bear
polar bear
American giant
short-faced bear
giant panda
sun bear
harbor seal
spectacled
bear
4.08
5.39
5.66
12.86
2.75
5.05
19.09
35.7
0.88
4.58
[3.11–5.27]
[4.26–7.34]
[9.77–16.58]
[3.9–6.48]
[0.66–1.17]
[4.2–6.86]
[2.1–3.57]
[14.38–24.79]
[3.51–5.89]14.32
[9.77–16.58]
95% CI
mean age (Ma)
t 2
t 3
t 4
t 6
t 7
t 5
t 8
t 9
t 10
t x
node
MP•MLu•MLp•Bayesian
100•100•100•1.00
100•100•100•1.00
85•93•93•1.00
76•94•97•1.00
99•97•94•1.00
100•100•100•1.00
100•100•100•1.00
100•100•100•1.00
t 1
Eocene Oligocene Miocene Plio Plei Hol
34 5.3 1.823.8 0.01
Epochs
Ma
Global expansion of C4 biomassMajor temperature drop and increasing seasonality
Faunal turnover
Krause et al., 2008. Mitochondrial genomes reveal anexplosive radiation of extinct and extant bears near theMiocene-Pliocene boundary. BMC Evol. Biol. 8.
Taxa
1
5
10
50
100
500
1000
5000
10000
20000
0100200300MYA
Ophidiiformes
Percomorpha
Beryciformes
Lampriformes
Zeiforms
Polymixiiformes
Percopsif. + Gadiif.
Aulopiformes
Myctophiformes
Argentiniformes
Stomiiformes
Osmeriformes
Galaxiiformes
Salmoniformes
Esociformes
Characiformes
Siluriformes
Gymnotiformes
Cypriniformes
Gonorynchiformes
Denticipidae
Clupeomorpha
Osteoglossomorpha
Elopomorpha
Holostei
Chondrostei
Polypteriformes
Clade r ε ΔAIC
1. 0.041 0.0017 25.32. 0.081 * 25.53. 0.067 0.37 45.1 4. 0 * 3.1Bg. 0.011 0.0011
Ostariophysi
Acanthomorpha
Teleo
stei
Santini et al., 2009. Did genome duplication drive the originof teleosts? A comparative study of diversification inray-finned fishes. BMC Evol. Biol. 9.
M R V
These are only a subset of the available models forbranch-rate variation
• Global molecular clock
• Local molecular clocks
• Punctuated rate change model
• Log-normally distributed autocorrelated rates
• Uncorrelated/independent rates models
• Dirchlet process prior
Model selection and model uncertainty are very importantfor Bayesian divergence time analysis
Models of Lineage-specific Rate Variation
B D T E
Estimating divergence times relies on 2 main elements:
• Branch-specific rates: f (R | θR)
• Node ages: f (A | θA,C)
http://bayesiancook.blogspot.com/2013/12/two-sides-of-same-coin.html
P N T
Sequence data are only informative on relative rates & times
Node-time priors cannot give precise estimates of absolutenode ages
We need external information (like fossils) to calibrate orscale the tree to absolute time
Node Age Priors
C D T
Fossils (or other data) are necessary to estimate absolutenode ages
There is no information inthe sequence data forabsolute time
Uncertainty in theplacement of fossils
N
A B C
20%
10%10%
10%200 My
400 My
C D
Bayesian inference is well suited to accommodatinguncertainty in the age of the calibration node
Divergence times arecalibrated by placingparametric densities oninternal nodes offset by ageestimates from the fossilrecord
N
A B C
200 My
De
nsity
Age
F C
Fossil and geological datacan be used to estimate theabsolute ages of ancientdivergences
Time (My)
Calibrating Divergence Times
F C
The ages of extant taxaare known
Time (My)
Calibrating Divergence Times
F C
Fossil taxa are assigned tomonophyletic clades
Time (My)Minimum age
Calibrating Divergence Times Notogoneus osculus (Grande & Grande J. Paleont. 2008)
F C
Fossil taxa are assigned tomonophyletic clades andconstrain the age of theMRCA
Minimum age Time (My)
Calibrating Divergence Times
A F C
Misplaced fossils can affect node age estimates throughoutthe tree – if the fossil is older than its presumed MRCA
Calibrating the Tree (figure from Benton & Donoghue Mol. Biol. Evol. 2007)
A F C
Crown clade: allliving species andtheir most-recentcommon ancestor(MRCA)
Calibrating the Tree (figure from Benton & Donoghue Mol. Biol. Evol. 2007)
A F C
Stem lineages:purely fossil formsthat are closer totheir descendantcrown clade thanany other crownclade
Calibrating the Tree (figure from Benton & Donoghue Mol. Biol. Evol. 2007)
A F C
Fossiliferoushorizons: thesources in therock record forrelevant fossils
Calibrating the Tree (figure from Benton & Donoghue Mol. Biol. Evol. 2007)
F C
Age estimates from fossilscan provide minimum timeconstraints for internalnodes
Reliable maximum boundsare typically unavailable
Minimum age Time (My)
Calibrating Divergence Times
P D C N
Parametric distributions aretypically off-set by the ageof the oldest fossil assignedto a clade
These prior densities do not(necessarily) requirespecification of maximumbounds
Uniform (min, max)
Exponential (λ)
Gamma (α, β)
Log Normal (µ, σ2)
Time (My)Minimum age
Calibrating Divergence Times
P D C N
Describe the waiting timebetween the divergenceevent and the age of theoldest fossil
Minimum age Time (My)
Calibrating Divergence Times
P D C N
Overly informative priorscan bias node ageestimates to be too young
Minimum age
Exponential (λ)
Time (My)
Calibrating Divergence Times
P D C N
Uncertainty in the age ofthe MRCA of the claderelative to the age of thefossil may be bettercaptured by vague priordensities
Minimum age
Exponential (λ)
Time (My)
Calibrating Divergence Times
P P
Hyperprior: place an higher-order prior on the parameter ofa prior distribution
Sample the time fromthe MRCA to thefossil from a mixtureof differentexponentialdistributions
Account foruncertainty in valuesof λ
De
nsity
Hyperparameter
Hyperprior
De
nsity
Parameter
Prior
C P RB
Calibrating Divergence Times
P D C N
Common practice in Bayesian divergence-time estimation:
Estimates of absolute nodeages are driven primarily bythe calibration density
Specifying appropriatedensities is a challenge formost molecular biologists
Uniform (min, max)
Exponential (λ)
Gamma (α, β)
Log Normal (µ, σ2)
Time (My)Minimum age
Calibration Density Approach
I F C
We would prefer toeliminate the need forad hoc calibrationprior densities
Calibration densitiesdo not account fordiversification of fossils
Domestic dog
Spotted seal
Giant panda
Spectacled bear
Sun bear
Am. black bear
Asian black bear
Brown bear
Polar bear
Sloth bear
Zaragocyon daamsi
Ballusia elmensis
Ursavus brevihinus
Ailurarctos lufengensis
Ursavus primaevus
Agriarctos spp.
Kretzoiarctos beatrix
Indarctos vireti
Indarctos arctoides
Indarctos punjabiensis
Giant short-faced bear
Cave bear
Fossil and Extant Bears (Krause et al. BMC Evol. Biol. 2008; Abella et al. PLoS ONE 2012)
I F C
We want to use allof the available fossils
Example: Bears12 fossils are reducedto 4 calibration ageswith calibration densitymethods
Domestic dog
Spotted seal
Giant panda
Spectacled bear
Sun bear
Am. black bear
Asian black bear
Brown bear
Polar bear
Sloth bear
Zaragocyon daamsi
Ballusia elmensis
Ursavus brevihinus
Ailurarctos lufengensis
Ursavus primaevus
Agriarctos spp.
Kretzoiarctos beatrix
Indarctos vireti
Indarctos arctoides
Indarctos punjabiensis
Giant short-faced bear
Cave bear
Fossil and Extant Bears (Krause et al. BMC Evol. Biol. 2008; Abella et al. PLoS ONE 2012)
I F C
We want to use allof the available fossils
Example: Bears12 fossils are reducedto 4 calibration ageswith calibration densitymethods
Domestic dog
Spotted seal
Giant panda
Spectacled bear
Sun bear
Am. black bear
Asian black bear
Brown bear
Polar bear
Sloth bear
Zaragocyon daamsi
Ballusia elmensis
Ursavus brevihinus
Ailurarctos lufengensis
Ursavus primaevus
Agriarctos spp.
Kretzoiarctos beatrix
Indarctos vireti
Indarctos arctoides
Indarctos punjabiensis
Giant short-faced bear
Cave bear
Fossil and Extant Bears (Krause et al. BMC Evol. Biol. 2008; Abella et al. PLoS ONE 2012)
I F C
Because fossils arepart of thediversification process,we can combine fossilcalibration withbirth-death models
Domestic dog
Spotted seal
Giant panda
Spectacled bear
Sun bear
Am. black bear
Asian black bear
Brown bear
Polar bear
Sloth bear
Zaragocyon daamsi
Ballusia elmensis
Ursavus brevihinus
Ailurarctos lufengensis
Ursavus primaevus
Agriarctos spp.
Kretzoiarctos beatrix
Indarctos vireti
Indarctos arctoides
Indarctos punjabiensis
Giant short-faced bear
Cave bear
Fossil and Extant Bears (Krause et al. BMC Evol. Biol. 2008; Abella et al. PLoS ONE 2012)
I F C
This relies on abranching model thataccounts forspeciation, extinction,and rates offossilization,preservation, andrecovery
Domestic dog
Spotted seal
Giant panda
Spectacled bear
Sun bear
Am. black bear
Asian black bear
Brown bear
Polar bear
Sloth bear
Zaragocyon daamsi
Ballusia elmensis
Ursavus brevihinus
Ailurarctos lufengensis
Ursavus primaevus
Agriarctos spp.
Kretzoiarctos beatrix
Indarctos vireti
Indarctos arctoides
Indarctos punjabiensis
Giant short-faced bear
Cave bear
Fossil and Extant Bears (Krause et al. BMC Evol. Biol. 2008; Abella et al. PLoS ONE 2012)
T F B-D P (FBD)
Improving statistical inference of absolute node ages
Eliminates the need to specify arbitrarycalibration densities
Better capture our statisticaluncertainty in species divergence dates
All reliable fossils associated with aclade are used 150 100 50 0
Time
Heath, Huelsenbeck, & Stadler. 2014. The fossilized birth-death process forcoherent calibration of divergence time estimates. PNAS 111(29):E2957–E2966.
http://www.pnas.org/content/111/29/E2957.abstract
T F B-D P (FBD)
Recovered fossil specimensprovide historicalobservations of thediversification process thatgenerated the tree ofextant species
150 100 50 0
Time
Diversification of Fossil & Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)
T F B-D P (FBD)
The probability of the treeand fossil observationsunder a birth-death modelwith rate parameters:
λ = speciation
μ = extinction
ψ = fossilization/recovery
150 100 50 0
Time
Diversification of Fossil & Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)
T F B-D P (FBD)
We assume that the fossilis a descendant of aspecified calibrated node
The time of the fossil: Aindicates an observation ofthe birth-death process afterthe age of the node N
0250 50100150200
Time (My)
Diversification of Fossil & Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)
T F B-D P (FBD)
The fossil must attach tothe tree at some time: ⋆
0250 50100150200
Time (My)
N
Diversification of Fossil & Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)
T F B-D P (FBD)
If it is the descendant ofan unobserved lineage, thenthere is a speciation eventat time ⋆ on one of the 2branches
0250 50100150200
Time (My)
N
Diversification of Fossil & Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)
T F B-D P (FBD)
If it is the descendant ofan unobserved lineage, thenthere is a speciation eventat time ⋆ on one of the 2branches
0250 50100150200
Time (My)
N
Diversification of Fossil & Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)
T F B-D P (FBD)
If ⋆ = A, the fossil is anobservation of a lineageancestral to the extantspecies
0250 50100150200
Time (My)
N
Diversification of Fossil & Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)
T F B-D P (FBD)
If ⋆ = A, the fossil is anobservation of a lineageancestral to the extantspecies
0250 50100150200
Time (My)
N
Diversification of Fossil & Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)
T F B-D P (FBD)
The probability of thisrealization of thediversification process isconditional on:
λ, μ, and ψ
0250 50100150200
Time (My)
N
Diversification of Fossil & Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)
T F B-D P (FBD)
Using MCMC, we cansample the age of thecalibrated node • whileconditioning on
λ, μ, and ψother node agesA and ⋆
0250 50100150200
Time (My)
N
Diversification of Fossil & Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)
T F B-D P (FBD)
MCMC allows us toconsider all possible valuesof ⋆ (marginalization)
0250 50100150200
Time (My)
N
Diversification of Fossil & Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)
T F B-D P (FBD)
MCMC allows us toconsider all possible valuesof ⋆ (marginalization)
0250 50100150200
Time (My)
N
Diversification of Fossil & Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)
T F B-D P (FBD)
MCMC allows us toconsider all possible valuesof ⋆ (marginalization)
0250 50100150200
Time (My)
N
Diversification of Fossil & Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)
T F B-D P (FBD)
MCMC allows us toconsider all possible valuesof ⋆ (marginalization)
0250 50100150200
Time (My)
Diversification of Fossil & Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)
T F B-D P (FBD)
The posterior samples ofthe calibrated node age areinformed by the fossilattachment times
0250 50100150200
Time (My)
Diversification of Fossil & Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)
T F B-D P (FBD)
The FBD model allowsmultiple fossils to calibratea single node
N
0250 50100150200
Time (My)
N
Diversification of Fossil & Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)
T F B-D P (FBD)
Given ⋆ and A, the newfossil can attach to the treevia speciation along eitherbranch in the extant tree
N
0250 50100150200
Time (My)
N
Diversification of Fossil & Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)
T F B-D P (FBD)
Given ⋆ and A, the newfossil can attach to the treevia speciation along eitherbranch in the extant tree
N
0250 50100150200
Time (My)
N
Diversification of Fossil & Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)
T F B-D P (FBD)
Given ⋆ and A, the newfossil can attach to the treevia speciation along eitherbranch in the extant tree
N
0250 50100150200
Time (My)
N
Diversification of Fossil & Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)
T F B-D P (FBD)
Or the unobserved branchleading to the othercalibrating fossil
N
0250 50100150200
Time (My)
N
Diversification of Fossil & Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)
T F B-D P (FBD)
If ⋆ = A, then the newfossil lies directly on abranch in the extant tree
N
0250 50100150200
Time (My)
N
Diversification of Fossil & Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)
T F B-D P (FBD)
If ⋆ = A, then the newfossil lies directly on abranch in the extant tree
N
0250 50100150200
Time (My)
N
Diversification of Fossil & Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)
T F B-D P (FBD)
Or it is an ancestor of theother calibrating fossil
N
0250 50100150200
Time (My)
N
Diversification of Fossil & Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)
T F B-D P (FBD)
The probability of thisrealization of thediversification process isconditional on:
λ, μ, and ψ
N
0250 50100150200
Time (My)
N
Diversification of Fossil & Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)
T F B-D P (FBD)
Using MCMC, we cansample the age of thecalibrated node whileconditioning on
λ, μ, and ψother node agesA and ⋆
A and ⋆N
0250 50100150200
Time (My)
N
Diversification of Fossil & Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)
T F B-D P (FBD)
MCMC allows us toconsider all possible valuesof ⋆ (marginalization)
0250 50100150200
Time (My)
Diversification of Fossil & Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)
B I U FBD
Implemented in:
• DPPDiv:github.com/trayc7/FDPPDIV
• BEAST2:beast2.cs.auckland.ac.nz/
Available soon:
• RevBayes:github.com/revbayes/revbayes
150 100 50 0
Time
0250 50100150200
Time (My)
FBD Implementation (Heath, Huelsenbeck, Stadler. PNAS 2014)
L T T
The FBD model provides anestimate of the number oflineages over time
050100150200
Time (My)
3.0
2.0
1.0
1.0
50100150 0200
Time (My)
Nu
mb
er
of lin
ea
ge
s
L T T
Lineage diversity over time with fossils
MCMC samples thetimes of lineages inthe reconstructed treeand the times of thefossil lineages
(for 1 simulation replicate with 10% random sample of fossils)
Simulations: Results
L T T
Lineage diversity over time with fossils
Visualize extant andsampled fossil lineagediversification whenusing all availablefossils (21 total)
(for 1 simulation replicate with 10% random sample of fossils)
Simulations: Results
L T T
Lineage diversity over time with fossils
Choosing only theoldest (calibration)fossils reduced the setof sampled fossilsfrom 21 to 12,giving less informationabout diversificationover time
(for 1 simulation replicate with 10% random sample of fossils)
Simulations: Results
B: D T
Sequence data for extantspecies:
• 8 Ursidae• 1 Canidae (gray wolf)• 1 Phocidae (spotted seal)
Fossil ages:
• 12 Ursidae• 5 Canidae• 5 Pinnipedimorpha
DPP relaxed clock model
(Heath, Holder, Huelsenbeck MBE 2012)
Fixed tree topology
Se
qu
en
ce
Da
ta
Fossil Data Phylogenetic Relationships
fossil canids
fossil pinnepeds
stem fossil ursids
fossil Ailuropodinae
giant short-faced bear
cave bear
Empirical Analysis (Wang 1994, 1999; Krause et al. 2008; Fulton & Strobeck 2010; Abella et al. 2012)
B: D TGray wolf
Spotted seal
Giant panda
Spectacled bear
Sun bear
Am. black bear
Asian black bear
Brown bear
Polar bear
Sloth bear
1. fossil canids
2. fossil pinnepeds
2. stem fossil ursids
3. fossil Ailuropodinae
4. giant short-faced bear
5. cave bear
1
3
2
4
5
Empirical Analysis (Heath et al. PNAS 2014; silhouette images: http://phylopic.org/)
B: D TGray wolf
Spotted seal
Giant panda
Spectacled bear
Sun bear
Am. black bear
Asian black bear
Brown bear
Polar bear
Sloth bear
Urs
ida
e
Time (My)
60 2040 0
Eocene Oligocene Miocene
Plio
Ple
is
Paleocene
95% CI
Empirical Analysis (Heath et al. PNAS 2014; silhouette images: http://phylopic.org/)
B: D T
fossil Ailuropodinae
fossil Arctodus
fossil Ursus
fossil pinnipeds
Gray wolf
Spotted seal
Giant panda
Spectacled bear
Sun bear
Am. black bear
Asian black bear
Brown bear
Polar bear
Sloth bear
Urs
ida
e
Time (My)
60 2040 0
Eocene Oligocene Miocene
Plio
Ple
is
Paleocene
stem fossil Ursidae
fossil canids
Empirical Analysis (Heath et al. PNAS 2014; silhouette images: http://phylopic.org/)
T F B-D P (FBD)
Improved statistical inference of absolute node ages
Biologically motivatedmodels can bettercapture statisticaluncertainty
fossil Ailuropodinae
fossil Arctodus
fossil Ursus
fossil pinnipeds
Gray wolf
Spotted seal
Giant panda
Spectacled bear
Sun bear
Am. black bear
Asian black bear
Brown bear
Polar bear
Sloth bear
Urs
ida
e
Time (My)
60 2040 0
Eocene Oligocene Miocene
Plio
Ple
is
Paleocene
stem fossil Ursidae
fossil canids
Modeling Diversification Processes (Heath, Huelsenbeck, Stadler. PNAS 2014)
T F B-D P (FBD)
Improved statistical inference of absolute node ages
Use all available fossils
Eliminates arbitrarychoice of calibrationpriors
fossil Ailuropodinae
fossil Arctodus
fossil Ursus
fossil pinnipeds
Gray wolf
Spotted seal
Giant panda
Spectacled bear
Sun bear
Am. black bear
Asian black bear
Brown bear
Polar bear
Sloth bear
Urs
ida
e
Time (My)
60 2040 0
Eocene Oligocene Miocene
Plio
Ple
is
Paleocene
stem fossil Ursidae
fossil canids
Modeling Diversification Processes (Heath, Huelsenbeck, Stadler. PNAS 2014)
T F B-D P (FBD)
Improved statistical inference of absolute node ages
Extensions of the FBDcan account forstratigraphic samplingof fossils and shifts inrates of speciationand extinction
0175 255075100125150
Time
0.2 1.05
Preservation Rate
Modeling Diversification Processes (Heath, Huelsenbeck, Stadler. PNAS 2014)
F T E D
Combining extant and fossilspecies
Fo
ssil
Ag
e D
ata
Se
qu
en
ce
Da
ta
© AntWeb.org
Mo
rph
olo
gic
al
Da
ta © AntWeb.org
0%
20%
40%
60%
80%
100%
OrthopteraParaneoptera
NeuropteraRaphidiopteraColeo PolyphagaColeoptera AdephagaLepidopteraMecoptera
XyelaMacroxyela
RunariaParemphytus
Blasticotoma
TenthredoAglaostigmaDolerus
SelandriaStrongylogasterMonophadnoidesMetallus
Athalia
Taxonus
HoplocampaNematinusNematusCladiusMonoctenusGilpiniaDiprion
CimbicinaeAbiaCorynis
ArgeSterictiphoraPergaPhylacteophagaLophyrotoma
AcorduleceraDecameria
Neurotoma
OnycholydaPamphiliusCephalciaAcantholyda
Megalodontes cephalotesMegalodontes skorniakowii
CephusCalameutaHartigiaSyntexisSirexXerisUrocerusTremexXiphydria
Orussus
Stephanidae AStephanidae B
Megalyridae
Trigonalidae
Chalcidoidea
Evanioidea
Ichneumonidae
Cynipoidea
Apoidea AApoidea BApoidea CVespidae
Grimmaratavites
Ghilarella
AulidontesProtosirexAuliscaKaratavitesSepulca
Onokhoius
TrematothoraxThoracotremaProsyntexis
FerganolydaRudisiriciusSogutia
XyelulaBrigittepterus
MesolydaBrachysyntexis
DahuratomaPseudoxyelocerus
PalaeathaliaAnaxyela
SyntexyelaKulbastavia
Undatoma
Abrotoxyela
Mesoxyela mesozoicaSpathoxyela
TriassoxyelaLeioxyela
NigrimonticolaChaetoxyelaAnagaridyela
EoxyelaLiadoxyela
Xyelotoma
Pamphiliidae undescribed
Turgidontes
PraeoryssusParoryssus
Mesorussus
SymphytopterusCleistogasterLeptephialtitesStephanogaster
97
100
100
100100
100100
100
100
100100
100
100100
100
100100
100
100100
100
100
100100
100
100
100
100
100
100
97
68
55
77
75
91
84
6157
62
9896
71
71
85
93
52
95
64
80
64
99
77
99
98
97
82
52
58
9490
60
70
57
8361
7060
350 300 250 200 150 100 50 0 million years before present
% of morphological
characters scored
for each terminal
(Ronquist, Klopfstein, et al. Syst. Biol. 2012.)