Inference under the model using an accurate beta...
Transcript of Inference under the model using an accurate beta...
using an accurate beta approximation
PAULA TATARU
THOMAS BATAILLON
ASGER HOBOLTH
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
CSHL, April 15th 2015
Inference under the Wright-Fisher model
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Theoretical population genetics
2
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Theoretical population genetics
›Mathematical models formalize the evolution of
genetic variation within and between populations
2
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Theoretical population genetics
›Mathematical models formalize the evolution of
genetic variation within and between populations
›Provide a framework for inferring evolutionary paths
from observed data to
2
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Inference problems
› Inference of population history from DNA data
› (Variable) population size
› Migration / admixture
› Divergence times
› Selection coefficients
3
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Inference problems: population size
4
H. Li and R. Durbin. Inference of human population history from individual whole-genome
sequences. Nature, 475:493–496, 2011
PSMC
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Inference problems: populations divergence
5
M. Gautier and R. Vitalis. Inferring population histories using genome-wide allele frequency data.
Molecular biology and evolution, 30(3):654–668, 2013
Kim Tree
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Inference problems: populations admixture
6
J. K. Pickrell and J. K. Pritchard. Inference of population splits and mixtures from genome-wide allele
frequency data. PLOS Genetics, 8(11):e1002967, 2012
TreeMix
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Inference problems: populations admixture
7
Gronau I., Hubisz M. J., Gulko B., Danko C. G., Siepel A. Bayesian inference of ancient human
demography from individual genome sequences. Nature genetics 43(10): 1031-1034, 2011
G-PhoCS
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Inference problems: loci under selection
8
Steinrücken M., Bhaskar A. and Song Y. S. A novel spectral method for inferring general selection from
time series genetic data. The Annals of Applied Statistics 8(4):2203–2222, 2014
spectralHMM
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Population genetics: the Wright-Fisher model
› Evolution of a population
forward in time
› Follow one locus (region
in the DNA)
›Different variants at the
locus are called alleles
9
individuals
ge
ne
rati
on
s (t
ime
)
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Population genetics: the Wright-Fisher model
›Basic model: only two
alleles per locus
› Follow the frequency of
one of the alleles
10
individuals
ge
ne
rati
on
s (t
ime
)
3
2
3
3
4
5
5
allele count
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Allele frequency distribution
11
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Population genetics: the coalescent model
› Trace the genealogy of
sampled individuals
backward in time
12
individuals
ge
ne
rati
on
s (t
ime
)
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Population genetics: the coalescent model
› Trace the genealogy of
sampled individuals
backward in time
12
individuals
ge
ne
rati
on
s (t
ime
)
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Population genetics: the coalescent model
› Trace the genealogy of
sampled individuals
backward in time
12
individuals
ge
ne
rati
on
s (t
ime
)
MRCA
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Population genetics: the coalescent model
› Trace the genealogy of
sampled individuals
backward in time
›Coalescent process
terminates when
reaching MRCA
12
individuals
ge
ne
rati
on
s (t
ime
)
MRCA
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
›The Wright-Fisher
›The coalescent
Two dual models
13
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
›The Wright-Fisher
› Forward in time
›The coalescent
› Backward in time
Two dual models
13
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
›The Wright-Fisher
› Forward in time
› Follow allele frequency
›The coalescent
› Backward in time
› Follow genealogy
Two dual models
13
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
›The Wright-Fisher
› Forward in time
› Follow allele frequency
› Selection
›The coalescent
› Backward in time
› Follow genealogy
› Recombination
Two dual models
13
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
›The Wright-Fisher
› Forward in time
› Follow allele frequency
› Selection
› Scalability
›Sample size decreases
uncertainty
›The coalescent
› Backward in time
› Follow genealogy
› Recombination
› Scalability
›Sample size increases
complexity
Two dual models
13
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
›Diffusion
›Moment-based
Approximations to the Wright-Fisher
14
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
›Diffusion
› Large population size
› Infinitesimal change
›Moment-based
Approximations to the Wright-Fisher
14
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
›Diffusion
› Large population size
› Infinitesimal change
›Moment-based
› Convenient distributions
› Normal distribution
› Beta distribution
Approximations to the Wright-Fisher
14
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
›Diffusion
› Large population size
› Infinitesimal change
› No closed solution
› Cumbersome to evaluate
›Moment-based
› Convenient distributions
› Normal distribution
› Beta distribution
› Closed analytical forms
› Fast to evaluate
Approximations to the Wright-Fisher
14
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
›Diffusion
› Large population size
› Infinitesimal change
› No closed solution
› Cumbersome to evaluate
›Moment-based
› Convenient distributions
› Normal distribution
› Beta distribution
› Closed analytical forms
› Fast to evaluate
› Problematic at boundaries
Approximations to the Wright-Fisher
14
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
›Normal distribution
›Beta distribution
Behavior at the boundaries
15
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
›Normal distribution
› Support: real line
›Beta distribution
› Support: [0, 1]
Behavior at the boundaries
15
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
›Normal distribution
› Support: real line
› Truncation
›Incorrect variance
›Beta distribution
› Support: [0, 1]
Behavior at the boundaries
15
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
›Normal distribution
› Support: real line
› Truncation
›Incorrect variance
› Intermediary frequencies
›Beta distribution
› Support: [0, 1]
› Intermediary frequencies
Behavior at the boundaries
15
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
The Beta with spikes
›Use of Wright-Fisher
› Scalable
›Use of moments
› Simple mathematical calculations
› Improve behavior at boundaries
› Preserve mean and variance
16
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
The Wright Fisher model
›Zt allele count
›Xt = Zt /2N
›Zt+1 follows a binomial
distribution
17
individuals
ge
ne
rati
on
s (t
ime
)
3
2
3
3
4
5
5
allele count
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
The Wright Fisher model
›Zt allele count
›Xt = Zt /2N
›Zt+1 follows a binomial
distribution
17
individuals
ge
ne
rati
on
s (t
ime
)
3
2
3
3
4
5
5
allele count
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
The Wright Fisher model
›Zt allele count
›Xt = Zt /2N
›Zt+1 follows a binomial
distribution
›g encodes the
evolutionary pressures
17
individuals
ge
ne
rati
on
s (t
ime
)
3
2
3
3
4
5
5
allele count
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
The Wright Fisher model: Drift only
18
individuals
ge
ne
rati
on
s (t
ime
)
3
2
3
3
4
5
5
allele count
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
The Wright Fisher model: Mutations
19
individuals
ge
ne
rati
on
s (t
ime
)
3
2
4
5
4
3
2
allele count
u v
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
The Wright Fisher model: Mutations
19
individuals
ge
ne
rati
on
s (t
ime
)
3
2
4
5
4
3
2
allele count
u v
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
The Wright Fisher model: Migration
20
individuals
ge
ne
rati
on
s (t
ime
)
3
2
3
5
4
2
3
allele count
m1 m2
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
The Wright Fisher model: Migration
20
individuals
ge
ne
rati
on
s (t
ime
)
3
2
3
5
4
2
3
allele count
m1 m2
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
The Wright Fisher model: Linear forces
›Mutations
›Migration
›Mutations & Migration
21
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
The Wright Fisher model: Linear forces
›Mutations
›Migration
›Mutations & Migration
21
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre 22
The Beta approximation: Main idea
›The density of Xt
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre 22
The Beta approximation: Main idea
›The density of Xt
›Use recursive approach to calculate
› Mean and variance
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre 22
The Beta approximation: Main idea
›The density of Xt
›Use recursive approach to calculate
› Mean and variance
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre 23
The Beta approximation: Drift only
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre 23
The Beta approximation: Drift only
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre 24
The Beta approximation: Drift only
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre 25
The Beta approximation: Drift only
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
The Beta with spikes: Main idea
›The density of Xt
26
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
The Beta with spikes: Main idea
›The density of Xt
›Use recursive approach to calculate
› Loss and fixation probabilities
26
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
The Beta with spikes: loss probability
27
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
The Beta with spikes: loss probability
28
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
The Beta with spikes: loss probability
28
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
The Beta with spikes: loss probability
28
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
The Beta with spikes: fixation probability
29
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre 30
The Beta with spikes: Drift only
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre 30
The Beta with spikes: Drift only
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre 31
The Beta with spikes: Drift only
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre 32
The Beta with spikes: Drift only
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Numerical accuracy: Drift only
33
Beta Beta with spikes
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre 34
Inference of divergence times: Drift only
›Simulated data
› 5000 independent loci
› 100 samples in each population
› 50 data sets (replicates)
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre 34
Inference of divergence times: Drift only
›Simulated data
› 5000 independent loci
› 100 samples in each population
› 50 data sets (replicates)
›Allele frequency distribution is used to
calculate likelihood of data
› Likelihood is numerically optimized
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Inference of divergence times: Drift only
35
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Conclusions
›Beta with spikes
36
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Conclusions
›Beta with spikes
› An extension built on the beta approximation
36
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Conclusions
›Beta with spikes
› An extension built on the beta approximation
› Improves the quality of the approximation
36
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Conclusions
›Beta with spikes
› An extension built on the beta approximation
› Improves the quality of the approximation
› Simple mathematical formulation
36
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Conclusions
›Beta with spikes
› An extension built on the beta approximation
› Improves the quality of the approximation
› Simple mathematical formulation
› Works under linear evolutionary forces
36
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Conclusions
›Beta with spikes
› An extension built on the beta approximation
› Improves the quality of the approximation
› Simple mathematical formulation
› Works under linear evolutionary forces
› Comparable to state of the art methods
for inference of divergence times
36
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Conclusions
›Beta with spikes
› An extension built on the beta approximation
› Improves the quality of the approximation
› Simple mathematical formulation
› Works under linear evolutionary forces
› Comparable to state of the art methods
for inference of divergence times
› Recursive formulation enables incorporation
of variable population size
36
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Future work
› Incorporate selection
37
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Future work
› Incorporate selection
› Non-linear evolutionary force
37
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Future work
› Incorporate selection
› Non-linear evolutionary force
› Positive selection increases probability of fixation
37
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Future work
› Incorporate selection
› Non-linear evolutionary force
› Positive selection increases probability of fixation
› Mean and variance are no longer available in closed form
37
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Future work
› Incorporate selection
› Non-linear evolutionary force
› Positive selection increases probability of fixation
› Mean and variance are no longer available in closed form
› Extend the approximation for loss/fixation probabilities to
mean and variance
37