Post on 30-Dec-2015
description
Recovering Migration RatesRecovering Migration RatesUsing a Deterministic Using a Deterministic
Approach.Approach.Applied to Human Data.Applied to Human Data.
Yosef E. MaruvkaYosef E. Maruvka1*1*, Nadav M. Shnerb, Nadav M. Shnerb11 Yaneer Bar-YamYaneer Bar-Yam22, Jonh Wakeley, Jonh Wakeley33 1 Department of Physics Bar-Ilan University.1 Department of Physics Bar-Ilan University.**http://yosi.maruvka.googlepages.com/http://yosi.maruvka.googlepages.com/
2. New England Complex System institution Boston2. New England Complex System institution Boston3. Department of Organismic & Evolutionary Biology Harvard 3. Department of Organismic & Evolutionary Biology Harvard
UniversityUniversity
Coalescence TheoryCoalescence Theory• One deme (well mixed) model.• Fixed mutation rate determines the time
scale• Genetics has no effect on fitness
(use uncoded DNA) • Population is well-mixed – no spatial
structure• The real history yields a phylogenetic tree• Dashed lines – lineages with no current
descendent • Full lines – lineages expressed in current
polymorphism data • “backward in time”: coalescence model
• Haploid mitochondrial DNA.• Wright Fisher Model.
Master EquationMaster Equation
00
1
2
1
2
1
N
tPnn
N
tnPntP nn
n
tPn
The probability to have n lineages at time t in the past
The time dependence of is given by the equation:
tPn
0Nn
Replacing Stochastic Replacing Stochastic Description With Deterministic Description With Deterministic
DescriptionDescription tnPtn n
0
2
2N
nn
dt
tndt
0
2
2N
nn
0
2
2N
n
0
2
2N
tntn
This replacement is valid when tntn
Deterministic ODE
Time Dependence of Number Time Dependence of Number of Lineagesof Lineages
0
0
0
22
nN
t
Ntn
Number of lineages as function of
time:
1 10 100 1000 10000
1
10
n(t)
t
00
112n
NTMRCATime to most recent common ancestor:
Average n(t) for 50 realizations:
Population Size EstimationPopulation Size Estimation
Fitting the simulation to the formula gives a very good estimation of the population size.
0
2
2
nNt
Ntn
For 40 realizations we get this estimation of the population’s size:
In the common notation: 0029.00201.02 0 N
1446100450 N
350100500 N
Mean Field ApproximationMean Field Approximationfor Two Demesfor Two Demes
Number of lineages as Number of lineages as function of time:function of time:
22112
22
2
22111
21
1
2
2
nmnmN
nn
nmnmN
nn
Backward in time process
Uniqueness – External Branch Uniqueness – External Branch LengthLength
0N
tntQtQ
3
00
02
2
80
Ntn
nN
dt
dQtU
Erik M. Rauch and Yaneer Bar-Yam. Nature 431, 449-452 (2004)Caliebe, A., et al On the Length Distribution of External Branches in Coalescence Trees:Genetic Diversity within Species. TPB, 72 (2007), 245-252
U=1 U=2U=4
Probability to not coalesce:
Uniqueness, probability to coalesce at time t:
2
222211
2
1
112211
1
N
tQtntQmtQm
dt
tdQ
N
tQtntQmtQm
dt
tdQ
The probability to not coalesce (survive) with any of the individuals from the other deme until time t:Rate equation for the probability not to coalesce until time t:
Relative UniquenessRelative Uniqueness
tQtQtQ 21
dt
tdQtPR
Relative UniquenessThe probability that an individual sampled from
deme i coalesced at time t in the past with any of the n n individuals sampled at deme j.
Recovering Demographic Recovering Demographic ParametersParameters
Using the previous equation for the relative uniqueness, we recovered the parameters of the simulation using the best fit.
.002 0.007=m0.005, 0.0038 =m 250, 2700=N 100,2400 =N1 :resultsfit Best
0.01=m 0.005,=m 2500, =N=N: parameters Real
212
21 21
Recovering Growth Rate of a Recovering Growth Rate of a PopulationPopulation
The time dependence of the The time dependence of the number of lineages as a number of lineages as a function of time:function of time:
teN
ntn
0
2
2
000
0
2
2
Nnen
Ntn
t
The number of lineages as a The number of lineages as a function of time (backward):function of time (backward):
Recovering Growth Rate of a Recovering Growth Rate of a PopulationPopulation
Recovering growthRecovering growth
rate.rate.
Growth estimation 0.0054+-0.0003Real growth rate 0.0056
Migration Rate Between Two Migration Rate Between Two Growing PopulationsGrowing Populations
22112
22
2
22111
21
1
2
1
2
2
nmnmeN
nn
nmnmeN
nn
t
t
t
t
eN
tQtntQmtQm
dt
tdQ
eN
tQtntQmtQm
dt
tdQ
2
1
2
222211
2
1
112211
1
Number of lineages as a function of time:
The probability to not coalesce, as function of time:
Migration between India & Migration between India & ChinaChina
The migration from India to China was The migration from India to China was muchmuch
stronger than the migration from China stronger than the migration from China to India during the years [-50ky -2ky ]to India during the years [-50ky -2ky ]
Estimated migration rates:Estimated migration rates: India -> China 0.01 i/g India -> China 0.01 i/g China -> India 0.001 i/g China -> India 0.001 i/g
Data obtained from www.hvrbase.org. Nucleic Acids Res.1998 Jan 1;26(1):126-9. Distance calculated using DNAdist, Kimura 2 model.
Advantages Advantages
Easy calculations.Easy calculations. Very fast estimations.Very fast estimations. Can handle large amount of data.Can handle large amount of data. No need for prior distribution.No need for prior distribution.
SummarySummary
Moved from a stochastic process to a Moved from a stochastic process to a deterministic processdeterministic process
Demonstrated this method for one demeDemonstrated this method for one deme Used the Relative Uniqueness to Used the Relative Uniqueness to
estimate migration rates for fixed estimate migration rates for fixed populationspopulations
Expanded to include growing populationsExpanded to include growing populations Applied this method to estimate Applied this method to estimate
migration rates between China and Indiamigration rates between China and India