Coalescent Theory & Population Genetics applications Frantz Depaulis [email protected]@ens.fr ...
Transcript of Coalescent Theory & Population Genetics applications Frantz Depaulis [email protected]@ens.fr ...
Coalescent Theory &
Population Genetics
applications
Frantz Depaulis [email protected]://www.biologie.ens.fr/eceem/frantz_depaulisPwd: M1ENSCOALLaboratoire Ecologie et Evolution, CNRS-UMR 7625 Université Paris 6- ENS room 426
Outline
• Reminder• Coalescence • Mutations/models• Perturbations, applications
Neutral theory of molecular evolution (Kimura 1969):
the neutral model as a reference
Mutations Occurrence Fate
avantageous rare fixed
deleterious frequent eliminated
neutral frequent drift until fixation or
loss
-reminder-
Wright Fisher Neutral model
Assumptions• Selective neutrality (Ne s <<1)
• Demography - Isolated panmictic Population, - Constant size N- Poisson Distribution of offspring P (1)
-reminder-
Effectif efficace : définition• On définit la taille efficace (notée Ne) d’une population comme étant la taille d’une population « idéale » de Wright-Fisher où la dérive génétique aurait la même intensité (**) que dans la population (ou bien le modèle de population) qui nous intéresse
** même taux de dérive, même augmentation de consanguinité, même augmentation de variance de fréquences alléliques entre populations, etc.
-reminder-
Relationships among 10 individuals over 15 generations
Generations
individuals # genes
-Coalescence-
the same, but permuted
Generations
individuals # genes
-Coalescence-
Descent and extinction of a lineage
Generations
individuals # genes
-Coalescence-
Genealogy of 3 individuals: Regarder le processus de dérive en « remontant le temps » jusqu’à l’ancêtre commun d’un échantillon de gènes
Generations
individuals#genes
-Coalescence-
Genealogy of a gene sample
gene sample
ancestral lineage
coalescence= common ancestor
Most recent common ancestor (MRCA)
-Coalescence-
-Coalescence-
Kingman (1980, 1982)
-Coalescence-
-Coalescence-
-Coalescence-
Coalescent Tree
a b c d e f
Most recent common ancestor of the
sample(MRCA)
sample of “genes” /
of individual
s
Common ancestor
(CA)
neutral mutati
ons
TC
C
G
CG
A
A
-Coalescence-
-Coalescence-
-Coalescence-
-Coalescence-
-Coalescence-
-Coalescence-
-Coalescence-
-Coalescence-
-Coalescence-
-Coalescence-
-Coalescence-
Coalescent times
2 genesp2=P( common ancestry in t -1)= 1/2N
n genespn =P ( common ancestry at t -1)= (n x (n -1)/2) x
1/2N
P ( common ancestry t generations ago ) = (1-p)t -1 x p
Geometric distribution (discrete generations)#p e(-pt )
Exponential distribution (continuous time)
p small,
t large
t
t-11/2N
1 2 3 4 5 ... 2N...
...
-Coalescence-
Diploïde (2n) => 2N genes
-Coalescence-
-Coalescence-
-Coalescence-
Effectif efficace de consanguinité
• Dans le modèle de Wright-Fisher, la probabilité de coalescence en une génération est égale à 1/N. On peut donc définir une taille efficace comme l’inverse de la probabilité de coalescence en une génération (Ewens, 1982). C’est une taille efficace de consanguinité, et qui est instantanée.
• Au contraire, une taille efficace asymptotique peut être définie, qui décrit le taux de coalescence de lignées de gènes dans un passé lointain (égale à 1/N dans le modèle de Wright-Fisher)
Cette taille efficace asymptotique (de consanguinité, de coalescence) est équivalente à la taille efficace de valeur propre, qui repose sur une description complète des fréquences alléliques dans une population (la valeur propre de la matrice qui décrit les changements de fréquence détermine l’approche de l’état d’équilibre)
-Coalescence-
1°)Ages of the nodes
Constructing coalescents,
cd e fba
t3
p =1/2NExp( p )
t1
t2
t4
t5:
additional assumption: n << N
-Coalescence-
gene sample
Topologyof the tree
2°)
a b cd e f
MRCA
common ancestor (CA)
t1
t2
t3
t4
t5:
Constructing-deconstructing coalescents
-Coalescence-
http://www.coalescent.dk/
Hudson’s animator:
Mutations
and associated models
-Mutations-
-Mutations-
A AA
A
neutral mutati
ons G
TC
C
G
CA
3°) uniform distribution of
mutations
gene sample
Topologyof the tree
2°)
T
C
GA
C
G
C
T
neutral distribution of sequence polymorphis
m
a b cd e f
MRCA
common ancestor (CA)
t1
t2
t3
t4
t5:
100 000 times
Constructing-deconstructing coalescents
S : P ( TTot x )=4Ne
µ
-Mutations-
• Mutational, sequence data: infinite site model (ISM)- No recombination- Independent mutations- Constant mutation rate µ
Along the sequenceAcross time
- Each mutation affects a new nucleotide site
Infinite(ly many) site model: sequences, SNP’s
-Mutations-
GCCCGCGAATCCATTGCGTGCGATCCGATTGCGTACAATCCCGTCGTGTACAATCTCGACGTGTACAATCTCGACGCGTGGAATCCCGTTCCGCGCGGTCCCATT
f 121531416121423
T C
TACT
G
AT
GC
Alignment of polymorphic sites:
GT
G
n =7
S =15
-Mutations-
Frequency spectrum of mutations
fi :
Number of
polymorphic sites
Number of occurrences in a sample
Neutral predictions
-Mutations-
sityheterozygo
S
i
ii
nn
fnf
1 )1(
)(2̂
1
1
1ˆn
iS iSWatterson’s (1975)
estimator
=4Ne
*
*: Tajima (1983)
1
1 1 )1(
2n
j
n
jk
kj
nn
d
ifE i /1)(
f 121531416621423f 121531416121423
GCCCGCGAATCCATTGCGTGCGATCCGATTGCGTACAATCCCGTCGTGTACAATCTCGACGTGTACAATCTCGACGCGTGGAATCCCGTTCCGCGCGGTCCCATT
GCCCGCGAATCCATTGCGTGCGATCCGATTGCGTACAATCCCGTCGTGTACAATCTCGACGTGTACAATCTCGACGCGTGGAATCCCGTTCCGCGCGGTCCCATT
C
T
→TC
→
T C
TACT
G
AT
GC
Alignment of polymorphic sites: non
oriented mutations
GT
G
n =7
S =15
-Mutations-
Frequency spectrum of mutations
fi : number of occurrences in a
sample
Number of
polymorphic sites
-Mutations-
GCCCGCGAATCCATTGCGTGCGATCCGATTGCGTACAATCCCGTCGTGTACAATCTCGACGTGTACAATCTCGACGCGTGGAATCCCGTTCCGCGCGGTCCCATT
f 121531416121423
C
T
→
Alignment of polymorphic sites: orienting mutations with
an outgroup
f 121531416621423
T
C
→
GCCCGCGAATCCATTGCGTGCGATCCGATTGCGTACAATCCCGTCGTGTACAATCTCGACGTGTACAATCTCGACGCGTGGAATCCCGTTCCGCGCGGTCCCATT
T
C
o.g.GCGCGCGAACCCATT
C
-Mutations-
Infinite(ly many) allele model: allozymes, one site (locus)
(microsatellites)• Each mutation gives rise to a new type/allele on a locus
T
A
C
C
G
CG
T
G
G G
CC
A
A
A
T CC
A
T
-Mutations-
The “stepwise mutation” model (SMM) is appropriate for microsatellites. When a mutation occurs, the new mutation length depends on the existing length. In the simplest case of the “single” SMM, illustrated in the next slide, the new length = old length +/- 1.
Stepwise mutation model
-Mutations-
Microsatellites
GAGGCGTAGTAGTAGTAGTAGTAGTAGGCTCTA
GAGGCGTAGTAGTAGTAGTAGTAGGCTCTAor
GAGGCGTAGTAGTAGTAGTAGTAGTAGTAGGCTCTA
• Microsatellites mutate very fast (~1 change every 500 generations)
• Mutation events usually involve a gain or a loss of a single repeat unit
-Mutations-
-Mutations-
-Mutations-
Recombination« simple » case: 2 haplotypes with 2 locus
One possible Genealogy outcome
Recombination
Coalescence
Coalescence
MRCA1
MRCA2
Past
-Perturbations-
Ancestral Recombination Graph : example
coalescence rate : n(n-1)/2
recombination rate : R n
Recombination
Recombination
Coalescence
Coalescence
n =2
n =3
n =2
n =2
n =2 ?
MRCA
Tim
e
Past
-Perturbations-
Recombination Histories: Non-ancestral bridges-Perturbations-
systematic effects
Genealogy Demographic Selective
Extinction-recolonisation,
severe bottlenecks, populationexpansion
severe hitchhiking
Migration, population structure, moderate
bottlenecks
balanced polymorphism,moderate hitchhiking
-Perturbations-
The neutral model, application = inference
Applications Molecular phylogeography
Molecular ecology:
• Selective effects
- balancing selection
- directional selection, hitchhiking • Demographic effects
- Dispersion
- bottlenecks/expansion
- Distribution of the number of offsprings
Tool= the coalescent
-Coalescence-
-Perturbations-
Demographic change
-Perturbations-
-Perturbations-
40 30 20 10 0Time before present (in mutational units)
Po
pu
latio
n s
ize
0
0.1
0.2
0.3
0.4
0 1 2 3 4 5 6 7 8 9 101112 131415No. of repeat units pairwise differences
Fre
qu
en
cy
SimulationModel estimate
0
0.1
0.2
0.3
0.4
0 1 2 3 4 5 6 7 8 9 10 111213 1415No. of repeat units pairwise differences
Fre
qu
en
cy
SimulationModel estimate
001.0)(
551.0)SSD(
837.1ˆ
SFp
p
308.0)(
007.0)SSD(
351.0ˆ
SFp
p
(a) Constant population size (simulation case 3)
(b) Population expansion, = 3 (simulation case 2)
4 3 2 1 0Time before present (in mutational units)
Po
pu
latio
n s
ize
K K
KNt
K
Nr
tt
t
eNN11
1
-Perturbations-
Coalescent and bottlenecks
t
N Pt = 1/2Nt
-Perturbations-
Time
Coalescence
Coalescence
Migration
Coalescence
Coalescence
Pop 1 Pop 2
-Perturbations-Population structure
Coalescence
1/m
Migration
-Perturbations-
-Perturbations-
______T_____*___C__
_G______A________G_
_G______A________G_
ACGTTTATGCAACGTCGAC 1°) an advantageo
us mutation appears
2°) Selection
_G_______________G__________________G_
______T__________________T_____*___C__
ACGTTTATGCAACGTCGAC______T_____*___C________T_____*___C__
______T_____*___C________T_____*___C__
3°) Hitch-hiking effect:
the advantageous mutation is fixed and
variability is swept
ACGTTTATGCAACGTCGACReference
chromosomespopulation of chromosomes
neutral mutations
_G_______________G__________________G_
______T__________________T_________C__
Selection:Hitch-hiking without
recombination
-Perturbations-
A
Hitch-hiking without recombination: genealogy
apparition of an
advantageous
mutation
Few mutations at low
frequency
common
ancestor
T
**** **
T
*
A
After hitch-hiking
Neutral distribution of
mutations in the sample
T
A
C
C
G
CG
CC
GA A
C
T T
G
A
A
neutral coalescent
-Perturbations-
The effect of a selective sweep on the shape of the
coalescent tree
-Perturbations-
_G______A________G_
_G______A________G_
ACGTTTATGCAACGTCGAC1°) an
advantageous mutation
appears
2°) Selection
_G_______________G__________________G_
______T__________________T_____*___C__
ACGTTTATGCAACGTCGAC_G______A___*___C__
______T_____*___C________T_____*___C________T_____*___C__
3°) Hitchhiking
effect: several
haplotypes remain
ACGTTTATGCAACGTCGACReference
chromosomespopulation ofchromosomes
neutral mutations
_G_______________G__________________G_
______T__________________T_________C__
hitchhiking with
recombination
_G______A___*___C__
-Perturbations-
hitchhiking with recombination: genealogy
apparition of an
advantageous
mutation
substantial number ofhigh frequency mutations
commonancestor
**** **
*
After hitchhiking
T
A
CC
G
CG
CC
GA A
C
T T
G
A
A
Neutral coalescent
C
G
C
G
C
G
*
recombination
G G G G
G
Neutral distribution of
mutations in the sample
-Perturbations-
Frequency spectrum of mutations
fi :
Number of
polymorphic sites
Number of
occurrences in a sample
Neutral predictionsSelective predictions
-Perturbations-
-Perturbations-
-Perturbations-
/Background selection
Charlesworth et al. 1993
Alternative hypotheses, overview
S 15 9 9 2
Neutral
Moderate bottleneck /population structureBalanced selection
hitchhiking with
recombination Severe
bottleneck, population expansion
/local hitchhiking
number of mutations
perturbation
-Perturbations-
frequency class in excess
intermediate high lowNone
Gene genealogies / Coalescent theory
•Based on the standard Wright-Fisher neutral model
•Genealogical trees– backward– Intuitive– Economic
*Sampling theory
*Only generations where « events » occurred are considered
-Coalescence-
• The coalescent= a simple and efficient framework to build make inference about selective and demographic history of populations
Acknowledgements
–R. Vitalis–V. Castric–L. Chikki–S. Billiard–M. Schierup
Oxf. Surv Evol Biol 1990. 7:1-44
http://home.uchicago.edu/~rhudson1/popgen356/OxfordSurveysEvolBiol7_1-44.pdf
Frequency spectrum of mutations
0
1
2
3
4
5
6
7
1 2 3 4 5 6
Observed
Neutral
Selection
fi :
Number of
polymorphic sites=4Ne
H=-H
== 00= 0.05
== -3.01 -3.01 **
Number of
occurrences in a sample
(Fay and Wu Genetics 2000)
sityheterozygostate derived the
ofty homozygosi
-Misorientation-
S
i
ii
nn
fnf
1 )1(
)(2̂
S
i
iH nn
f
1
2
)1(
2̂
f 121531416621423f 121531416121423
C
C
T
GCCCGCGAATCCATTGCGTGCGATCCGATTGCGTACAATCCCGTCGTGTACAATCTCGACGTGTACAATCTCGACGCGTGGAATCCCGTTCCGCGCGGTCCCATT
GCCCGCGAATCCATTGCGTGCGATCCGATTGCGTACAATCCCGTCGTGTACAATCTCGACGTGTACAATCTCGACGCGTGGAATCCCGTTCCGCGCGGTCCCATT
o.g.GCGCGCGAACCCATTo.g.GCGCGCGAATCCATTo.g.GCGCGCGAATCCATT
T
C
→
T
T
C
T
→
T
C
pM
Alignment of polymorphic sites: multiple hits and
misorientation
-Misorientation-
T
A
C
C
G
CG
CC
GA A
C
T T
G
A
AT
A
A
C
G
T
C
CT
CA A
T
T
G
A
T
C
T
A
C
C
G
CG
C
TG
G G
CC
C
G
A
A
A
A
T
Neutrality tests: simulationsparameters‡ : S =8 n =6
H = 2.13H = 2.13H = -1.06
... 10 000 simulations
H {
CC
A
T
-tests-
‡ Hudson 1993
-3 -2 -1 0 1 2 3 4 H
density
observed H : P = 0.03 *
Distribution of simulated
H