Coalescent Theory & Population Genetics applications Frantz Depaulis depaulis@ens.frdepaulis@ens.fr ...

Post on 14-Dec-2015

213 views 0 download

Tags:

Transcript of Coalescent Theory & Population Genetics applications Frantz Depaulis depaulis@ens.frdepaulis@ens.fr ...

Coalescent Theory &

Population Genetics

applications

Frantz Depaulis depaulis@ens.frhttp://www.biologie.ens.fr/eceem/frantz_depaulisPwd: M1ENSCOALLaboratoire Ecologie et Evolution, CNRS-UMR 7625 Université Paris 6- ENS room 426

Outline

• Reminder• Coalescence • Mutations/models• Perturbations, applications

Neutral theory of molecular evolution (Kimura 1969):

the neutral model as a reference

Mutations Occurrence Fate

avantageous rare fixed

deleterious frequent eliminated

neutral frequent drift until fixation or

loss

-reminder-

Wright Fisher Neutral model

Assumptions• Selective neutrality (Ne s <<1)

• Demography - Isolated panmictic Population, - Constant size N- Poisson Distribution of offspring P (1)

-reminder-

Effectif efficace : définition• On définit la taille efficace (notée Ne) d’une population comme étant la taille d’une population « idéale » de Wright-Fisher où la dérive génétique aurait la même intensité (**) que dans la population (ou bien le modèle de population) qui nous intéresse

** même taux de dérive, même augmentation de consanguinité, même augmentation de variance de fréquences alléliques entre populations, etc.

-reminder-

Relationships among 10 individuals over 15 generations

Generations

individuals # genes

-Coalescence-

the same, but permuted

Generations

individuals # genes

-Coalescence-

Descent and extinction of a lineage

Generations

individuals # genes

-Coalescence-

Genealogy of 3 individuals: Regarder le processus de dérive en « remontant le temps » jusqu’à l’ancêtre commun d’un échantillon de gènes

Generations

individuals#genes

-Coalescence-

Genealogy of a gene sample

gene sample

ancestral lineage

coalescence= common ancestor

Most recent common ancestor (MRCA)

-Coalescence-

-Coalescence-

Kingman (1980, 1982)

-Coalescence-

-Coalescence-

-Coalescence-

Coalescent Tree

a b c d e f

Most recent common ancestor of the

sample(MRCA)

sample of “genes” /

of individual

s

Common ancestor

(CA)

neutral mutati

ons

TC

C

G

CG

A

A

-Coalescence-

-Coalescence-

-Coalescence-

-Coalescence-

-Coalescence-

-Coalescence-

-Coalescence-

-Coalescence-

-Coalescence-

-Coalescence-

-Coalescence-

Coalescent times

2 genesp2=P( common ancestry in t -1)= 1/2N

n genespn =P ( common ancestry at t -1)= (n x (n -1)/2) x

1/2N

P ( common ancestry t generations ago ) = (1-p)t -1 x p

Geometric distribution (discrete generations)#p e(-pt )

Exponential distribution (continuous time)

p small,

t large

t

t-11/2N

1 2 3 4 5 ... 2N...

...

-Coalescence-

Diploïde (2n) => 2N genes

-Coalescence-

-Coalescence-

-Coalescence-

Effectif efficace de consanguinité

• Dans le modèle de Wright-Fisher, la probabilité de coalescence en une génération est égale à 1/N. On peut donc définir une taille efficace comme l’inverse de la probabilité de coalescence en une génération (Ewens, 1982). C’est une taille efficace de consanguinité, et qui est instantanée.

• Au contraire, une taille efficace asymptotique peut être définie, qui décrit le taux de coalescence de lignées de gènes dans un passé lointain (égale à 1/N dans le modèle de Wright-Fisher)

Cette taille efficace asymptotique (de consanguinité, de coalescence) est équivalente à la taille efficace de valeur propre, qui repose sur une description complète des fréquences alléliques dans une population (la valeur propre de la matrice qui décrit les changements de fréquence détermine l’approche de l’état d’équilibre)

-Coalescence-

1°)Ages of the nodes

Constructing coalescents,

cd e fba

t3

p =1/2NExp( p )

t1

t2

t4

t5:

additional assumption: n << N

-Coalescence-

gene sample

Topologyof the tree

2°)

a b cd e f

MRCA

common ancestor (CA)

t1

t2

t3

t4

t5:

Constructing-deconstructing coalescents

-Coalescence-

http://www.coalescent.dk/

Hudson’s animator:

Mutations

and associated models

-Mutations-

-Mutations-

A AA

A

neutral mutati

ons G

TC

C

G

CA

3°) uniform distribution of

mutations

gene sample

Topologyof the tree

2°)

T

C

GA

C

G

C

T

neutral distribution of sequence polymorphis

m

a b cd e f

MRCA

common ancestor (CA)

t1

t2

t3

t4

t5:

100 000 times

Constructing-deconstructing coalescents

S : P ( TTot x )=4Ne

µ

-Mutations-

• Mutational, sequence data: infinite site model (ISM)- No recombination- Independent mutations- Constant mutation rate µ

Along the sequenceAcross time

- Each mutation affects a new nucleotide site

Infinite(ly many) site model: sequences, SNP’s

-Mutations-

GCCCGCGAATCCATTGCGTGCGATCCGATTGCGTACAATCCCGTCGTGTACAATCTCGACGTGTACAATCTCGACGCGTGGAATCCCGTTCCGCGCGGTCCCATT

f 121531416121423

T C

TACT

G

AT

GC

Alignment of polymorphic sites:

GT

G

n =7

S =15

-Mutations-

Frequency spectrum of mutations

fi :

Number of

polymorphic sites

Number of occurrences in a sample

Neutral predictions

-Mutations-

sityheterozygo

S

i

ii

nn

fnf

1 )1(

)(2̂

1

1

1ˆn

iS iSWatterson’s (1975)

estimator

=4Ne

*

*: Tajima (1983)

1

1 1 )1(

2n

j

n

jk

kj

nn

d

ifE i /1)(

f 121531416621423f 121531416121423

GCCCGCGAATCCATTGCGTGCGATCCGATTGCGTACAATCCCGTCGTGTACAATCTCGACGTGTACAATCTCGACGCGTGGAATCCCGTTCCGCGCGGTCCCATT

GCCCGCGAATCCATTGCGTGCGATCCGATTGCGTACAATCCCGTCGTGTACAATCTCGACGTGTACAATCTCGACGCGTGGAATCCCGTTCCGCGCGGTCCCATT

C

T

→TC

T C

TACT

G

AT

GC

Alignment of polymorphic sites: non

oriented mutations

GT

G

n =7

S =15

-Mutations-

Frequency spectrum of mutations

fi : number of occurrences in a

sample

Number of

polymorphic sites

-Mutations-

GCCCGCGAATCCATTGCGTGCGATCCGATTGCGTACAATCCCGTCGTGTACAATCTCGACGTGTACAATCTCGACGCGTGGAATCCCGTTCCGCGCGGTCCCATT

f 121531416121423

C

T

Alignment of polymorphic sites: orienting mutations with

an outgroup

f 121531416621423

T

C

GCCCGCGAATCCATTGCGTGCGATCCGATTGCGTACAATCCCGTCGTGTACAATCTCGACGTGTACAATCTCGACGCGTGGAATCCCGTTCCGCGCGGTCCCATT

T

C

o.g.GCGCGCGAACCCATT

C

-Mutations-

Infinite(ly many) allele model: allozymes, one site (locus)

(microsatellites)• Each mutation gives rise to a new type/allele on a locus

T

A

C

C

G

CG

T

G

G G

CC

A

A

A

T CC

A

T

-Mutations-

The “stepwise mutation” model (SMM) is appropriate for microsatellites. When a mutation occurs, the new mutation length depends on the existing length. In the simplest case of the “single” SMM, illustrated in the next slide, the new length = old length +/- 1.

Stepwise mutation model

-Mutations-

Microsatellites

GAGGCGTAGTAGTAGTAGTAGTAGTAGGCTCTA

GAGGCGTAGTAGTAGTAGTAGTAGGCTCTAor

GAGGCGTAGTAGTAGTAGTAGTAGTAGTAGGCTCTA

• Microsatellites mutate very fast (~1 change every 500 generations)

• Mutation events usually involve a gain or a loss of a single repeat unit

-Mutations-

-Mutations-

-Mutations-

Recombination« simple » case: 2 haplotypes with 2 locus

One possible Genealogy outcome

Recombination

Coalescence

Coalescence

MRCA1

MRCA2

Past

-Perturbations-

Ancestral Recombination Graph : example

coalescence rate : n(n-1)/2

recombination rate : R n

Recombination

Recombination

Coalescence

Coalescence

n =2

n =3

n =2

n =2

n =2 ?

MRCA

Tim

e

Past

-Perturbations-

Recombination Histories: Non-ancestral bridges-Perturbations-

systematic effects

Genealogy Demographic Selective

Extinction-recolonisation,

severe bottlenecks, populationexpansion

severe hitchhiking

Migration, population structure, moderate

bottlenecks

balanced polymorphism,moderate hitchhiking

-Perturbations-

The neutral model, application = inference

Applications Molecular phylogeography

Molecular ecology:

• Selective effects

- balancing selection

- directional selection, hitchhiking • Demographic effects

- Dispersion

- bottlenecks/expansion

- Distribution of the number of offsprings

Tool= the coalescent

-Coalescence-

-Perturbations-

Demographic change

-Perturbations-

-Perturbations-

40 30 20 10 0Time before present (in mutational units)

Po

pu

latio

n s

ize

0

0.1

0.2

0.3

0.4

0 1 2 3 4 5 6 7 8 9 101112 131415No. of repeat units pairwise differences

Fre

qu

en

cy

SimulationModel estimate

0

0.1

0.2

0.3

0.4

0 1 2 3 4 5 6 7 8 9 10 111213 1415No. of repeat units pairwise differences

Fre

qu

en

cy

SimulationModel estimate

001.0)(

551.0)SSD(

837.1ˆ

SFp

p

308.0)(

007.0)SSD(

351.0ˆ

SFp

p

(a) Constant population size (simulation case 3)

(b) Population expansion, = 3 (simulation case 2)

4 3 2 1 0Time before present (in mutational units)

Po

pu

latio

n s

ize

K K

KNt

K

Nr

tt

t

eNN11

1

-Perturbations-

Coalescent and bottlenecks

t

N Pt = 1/2Nt

-Perturbations-

Time

Coalescence

Coalescence

Migration

Coalescence

Coalescence

Pop 1 Pop 2

-Perturbations-Population structure

Coalescence

1/m

Migration

-Perturbations-

-Perturbations-

______T_____*___C__

_G______A________G_

_G______A________G_

ACGTTTATGCAACGTCGAC 1°) an advantageo

us mutation appears

2°) Selection

_G_______________G__________________G_

______T__________________T_____*___C__

ACGTTTATGCAACGTCGAC______T_____*___C________T_____*___C__

______T_____*___C________T_____*___C__

3°) Hitch-hiking effect:

the advantageous mutation is fixed and

variability is swept

ACGTTTATGCAACGTCGACReference

chromosomespopulation of chromosomes

neutral mutations

_G_______________G__________________G_

______T__________________T_________C__

Selection:Hitch-hiking without

recombination

-Perturbations-

A

Hitch-hiking without recombination: genealogy

apparition of an

advantageous

mutation

Few mutations at low

frequency

common

ancestor

T

**** **

T

*

A

After hitch-hiking

Neutral distribution of

mutations in the sample

T

A

C

C

G

CG

CC

GA A

C

T T

G

A

A

neutral coalescent

-Perturbations-

The effect of a selective sweep on the shape of the

coalescent tree

-Perturbations-

_G______A________G_

_G______A________G_

ACGTTTATGCAACGTCGAC1°) an

advantageous mutation

appears

2°) Selection

_G_______________G__________________G_

______T__________________T_____*___C__

ACGTTTATGCAACGTCGAC_G______A___*___C__

______T_____*___C________T_____*___C________T_____*___C__

3°) Hitchhiking

effect: several

haplotypes remain

ACGTTTATGCAACGTCGACReference

chromosomespopulation ofchromosomes

neutral mutations

_G_______________G__________________G_

______T__________________T_________C__

hitchhiking with

recombination

_G______A___*___C__

-Perturbations-

hitchhiking with recombination: genealogy

apparition of an

advantageous

mutation

substantial number ofhigh frequency mutations

commonancestor

**** **

*

After hitchhiking

T

A

CC

G

CG

CC

GA A

C

T T

G

A

A

Neutral coalescent

C

G

C

G

C

G

*

recombination

G G G G

G

Neutral distribution of

mutations in the sample

-Perturbations-

Frequency spectrum of mutations

fi :

Number of

polymorphic sites

Number of

occurrences in a sample

Neutral predictionsSelective predictions

-Perturbations-

-Perturbations-

-Perturbations-

/Background selection

Charlesworth et al. 1993

Alternative hypotheses, overview

S 15 9 9 2

Neutral

Moderate bottleneck /population structureBalanced selection

hitchhiking with

recombination Severe

bottleneck, population expansion

/local hitchhiking

number of mutations

perturbation

-Perturbations-

frequency class in excess

intermediate high lowNone

Gene genealogies / Coalescent theory

•Based on the standard Wright-Fisher neutral model

•Genealogical trees– backward– Intuitive– Economic

*Sampling theory

*Only generations where « events » occurred are considered

-Coalescence-

• The coalescent= a simple and efficient framework to build make inference about selective and demographic history of populations

Acknowledgements

–R. Vitalis–V. Castric–L. Chikki–S. Billiard–M. Schierup

Oxf. Surv Evol Biol 1990. 7:1-44

http://home.uchicago.edu/~rhudson1/popgen356/OxfordSurveysEvolBiol7_1-44.pdf

Frequency spectrum of mutations

0

1

2

3

4

5

6

7

1 2 3 4 5 6

Observed

Neutral

Selection

fi :

Number of

polymorphic sites=4Ne

H=-H

== 00= 0.05

== -3.01 -3.01 **

Number of

occurrences in a sample

(Fay and Wu Genetics 2000)

sityheterozygostate derived the

ofty homozygosi

-Misorientation-

S

i

ii

nn

fnf

1 )1(

)(2̂

S

i

iH nn

f

1

2

)1(

f 121531416621423f 121531416121423

C

C

T

GCCCGCGAATCCATTGCGTGCGATCCGATTGCGTACAATCCCGTCGTGTACAATCTCGACGTGTACAATCTCGACGCGTGGAATCCCGTTCCGCGCGGTCCCATT

GCCCGCGAATCCATTGCGTGCGATCCGATTGCGTACAATCCCGTCGTGTACAATCTCGACGTGTACAATCTCGACGCGTGGAATCCCGTTCCGCGCGGTCCCATT

o.g.GCGCGCGAACCCATTo.g.GCGCGCGAATCCATTo.g.GCGCGCGAATCCATT

T

C

T

T

C

T

T

C

pM

Alignment of polymorphic sites: multiple hits and

misorientation

-Misorientation-

T

A

C

C

G

CG

CC

GA A

C

T T

G

A

AT

A

A

C

G

T

C

CT

CA A

T

T

G

A

T

C

T

A

C

C

G

CG

C

TG

G G

CC

C

G

A

A

A

A

T

Neutrality tests: simulationsparameters‡ : S =8 n =6

H = 2.13H = 2.13H = -1.06

... 10 000 simulations

H {

CC

A

T

-tests-

‡ Hudson 1993

-3 -2 -1 0 1 2 3 4 H

density

observed H : P = 0.03 *

Distribution of simulated

H