Bayesian models for molecular evolution Rates, rates and...

46
Bayesian models for molecular evolution Rates, rates and traits Nicolas Lartillot March 20, 2012 Nicolas Lartillot (Universite de Montréal) BCM2004: Molecular evolution March 20, 2012 1 / 38

Transcript of Bayesian models for molecular evolution Rates, rates and...

Page 1: Bayesian models for molecular evolution Rates, rates and ...megasun.bch.umontreal.ca/People/lartillot/trash/... · Bayesian models for molecular evolution Rates, rates and traits

Bayesian models for molecular evolutionRates, rates and traits

Nicolas Lartillot

March 20, 2012

Nicolas Lartillot (Universite de Montréal) BCM2004: Molecular evolution March 20, 2012 1 / 38

Page 2: Bayesian models for molecular evolution Rates, rates and ...megasun.bch.umontreal.ca/People/lartillot/trash/... · Bayesian models for molecular evolution Rates, rates and traits

1 Molecular dating

2 Phylogenetic covariance model

3 ResultsRates, dates and traitsTesting the nearly-neutral modelGC landscapes and biased genes conversion

4 Conclusions

Nicolas Lartillot (Universite de Montréal) BCM2004: Molecular evolution March 20, 2012 2 / 38

Page 3: Bayesian models for molecular evolution Rates, rates and ...megasun.bch.umontreal.ca/People/lartillot/trash/... · Bayesian models for molecular evolution Rates, rates and traits

The molecular clock

!"#$%&$'()*+'()+,'"*-(./+0123/+4-5"&'+/+0167+

8%&9%:&'9%+.*;:-(+

ρ = µ× f0substitution rate = mutation rate× fraction of neutral mutations

Page 4: Bayesian models for molecular evolution Rates, rates and ...megasun.bch.umontreal.ca/People/lartillot/trash/... · Bayesian models for molecular evolution Rates, rates and traits

Variation of the substitution rate among lineages

Possible causesgeneration-time effect

time

metabolic rate effectsselection for longevity

(reviewed in Lanfear et al, 2010)

=⇒

TREESHREWLEMUR

HUMANFLYINGLEMUR

RABBITPIKA

SCIURIDRAT

MOUSECAVIOMORPH

MOLESHREWHEDGEHOG

LLAMAPIG

HIPPOWHALE

DELPHINOIDCOW

TAPIRRHINO

HORSEPHYLLOSTOMID

FLYINGFOXPANGOLIN

DOGCAT

ARMADILLOSLOTH

ANTEATERSIRENIAN

HYRAXELEPHANT

MACROSCELIDESELEPHANTULUS

TENRECIDGOLDENMOLE

AARDVARK

0.1 subs per site

Nicolas Lartillot (Universite de Montréal) BIN6009 10/05/2009 1 / 1concatenation of 13 nuclear genes, 38 placentals

Page 5: Bayesian models for molecular evolution Rates, rates and ...megasun.bch.umontreal.ca/People/lartillot/trash/... · Bayesian models for molecular evolution Rates, rates and traits

Variation of the substitution rate among lineages

Possible causesgeneration-time effect

time

metabolic rate effectsselection for longevity

(reviewed in Lanfear et al, 2010)

=⇒

TREESHREWLEMUR

HUMANFLYINGLEMUR

RABBITPIKA

SCIURIDRAT

MOUSECAVIOMORPH

MOLESHREWHEDGEHOG

LLAMAPIG

HIPPOWHALE

DELPHINOIDCOW

TAPIRRHINO

HORSEPHYLLOSTOMID

FLYINGFOXPANGOLIN

DOGCAT

ARMADILLOSLOTH

ANTEATERSIRENIAN

HYRAXELEPHANT

MACROSCELIDESELEPHANTULUS

TENRECIDGOLDENMOLE

AARDVARK

0.1 subs per site

Nicolas Lartillot (Universite de Montréal) BIN6009 10/05/2009 1 / 1concatenation of 13 nuclear genes, 38 placentals

Page 6: Bayesian models for molecular evolution Rates, rates and ...megasun.bch.umontreal.ca/People/lartillot/trash/... · Bayesian models for molecular evolution Rates, rates and traits

Estimating divergence times: the relaxed clock model

!"#!$#!%#

&'(#'('#('&#')'#***#

&'&#'&'#('&#')'#***#

&'(#'&'#()&#')'#***#

&'(#'&'#()&#')'#***#

sequence alignment

!

rt

Data and constraintsmultiple alignment D (here, nuclear coding genes in mammals)tree topology T , and fossil calibrations Φ

Page 7: Bayesian models for molecular evolution Rates, rates and ...megasun.bch.umontreal.ca/People/lartillot/trash/... · Bayesian models for molecular evolution Rates, rates and traits

Estimating divergence times: the relaxed clock model

!"#!$#!%#

&'(#'('#('&#')'#***#

&'&#'&'#('&#')'#***#

&'(#'&'#()&#')'#***#

&'(#'&'#()&#')'#***#

sequence alignment

!

rt

Hierarchial Bayesian modeldiversification process (e.g. birth-death, parameters λ, µ, ρ)substitution rate: Brownian log-normal process (variance σ2)substitution process (4x4 substitution matrix Q)parameters: θ = (λ, µ, ρ, σ, t , r)

Page 8: Bayesian models for molecular evolution Rates, rates and ...megasun.bch.umontreal.ca/People/lartillot/trash/... · Bayesian models for molecular evolution Rates, rates and traits

Estimating divergence times: the relaxed clock model

!"#!$#!%#

&'(#'('#('&#')'#***#

&'&#'&'#('&#')'#***#

&'(#'&'#()&#')'#***#

&'(#'&'#()&#')'#***#

sequence alignment

!

rt

Diversification processdiversification process (e.g. birth-death, parameters λ, µ, ρ)tree topology T , fossil constraints Φ

t : vector of divergence timesp(t | λ, µ, ρ,Φ,T ) = p(t | λ, µ, ρ)

Kendall 1948, Nee et al 1994, Yang and Rannala, 2006

Page 9: Bayesian models for molecular evolution Rates, rates and ...megasun.bch.umontreal.ca/People/lartillot/trash/... · Bayesian models for molecular evolution Rates, rates and traits

Molecular dating

Tree shape and diversification patterns

Nee et al, 1994

modeling speciation-extinction processesusing time-calibrated phylogenies to fit and compare models

Nicolas Lartillot (Universite de Montréal) BCM2004: Molecular evolution March 20, 2012 6 / 38

Page 10: Bayesian models for molecular evolution Rates, rates and ...megasun.bch.umontreal.ca/People/lartillot/trash/... · Bayesian models for molecular evolution Rates, rates and traits

Molecular dating

Brownian process

!"#!$#!%#

!

l = r "t

!

"t = t2# t

1

!

rj up

!

rj

!

p(D |T ,l,Q)p(r | t," )p(t)p(" )p(Q)

joint probability:

!

ln rj ~ N (ln rjup ,"#t)

xt = ln rt

dxt = σdBt

xt ∼ N(x0, σ2t)

P taxa, j = 1..2P − 2 branches

p(r | r0, σ2, t) =

∏j

p(rj | rjup ,∆tj , σ2)

Nicolas Lartillot (Universite de Montréal) BCM2004: Molecular evolution March 20, 2012 7 / 38

Page 11: Bayesian models for molecular evolution Rates, rates and ...megasun.bch.umontreal.ca/People/lartillot/trash/... · Bayesian models for molecular evolution Rates, rates and traits

Molecular dating

Substitution process

!"#!$#!%#

!

l = r "t

!

"t = t2 # t1

!

rj up

!

rj &'(#'('#('&#')'#***#

&'&#'&'#('&#')'#***#

&'(#'&'#()&#')'#***#

&'(#'&'#()&#')'#***#

sequence alignment

probability of transition between nucleotides a and b over time ∆t :

r =1

∆t

∫ t1

t2rtdt '

rj + rjup

2

p(a→ b | r ,∆t ,Q) = [exp(r∆t Q)]ab

Nicolas Lartillot (Universite de Montréal) BCM2004: Molecular evolution March 20, 2012 8 / 38

Page 12: Bayesian models for molecular evolution Rates, rates and ...megasun.bch.umontreal.ca/People/lartillot/trash/... · Bayesian models for molecular evolution Rates, rates and traits

Molecular dating

Relaxed clock model

!"#!$#!%#

!

l = r "t

!

"t = t2# t

1

!

rj up

!

rj

!

p(D |T ,l,Q)p(r | t," )p(t)p(" )p(Q)

joint probability:

!

ln rj ~ N (ln rjup ,"#t)

joint probability:

p(t | λ, µ, ρ) p(r | t , σ2) p(D | r , t ,Q) = p(D | θ)

parameters: θ = (λ, µ, ρ, σ, t , r)

Nicolas Lartillot (Universite de Montréal) BCM2004: Molecular evolution March 20, 2012 9 / 38

Page 13: Bayesian models for molecular evolution Rates, rates and ...megasun.bch.umontreal.ca/People/lartillot/trash/... · Bayesian models for molecular evolution Rates, rates and traits

Molecular dating

Bayes theorem: general formulation

p(θ | D) =p(D | θ)p(θ)

p(D)

p(D) =

∫p(D | θ)p(θ)dθ

θ model parametersD Data

p(θ) prior distributionp(D | θ) likelihood

p(D) evidence, or marginal likelihoodp(θ | D) posterior distribution

Nicolas Lartillot (Universite de Montréal) BCM2004: Molecular evolution March 20, 2012 10 / 38

Page 14: Bayesian models for molecular evolution Rates, rates and ...megasun.bch.umontreal.ca/People/lartillot/trash/... · Bayesian models for molecular evolution Rates, rates and traits

Sampling from the posterior: the Metropolis algorithm

0

0.2

0.4

0.6

0.8

1

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000

"out2"

Histogram of out2[[1]]

random walk

Den

sity

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.5

1.0

1.5

2.0

2.5

Metropolis algorithmdraw U ∼ Uniform[0,1]

set θ∗k = θk + δ(U − 0.5), possibly reflecting back into [0,1]

accept with probability q = min{

1, p(θ∗k |D)

p(θk |D)

}iterate

(θk )k∈N is a Markov chain with stationary distirbution p(θ | f )

Page 15: Bayesian models for molecular evolution Rates, rates and ...megasun.bch.umontreal.ca/People/lartillot/trash/... · Bayesian models for molecular evolution Rates, rates and traits

Molecular dating

Relaxed clock model

!"#!$#!%#

!

l = r "t

!

"t = t2# t

1

!

rj up

!

rj

!

p(D |T ,l,Q)p(r | t," )p(t)p(" )p(Q)

joint probability:

!

ln rj ~ N (ln rjup ,"#t)

posterior proportional to joint probability:

p(λ, µ, ρ) p(σ2) p(t | λ, µ, ρ) p(r | t , σ2) p(D | r , t ,Q)

Nicolas Lartillot (Universite de Montréal) BCM2004: Molecular evolution March 20, 2012 12 / 38

Page 16: Bayesian models for molecular evolution Rates, rates and ...megasun.bch.umontreal.ca/People/lartillot/trash/... · Bayesian models for molecular evolution Rates, rates and traits

Molecular dating

Relaxed clock model

!"#!$#!%#

!

rj " # r j

Metropolis Hastings on rates

α =p(D | r ′, t ,Q) p(r ′ | t , σ2)

p(D | r , t ,Q) p(r | t , σ2)

α > 1: accept moveα < 1: accept with prob. α

Nicolas Lartillot (Universite de Montréal) BCM2004: Molecular evolution March 20, 2012 12 / 38

Page 17: Bayesian models for molecular evolution Rates, rates and ...megasun.bch.umontreal.ca/People/lartillot/trash/... · Bayesian models for molecular evolution Rates, rates and traits

Molecular dating

Relaxed clock model

!"#!$#!%#

!

t j " # t j

Metropolis Hastings on divergence times

α =p(D | r , t ′,Q) p(r | t ′, σ2) p(t ′ | λ, µ, ρ)

p(D | r , t ,Q) p(r | t , σ2) p(t | λ, µ, ρ)

α > 1: accept moveα < 1: accept with prob. α

Nicolas Lartillot (Universite de Montréal) BCM2004: Molecular evolution March 20, 2012 12 / 38

Page 18: Bayesian models for molecular evolution Rates, rates and ...megasun.bch.umontreal.ca/People/lartillot/trash/... · Bayesian models for molecular evolution Rates, rates and traits

Molecular dating

Relaxed clock model

!"#!$#!%#

!

l = r "t

!

"t = t2# t

1

!

rj up

!

rj

!

p(D |T ,l,Q)p(r | t," )p(t)p(" )p(Q)

joint probability:

!

ln rj ~ N (ln rjup ,"#t)

Metropolis Hastings on σ2

α =p(r | t , σ′2) p(σ′2)

p(r | t , σ2) p(σ2)

α > 1: accept moveα < 1: accept with prob. α

Nicolas Lartillot (Universite de Montréal) BCM2004: Molecular evolution March 20, 2012 12 / 38

Page 19: Bayesian models for molecular evolution Rates, rates and ...megasun.bch.umontreal.ca/People/lartillot/trash/... · Bayesian models for molecular evolution Rates, rates and traits

Molecular dating

Relaxed clock model

!"#!$#!%#

!

l = r "t

!

"t = t2# t

1

!

rj up

!

rj

!

p(D |T ,l,Q)p(r | t," )p(t)p(" )p(Q)

joint probability:

!

ln rj ~ N (ln rjup ,"#t)

Metropolis Hastings on λ, µ, ρ

α =p(t | λ′, µ′, ρ′) p(λ′, µ′, ρ′)

p(t | λ, µ, ρ) p(λ, µ, ρ)

α > 1: accept moveα < 1: accept with prob. α

Nicolas Lartillot (Universite de Montréal) BCM2004: Molecular evolution March 20, 2012 12 / 38

Page 20: Bayesian models for molecular evolution Rates, rates and ...megasun.bch.umontreal.ca/People/lartillot/trash/... · Bayesian models for molecular evolution Rates, rates and traits

Posterior mean times and rates

rate

900

1000

1100

1200

1300

1400

1500

1600

500 1000 1500 2000 2500

time

3.55

3.6

3.65

3.7

3.75

3.8

3.85

3.9

3.95

4

200 400 600 800 1000 1200 1400 1600 1800 2000 2200

Divergence times and substitution rates

PLATYPUSMONODELPHIDIDELPHISARMADILLOSLOTHANTEATERSIRENIANHYRAXELEPHANTAARDVARKSHEARELESHLOEARELESHTENRECIDGOLDENMOLETREESHREWSTREPSIRRHHUMANFLYINGLEMURABBITPIKASCIURIDRATMOUSECAVIOMORPHMOLESHREWHEDGEHOGLLAMAPIGHIPPOWHALEDELPHINOIDCOWTAPIRRHINOHORSEPHYLLOSTOMFLYINGFOXPANGOLINDOGCAT

0100 MyrsKT

carnivoreschiropteresperissodactyls

cetartiodactyls

eulipotyphlans

rodentslagomorphs

primates

afrotherians

xenarthransmarsupialsmonotremes

(Thorne et al 1998, Lepage et al 2007, Rannala and Yang 2007)Multidivtime, PhyloBayes, MCMCtree

Page 21: Bayesian models for molecular evolution Rates, rates and ...megasun.bch.umontreal.ca/People/lartillot/trash/... · Bayesian models for molecular evolution Rates, rates and traits

Substitution-calibrated versus time-calibrated trees

TREESHREWLEMUR

HUMANFLYINGLEMUR

RABBITPIKA

SCIURIDRAT

MOUSECAVIOMORPH

MOLESHREWHEDGEHOG

LLAMAPIG

HIPPOWHALE

DELPHINOIDCOW

TAPIRRHINO

HORSEPHYLLOSTOMID

FLYINGFOXPANGOLIN

DOGCAT

ARMADILLOSLOTH

ANTEATERSIRENIAN

HYRAXELEPHANT

MACROSCELIDESELEPHANTULUS

TENRECIDGOLDENMOLE

AARDVARK

0.1 subs per site

Nicolas Lartillot (Universite de Montréal) BIN6009 10/05/2009 1 / 1

Page 22: Bayesian models for molecular evolution Rates, rates and ...megasun.bch.umontreal.ca/People/lartillot/trash/... · Bayesian models for molecular evolution Rates, rates and traits

Correlating rates and life-history traitsRates and life-history traits

!!

!

!!!!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!!!

!

!

!

!

!!

!

!

!

!

!

!

!!

!

!

!

!

!

!!

!

!

!

!

!!!!!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!!

!

!!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

5 10 15

!3.0

!2.5

!2.0

!1.5

!1.0

!0.5

0.0

rate / mass regression

log mass

log

su

bs.

rate

correcting for phylogenetic inertia (independent contrasts)sequential method: error propagationno feedback of rate variations on life-history evolution

Page 23: Bayesian models for molecular evolution Rates, rates and ...megasun.bch.umontreal.ca/People/lartillot/trash/... · Bayesian models for molecular evolution Rates, rates and traits

Coupling substitution process with life-history evolution

!"#!$#!%#

&'#"()#

!

" =2 #1

#1 1

$

% &

'

( )

*+,-#./00#

01*0!2#3/!4#

"()#

)2(5#

5)#

(6))#

covariance

matrix

kg

!

r 1

!

r 3

!

r 2

!

l2

= r 2"t

789#898#987#8:8#222#

787#878#987#8:8#222#

789#878#9:7#8:8#222#

789#878#9:7#8:8#222#

sequence alignment

(Lartillot and Poujol, 2011, Molecular Biology and Evolution)

Data and constraintsmultiple alignment D (nuclear coding genes in mammals)

matrix of quantitative characters (C) (life-history traits)

tree topology (T ), and fossil calibrations (Φ)

Page 24: Bayesian models for molecular evolution Rates, rates and ...megasun.bch.umontreal.ca/People/lartillot/trash/... · Bayesian models for molecular evolution Rates, rates and traits

Coupling substitution process with life-history evolution

!"#!$#!%#

&'#"()#

!

" =2 #1

#1 1

$

% &

'

( )

*+,-#./00#

01*0!2#3/!4#

"()#

)2(5#

5)#

(6))#

covariance

matrix

kg

!

r 1

!

r 3

!

r 2

!

l2

= r 2"t

789#898#987#8:8#222#

787#878#987#8:8#222#

789#878#9:7#8:8#222#

789#878#9:7#8:8#222#

sequence alignment

(Lartillot and Poujol, 2011, Molecular Biology and Evolution)

Hierarchial Bayesian model (parameter estimation by MCMC)

diversification process t (birth-death, parameters λ, µ, ρ)

Brownian multivariate process X (covariance matrix Σ)

time-dependent codon model Q

Page 25: Bayesian models for molecular evolution Rates, rates and ...megasun.bch.umontreal.ca/People/lartillot/trash/... · Bayesian models for molecular evolution Rates, rates and traits

Codon modelMutation matrix U (4 x 4)

A C G TA − γ κγ 1C 1 − γ κG κ γ − 1T 1 κγ γ −

κ: transition-transversion ratioγ = GC∗/(1−GC∗) (GC∗: equilibrium GC)

Codon substitution matrix Q (61 x 61)

Qc1c2 = Un1n2 , c1, c2synonymous,Qc1c2 = Un1n2 ω, c1, c2non-synonymous.

Muse and Gaut 1994

Page 26: Bayesian models for molecular evolution Rates, rates and ...megasun.bch.umontreal.ca/People/lartillot/trash/... · Bayesian models for molecular evolution Rates, rates and traits

Phylogenetic covariance model

Generalization

time-dependent substitution parametersrate of synonymous substitution (r )non-synonymous / synonymous ratio (ω)equilibrium GC (γ)

time-dependent quantitative traitssexual maturity (proxy of generation time)adult body massmaximum recorded lifespan (proxy of longevity)metabolic rategenome sizekaryotypic number (number of chromosomes 2n)

Nicolas Lartillot (Universite de Montréal) BCM2004: Molecular evolution March 20, 2012 18 / 38

Page 27: Bayesian models for molecular evolution Rates, rates and ...megasun.bch.umontreal.ca/People/lartillot/trash/... · Bayesian models for molecular evolution Rates, rates and traits

Coupling substitution process with life-history evolution

!"#!$#!%#

&'#"()#

!

" =2 #1

#1 1

$

% &

'

( )

*+,-#./00#

01*0!2#3/!4#

"()#

)2(5#

5)#

(6))#

covariance

matrix

kg

!

r 1

!

r 3

!

r 2

!

l2

= r 2"t

789#898#987#8:8#222#

787#878#987#8:8#222#

789#878#9:7#8:8#222#

789#878#9:7#8:8#222#

sequence alignment

(Lartillot and Poujol, 2011, Molecular Biology and Evolution)

posterior proportional to joint probability:

p(λ, µ, ρ) p(t | λ, µ, ρ) p(Σ) p(X | t ,Σ) p(D | X , t)

Page 28: Bayesian models for molecular evolution Rates, rates and ...megasun.bch.umontreal.ca/People/lartillot/trash/... · Bayesian models for molecular evolution Rates, rates and traits

Results Rates, dates and traits

1. Nuclear data: correlates of synonymous rate

Nicolas Lartillot (Universite de Montréal) BIN6009 10/05/2009 1 / 1

dS

dN/dS

maturity

longevity

mass

metabolic rate

dS dN/dS mat. long. mass met.

red: positive

blue: negative

light shade: not significant

strong correlations between life-history traitsdS correlates negatively with body mass, gen. time and longevityR2: life-history variations explain ∼ 35% of synonymous rate.dN/dS positively correlated with body size (nearly neutral effect?)

Nicolas Lartillot (Universite de Montréal) BCM2004: Molecular evolution March 20, 2012 20 / 38

Page 29: Bayesian models for molecular evolution Rates, rates and ...megasun.bch.umontreal.ca/People/lartillot/trash/... · Bayesian models for molecular evolution Rates, rates and traits

Inferring divergence times and body size evolution

ManisAiluropodaCanisFelisPantheraEquusCeratotheriumTapirusTadaridaAntrozousMyotisArtibeusNycterisMegadermaPteropusRousettusSusBosTragelaphusHippopotamusMegapteraTursiopsLamaVicugnaSolenodonErinaceusSorexGalemysTalpaTarsiusCallithrixMacacaPongoGorillaHomoPanOtolemurLemurMicrocebusCynocephalusGaleopterusPtilocercusTupaiaCastorDipodomysPedetesMusRattusHystrixErethizonCaviaHydrochoerusMuscardinusSpermophilusTamiasOchotonaOryctolagusSylvilagusOrycteropusAmblysomusEchinopsElephantulusMacroscelidesLoxodontaProcaviaTrichechusDasypusChaetophractusEuphractusCholoepusdiCholoepushoMyrmecophagaTamandua

1

10

100

1000

10000

100000

1000000

10000000

0100 MyrsKT

Page 30: Bayesian models for molecular evolution Rates, rates and ...megasun.bch.umontreal.ca/People/lartillot/trash/... · Bayesian models for molecular evolution Rates, rates and traits

The evolution of body size

PLATYPUSMONODELPHISDIDELPHISARMADILLOSLOTHANTEATERSIRENIANHYRAXELEPHANTAARDVARKMACROSCELIDESELEPHANTULUSTENRECIDGOLDENMOLETREESHREWLEMURHUMANFLYINGLEMURRABBITPIKASCIURIDRATMOUSECAVIOMORPHMOLESHREWHEDGEHOGLLAMAPIGHIPPOWHALEDELPHINOIDCOWTAPIRRHINOHORSEPHYLLOSTOMIDFLYINGFOXPANGOLINDOGCAT

0100 MyrsKT

1 kg

10 kg

100 kg

1000 kg

Nicolas Lartillot (Universite de Montréal) BIN6009 10/05/2009 1 / 1

Pakicetids

(Thewissen et al, 2001)

1 2 3 4 5 6 7 8

0.0

0.2

0.4

0.6

0.8

Hippo Whale ancestor

log10 Mass (g)p

ost.

de

nsity

coupled < KT uncoupled

Page 31: Bayesian models for molecular evolution Rates, rates and ...megasun.bch.umontreal.ca/People/lartillot/trash/... · Bayesian models for molecular evolution Rates, rates and traits

Systematic trends

-3.2

-3

-2.8

-2.6

-2.4

-2.2

-2

-1.8

-100 -80 -60 -40 -20 0

substitution rate

4

5

6

7

8

9

10

11

12

-100 -80 -60 -40 -20 0

body mass

Cope’s or Stanley’s ruleintra-lineage drive towards larger body sizemore frequent extinction of large-bodied mammalsneeds to be explicitely modeled (directed Brownian motion)possible impact in estimated divergence times (Welch 2008)connections with mass-dependent extinction (FitzJohn, 2010).

Page 32: Bayesian models for molecular evolution Rates, rates and ...megasun.bch.umontreal.ca/People/lartillot/trash/... · Bayesian models for molecular evolution Rates, rates and traits

Results Testing the nearly-neutral model

2. Mitochondrial data: correlates of dN/dS

dS

dN/dS

maturity

mass

dS dN/dS mat. long.

Nicolas Lartillot (Universite de Montréal) BIN6009 10/05/2009 1 / 1

mass

longevity

unfit

fit

coding sequence

1/N2

1/N1

positive correlation between dN/dS and body sizecompatible with a nearly-neutral interpretationvia negative correlation body size population size (N)(Ohta, 1972, Kimura, 1979, Popadin, 2007)

Nicolas Lartillot (Universite de Montréal) BCM2004: Molecular evolution March 20, 2012 24 / 38

Page 33: Bayesian models for molecular evolution Rates, rates and ...megasun.bch.umontreal.ca/People/lartillot/trash/... · Bayesian models for molecular evolution Rates, rates and traits

Radical-conservative amino-acid replacement model

(adapted from Livington and Barton, 1993)

ω = Kr/Kc

Qab = Rab if a→ b conservative,Qab = Rab ω if a→ b radical.

Rab: a general time reversible 20x20 process.conservative = conserving volume and/or polarity (and/or charge)

Page 34: Bayesian models for molecular evolution Rates, rates and ...megasun.bch.umontreal.ca/People/lartillot/trash/... · Bayesian models for molecular evolution Rates, rates and traits

Results Testing the nearly-neutral model

Mitochondrial data Kr/Kc (volume + polarity)

Kc

Kr/Kc

maturity

mass

Kc Kr/Kc mat. long. mass

longevity

Nicolas Lartillot (Universite de Montréal) BIN6009 10/05/2009 1 / 1

red: positive

blue: negative

light shade: not significant

positive correlation between Kr/Kc and body sizesimilar to that observed for dN/dS (but higher R2)charge: no significant effectpolarity + volume : strongest correlation (highest R2)

Nicolas Lartillot (Universite de Montréal) BCM2004: Molecular evolution March 20, 2012 26 / 38

Page 35: Bayesian models for molecular evolution Rates, rates and ...megasun.bch.umontreal.ca/People/lartillot/trash/... · Bayesian models for molecular evolution Rates, rates and traits

Reconstructed variations of Kr/Kc

Procavia capensisLoxodonta africanaElephas maximusDugong dugonEchinops telfairiMacroscelides proboscideusElephantulus spOrycteropus aferChrysochloris asiaticaDasypus novemcinctusTamandua tetradactylaCholoepus didactylusBradypus tridactylusTupaia belangeriNycticebus coucangLemur cattaTarsius bancanusCynocephalus variegatusHylobates larPongo pygmaeus abeliiPongo pygmaeusPan troglodytesPan paniscusHomo sapiensGorilla gorillaTrachypithecus obscurusColobus guerezaPapio hamadryasMacaca sylvanusMacaca mulattaCercopithecus aethiopsCebus albifronsOchotona princepsOchotona collarisOryctolagus cuniculusLepus europaeusSciurus vulgarisMyoxus glisNannospalax ehrenbergiVolemys kikuchiiRattus norvegicusMus musculus molossinusMus musculusJaculus jaculusThryonomys swinderianusCavia porcellusHemiechinus auritusErinaceus europaeusEchinosorex gymnuraUrotrichus talpoidesTalpa europaeaMogera woguraSorex unguiculatusEpisoriculus fumidusCrocidura russulaRhinolophus pumilusRhinolophus monocerosPteropus scapulatusPteropus dasymallusPipistrellus abramusChalinolobus tuberculatusMystacina tuberculataArtibeus jamaicensisSus scrofaLama pacosMuntiacus reevesiMuntiacus muntjakMuntiacus crinifronsCervus nippon yesoensisCervus nippon centralisOvis ariesCapra hircusBubalus bubalisBos taurusBos indicusBos grunniensHippopotamus amphibiusPhyseter catodonKogia brevicepsPhocoena phocoenaMonodon monocerosLagenorhynchus albirostrisPontoporia blainvilleiInia geoffrensisPlatanista minorHyperoodon ampullatusBerardius bairdiiCaperea marginataEschrichtius robustusMegaptera novaeangliaeBalaenoptera physalusBalaenoptera musculusBalaenoptera brydeiBalaenoptera borealisBalaenoptera bonaerensisBalaenoptera acutorostrataEubalaena japonicaEubalaena australisBalaena mysticetusManis tetradactylaEquus caballusEquus asinusTapirus terrestrisRhinoceros unicornisCeratotherium simumCanis familiarisUrsus maritimusUrsus arctosUrsus americanusPhoca vitulinaHalichoerus grypusOdobenus rosmarus rosmarusEumetopias jubatusArctocephalus forsteriHerpestes javanicusFelis catusAcinonyx jubatus

0.66 1.2 1.7

!"#$%&'#"(

!)*"#+',"-*./"(

!0%#'1*)#"(

23/.1'*.10/"(

4)#%55',"-*./"(

6',)$+"(

7"8'9'#10"(

4#%9"*)5(

:)$"#*0#"(

;<#'*0)#%"(

Page 36: Bayesian models for molecular evolution Rates, rates and ...megasun.bch.umontreal.ca/People/lartillot/trash/... · Bayesian models for molecular evolution Rates, rates and traits

Results GC landscapes and biased genes conversion

3. Equilibrium GC (GC∗) in nuclear genomes

dS

dN/dS

maturity

longevity

mass

dS dN/dS mat. long. mass GC*

GC*

# chrom.

# chromosomes

red: positive

blue: negative

light shade: not significant

negative correlation between GC∗ and body sizepositive correlation between GC∗ and number of chromosomes

Nicolas Lartillot (Universite de Montréal) BCM2004: Molecular evolution March 20, 2012 28 / 38

Page 37: Bayesian models for molecular evolution Rates, rates and ...megasun.bch.umontreal.ca/People/lartillot/trash/... · Bayesian models for molecular evolution Rates, rates and traits

Results GC landscapes and biased genes conversion

Biased gene conversion (BGC) during meiosis

GC-biased mismatch repair

adapted from Duret and Galtier 2009

=⇒

GC is overtransmitted

A

T

A

T

C

G

C

G

A

T

C

G

C

G

C

G

xGC =1 + b

2xAT =

1− b2

meiotic distorsion bias b ⇐⇒ positive selection for GCb proportional to local recombination rate

Nicolas Lartillot (Universite de Montréal) BCM2004: Molecular evolution March 20, 2012 29 / 38

Page 38: Bayesian models for molecular evolution Rates, rates and ...megasun.bch.umontreal.ca/People/lartillot/trash/... · Bayesian models for molecular evolution Rates, rates and traits

Results GC landscapes and biased genes conversion

Mutation selection model

Kimura 1982

Substitution process (low mutation approx.)Substitution rate = mutation rate x fixation probability

r = 2Nu p = u 2Np = uP

u: mutation ratep: fixation probabilityN: effective population size

P = 2Np: scaled fixation probability (relative to neutral)

Nicolas Lartillot (Universite de Montréal) BCM2004: Molecular evolution March 20, 2012 30 / 38

Page 39: Bayesian models for molecular evolution Rates, rates and ...megasun.bch.umontreal.ca/People/lartillot/trash/... · Bayesian models for molecular evolution Rates, rates and traits

Results GC landscapes and biased genes conversion

Fixation probability of a mutation20ZZ CARTWRIGHT, LARTILLOT, AND THORNE—STUDYING ANCESTRAL LINEAGES 23

FIGURES

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

(a) Ancestral Lineage

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

(b) Ancestral Lineages Relating Three Species

FIGURE 1.

neutral case

p0 =1

2N

general case

p =2s

1− e−4Ns

Scaled fixation probability

P = 2Np =4Ns

1− e−4Ns =S

1− e−S

S = 4Ns: scaled selection coefficient

Nicolas Lartillot (Universite de Montréal) BCM2004: Molecular evolution March 20, 2012 31 / 38

Page 40: Bayesian models for molecular evolution Rates, rates and ...megasun.bch.umontreal.ca/People/lartillot/trash/... · Bayesian models for molecular evolution Rates, rates and traits

Results GC landscapes and biased genes conversion

Scaled fixation probability as a function of S = 4Ns

P = 2Np =S

1− e−S

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●

●●●●●●●●●

●●●●●●●●

●●●●●●●

●●●●●●

●●●●●●

●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

−6 −4 −2 0 2 4 6

01

23

45

6

SP

neutral case S = 0: P = 1deleterious S < 0: P < 1advantageous S > 0: P > 1

Nicolas Lartillot (Universite de Montréal) BCM2004: Molecular evolution March 20, 2012 32 / 38

Page 41: Bayesian models for molecular evolution Rates, rates and ...megasun.bch.umontreal.ca/People/lartillot/trash/... · Bayesian models for molecular evolution Rates, rates and traits

Neutral fixation probability in the presence of BGC

20ZZ CARTWRIGHT, LARTILLOT, AND THORNE—STUDYING ANCESTRAL LINEAGES 23

FIGURES

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

(a) Ancestral Lineage

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

(b) Ancestral Lineages Relating Three Species

FIGURE 1.

GC overtransmission

A

T

A

T

C

G

C

G

A

T

C

G

C

G

C

G

xGC =1 + b

2xAT =

1− b2

Scaled conversion coefficient B = 4Nbmutation from AT to GC

P =B

1− e−B > 1

mutation from GC to AT

P =−B

1− eB < 1

Page 42: Bayesian models for molecular evolution Rates, rates and ...megasun.bch.umontreal.ca/People/lartillot/trash/... · Bayesian models for molecular evolution Rates, rates and traits

Biased gene conversion explains variations of GC∗

Nicolas Lartillot (Universite de Montréal) BIN6009 10/05/2009 1 / 1

dS

dN/dS

maturity

longevity

mass

dS dN/dS mat. long. mass gc

gc

# chrom.

# chromosomes

red: positive

blue: negative

light shade: not significant

Negative correlation GC∗ / body sizelarger animals = smaller population = less efficient selectionalso less efficient BGC (lower GC∗)

Page 43: Bayesian models for molecular evolution Rates, rates and ...megasun.bch.umontreal.ca/People/lartillot/trash/... · Bayesian models for molecular evolution Rates, rates and traits

Biased gene conversion explains variations of GC∗

Nicolas Lartillot (Universite de Montréal) BIN6009 10/05/2009 1 / 1

dS

dN/dS

maturity

longevity

mass

dS dN/dS mat. long. mass gc

gc

# chrom.

# chromosomes

linkage maps are provisional, they are in good agreement with theregression line.

Both the regression line for all species in Fig. 1a (NC !0.85(NF/2) + 6.2; R2 ! 0.83; F < 2.91 ! 10"17, ! in Fig. 1a) andthe average NC per chromosome arm (1.06 and 1.16 in Dutrillaux(1986) and Burt and Bell (1987), respectively) suggest that there isa required minimum of one crossover per chromosome arm. If thisconclusion is true, then the minimum genetic size for chromosomearms is expected to be 50 cM. This prediction is supported by thedistribution of the genetic size of 286 chromosome arms shown inFig. 2. Few of the chromosome arms appear to be less than 50 cM,and this limit applies equally to arms from uni-armed and bi-armedchromosomes. We suggest that those chromosome arms with anapparent size of less than 50 cM result from failure to observerecombination distal to the most telomeric marker employed inmap construction.

We conclude that genome-wide recombination is directly pro-portional to the number of chromosome arms, as proposed byDutrillaux (1986). Our analysis indicates that there is a generalrequirement for at least one crossover per chromosome arm, ratherthan the classical expectation of one crossover per chromosome.Chiasma (the cytological manifestation of crossovers at meiosis I)are required to establish the physical attachment between ho-mologs that, in turn, is required to ensure that one member of eachpair of homologs segregates to opposite poles of the meioticspindle. The proportionality between recombination and the num-ber of chromosome arms and, most importantly, the observed dis-tribution of arm size in bi-armed chromosomes (Fig. 2a) indicatesthat the effect of crossovers on maintaining proper distributivesegregation is effectively suppressed at the centromere. Interest-

ingly, the centromere shows distinct patterns of repression of re-combination and interference in Drosophila, Neurospora, yeast,and human (Dobzhanky 1930; Hulten 1974; Davies et al. 1994;Mathani and Willard 1998, Kaback et al. 1999). This requirementestablishes important constraints on the minimum number ofcrossovers for any chromosome.

Perhaps the most extreme example of a requirement for onecrossover per arm is the pseudoautosomal region (PAR) of mam-malian X and Y Chromosomes (Chrs). The PAR undergoes pairingand recombination during male meiosis. Crossing-over within thePAR is critical for the fidelity of chromosome segregation (Rouyeret al. 1986; Soriano et al. 1987). This requirement exists despitethe limited physical size of this region and represents a strikingexample of independence between physical and genetic distances.In humans the location of the PARs is remarkable. The X Chr issubmetacentric with two PARs located in the terminal regions ofeach arm, and both regions undergo pairing and recombinationduring male meiosis.

Although we conclude that in mammals the majority of theinterspecific variation in mammalian recombination rate is owingto changes in FN, this conclusion does not affect the significanceof the observed correlation between EC and age to maturity re-ported by Burt and Bell (1987). We note, however, that theiranalysis was based on two assumptions. First, that only one cross-over in each bivalent is required to ensure distributive segregation.Second, crossovers in excess of one per bivalent (EC) are notrequired for fidelity of chromosome segregation but are a mecha-nism to increase variability in the progeny. Under this model, ECis expected to represent the amount of recombination in an organ-ism that is unrelated to the fidelity of chromosome segregation. In

Fig. 1. Plot of mammalian recombination frequency as a function of hap-loid number of chromosome arms (FN/2). a) Recombination estimated asnumber of chiasma (NC). Each circle represents a species: 1, Dasyuroidesbyrnei; 2, Dasyurus viverrinus; 3, Sarcophilus harrissi; 4, Smithopsis cras-sicaudata; 5, Paremeles gunnii; 6, Isoodon macrourus; 7, Dasypus novem-cinctus; 8, Oryctolagus cuniculus; 9, Cricetus cricetus; 10, Lagurus lagu-rus; 11, Meriones ungiculatus; 12, Apodemus sylvaticus; 13, Rattus nor-vegicus; 14, Mus musculus; 15, Cebuella pygmamaea; 16, Sanguinusoedipus; 17, Macaca fuscata; 18, Macaca mulatta; 19, Macaca nemes-trina; 20, Pan troglodytes; 21, Homo sapiens; 22, Homo sapiens; 23,Mandrillux sphinx; 24, Cebus capucinus; 25, Lemur fulvus; 26, Lemurfulvus collaris; 27, Lemur fulvus albocollaris; 28, Akodon arviculoides;29; Akodon arviculoides; 30, Akodon sp; 31, Akodon nigris; 32, Zygodon-tomys lasiurus; 33, Clyomys laticeps; 34; Nectomys squamipes; 35, Nec-tomys squamypes; 36, Oxymicterus sp; 37, Euryzygomalomys guiara; 38,Proechimys iheringi; 39, Mus musculus; 40, Mesocricetus auratus; 41,

Cricetulus griseus; 42, Gerbillus aureus; and 43, Taterillus gracillis. NCdata in species represented as black circles were calculated from the ECdata reported by Burt and Bell (1987) (NC ! EC + n, where n is thenumber of haploid autosomes) and the regression line is denoted as ". NCdata in species represented as gray circles are from Dutrillaux (1986), andthe regression line is denoted as #. The regression line for the combineddata is denoted as !. b) Recombination estimated as the size of the linkagemap. Each filled circle represent a species used for the regression analysis:1, cattle; 2, dog; 3, pig; 4, mouse; 5, rat; and 6, human. Open circles arespecies not included in the regression analysis (see text): 7, goat; 8, sheep;9, cat; 10, horse; and 11, baboon. The estimated size of the linkage map ofeach species is as given in the references in the text, except for the baboon,which we estimate to be 28.3 M instead of 23.7 M (Rogers et al. 2000) aftercorrecting for the fraction of the human genome covered by the markersused in their study.

F. Pardo-Manuel de Villena, C. Sapienza: Recombination and chromosome arms 319

de Villena and Sapienza, 2001

Positive correlation GC∗ / chromosome numberconversion bias proportional to recombination rate∼ 1 recombination event per chromosome arm per meiosisrecombination rate inversely proportional to chromosome sizestronger gene conversion bias in more fragmented karyotypes

Page 44: Bayesian models for molecular evolution Rates, rates and ...megasun.bch.umontreal.ca/People/lartillot/trash/... · Bayesian models for molecular evolution Rates, rates and traits

A history of biased gene conversion in placentals

reconstruction ofB = 4Ne b

B < 1: effectively neutral

B > 1: selective regime

B > 10: likely deleterious

VicugnaSusBosTursiopsMyotisPteropusEquusCanisFelisErinaceusSorexTupaiaTarsiusCallithrixMacacaPongoGorillaHomoPanMicrocebusOtolemurSpermophilusCaviaDipodomysMusRattusOchotonaOryctolagusEchinopsLoxodontaProcaviaCholoepusDasypus

0.14 3 5.8

VicugnaSusBosTursiopsMyotisPteropusEquusCanisFelisErinaceusSorexTupaiaTarsiusCallithrixMacacaPongoGorillaHomoPanMicrocebusOtolemurSpermophilusCaviaDipodomysMusRattusOchotonaOryctolagusEchinopsLoxodontaProcaviaCholoepusDasypus

0.064 1.9 3.7

VicugnaSusBosTursiopsMyotisPteropusEquusCanisFelisErinaceusSorexTupaiaTarsiusCallithrixMacacaPongoGorillaHomoPanMicrocebusOtolemurSpermophilusCaviaDipodomysMusRattusOchotonaOryctolagusEchinopsLoxodontaProcaviaCholoepusDasypus

0.06 2 3.9

VicugnaSusBosTursiopsMyotisPteropusEquusCanisFelisErinaceusSorexTupaiaTarsiusCallithrixMacacaPongoGorillaHomoPanMicrocebusOtolemurSpermophilusCaviaDipodomysMusRattusOchotonaOryctolagusEchinopsLoxodontaProcaviaCholoepusDasypus

0.13 2.6 5.1

VicugnaSusBosTursiopsMyotisPteropusEquusCanisFelisErinaceusSorexTupaiaTarsiusCallithrixMacacaPongoGorillaHomoPanMicrocebusOtolemurSpermophilusCaviaDipodomysMusRattusOchotonaOryctolagusEchinopsLoxodontaProcaviaCholoepusDasypus

0.071 2.4 4.7

VicugnaSusBosTursiopsMyotisPteropusEquusCanisFelisErinaceusSorexTupaiaTarsiusCallithrixMacacaPongoGorillaHomoPanMicrocebusOtolemurSpermophilusCaviaDipodomysMusRattusOchotonaOryctolagusEchinopsLoxodontaProcaviaCholoepusDasypus

0.14 2.8 5.4

BGC very weak (B ∼ 0.2) in anthropoids

BGC above the nearly neutral threshold (B > 1) in many taxa

in some lineages (microbats, insectivores) > 5% exons under B > 10

Page 45: Bayesian models for molecular evolution Rates, rates and ...megasun.bch.umontreal.ca/People/lartillot/trash/... · Bayesian models for molecular evolution Rates, rates and traits

GC and genome rearrangementsGC and genome rearrangements

tion rates of all types. Our use of only high-quality chimp basecalls should reduce this effect, but some residual overestimationmay exist in our data. Nonetheless, we observe in the substitu-tions specific to the chimpanzee lineage the same pattern ofUCBS as found in the human-specific substitutions in the humangenome, indicating that our findings are not specific to the hu-man lineage. In fact, there is a striking similarity between thehuman and chimp UBCS profiles along each chromosome, de-spite the fact that (by definition) no substitutions are sharedbetween the two genomes (Fig. 2). The human and chimpsmoothed UBCS values are highly correlated genome-wide(Spearman correlation: ! = 0.70, permutation P ! 0) and for eachchromosome (Supplemental Table S3). This correlation is muchlower (! = 0.44 genome-wide), though still significant, if the dis-tal 16 Mb of each chromosome arm are dropped from the analy-sis. Analyzing only every 100th window along the genome inorder to reduce possible effects of autocorrelation betweenadjacent regions does not change the correlation between

human and chimp UBCS nor its statistical significance. Rough(unsmoothed) UBCS values are less highly correlated but arestill significantly similar between the two species (Supple-mental Table S3). This suggests that the correlation betweenchimp and human UBCS is regional rather than fine-scale. Thus,the force that is responsible for creating biased clusters of sub-stitutions is clearly location-dependent and evolutionarilyconserved.

Bias absent from chromosomes X and Y

The human sex chromosomes do not share the consistent pat-tern of bias found on all the autosomes (Fig. 3). On human chro-mosome Y, UBCS varies randomly and does not reach significantlevels. This observation is consistent with the BGC model, whichpredicts no bias in the absence of recombination. Data for chim-panzee chromosome Y was insufficient for analysis of UBCS. Sur-prisingly, despite the presence of substantial recombination (al-

Figure 2. Patterns of substitution bias are nearly identical in human and chimp. (A) Unexpected biased clustered substitutions (UBCS) (faint line) forhuman chromosome 7 is above zero, indicating GC bias, along most (91.1%) of the chromosome and rises significantly at the distal ends. SmoothedUBCS (dark line) and 95% confidence band (yellow) are shown. The 95% confidence region is above the null expectation (zero) for more than half ofthe chromosome (61.8%). UBCS only exceeds the genome-wide 95% confidence interval (gray) near the telomeres. (B) Chimpanzee chromosome 7has a remarkably similar profile (Spearman correlation ! = 0.87). (C) The pattern of bias on chromosome 7 is mirrored on all autosomes (orderedsequentially; red, human; blue, chimp). Elevated UBCS near telomeres exceeds the human genome-wide 95% confidence interval (gray) on almost allautosomes. Here the chimpanzee sequence has been aligned to the human genome.

Dreszer et al.

1422 Genome Researchwww.genome.org

Dreszer et al, 2007

beit reduced relative to the autosomes) both the human andchimpanzee X chromosomes show almost no UBCS signal(Fig. 3). Overall, clusters of substitutions on chromosome X are infact biased strong-to-weak, as would be expected in the absenceof selective pressures, due to AT mutation bias. In fact, correla-tion between the smoothed UBCS curves in human and chimpis much weaker on chromosome X (! = 0.24) compared to theautosomes (! = 0.39 to 0.96; Supplemental Table S3).

Since the X and Y chromosomes recombine with each otherin the pseudo-autosomal regions (PAR), we specifically examinedthese regions for evidence of UBCS. In humans there are two PARregions. One may be too small (!320 kb) to detect any significantUBCS, and we have found none. The only significant level ofUBCS on chromosome X is found along the distal end of the parm peaking in the larger (2.6 Mb) PAR, but extending !5 Mbbeyond its current boundary (Fig. 3). If UBCS is a product ofrecombination, we might expect a dramatic peak in the PAR re-gion. We detect a mild elevation, but only in comparison withthe levels of UCBS in other regions of X, and not from a genome-wide perspective. This milder elevation may be due to the sameprocesses that render X and Y much less polymorphic (Interna-tional SNP Map Working Group 2001) and diverse (Hellborg andEllegren 2004; Baines and Harr 2007) than autosomes. Furtherinvestigation is needed to determine why the UCBS effect is notas dramatic in the PAR regions as it is in autosomal telomeres.

Bias correlated with male recombination rateIn order to further investigate genome-wide bias trends and tobetter understand this “X exception,” we computed correlationsbetween human UBCS and a variety of genome characteristicsusing a window size of 1 Mb. Male recombination rate (! = 0.38),recombination hot spots (! = 0.37), and G+C% (! = 0.33) showthe strongest associations with UBCS, all of which are statistically

significant (P " 0). This level of correlation is high, especiallygiven that UBCS has accumulated for 6 million years underchanging patterns of recombination, which should be only par-tially reflected in current rates (Kong et al. 2002). UBCS, G+C%,and recombination rates all rise substantially near the telomeresof each autosome. Computing the correlations without the last16 Mb of sequence from each chromosome arm reduces the as-sociations by !50%. Compared to the male rate, the female re-combination rate has a much weaker, though statistically sig-nificant, correlation with UBCS (! = 0.25). Both transcriptiondensity (! = 0.009) and conservation (! = !0.069) show littlerelation to UBCS. We also examined genome-wide correla-tions over a range of window sizes from 10 kb to 1 Mb. Results atfiner scales were similar to those for 1 Mb windows, and weobserved that the stronger correlations all increase with windowsize (Fig. 4). Repeating this analysis with smoothed UBCS valuesincreases the magnitude of the observed correlations but doesnot qualitatively alter our findings.

To further investigate these correlations, we fit linear regres-sion models on the data for 1-Mb windows. A multiple linearregression analysis that adjusts for the effects of other variablesindicates that recombination hot spots, G+C%, and male recom-bination rate all have significant positive associations withUBCS. These associations are reduced in magnitude, but remainsignificant, when distal 16-Mb regions are dropped from theanalysis. Conservation and female recombination rate showweak negative associations with UBCS in the model adjusting forother genomic variables. Furthermore, the female recombinationrate is not correlated with the residuals from a simple linear re-gression of the male recombination rate on UBCS, suggestingthat the female rate offers no additional explanatory power. Ifclusters of biased substitutions are a consequence of some processassociated primarily with recombination in the male germline,

Figure 3. Exceptions to genome-wide patterns of bias. (A) There is little evidence of UBCS on either human or chimp sex chromosomes. There is alsopoor agreement between the human and chimp smoothed UBCS curves (! = 0.25 on chrX). The pseudo-autosomal regions (PAR) are shown in green.(B) Human chromosome 2 shows an atypical central peak, which is likely due to the fusion of two ancestral chromosomes (red, human chr2; blue, chimpchr2a and chr2b). The two smoothed curves are in remarkable agreement (! = 0.84). The UBCS signal for hypothetical telomeres (if no fusion hadoccurred) was predicted (green) by using 16 Mb of distal sequence (yellow).

Biased clustered substitutions in the human genome

Genome Research 1423www.genome.org

chromosome fusion dated at 0.74 Myr in the human lineage.Dreszer et al, 2007

Page 46: Bayesian models for molecular evolution Rates, rates and ...megasun.bch.umontreal.ca/People/lartillot/trash/... · Bayesian models for molecular evolution Rates, rates and traits

Conclusions

Conclusions

Molecular datingmolecular clock: sequence divergence proportional to divergencetimesviolation of the molecular clock (variation of substitution rate)complex interplay between rates and timesrequires probabilistic model for teasing out rates and timeslife-history correlates of variation of ratealso variation of other aspects of the substitution process (dN/dS,GC, etc)

Nicolas Lartillot (Universite de Montréal) BCM2004: Molecular evolution March 20, 2012 38 / 38