Pennell defense-talk

Post on 16-Jul-2015

548 views 4 download

Transcript of Pennell defense-talk

Matthew Wesley PennellPhD Candidate - Bioinformatics and Computational BiologyInstitute for Bioinformatics and Evolutionary StudiesUniversity of Idaho

MODELS,MEANINGS, AND MACROEVOLUTION

How can statistical models help us understand the drivers of long-term

evolutionary change?

What we talk about when we talk about

MACROEVOLUTION

We know the ingredients ofevolutionary change within populations

Mutation

Selection

Drift

Gene flow

Mutation Selection

DriftGene flow

But how do these work together to

SHAPE DIVERSITY?

Simone Des Roches

Long term dynamics of evolutionary processes

MACROEVOLUTION

Peter Park

Time

Daniel Berner

Time

F(time)

Models for continuous traits

Brownian motion Ornstein-Uhlenbeck Early Burst

Random walkRandom walk with a

central tendencyEvolution is rapid early

& slows down over time

-∞

-∞

Θ

-∞t

Models for discrete traits

Mk Threshold

Transitions are instantaneous& occur at some constant rate

Character states are determinedby a continuous “liability”

0

1

q01

q10

10

GEIGER

Pennell et al. 2014 Bioinformatics

https://github.com/mwpennell/geiger-v2

What can learn we about evolution

FROM TRAIT MODELS

MacroevolutionaryDynamics

Populationprocesses

Statisticaldescriptors

MacroevolutionaryDynamics

Populationprocesses

Statisticaldescriptors

In order to make ANY interpretation of the model, we need to know if our model is actually explaining our data

1. Is the model capturing the variation in the data we have observed?

2.What about the data we haven’t?

1. Is the model capturing the variation in the data we have observed?

2. What about the data we haven’t?

1. Is the model capturing the variation in the data we have observed?

2. What about the data we haven’t?

R2=0.67 p=0.002 R2=0.67 p=0.002

R2=0.67 p=0.002R2=0.67 p=0.002

●●

●●

●●●

●●

●●

●●

●●

●●

●●

Is the model

APPROPRIATEand if not...

WHAT ARE WE MISSING

Linear regression models

Observation

Cook

’s dis

tance

● ● ● ● ● ● ● ●● ●

Linear regression models

Fitted values

Resid

uals

● ●

●●

Assessing the adequacy of

PHYLOGENETIC TRAIT MODELS

Establishing scope

Trait value

Univariate, quantitativetraits

Models that predict multivariate normal data

Fit a model to comparative data

Use fitted parameters to simulate data

Compare observed to simulated data

Old idea in statistics

θ

Pr(θ

|D)

Pr(D

|θ)

Parametric bootstrapping

Posterior predictive simulation

But new in comparative biology

θ

Pr(θ

|D)

Pr(D

|θ)

Parametric bootstrapping(Boettiger et al. 2012 Evolution)

Posterior predictive simulation(Slater and Pennell 2014 Sys Bio)

If we re-ran evolution, how likely are we to see a data set like ours

SIMILAR

DIFFERENT

Model is likely adequate

Model is likely inadequate

How similar is similarProblem: No two datasets are exactly alike

How similar is similarProblem: No two datasets are exactly alikeSolution: Use test statistics to summarize data in meaningful ways

How similar is similarProblem: No two datasets are exactly alikeSolution: Use test statistics to summarize data in meaningful ways

Problem: Species are not independent data points

How similar is similarProblem: No two datasets are exactly alikeSolution: Use test statistics to summarize data in meaningful ways

Problem: Species are not independent data pointsSolution: Calculate test statistics on contrasts rather than the data

A

B

C

Independent contrasts

Ci

Cj

n-1 contrasts for n tips

Under Brownian motionC ~ Normal(0, σ)

Felsenstein 1985 Am NatFelsenstein 1973 Am J Hum Gen

For non-Brownian models

Problem: Contrasts will no longer be normally distributed

For non-Brownian models

Problem: Contrasts will no longer be normally distributedSolution: Use model parameters to standardize branch lengths by theexpected (co)variance that will accumulate along them

For non-Brownian models

Problem: Contrasts will no longer be normally distributedSolution: Use model parameters to standardize branch lengths by theexpected (co)variance that will accumulate along them

Refer to rescaled tree as a unit tree

Test statistics

Slope of contrasts vs. ancestral

state

Slope of contrasts vs. expected

variances

Slope of contrasts vs. node height

Mean of squaredcontrasts

Coefficient of variation of contrasts

KS-Test for normality of

contrasts

Simulating datasets for comparison

Simulate a lot of new datasets on unit tree

Use a BM model with a rate of 1

Calculate test statistics on simulated dataset

Putting it all together...

1

TY TX 1

2

34

5

6

Fit model

TX

TY

Unit tree

Test statSim data

Test stat x m

Compare

BM

ARBUTUS

Pennell et al. 2015 Am Nat

https://github.com/mwpennell/arbutus

Cornwell et al. 2014 J Ecology

Lamiales

Solanales

GentianalesBoraginaceae

Garryal

es

Icacin

acea

e

Dipsac

ales

Paracry

phiac

eae

ApialesBr

unial

esAste

ralesEs

callo

niac

eae

Aqui

folia

les

Erica

les

Cornales

CaryophyllalesSantalales

Berberidopsidales

Malpighiales

OxalidalesCelastralesCucurbitales

Fagales

Rosales

Fabales

Zygophyllales

Brassicales

Malvales

Huerteales

Sapindales

Crosso

somata

les

Myrtale

sGer

anial

esVi

tales

Saxif

raga

lesDi

llenia

ceae

Gun

nera

les

Buxa

ceae

Prot

eale

sSa

biac

eae

Ranu

ncul

ales

Acor

acea

eAl

ism

atal

es selailiL

Asparagales

Poales

ArecacalesZingiberales

Commelinales

Dioscoreales

Pandanales

Magnoliales

Laurales

Piperales

Canellales

Chloranthaceae

Austrobaileyales

Nymphaeales

Pinales

Gnetales

Cycadales

Monilophyte

Arec.

Ast.Ast2.

Bras.

Cary.

Eric.

Fab.

Gymn.

Magn.

Mono.

Myrt.

Prot.

Rosid.

Leaf NSLAMax heightLeaf sizeSeed mass

Specific Leaf Area

Seed mass

Leaf Nitrogen Content

72 datasets (20 - 2,200 species)

226 datasets (20 - 22,817 species)

39 datasets (20 - 936 species)Kleyer et al. 2008 J Ecology

Kew Seed Information Database 2014

Wright et al. 2004 NatureZanne et al. 2014 Nature

Empirical analyses

1. Fit Brownian motion, Ornstein-Uhlenbeck and

Early Burst to each dataset

2. Calculate relative support using AIC

3. Assess adequacy of best-fitting model

Dataset

AIC

wei

ght Model

BMOUEB

Mode

l sup

port

(AIC)

Brownian motion

Ornstein-Uhlenbeck Early burst

Dataset (1 - 337)Pennell et al. 2015 Am Nat

Specific Leaf Area

Seed mass

Leaf Nitrogen Content

Model deviations detected in 32/72 datasets

Model deviations detected in 153/226 datasets

Model deviations detected in 19/39 datasets

Simple, commonly used models are often (woefully) inadequate

But we already knew that...

We are (often) here

●●

We are (often) here

●●

This is how we learn about biology

Learn about our data

Learn about our data

Phylogenetic error (topology and branch lengths)

Measurement error

Biologically interesting “outlier” species

Learn about evolutionary processes

●●

Time heterogeneous models

Different models for different parts of the tree

Biologically motivated models

Learn about evolutionary processes

Understanding how and why a model failscan provide new biological insights

1. Is the model capturing the variation in the data we have observed?

2. What about the data we haven’t?

True diversity

Sampled diversity

If missing data is non-randommodel parameters will be biased

HOW MANY SPECIES ARE WOODY

Willem van Aken

True diversity?

True diversity?Known species

316,000

True diversity?Known species

316,000 Trait data49,000

True diversity?Known species

316,000 Trait data49,000

Genetic data55,000

Sampling bias is

EVERYWHERE

Hinchliff and Smith 2014 PLoS ONE

Sampling bias in...

The groups we choose to study(Pennell, Sarver, and Harmon 2012 PLoS ONE)

And the traits we choose to measure(Uyeda, Caetano, and Pennell 2015 Sys Bio)

MISSING DATA HAS STRUCTURE

100% HERBACEOUS

100% WOODY

Gnangarra Willem van Aken

Microcoelia (Orchid family)

? ? ? ? ? ?

? ? ? ? ? ?

W ?H0 12 18 30

??

? ? ? ?

H H H H H H H H

H H H H

Strong PriorPr (All are ) = 1

Weak PriorPr (All are ) = 0.42

Pr ( At least 15 are ) = 0.90

?

?

Sampling withreplacement(Binomial)

?

Sampling without replacement

(Hypergeometric)

H

H

H

Distribution of woodiness bimodal

791 genera with records for >10 species

411W H

271

HW

58

Prob

abilit

y den

sity

Percentage of woody species per genus0 10020 40 60 80

Strong priorWeak prior

Global proportion of woody species

Prob

abilit

y den

sity

46 4844

Strong priorWeak prior

Global proportion of woody species

Prob

abilit

y den

sity

46 4844

Strong priorWeak prior

Taking the dataset at face value: 59% woody

WoodyHerb

MonilophytesGymnospermsBasal AngiospermsMonocotsEudicots

FitzJohn, Pennell, et al. 2014 J Ecology

Can use estimated sampling proportionsin model-based analyses

So we have a good model and haveincorporated sampling error...

WHAT CAN WE SAY?

MacroevolutionaryDynamics

Populationprocesses

Statisticaldescriptors

MacroevolutionaryDynamics

Populationprocesses

Statisticaldescriptors

Strict pop gen interpretation

Δz = σdW

Δz = 2VM

Brownian Motion

Mutation-Drift Equilibrium

Hansen and Martins 1996 Evolution

Quantitative genetics interpretation

Δz = σdW

Δz = 2VM

Brownian Motion

Mutation-Drift Equilibrium

Hansen and Martins 1996 Evolution

Rate

Diffusion process

Mutational variance

Change intrait mean

Lynch and Hill 1986 Evolution

By fitting alternative models we canevaluate the effects of microevolutionary

processes over long time scales

But such intuitive interpretations are

LIKELY NAÏVE

Micro to MacroUse population estimates to predict divergence over long time scales

Macro to Micro Use phylogenetic models to estimate

population level parameters

Micro to Macro

Hansen 2012 Book ChapterEstes and Arnold 2007 Am Nat

Use population estimates to predict divergence over long time scales

Macro to Micro Use phylogenetic models to estimate

population level parametersLynch 1990 Am Nat

THE NUMBERS DON’T ADD UP!

MacroevolutionaryDynamics

Populationprocesses

Statisticaldescriptors

Macroevolutionary models may reflectdynamics of adaptive landscapes

rather thanevolution along an adaptive landscape

Pennell et al. 2014 TREEPennell and Harmon 2013 NYAS

Pennell 2015 Sys Bio

Simpson 1944 Tempo and Mode

Dynamics of adaptive landscapes

Adaptive radiation

Adaptive zones

Red Queen (Van Valen)

Escape and radiate

Punctuated equilibrium

Diversity dependence

Key innovations

Ephemeral divergence

Adaptive radiation

Adaptive zones

Red Queen (Van Valen)

Escape and radiate

Punctuated equilibrium

Diversity dependence

Key innovations

Ephemeral divergence

Punctuated equilibrium

Eldredge and Gould 1972 Gould and Eldredge 1977 Paleobiology

Time

Morphology

What about punctuated equilibrium?

Eldredge and Gould 1972 Gould and Eldredge 1977 Paleobio

Time

Morphology

This is confusing (to everyone)

Is evolution gradual or pulsed?

Is trait evolution(mainly) associated

with speciation?

Is evolution duringspeciation adaptive

or neutral?

Does species selectiondrive evolutionary

trends?

Is evolution gradual or pulsed?

Is trait evolution(mainly) associated

with speciation?

Is evolution duringspeciation adaptive

or neutral?

Does species selectiondrive evolutionary

trends?

Is evolution gradual or pulsed?

Is trait evolution(mainly) associated

with speciation?

Is evolution duringspeciation adaptive

or neutral?

Does species selectiondrive evolutionary

trends?

Each can be tested with a specificmacroevolutionary model

MacroevolutionaryDynamics

Populationprocesses

Statisticaldescriptors

Nothing.

Models purely phenomenologicalCapture patterns not processes

A case study:

EVOLUTION OF KARYOTYPES

To a geneticist many of the comparisons (i.e., between karyotypes of different species) will seem of little significance, because to him [sic] it is not the shapes and sizes of chromosomes which are important, but the genes contained in them.

T. H. Morgan et al. 1925

Physical linkage keeps genes together

Genetic material lost/gained whenmutations change chromosome number/form

Structural changes may be involved inadaptation and speciation

New chromosomes arise from

Duplications (including polyploidy)

Fissions - chromosome breaks into two

Fusions - two chromosomes come together

New chromosomes arise from

Duplications (including polyploidy)

Fissions - chromosome breaks into two

Fusions - two chromosomes come together

XX XY

ZZ ZW

Sex chromosomes are natural

EVOLUTION EXPERIMENTS

Males

Hemizygous Homozygous

Females W

Y

X

Z

X

A

Y

A

X1

X1

X2Y X

Y1

Y2

Y-A X-A

Pennell et al. in press PLoS Genetics

FISHES

SQUAMATEREPTILES

Y-A fusionTotal XY 109

423

120400

3802

24024 Pennell et al. in press PLoS Genetics

Data from Tree of Sex Consortium 2014

X-A fusion

W-A fusionTotal ZW

Z-A fusion

Y-A fusionTotal XY 109

423

120400

3802

24024 Pennell et al. in press PLoS Genetics

Data from Tree of Sex Consortium 2014

X-A fusion

W-A fusionTotal ZW

Z-A fusion

Both highly significant (Fisher’s exact test)

Xiphophorus

GambusiaPoecilia

MegupsilonGarmanellaFundulusAllodontich

thys

Ilyodo

nAphyo

semion

&

Chromap

hyos

emion

Nothob

ranch

ius

Aploc

heilu

s

Pter

olebia

s

Oryz

iasLepa

doga

ster

Ore

ochr

omis

Saro

ther

odon

Pseu

docr

enila

brus

Sata

nope

rca

Geo

phag

usBo

thus

Para

licht

hysM

icrochirusTetrapturus

Mastacem

belus

Trichogaster

Rhinecanthus

Odonus

Stephanolepis

Takifugu

Arothron

Scatophagus

Lutjanus

Dicentrarchus

Parapercis

Pomoxis

Chionodraco

Chaenodraco

Pagetopsis

Pagothenia

TrematomusZingel

ArctoscopusCottusPungitius

Gobius,Neogobius,

& ProterorhinusBoleophthalmusAwaousSynechogobiusCtenogobius

DormitatorEleotris

Callionymus

MelamphaesScopelogadus

BeryxZeus

Synodus &

Trachinocephalus

Saurida

Stenobrachius

Scopelengys

Oncorhynchus

Salmo

Salvelinus

Hucho

Corego

nus

Bathyla

gus

Leuro

gloss

usArge

ntina

Syno

donti

s

Clar

iasNe

tum

aPi

mel

odel

laIm

parfi

nis

Om

pok

Mys

tus

His

onot

usPs

eudo

toci

nclu

sH

ypos

tom

usH

arttiaLoricariichthys

Leporinus

CharacidiumTriportheus

ThoracocharaxBrachyhypopomus

GymnotusVimba

Scardinius

Leuciscus,

Gnathopogon,

& Ctenopharyngodon

Barilius

Carassius

Cyprinus

Garra

Barbonymus

Cobitis

Lepidocephalichthys

Brevoortia

Anguilla

Conger

Gymnothorax

Osteoglossum

BrienomyrusAcipenser

XYZW

YA/WAXA/ZA

Pennell et al. in press PLoS Genetics

XYZW

YA/WAXA/ZA

EnhydrinaDisteira

Hydrophis &Pelamis

AipysurusEmydocephalusHemiaspisTropidechisNotechisHoplocephalusAustrelapsDrysdaliaPseudonajaOxyuranus

DenisoniaRhinoplocephalus

Elapognathus

SutaCacophisPseudechisAcan

thoph

is

Simos

elaps

FurinaDem

ansia

Latic

auda

Bung

arus

NajaDe

ndro

aspis

Micr

urus

Ger

arda

Cerb

erus

Clel

ia &

Pseu

dobo

a

Oxy

rhop

us

Trop

idod

ryas

Tham

nody

nast

es

Tom

odon

Philo

drya

s

Wag

lero

phis

Xeno

don

Liop

his

Hydrom

orphusG

eophisN

atrixStoreria

Thamnophis

Sinonatrix

Amphiesm

a,

Xenochrophis,

Rhabdophis,

& Macropisthodon

Drymarchon

Chironius,

Spilotes,

& Mastigodryas

Elaphe

Bogertophis

Dinodon

LycodonPtyas

Boiga

Chrysopelea

Dendrelaphis

AhaetullaCrotalusAgkistrodonBothriechisLachesis

BothropsCerrophidionPorthidiumAtropoidesVipera

DaboiaMacroviperaEchisSanziniaAcrantophisBoa

MoreliaLiasis

Sceloporus

UtaUm

a

Anol

is

Pris

tidac

tylu

sPh

ymat

urus

Polychrus

TropidurusPogonaPhrynocephalus

Varanus

LacertaTim

onPodarcis

DarevskiaAlgyroides

TakydromusEremias

OphisopsAcanthodactylus

MesalinaPedioplanis

Meroles

Heliobolus

Psammodromus

Gallotia

Calyptommatus

Nothobachia

Gymnophthalmus

Micrablepharus

Cnemidophorus

Aspidoscelis

Pseudemoia

Bassiana

Cyclodina

Saproscincus

Lampropholis

MabuyaScincella

Gekko

Lepidodactylus

Heteronotia

GehyraHemidactylusChristinus

Phyllodactylus

GonatodesDelmaLialisDibamus

Pennell et al. in press PLoS Genetics

XYFused

XYFused

XY

ZW

ZWFused

XYFused

C

A

T

G

XYFused

XYFused

XY

ZW

ZWFused

Difference in fusion rate between Y and other sex chromosomes0 00.01 0.02 0.03 0.05 0.1 0.15

Prob

abilit

y den

sity

Pennell et al. in press PLoS Genetics

Y chromosomes fuse with autosomesmore frequently that X, W, or Z

How can this help us understand

EVOLUTIONARY PROCESSES?

Y W Z

Male biasedmutationBatemangradient

All elseequal

≈X fusions >X fusions <X fusions

Neutral case

Y W Z

Male biasedmutationBatemangradient

All elseequal

Neutral case

Y W Z

Male biasedmutationBatemangradient

All elseequalNO

NO

NO

Direct fitness effects

X

A

Y

A

X1

X2Y

Causes expression changes near breakpoints

Direct fitness effects (deleterious)

Y W Z

Male biasedmutationBatemangradient

All elseequal

Direct fitness effects (deleterious)

Y W Z

Male biasedmutationBatemangradient

All elseequalNO

YES

YES

Sexually antagonistic selectionA

Fitne

ss

Males Females

If A fuses with Y, allele will only be found in males(assuming no recombination between X and Y)

Y W Z

Male biasedmutationBatemangradient

All elseequal

Sexually antagonistic selection

Y W Z

Male biasedmutationBatemangradient

All elseequal

Sexually antagonistic selection

NO

YES

NO

Most scenarios inconsistent with excess of

Y-A fusions

Fusions deleterious + male-biased mutation

Fusions deleterious + Bateman gradient

Fusions driven by sexually antagonistic selection + male-biased mutation

(requires very high male-biased mutation rate)

Phylogenetic models used not mechanistic

But model fits can ground truth theoretical analyses

MacroevolutionaryDynamics

Populationprocesses

Statisticaldescriptors

MacroevolutionaryDynamics

Populationprocesses

Statisticaldescriptors

Software forfitting models

MacroevolutionaryDynamics

Populationprocesses

Statisticaldescriptors

Assessing modeladequacy

Software forfitting models

MacroevolutionaryDynamics

Populationprocesses

Statisticaldescriptors

Assessing modeladequacy

Incorporating sampling bias

Software forfitting models

MacroevolutionaryDynamics

Populationprocesses

Statisticaldescriptors

Assessing modeladequacy

Incorporating sampling bias

Punctuated equilibrium

Software forfitting models

MacroevolutionaryDynamics

Populationprocesses

Statisticaldescriptors

Software forfitting models

Assessing modeladequacy

Incorporating sampling bias

Punctuated equilibrium Chromosome fusions

Because I can’t eat phylogeniesInstitute for Bioinform

atics & Evolution

ary Studies

Luke Harmon

Jack SullivanScott NuismerPaul JoyceArne Mooers

David TankLarry Forney

Rich FitzJohnJosef UyedaJon EastmanDavid Bapst

Michael AlfaroSteve ArnoldFrank BurbrinkWill CornwellBernie CrespiJoe FelsensteinDavid GreenPaul Harnik

Mark KirkpatrickCraig MillerBrian O’MearaErica Bree RosenblumCarl SimpsonGraham SlaterDavid SwoffordAmy Zanne

Joseph BrownDaniel CaetanoSimone Des RochesTravis HageyKayla HardwickDenim Jochimsen

Suzanne JonesonRafael MaiaEliot MillerTom PoortenJames RosindellJamie Voyles

Simon Uribe-ConversTyler HetherBrice Sarver

All y’all

Institute for Bioinform

atics & Evolution

ary Studies

Lisha AbendrothEva Top

Roxana Hickey

My family