Taming the Beast Workshop Priors and starting values...Taming the Beast Priors and starting values...

23
Taming the Beast Priors and starting values Priors Prior distribution Tree prior Substitution model prior Clock prior Parameter prior Think twice Starting values References Taming the Beast Workshop Priors and starting values Veronika Boˇ skov´ a & Chi Zhang June 28, 2016 1 / 21

Transcript of Taming the Beast Workshop Priors and starting values...Taming the Beast Priors and starting values...

Page 1: Taming the Beast Workshop Priors and starting values...Taming the Beast Priors and starting values Priors Prior distribution Tree prior Substitution model prior Clock prior Parameter

Taming the Beast

Priors and startingvalues

Priors

Prior distribution

Tree prior

Substitution model prior

Clock prior

Parameter prior

Think twice

Starting values

References

Taming the Beast Workshop

Priors and starting values

Veronika Boskova & Chi Zhang

June 28, 2016

1 / 21

Page 2: Taming the Beast Workshop Priors and starting values...Taming the Beast Priors and starting values Priors Prior distribution Tree prior Substitution model prior Clock prior Parameter

Taming the Beast

Priors and startingvalues

Priors

Prior distribution

Tree prior

Substitution model prior

Clock prior

Parameter prior

Think twice

Starting values

References

What is a prior?

I Distribution of a parameter before the data is collected and

analysed

I as opposed to POSTERIOR distribution which combines the

information from the prior and the data

2 / 21

Page 3: Taming the Beast Workshop Priors and starting values...Taming the Beast Priors and starting values Priors Prior distribution Tree prior Substitution model prior Clock prior Parameter

Taming the Beast

Priors and startingvalues

Priors

Prior distribution

Tree prior

Substitution model prior

Clock prior

Parameter prior

Think twice

Starting values

References

What is a prior?

I Using Bayes theorem, we can decompose the posterior:

P( | )=P( | )P( | )P( )P( )P( )P( )

geneticsequences

genealogy demographicmodel

substitutionmodel

molecular clockmodel

ACAC...TCAC...ACAG...

ACAC...TCAC...ACAG...

ACAC...TCAC...ACAG...

ACAC...TCAC...ACAG...

Figure adapted from [du Plessis and Stadler, 2015]

3 / 21

Page 4: Taming the Beast Workshop Priors and starting values...Taming the Beast Priors and starting values Priors Prior distribution Tree prior Substitution model prior Clock prior Parameter

Taming the Beast

Priors and startingvalues

Priors

Prior distribution

Tree prior

Substitution model prior

Clock prior

Parameter prior

Think twice

Starting values

References

What is a prior?

I Using Bayes theorem, we can decompose the posterior:

P( | )=P( | )P( | )P( )P( )P( )P( )

geneticsequences

genealogy demographicmodel

substitutionmodel

molecular clockmodel

ACAC...TCAC...ACAG...

ACAC...TCAC...ACAG...

ACAC...TCAC...ACAG...

ACAC...TCAC...ACAG...

Prior information

Figure adapted from [du Plessis and Stadler, 2015]

3 / 21

Page 5: Taming the Beast Workshop Priors and starting values...Taming the Beast Priors and starting values Priors Prior distribution Tree prior Substitution model prior Clock prior Parameter

Taming the Beast

Priors and startingvalues

Priors

Prior distribution

Tree prior

Substitution model prior

Clock prior

Parameter prior

Think twice

Starting values

References

What is a prior?

I Using Bayes theorem, we can decompose the posterior:

P( | )=P( | )P( | )P( )P( )P( )P( )

geneticsequences

genealogy demographicmodel

substitutionmodel

molecular clockmodel

ACAC...TCAC...ACAG...

ACAC...TCAC...ACAG...

ACAC...TCAC...ACAG...

ACAC...TCAC...ACAG...

Tree prior Prior information

Figure adapted from [du Plessis and Stadler, 2015]

3 / 21

Page 6: Taming the Beast Workshop Priors and starting values...Taming the Beast Priors and starting values Priors Prior distribution Tree prior Substitution model prior Clock prior Parameter

Taming the Beast

Priors and startingvalues

Priors

Prior distribution

Tree prior

Substitution model prior

Clock prior

Parameter prior

Think twice

Starting values

References

Prior

I Allows us to include any information we have on the process,before looking at the data

I Do not be afraid of using it in the inference

I Prior distribution does not have to, and is not expected to,

be exactly the same as the posterior

4 / 21

Page 7: Taming the Beast Workshop Priors and starting values...Taming the Beast Priors and starting values Priors Prior distribution Tree prior Substitution model prior Clock prior Parameter

Taming the Beast

Priors and startingvalues

Priors

Prior distribution

Tree prior

Substitution model prior

Clock prior

Parameter prior

Think twice

Starting values

References

Prior

I Should not be and is not universal for all the analyses you

will ever do in your research

I Should incorporate prior (before looking at the data)knowledge about the parameter/underlying process

I use results of previous independent experimentsI use other independent evidence

I Should not be too restrictive if prior knowledge/assumptionsare weak

I One can use diffuse priors

I May not be adjusted after the run, to give higher and higher

posterior support

5 / 21

Page 8: Taming the Beast Workshop Priors and starting values...Taming the Beast Priors and starting values Priors Prior distribution Tree prior Substitution model prior Clock prior Parameter

Taming the Beast

Priors and startingvalues

Priors

Prior distribution

Tree prior

Substitution model prior

Clock prior

Parameter prior

Think twice

Starting values

References

Prior

I Is a choice ofI model

I tree-generating models, nucleotide/AA/codon substitution

models, ...

and ofI distribution of plausible values for a parameter of interest

I Uniform, Normal, Beta,...

6 / 21

Page 9: Taming the Beast Workshop Priors and starting values...Taming the Beast Priors and starting values Priors Prior distribution Tree prior Substitution model prior Clock prior Parameter

Taming the Beast

Priors and startingvalues

Priors

Prior distribution

Tree prior

Substitution model prior

Clock prior

Parameter prior

Think twice

Starting values

References

Tree prior (tree-generating model)

I Have to pick one from Coalescent or Birth-death process

framework

I Have to put priors on parameters of the chosen modelI e.g. growth-rate of the population, R0, extinction rate, ...

7 / 21

Page 10: Taming the Beast Workshop Priors and starting values...Taming the Beast Priors and starting values Priors Prior distribution Tree prior Substitution model prior Clock prior Parameter

Taming the Beast

Priors and startingvalues

Priors

Prior distribution

Tree prior

Substitution model prior

Clock prior

Parameter prior

Think twice

Starting values

References

Substitution model prior

I The selection is big: JC69, HKY85, ..., GTR

I Use model which has been previously identified to be best foryour type of data

I e.g. HKY85I Prior for transition/transversion rate ratio (κ)I Prior for base frequencies

I To choose the best modelI Use model comparison to choose the one best fitting the

dataI Use rjMCMC directly in BEAST2 to sample from the

posterior distribution including different substitution models.

The model where rjMCMC spends the most time (samples

the most from), is the best fitting model.

8 / 21

Page 11: Taming the Beast Workshop Priors and starting values...Taming the Beast Priors and starting values Priors Prior distribution Tree prior Substitution model prior Clock prior Parameter

Taming the Beast

Priors and startingvalues

Priors

Prior distribution

Tree prior

Substitution model prior

Clock prior

Parameter prior

Think twice

Starting values

References

Clock prior (molecular clock model)

I Strict clock: all branches have the same clock rate

I Relaxed clockI Uncorrelated: branches have independent clock rate

distributionsI Correlated: child branch has clock rate distribution

correlated to distribution of the parent branch

9 / 21

Page 12: Taming the Beast Workshop Priors and starting values...Taming the Beast Priors and starting values Priors Prior distribution Tree prior Substitution model prior Clock prior Parameter

Taming the Beast

Priors and startingvalues

Priors

Prior distribution

Tree prior

Substitution model prior

Clock prior

Parameter prior

Think twice

Starting values

References

Parameter prior

I Can be fixed to a given value

(though this is generally not recommended)

I Can have upper and lower limitsI If we know that any infected individual recovers after 5-10

days, we can set the distribution of infectious period to be

e.g. min 4 days and max 11 days

I If specified by a parametric distribution, the parameters of

this distribution can also be assigned a prior (hyperprior)

I You can visualise the distribution in BEAUti

10 / 21

Page 13: Taming the Beast Workshop Priors and starting values...Taming the Beast Priors and starting values Priors Prior distribution Tree prior Substitution model prior Clock prior Parameter

Taming the Beast

Priors and startingvalues

Priors

Prior distribution

Tree prior

Substitution model prior

Clock prior

Parameter prior

Think twice

Starting values

References

Examples - Normal distribution

-0.4 -0.2 0.0 0.2 0.4

01

23

45

PDF

µ=0, σ=0.5µ=0.2, σ=0.2µ=0, σ=0.1µ=0, σ=0.2

I Parameters: mean µ ∈ R, standard deviation σ > 0

I Range of values: (-∞,∞)

11 / 21

Page 14: Taming the Beast Workshop Priors and starting values...Taming the Beast Priors and starting values Priors Prior distribution Tree prior Substitution model prior Clock prior Parameter

Taming the Beast

Priors and startingvalues

Priors

Prior distribution

Tree prior

Substitution model prior

Clock prior

Parameter prior

Think twice

Starting values

References

Examples - LogNormal distribution

0.0 0.5 1.0 1.5 2.0 2.5 3.0

0.0

0.2

0.4

0.6

0.8

1.0

PDF

M=0, S=1M=0, S=0.5M=2, S=1M=1, S=0.75

I Parameters: mean M ∈ R, standard deviation S > 0

I Range of values: [0,∞)

I Long tail, always positive

12 / 21

Page 15: Taming the Beast Workshop Priors and starting values...Taming the Beast Priors and starting values Priors Prior distribution Tree prior Substitution model prior Clock prior Parameter

Taming the Beast

Priors and startingvalues

Priors

Prior distribution

Tree prior

Substitution model prior

Clock prior

Parameter prior

Think twice

Starting values

References

Examples - Beta distribution

0.0 0.2 0.4 0.6 0.8 1.0

01

23

45

PDF

α=0.5, β=0.5α=2, β=2α=2, β=5α=5, β=1

I Parameters: shape α > 0, shape β > 0

I Range of values: [0,1]

I Good for e.g. sampling probability prior

13 / 21

Page 16: Taming the Beast Workshop Priors and starting values...Taming the Beast Priors and starting values Priors Prior distribution Tree prior Substitution model prior Clock prior Parameter

Taming the Beast

Priors and startingvalues

Priors

Prior distribution

Tree prior

Substitution model prior

Clock prior

Parameter prior

Think twice

Starting values

References

Examples - Uniform distribution

-2 -1 0 1 2

0.0

0.2

0.4

0.6

0.8

1.0

PDF

l=-0.5, u=0.5l=0, u=1.7l=-1, u=1l=-1.5, u=1.5

I Parameters: lower, upper bound

I Range of values: (-∞,∞)

14 / 21

Page 17: Taming the Beast Workshop Priors and starting values...Taming the Beast Priors and starting values Priors Prior distribution Tree prior Substitution model prior Clock prior Parameter

Taming the Beast

Priors and startingvalues

Priors

Prior distribution

Tree prior

Substitution model prior

Clock prior

Parameter prior

Think twice

Starting values

References

Is uniform distribution a non-informative prior?

I Not reallyI Imagine setting a Uniform(0, 100) prior for the

transition/transversion rate ratio (κ). You also know that

the most likely values for κ are between 0 and 10. But you

now put 9/10 of the weight to values > 10.

f(κ)

κ0 10 20 30 40 50 60 70 80 90

9/10 of all weight

I In fact there is nothing such as an non-informative prior

I If little or no information on the parameter is available, use

diffuse priors

I Try to avoid Uniform(-∞, ∞) or Uniform(0, ∞)

15 / 21

Page 18: Taming the Beast Workshop Priors and starting values...Taming the Beast Priors and starting values Priors Prior distribution Tree prior Substitution model prior Clock prior Parameter

Taming the Beast

Priors and startingvalues

Priors

Prior distribution

Tree prior

Substitution model prior

Clock prior

Parameter prior

Think twice

Starting values

References

Proper vs improper priors

I Sometimes the prior distribution is such that the sum or the

integral of the prior values does not converge, this is called

an IMPROPER prior

I ExamplesI 1/xI Uniform(−∞,∞)

16 / 21

Page 19: Taming the Beast Workshop Priors and starting values...Taming the Beast Priors and starting values Priors Prior distribution Tree prior Substitution model prior Clock prior Parameter

Taming the Beast

Priors and startingvalues

Priors

Prior distribution

Tree prior

Substitution model prior

Clock prior

Parameter prior

Think twice

Starting values

References

Are my priors what I set them to be?

I Not alwaysI Induced priors may change the picture, i.e. if the parameters

interact, the marginal prior distribution for each individual

parameter may be different from the originally specified prior

I Use sampling from the prior, to see what your ’real’ prior is

MyearsMyears

Den

sity

Myears

Den

sity

Figure adapted from [Heled and Drummond, 2012]

The marginal prior distributions that result from the multiplicative

construction (gray) versus calibration densities (black line) specified for

the calibrated nodes.

17 / 21

Page 20: Taming the Beast Workshop Priors and starting values...Taming the Beast Priors and starting values Priors Prior distribution Tree prior Substitution model prior Clock prior Parameter

Taming the Beast

Priors and startingvalues

Priors

Prior distribution

Tree prior

Substitution model prior

Clock prior

Parameter prior

Think twice

Starting values

References

How to choose priors?

I Use all the prior knowledge you have to choose models and

set appropriate parameter priors

I Sample from the prior distribution before using your data to

check you really have the priors you want

I Check your posterior distribution against the prior

18 / 21

Page 21: Taming the Beast Workshop Priors and starting values...Taming the Beast Priors and starting values Priors Prior distribution Tree prior Substitution model prior Clock prior Parameter

Taming the Beast

Priors and startingvalues

Priors

Prior distribution

Tree prior

Substitution model prior

Clock prior

Parameter prior

Think twice

Starting values

References

Word of caution

I In practice, it is important to evaluate the impact of the

prior on the posterior in a Bayesian robustness analysis

I Ideally, the posterior should be dominated by your data, such

that the choice of the prior has little influence on the result

I If this is not the case, the choice of prior is very important,

and should be reported

19 / 21

Page 22: Taming the Beast Workshop Priors and starting values...Taming the Beast Priors and starting values Priors Prior distribution Tree prior Substitution model prior Clock prior Parameter

Taming the Beast

Priors and startingvalues

Priors

Prior distribution

Tree prior

Substitution model prior

Clock prior

Parameter prior

Think twice

Starting values

References

Starting values

I Are just starting values

I Have to be within the prior distribution, and its upper and

lower limits, you chose for the parameter

I Use your best guessI BEAST2 attempts 10 times at most (can be changed) to

initialize the run, but if the starting values are unreasonable,

the runs may keep failing

I Start from different starting values to make sure the chains

converge to the same distribution

20 / 21

Page 23: Taming the Beast Workshop Priors and starting values...Taming the Beast Priors and starting values Priors Prior distribution Tree prior Substitution model prior Clock prior Parameter

Taming the Beast

Priors and startingvalues

Priors

Prior distribution

Tree prior

Substitution model prior

Clock prior

Parameter prior

Think twice

Starting values

References

References I

- du Plessis, L. and Stadler, T. (2015). Getting to the root of epidemic spread with phylodynamic analysis of genomicdata. Trends in microbiology, 23(7):383–386.

- Heled, J. and Drummond, A. J. (2012). Calibrated tree priors for relaxed phylogenetics and divergence time estimation.Systematic Biology, 61(1):138–149.

21 / 21