Bayesian Model Selection for Multiple QTLpages.stat.wisc.edu/~yandell/talk/jax/2005/jax2005.pdf ·...

33
October 2005 Jax Workshop © Brian S. Yandell 1 Bayesian Model Selection for Multiple QTL Brian S. Yandell University of Wisconsin-Madison www.stat.wisc.edu/~yandell/statgenJackson Laboratory, October 2005 October 2005 Jax Workshop © Brian S. Yandell 2 outline 1. what is the goal of QTL study? 2. Bayesian priors & posteriors 3. model search using MCMC Gibbs sampler and Metropolis-Hastings 4. model assessment Bayes factors & model averaging 5. data examples in detail plants & animals

Transcript of Bayesian Model Selection for Multiple QTLpages.stat.wisc.edu/~yandell/talk/jax/2005/jax2005.pdf ·...

Page 1: Bayesian Model Selection for Multiple QTLpages.stat.wisc.edu/~yandell/talk/jax/2005/jax2005.pdf · 2005-10-07 · • conjugate prior is inverse chi-square ... – epistasis between

October 2005 Jax Workshop © Brian S. Yandell 1

Bayesian Model Selectionfor Multiple QTL

Brian S. YandellUniversity of Wisconsin-Madison

www.stat.wisc.edu/~yandell/statgen↑Jackson Laboratory, October 2005

October 2005 Jax Workshop © Brian S. Yandell 2

outline1. what is the goal of QTL study?2. Bayesian priors & posteriors3. model search using MCMC

• Gibbs sampler and Metropolis-Hastings4. model assessment

• Bayes factors & model averaging5. data examples in detail

• plants & animals

Page 2: Bayesian Model Selection for Multiple QTLpages.stat.wisc.edu/~yandell/talk/jax/2005/jax2005.pdf · 2005-10-07 · • conjugate prior is inverse chi-square ... – epistasis between

October 2005 Jax Workshop © Brian S. Yandell 3

0 5 10 15 20 25 300

12

3

rank order of QTL

addi

tive

effe

ct

Pareto diagram of QTL effects

54

3

2

1

major QTL onlinkage map

majorQTL

minorQTL

polygenes

1. what is the goal of QTL study?

October 2005 Jax Workshop © Brian S. Yandell 4

interval mapping basics• observed measurements

– y = phenotypic trait– m = markers & linkage map– i = individual index (1,…,n)

• missing data– missing marker data– q = QT genotypes

• alleles QQ, Qq, or qq at locus• unknown quantities

– λ = QT locus (or loci)– µ = phenotype model parameters– H = QTL model/genetic architecture

• pr(q|m,λ,H) genotype model– grounded by linkage map, experimental cross– recombination yields multinomial for q given m

• pr(y|q,µ,H) phenotype model– distribution shape (assumed normal here) – unknown parameters µ (could be non-parametric)

observed X Y

missing Q

unknown λ θafter

Sen Churchill (2001)

y

λ

q

µ

m

Page 3: Bayesian Model Selection for Multiple QTLpages.stat.wisc.edu/~yandell/talk/jax/2005/jax2005.pdf · 2005-10-07 · • conjugate prior is inverse chi-square ... – epistasis between

October 2005 Jax Workshop © Brian S. Yandell 5

2. Bayesian priors & posteriors• augment data (y,m) with missing genotypes q• study model parameters (µ,λ) given data (y,m,q)

– properties of posterior pr(µ,λ,q | y,m )– average posterior over missing genotypes q

• pr(µ,λ | y,m ) = sumq posterior• sample from posterior in some clever way

– multiple imputation (Sen Churchill 2002)– Markov chain Monte Carlo (MCMC)

posterior = likelihood * prior / constant

)|(pr)|(pr)(pr),|(pr),|(pr),|,,(pr

mymmqqymyq λµλµλµ =

October 2005 Jax Workshop © Brian S. Yandell 6

Bayes posterior vs. maximum likelihood

• classical approach maximizes likelihood• Bayesian posterior averages over other parameters

=

+=

+=

qmqqymy

Cdmym

cmy

),|(pr),|(pr),,|(pr

})(pr),,|(pr)|(pr{log)(LPD

)},,|(pr{maxlog)(LOD

10

10

λµλµ

µµλµλλ

λµλ µ

Page 4: Bayesian Model Selection for Multiple QTLpages.stat.wisc.edu/~yandell/talk/jax/2005/jax2005.pdf · 2005-10-07 · • conjugate prior is inverse chi-square ... – epistasis between

October 2005 Jax Workshop © Brian S. Yandell 7

0 50 100 150 200

0

2

4

6

8

10

12

Map position (cM)

lod

15

5010

00

LOD and LPD: QTL at 100

0 50 100 150 200-0

.50.

00.

51.

01.

52.

0

Map position (cM)

subs

titut

ion

effe

ct

substitution effect

BC with 1 QTL: IM vs. BIM

black=BIMpurple=IM

2nd QTL? 2nd QTL?

blue=ideal

October 2005 Jax Workshop © Brian S. Yandell 8

how does phenotype y improve posterior for genotype q?

90

100

110

120

D4Mit41D4Mit214

Genotype

bp

AAAA

ABAA

AAAB

ABAB

what are probabilitiesfor genotype qbetween markers?

recombinants AA:AB

all 1:1 if ignore yand if we use y?

Page 5: Bayesian Model Selection for Multiple QTLpages.stat.wisc.edu/~yandell/talk/jax/2005/jax2005.pdf · 2005-10-07 · • conjugate prior is inverse chi-square ... – epistasis between

October 2005 Jax Workshop © Brian S. Yandell 9

posterior on QTL genotypes• full conditional of q given data, parameters

– proportional to prior pr(q | m, λ )• weight toward q that agrees with flanking markers

– proportional to likelihood pr(y|q,µ)• weight toward q with similar phenotype values

• phenotype and flanking markers may conflict– posterior recombination balances these two weights

• this is the E-step of EM computations

),,|(pr),|(pr),|(pr),,,|(pr

λµλµλµ

mymqqymyq =

October 2005 Jax Workshop © Brian S. Yandell 10

prior & posteriors: genotypic means µq

6 8 10 12 14 16

y = phenotype values

n la

rge

n la

rge

n sm

allp

rior

qq Qq QQ

data meandata means prior mean

Page 6: Bayesian Model Selection for Multiple QTLpages.stat.wisc.edu/~yandell/talk/jax/2005/jax2005.pdf · 2005-10-07 · • conjugate prior is inverse chi-square ... – epistasis between

October 2005 Jax Workshop © Brian S. Yandell 11

shrink posterior from sample genotypic mean toward overall meanpartition individuals into sets Sq by genotype qhyperparameter κ is related to heritabilityprior:

posterior given data:

shrinkage factor:

prior & posteriors: genotypic means µq

( )

11

}{count,sum

,)1(~

,~2

2

→+

=

==

⎟⎟⎠

⎞⎜⎜⎝

⎛−+ •

q

qq

qqq

Sq

qqqqqq

q

nn

b

Snnyy

nbybybN

yN

q

κκ

σµ

κσµ

October 2005 Jax Workshop © Brian S. Yandell 12

what if variance σ2 is unknown?• sample variance is proportional to chi-square

– ns2 / σ2 ~ χ2 ( n )– likelihood of sample variance s2 given n, σ2

• conjugate prior is inverse chi-square– ντ2 / σ2 ~ χ2 ( ν )– prior of population variance σ2 given ν, τ2

• posterior is weighted average of likelihood and prior– (ντ2+ns2) / σ2 ~ χ2 ( ν+n )– posterior of population variance σ2 given n, s2, ν, τ2

• empirical choice of hyper-parameters– τ2= s2/3, ν=6– E(σ2 | ν,τ2) = s2/2, Var(σ2 | ν,τ2) = s4/4

Page 7: Bayesian Model Selection for Multiple QTLpages.stat.wisc.edu/~yandell/talk/jax/2005/jax2005.pdf · 2005-10-07 · • conjugate prior is inverse chi-square ... – epistasis between

October 2005 Jax Workshop © Brian S. Yandell 13

• phenotype affected by genotype & environmentpr(y|q,µ) ~ N(µq , σ2)y = µq + environment

µq = β0 + sum{j in H} βj(q)number of terms in QTL model H ≤ 2nqtl (3nqtl for F2)

• partition genotypic mean into QTL effectsµq = β0 + β1(q1) + β2(q2) + β12(q1,q2)µq = mean + main effects + epistatic interactions

• partition prior and posterior (details omitted)

multiple QTL phenotype model

October 2005 Jax Workshop © Brian S. Yandell 14

prior & posterior on QTL model H• index model H by number of QTL• what prior on number of QTL?

– uniform over some range– Poisson with prior mean– geometric with prior mean

• prior influences posterior– good: reflects prior belief

• push data in discovery process

– bad: skeptic revolts!• “answer” depends on “guess”

e

e

e

ee

ee e e e e

0 2 4 6 8 10

0.00

0.10

0.20

0.30

p

p

p p

p

p

pp p p p

u u u u u u u

u u u u

m = number of QTL

prio

r pro

babi

lity

epu

exponentialPoissonuniform

Page 8: Bayesian Model Selection for Multiple QTLpages.stat.wisc.edu/~yandell/talk/jax/2005/jax2005.pdf · 2005-10-07 · • conjugate prior is inverse chi-square ... – epistasis between

October 2005 Jax Workshop © Brian S. Yandell 15

epistatic interactions• model space issues

– 2-QTL interactions only?– Fisher-Cockerham partition vs. tree-structured?– general interactions among multiple QTL

• model search issues– epistasis between significant QTL

• check all possible pairs when QTL included?• allow higher order epistasis?

– epistasis with non-significant QTL• whole genome paired with each significant QTL?• pairs of non-significant QTL?

• Yi Xu (2000) Genetics; Yi, Xu, Allison (2003) Genetics; Yi (2004)

October 2005 Jax Workshop © Brian S. Yandell 16

3. QTL Model Search using MCMC• construct Markov chain around posterior

– want posterior as stable distribution of Markov chain– in practice, the chain tends toward stable distribution

• initial values may have low posterior probability• burn-in period to get chain mixing well

• update QTL model components from full conditionals– update locus λ given q,H (using Metropolis-Hastings step)– update genotypes q given λ,µ,y,H (using Gibbs sampler)– update effects µ given q,y,H (using Gibbs sampler)– update QTL model H given λ,µ,y,q (using Gibbs or M-H)

NHqHqHqXYHqHq

),,,(),,,(),,,(),|,,,(pr~),,,(

21 µλµλµλµλµλ

→→→ L

Page 9: Bayesian Model Selection for Multiple QTLpages.stat.wisc.edu/~yandell/talk/jax/2005/jax2005.pdf · 2005-10-07 · • conjugate prior is inverse chi-square ... – epistasis between

October 2005 Jax Workshop © Brian S. Yandell 17

Gibbs sampler idea• two correlated normals (genotypic means in BC)

– could draw samples from both together– but easier to sample one at a time

• Gibbs sampler:– sample each from its full conditional– pick order of sampling at random– repeat N times

( )( )2

QQQQQq

2QqQqQQ

QqQQQqQQ

1,~given

1,~given

),(but )1,0(~,

ρρµµµ

ρρµµµ

ρµµµµ

=

N

N

corN

October 2005 Jax Workshop © Brian S. Yandell 18

Gibbs sampler samples: ρ = 0.6

0 10 20 30 40 50

-2-1

01

2

Markov chain index

Gib

bs: m

ean

1

0 10 20 30 40 50

-2-1

01

23

Markov chain index

Gib

bs: m

ean

2

-2 -1 0 1 2

-2-1

01

23

Gibbs: mean 1

Gib

bs: m

ean

2

-2 -1 0 1 2

-2-1

01

23

Gibbs: mean 1

Gib

bs: m

ean

2

0 50 100 150 200

-2-1

01

23

Markov chain index

Gib

bs: m

ean

1

0 50 100 150 200

-2-1

01

2

Markov chain index

Gib

bs: m

ean

2

-2 -1 0 1 2 3

-2-1

01

2

Gibbs: mean 1

Gib

bs: m

ean

2

-2 -1 0 1 2 3

-2-1

01

2

Gibbs: mean 1

Gib

bs: m

ean

2

N = 50 samples N = 200 samples

Page 10: Bayesian Model Selection for Multiple QTLpages.stat.wisc.edu/~yandell/talk/jax/2005/jax2005.pdf · 2005-10-07 · • conjugate prior is inverse chi-square ... – epistasis between

October 2005 Jax Workshop © Brian S. Yandell 19

How to sample a locus λ?• cannot easily sample from locus full conditional

pr(λ | m,q) = pr(λ) pr(q | m,λ) / constant• constant determined by averaging

– over all possible genotypes– over entire map

• Gibbs sampler will not work in general– but can use method based on ratios of probabilities– Metropolis-Hastings is extension of Gibbs sampler

October 2005 Jax Workshop © Brian S. Yandell 20

Metropolis-Hastings idea• want to study distribution f(λ)• take Monte Carlo samples

– unless too complicated• pick an arbitrary distribution g(λ, λ*)• draw Metropolis-Hastings samples:

– current sample value λ– propose new value λ* from g(λ, λ*)– accept new value with prob A

• Gibbs sampler is special case– g(λ, λ*) = f(λ*)– always accept new proposal: A = 1

⎟⎟⎠

⎞⎜⎜⎝

⎛=

),()(),()(,1min *

**

λλλλλλ

gfgfA

0 2 4 6 8 10

0.0

0.2

0.4

-4 -2 0 2 4

0.0

0.2

0.4

f(λ)

g(λ–λ*)

Page 11: Bayesian Model Selection for Multiple QTLpages.stat.wisc.edu/~yandell/talk/jax/2005/jax2005.pdf · 2005-10-07 · • conjugate prior is inverse chi-square ... – epistasis between

October 2005 Jax Workshop © Brian S. Yandell 21

MCMC realization

0 2 4 6 8 10

020

0040

0060

0080

0010

000

θ

mcm

c se

quen

ce

2 3 4 5 6 7 80.

00.

10.

20.

30.

40.

50.

pr(θ

|Y)

added twist: occasionally propose from whole domainλλ

October 2005 Jax Workshop © Brian S. Yandell 22

Metropolis-Hastings samples

0 2 4 6 8

050

150

θ

mcm

c se

quen

ce

0 2 4 6 8

02

46

θ

pr(θ

|Y)

0 2 4 6 8

050

150

θ

mcm

c se

quen

ce

0 2 4 6 8

0.0

0.4

0.8

1.2

θ

pr(θ

|Y)

0 2 4 6 8

040

080

0

θ

mcm

c se

quen

ce

0 2 4 6 8

0.0

1.0

2.0

θ

pr(θ

|Y)

0 2 4 6 8

040

080

0

θ

mcm

c se

quen

ce

0 2 4 6 8

0.0

0.2

0.4

0.6

θ

pr(θ

|Y)

N = 200 samples N = 1000 samplesnarrow g wide g narrow g wide g

λ

λ λ

λ λ

λ λ

λ

Page 12: Bayesian Model Selection for Multiple QTLpages.stat.wisc.edu/~yandell/talk/jax/2005/jax2005.pdf · 2005-10-07 · • conjugate prior is inverse chi-square ... – epistasis between

October 2005 Jax Workshop © Brian S. Yandell 23

sampling across QTL models H

action steps: draw one of three choices• update QTL model H with probability 1-b(H)-d(H)

– update current model using full conditionals– sample QTL loci, effects, and genotypes

• add a locus with probability b(H)– propose a new locus along genome– innovate new genotypes at locus and phenotype effect– decide whether to accept the “birth” of new locus

• drop a locus with probability d(H)– propose dropping one of existing loci– decide whether to accept the “death” of locus

0 Lλ1 λm+1 λmλ2 …

October 2005 Jax Workshop © Brian S. Yandell 24

reversible jump MCMC

• consider known genotypes q at 2 known loci λ– models with 1 or 2 QTL

• M-H step between 1-QTL and 2-QTL models– model changes dimension (via careful bookkeeping)– consider mixture over QTL models H

eqqYnqtl

eqYnqtl

+++==

++==

)()(:2

)(:1

22110

110

βββ

ββ

Page 13: Bayesian Model Selection for Multiple QTLpages.stat.wisc.edu/~yandell/talk/jax/2005/jax2005.pdf · 2005-10-07 · • conjugate prior is inverse chi-square ... – epistasis between

October 2005 Jax Workshop © Brian S. Yandell 25

Gibbs sampler with loci indicators • partition genome into intervals

– at most one QTL per interval– interval = 1 cM in length– assume QTL in middle of interval

• use loci to indicate presence/absence of QTL in each interval– γ = 1 if QTL in interval– γ = 0 if no QTL

• Gibbs sampler on loci indicators– see work of Nengjun Yi (and earlier work of Ina Hoeschele)

eqqY +++= )()( 1221110 βγβγβ

October 2005 Jax Workshop © Brian S. Yandell 26

Bayesian shrinkage estimation

• soft loci indicators– strength of evidence for λj depends on variance of βj– similar to γ > 0 on grey scale

• include all possible loci in model– pseudo-markers at 1cM intervals

• Wang et al. (2005 Genetics)– Shizhong Xu group at U CA Riverside

chisquare-inverse~),,0(~)(

...)()(

22

12110

jjjj Nq

eqqY

σσβ

βββ ++++=

Page 14: Bayesian Model Selection for Multiple QTLpages.stat.wisc.edu/~yandell/talk/jax/2005/jax2005.pdf · 2005-10-07 · • conjugate prior is inverse chi-square ... – epistasis between

October 2005 Jax Workshop © Brian S. Yandell 27

0.05 0.10 0.15

0.0

0.05

0.10

0.15

b1

b2a short sequence

-0.3 -0.1 0.10.

00.

10.

20.

30.

4

first 1000 with m<3

b1

b2-0.2 0.0 0.2

geometry of effect samples(collinearity of QTL yields correlated estimates)

β1 β1

β 2β 2

October 2005 Jax Workshop © Brian S. Yandell 28

4. Model Assessment• balance model fit against model complexity

smaller model bigger modelmodel fit miss key features fits betterprediction may be biased no biasinterpretation easier more complicatedparameters low variance high variance

• information criteria: penalize L by model size |H|– compare IC = – 2 log L( H | y ) + penalty( H )

• Bayes factors: balance posterior by prior choice– compare pr( data y | model H )

Page 15: Bayesian Model Selection for Multiple QTLpages.stat.wisc.edu/~yandell/talk/jax/2005/jax2005.pdf · 2005-10-07 · • conjugate prior is inverse chi-square ... – epistasis between

October 2005 Jax Workshop © Brian S. Yandell 29

QTL Bayes factors• BF = posterior odds / prior odds• BF equivalent to BIC

– simple comparison: 1 vs 2 QTL• same as LOD test

– general comparison of models– want Bayes factor >> 1

• nqtl = number of QTL– indexes model complexity– genetic architecture also important

)1(pr)data1(pr)(pr)data(pr

1 ++=+ nqtl/|nqtl

nqtl/nqtl|BFnqtl,nqtl

e

e

e

ee

ee e e e e

0 2 4 6 8 10

0.00

0.10

0.20

0.30

p

p

p p

p

p

pp p p p

u u u u u u u

u u u u

m = number of QTL

prio

r pro

babi

lity

epu

exponentialPoissonuniform

October 2005 Jax Workshop © Brian S. Yandell 30

Bayes factors to assess models•Bayes factor: which model best supports the data?

– ratio of posterior odds to prior odds– ratio of model likelihoods

•equivalent to LR statistic when– comparing two nested models– simple hypotheses (e.g. 1 vs 2 QTL)

•Bayes Information Criteria (BIC)– Schwartz introduced for model selection in general settings– penalty to balance model size (p = number of parameters)

)log()()log(2)log(2)model|(pr)model|(pr

)model(pr/)model(pr)|model(pr/)|model(pr

1212

2

1

21

2112

nppLRBYYYYB

−−−=−

==

Page 16: Bayesian Model Selection for Multiple QTLpages.stat.wisc.edu/~yandell/talk/jax/2005/jax2005.pdf · 2005-10-07 · • conjugate prior is inverse chi-square ... – epistasis between

October 2005 Jax Workshop © Brian S. Yandell 31

simulations and data studies•simulated F2 intercross, 8 QTL

– (Stephens, Fisch 1998)– n=200, heritability = 50%– detected 3 QTL

•increase to detect all 8– n=500, heritability to 97%

QTL chr loci effect1 1 11 –32 1 50 –53 3 62 +24 6 107 –35 6 152 +36 8 32 –47 8 54 +18 9 195 +2

7 8 9 10 11 12 13

010

2030

40

number of QTL

frequ

ency

in %

0 50 100 150 200

ch1ch2ch3ch4ch5ch6ch7ch8ch9

ch10

Genetic map

posterior

October 2005 Jax Workshop © Brian S. Yandell 32

loci pattern across genome• notice which chromosomes have persistent loci• best pattern found 42% of the time

Chromosome m 1 2 3 4 5 6 7 8 9 10 Count of 80008 2 0 1 0 0 2 0 2 1 0 33719 3 0 1 0 0 2 0 2 1 0 7517 2 0 1 0 0 2 0 1 1 0 3779 2 0 1 0 0 2 0 2 1 0 2189 2 0 1 0 0 3 0 2 1 0 2189 2 0 1 0 0 2 0 2 2 0 198

Page 17: Bayesian Model Selection for Multiple QTLpages.stat.wisc.edu/~yandell/talk/jax/2005/jax2005.pdf · 2005-10-07 · • conjugate prior is inverse chi-square ... – epistasis between

October 2005 Jax Workshop © Brian S. Yandell 33

B. napus 8-week vernalizationwhole genome study

• 108 plants from double haploid– similar genetics to backcross: follow 1 gamete– parents are Major (biennial) and Stellar (annual)

• 300 markers across genome– 19 chromosomes– average 6cM between markers

• median 3.8cM, max 34cM– 83% markers genotyped

• phenotype is days to flowering– after 8 weeks of vernalization (cooling)– Stellar parent requires vernalization to flower

• available in R/bim package• Ferreira et al. (1994); Kole et al. (2001); Schranz et al. (2002)

October 2005 Jax Workshop © Brian S. Yandell 34

Markov chain Monte Carlo sequence

burnin (sets up chain)mcmc sequence

number of QTLenvironmental varianceh2 = heritability(genetic/total variance)LOD = likelihood

-50000 -40000 -30000 -20000 -10000 0

05

1015

2025

burnin sequence

nqtl

-50000 -40000 -30000 -20000 -10000 0

0.00

60.

010

0.01

4

burnin sequence

envv

ar

-50000 -40000 -30000 -20000 -10000 0

0.0

0.2

0.4

0.6

burnin sequence

herit

-50000 -40000 -30000 -20000 -10000 0

05

1015

20

burnin sequence

LOD

0 e+00 2 e+05 4 e+05 6 e+05 8 e+05 1 e+0

24

68

10

mcmc sequence

nqtl

0 e+00 2 e+05 4 e+05 6 e+05 8 e+05 1 e+0

0.00

40.

010

0.01

6

mcmc sequence

envv

ar

0 e+00 2 e+05 4 e+05 6 e+05 8 e+05 1 e+0

0.1

0.3

0.5

mcmc sequence

herit

0 e+00 2 e+05 4 e+05 6 e+05 8 e+05 1 e+0

510

1520

mcmc sequence

LOD

Page 18: Bayesian Model Selection for Multiple QTLpages.stat.wisc.edu/~yandell/talk/jax/2005/jax2005.pdf · 2005-10-07 · • conjugate prior is inverse chi-square ... – epistasis between

October 2005 Jax Workshop © Brian S. Yandell 35

MCMC sampled loci

subset of chromosomesN2, N3, N16

points jittered for viewblue lines at markers

note concentrationon chromosome N2

includes all models

020

4060

8010

012

0

N2 N3 N16chromosome

MC

MC

sam

pled

loci

ec2e5a

E33M49.117

ec3b12wg2d11b

wg1g8a

E32M49.73wg3g11

ec3a8wg7f3aE33M59.59

E35M62.117wg6b10

wg8g1bwg5a5

tg6a12

Aca1E38M50.133

E35M59.117

ec2d1a

wg1a10tg2h10b

tg2f12

ec3e12b

wg5a1bwg2a3c

ec5e12awg7f5bE32M47.159

tg5d9bslg6E35M59.85

E33M49.491E38M62.229wg1g10bec4h7

wg4d10

E32M50.409

E38M50.159

E33M47.338ec2d8a

E33M49.165wg4f4cE35M48.148wg6c6ec4g7bwg7a11

wg5b1aec4e7aE32M61.218

wg4a4bE33M50.183E33M62.140

wg9c7

wg6b2

E33M49.211wg2e12bisoIdh

wg3f7c

October 2005 Jax Workshop © Brian S. Yandell 36

Bayesian model assessment

row 1: # QTLrow 2: pattern

col 1: posteriorcol 2: Bayes factornote error bars on bf

evidence suggests4-5 QTLN2(2-3),N3,N16

1 3 5 7 9 11

0.0

0.1

0.2

0.3

number of QTL

QTL

pos

terio

r

QTL posterior

15

5050

0

number of QTL

post

erio

r / p

rior

Bayes factor ratios

1 3 5 7 9 11

weak

moderate

strong

1 3 5 7 9 11 13

0.0

0.1

0.2

0.3

2*2

2:2,

123:

2*2,

122:

2,13

2:2,

32:

2,16

2:2,

113:

2*2,

32:

2,15

4:2*

2,3,

163:

2*2,

132:

2,14

2

model index

mod

el p

oste

rior

pattern posterior

5 e

-01

1 e

+01

5 e

+02

model index

post

erio

r / p

rior

Bayes factor ratios

1 3 5 7 9 11 13

12

23

2 22

23

24

32

weak

moderate

strong

Page 19: Bayesian Model Selection for Multiple QTLpages.stat.wisc.edu/~yandell/talk/jax/2005/jax2005.pdf · 2005-10-07 · • conjugate prior is inverse chi-square ... – epistasis between

October 2005 Jax Workshop © Brian S. Yandell 37

Bayesian estimates of loci & effectsmodel averaging: at least 4 QTL

histogram of lociblue line is densityred lines at estimates

estimate additive effects(red circles)

grey points sampledfrom posterior

blue line is cubic splinedashed line for 2 SD

0 50 100 150 200 250

0.00

0.02

0.04

0.06

loci

his

togr

am

napus8 summaries with pattern 1,1,2,3 and m ≥ 4

N2 N3 N16

0 50 100 150 200 250

-0.0

8-0

.02

0.02

addi

tive

N2 N3 N16

October 2005 Jax Workshop © Brian S. Yandell 38

Bayesian model diagnostics pattern: N2(2),N3,N16col 1: densitycol 2: boxplots by m

environmental varianceσ2 = .008, σ = .09

heritabilityh2 = 52%

LOD = 16(highly significant)

but note change with m

0.004 0.006 0.008 0.010 0.012

050

100

200

dens

ity

marginal envvar, m ≥ 44 5 6 7 8 9 11 12

0.00

60.

008

0.01

0

envvar conditional on number of QTL

envv

ar

0.2 0.3 0.4 0.5 0.6 0.7

01

23

45

dens

ity

marginal heritability, m ≥ 44 5 6 7 8 9 11 12

0.30

0.40

0.50

0.60

heritability conditional on number of QT

herit

abilit

y

5 10 15 20 25

0.00

0.04

0.08

0.12

dens

ity

marginal LOD, m ≥ 44 5 6 7 8 9 11 12

1012

1416

1820

LOD conditional on number of QTL

LOD

Page 20: Bayesian Model Selection for Multiple QTLpages.stat.wisc.edu/~yandell/talk/jax/2005/jax2005.pdf · 2005-10-07 · • conjugate prior is inverse chi-square ... – epistasis between

October 2005 Jax Workshop © Brian S. Yandell 39

studying diabetes in an F2• segregating cross of inbred lines

– B6.ob x BTBR.ob → F1 → F2– selected mice with ob/ob alleles at leptin gene (chr 6)– measured and mapped body weight, insulin, glucose at various ages

(Stoehr et al. 2000 Diabetes)– sacrificed at 14 weeks, tissues preserved

• gene expression data– Affymetrix microarrays on parental strains, F1

• key tissues: adipose, liver, muscle, β-cells• novel discoveries of differential expression (Nadler et al. 2000 PNAS; Lan et

al. 2002 in review; Ntambi et al. 2002 PNAS)– RT-PCR on 108 F2 mice liver tissues

• 15 genes, selected as important in diabetes pathways• SCD1, PEPCK, ACO, FAS, GPAT, PPARgamma, PPARalpha, G6Pase,

PDI,…

October 2005 Jax Workshop © Brian S. Yandell 40

0 50 100 150 200 250 300

02

46

8LO

D

chr2 chr5 chr9

0 50 100 150 200 250 300

-0.5

0.0

0.5

1.0

effe

ct (a

dd=b

lue,

dom

=red

)

chr2 chr5 chr9

Multiple Interval Mapping (QTLCart)SCD1: multiple QTL plus epistasis!

Page 21: Bayesian Model Selection for Multiple QTLpages.stat.wisc.edu/~yandell/talk/jax/2005/jax2005.pdf · 2005-10-07 · • conjugate prior is inverse chi-square ... – epistasis between

October 2005 Jax Workshop © Brian S. Yandell 41

Bayesian model assessment:number of QTL for SCD1

1 2 3 4 5 6 7 8 9 11 13

0.00

0.05

0.10

0.15

0.20

0.25

number of QTL

QTL

pos

terio

r

QTL posterior

15

1050

500

number of QTL

post

erio

r / p

rior

Bayes factor ratios

1 2 3 4 5 6 7 8 9 11 13

weak

moderate

strong

October 2005 Jax Workshop © Brian S. Yandell 42

Bayesian LOD and h2 for SCD1

0 5 10 15 20

0.00

0.10

dens

ity

marginal LOD, m ≥ 11 2 3 4 5 6 7 8 9 10 12 14

510

1520

LOD conditional on number of QTL

LOD

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

01

23

4

dens

ity

marginal heritability, m ≥ 11 2 3 4 5 6 7 8 9 10 12 14

0.1

0.3

0.5

heritability conditional on number of QTL

herit

abilit

y

Page 22: Bayesian Model Selection for Multiple QTLpages.stat.wisc.edu/~yandell/talk/jax/2005/jax2005.pdf · 2005-10-07 · • conjugate prior is inverse chi-square ... – epistasis between

October 2005 Jax Workshop © Brian S. Yandell 43

Bayesian model assessment:chromosome QTL pattern for SCD1

1 3 5 7 9 11 13 15

0.00

0.05

0.10

0.15

4:1,

2,2*

34:

1,2*

2,3

5:3*

1,2,

35:

2*1,

2,2*

35:

2*1,

2*2,

36:

3*1,

2,2*

36:

3*1,

2*2,

35:

1,2*

2,2*

36:

4*1,

2,3

6:2*

1,2*

2,2*

32:

1,3

3:2*

1,2

2:1,

2

3:1,

2,3

4:2*

1,2,

3

model index

mod

el p

oste

rior

pattern posterior

0.2

0.4

0.6

0.8

model index

post

erio

r / p

rior

Bayes factor ratios

1 3 5 7 9 11 13 15

3 44 4

55 5 6 6

56

62

32

weak

moderate

October 2005 Jax Workshop © Brian S. Yandell 44

trans-acting QTL for SCD1(no epistasis yet: see Yi, Xu, Allison 2003)

dominance?

Page 23: Bayesian Model Selection for Multiple QTLpages.stat.wisc.edu/~yandell/talk/jax/2005/jax2005.pdf · 2005-10-07 · • conjugate prior is inverse chi-square ... – epistasis between

October 2005 Jax Workshop © Brian S. Yandell 45

2-D scan: assumes only 2 QTL!

epistasisLODpeaks

jointLODpeaks

October 2005 Jax Workshop © Brian S. Yandell 46

sub-peaks can be easily overlooked!

Page 24: Bayesian Model Selection for Multiple QTLpages.stat.wisc.edu/~yandell/talk/jax/2005/jax2005.pdf · 2005-10-07 · • conjugate prior is inverse chi-square ... – epistasis between

October 2005 Jax Workshop © Brian S. Yandell 47

epistatic model fit

October 2005 Jax Workshop © Brian S. Yandell 48

Cockerham epistatic effects

Page 25: Bayesian Model Selection for Multiple QTLpages.stat.wisc.edu/~yandell/talk/jax/2005/jax2005.pdf · 2005-10-07 · • conjugate prior is inverse chi-square ... – epistasis between

October 2005 Jax Workshop © Brian S. Yandell 49

marginal heritability by locus

black = totalblue = additivered = dominantpurple = add x add

October 2005 Jax Workshop © Brian S. Yandell 50

marginal LOD by locus

black = totalblue = additivered = dominantpurple = add x addgreen = scanone

Page 26: Bayesian Model Selection for Multiple QTLpages.stat.wisc.edu/~yandell/talk/jax/2005/jax2005.pdf · 2005-10-07 · • conjugate prior is inverse chi-square ... – epistasis between

October 2005 Jax Workshop © Brian S. Yandell 51

Bayesian & classical (HK)

October 2005 Jax Workshop © Brian S. Yandell 52

sex-adjusted anova for SCD1(only 3-way sex interactions significant)

Summary for fit QTL Method is: imp Number of observations: 107

Full model result----------------------------------Model formula is: y ~ sex * (Q1 + Q2 + Q3 + Q1:Q3 + Q2:Q3)

df SS MS LOD %var Pvalue(Chi2) Pvalue(F)Model 29 109.02204 3.7593806 25.79574 67.05143 7.869261e-13 1.592058e-09Error 77 53.57261 0.6957482 Total 106 162.59465

Drop one QTL at a time ANOVA table: ----------------------------------

df Type III SS LOD %var F value Pvaluesex 15 36.369 12.039 22.368 3.485 0.000154Chr2@105 12 41.248 13.266 25.369 4.940 5.38e-06 Chr5@67 12 39.771 12.901 24.460 4.764 8.93e-06Chr9@15 20 59.542 17.365 36.620 4.279 1.87e-06 Chr2@105:Chr9@15 8 33.637 11.322 20.687 6.043 5.11e-06 Chr5@67:Chr9@15 8 30.042 10.344 18.476 5.397 2.12e-05sex:Chr2@105 6 18.499 6.892 11.377 4.431 0.000669sex:Chr5@67 6 18.130 6.773 11.151 4.343 0.000793sex:Chr9@15 10 25.945 9.176 15.957 3.729 0.000413 sex:Chr2@105:Chr9@15 4 17.444 6.549 10.729 6.268 0.000202sex:Chr5@67:Chr9@15 4 15.728 5.981 9.673 5.652 0.000483

Page 27: Bayesian Model Selection for Multiple QTLpages.stat.wisc.edu/~yandell/talk/jax/2005/jax2005.pdf · 2005-10-07 · • conjugate prior is inverse chi-square ... – epistasis between

October 2005 Jax Workshop © Brian S. Yandell 53

anova for SCD1(sex terms n.s.)

Model:y ~ Chr2@105 + Chr5@67 + Chr9@67 + Chr2@80 + Chr9@67^2

+ Chr2@105:Chr9@67 + Chr5@67:Chr9@67^2

Df Sum Sq Mean Sq F value Pr(>F) Model 7 69.060 69.060 73.095 < 2.2e-16 ***Error 99 93.535 0.945 Total 106 162.595 70.005

Single term deletionsDf Sum of Sq RSS AIC F value Pr(F)

<none> 93.535 22.992 Chr2@80 1 9.073 102.608 28.225 9.6036 0.002528 **

Chr2@105 1 0.073 93.607 18.402 0.0769 0.782140 Chr5@67 1 0.218 93.753 18.568 0.2307 0.632084 Chr9@67 1 7.156 100.691 26.207 7.5746 0.007042 ** Chr9@67^2 1 0.106 93.641 18.440 0.1123 0.738200

Chr2@105:Chr9@67 1 15.612 109.147 34.836 16.5246 9.64e-05 ***Chr5@67:Chr9@67^2 1 7.211 100.745 26.265 7.6318 0.006838 **

October 2005 Jax Workshop © Brian S. Yandell 54

epistatic interaction plots(chr 5x9 p=0.007; chr 2x9 p<0.0001)

mostly axd mostly axa

Page 28: Bayesian Model Selection for Multiple QTLpages.stat.wisc.edu/~yandell/talk/jax/2005/jax2005.pdf · 2005-10-07 · • conjugate prior is inverse chi-square ... – epistasis between

October 2005 Jax Workshop © Brian S. Yandell 55

SCD1 on chr2x9 by sex

October 2005 Jax Workshop © Brian S. Yandell 56

obesity in CAST/Ei BC onto M16i

• 421 mice (Daniel Pomp)– (213 male, 208 female)

• 92 microsatellites on 19 chromosomes– 1214 cM map

• subcutaneous fat pads– pre-adjusted for sex and dam effects

• Yi, Yandell, Churchill, Allison, Eisen, Pomp (2005) Genetics (in press)

Page 29: Bayesian Model Selection for Multiple QTLpages.stat.wisc.edu/~yandell/talk/jax/2005/jax2005.pdf · 2005-10-07 · • conjugate prior is inverse chi-square ... – epistasis between

October 2005 Jax Workshop © Brian S. Yandell 57

non-epistatic analysis0

510

1520

LOD

sco

re

050

100

150

200

250

300

Baye

s fa

ctor

single QTL LOD profile multiple QTLBayes factor profile

October 2005 Jax Workshop © Brian S. Yandell 58

posterior profile of main effectsin epistatic analysis

-0.8

-0.4

0.0

0.2

0.4

Mai

n ef

fect

0.00

0.05

0.10

0.15

0.20

Her

itabi

lity

010

2030

4050

Bay

es fa

ctor

main effects & heritability profile Bayes factor profile

Page 30: Bayesian Model Selection for Multiple QTLpages.stat.wisc.edu/~yandell/talk/jax/2005/jax2005.pdf · 2005-10-07 · • conjugate prior is inverse chi-square ... – epistasis between

October 2005 Jax Workshop © Brian S. Yandell 59

posterior profile of main effectsin epistatic analysis

October 2005 Jax Workshop © Brian S. Yandell 60

model selectionvia

Bayes factorsfor

epistatic model

number of QTL

QTL pattern

Page 31: Bayesian Model Selection for Multiple QTLpages.stat.wisc.edu/~yandell/talk/jax/2005/jax2005.pdf · 2005-10-07 · • conjugate prior is inverse chi-square ... – epistasis between

October 2005 Jax Workshop © Brian S. Yandell 61

posterior probability of effects

Posterior probability

0.0 0.2 0.4 0.6 0.8 1.0

Chr2(72,85)Chr13(20,42)Chr15(1,31)

Chr18(43,71)Chr1(26,54)

Chr19(15,45)Chr7(50,75)

Chr14(12,41)Chr1(26,54)*Chr18(43,71)Chr2(72,85)*Chr13(20,42)Chr15(1,31)*Chr19(15,45)Chr2(72,85)*Chr14(12,41)Chr7(50,75)*Chr19(15,45)Chr13(20,42)*Chr15(1,31)

October 2005 Jax Workshop © Brian S. Yandell 62

scatterplot estimates of epistatic loci

Page 32: Bayesian Model Selection for Multiple QTLpages.stat.wisc.edu/~yandell/talk/jax/2005/jax2005.pdf · 2005-10-07 · • conjugate prior is inverse chi-square ... – epistasis between

October 2005 Jax Workshop © Brian S. Yandell 63

stronger epistatic effects

October 2005 Jax Workshop © Brian S. Yandell 64

Bayesian software for QTLs• R/bim: Bayesian IM (Satagopan Yandell 1996; Gaffney 2001)*

– www.stat.wisc.edu/~yandell/qtl/software/Bmapqtl– www.r-project.org contributed package to CRAN– version available within WinQTLCart (statgen.ncsu.edu/qtlcart)

• R/bmqtl: Bayesian Multiple QTL (N Yi, T Mehta & BS Yandell)*– epistasis, new MCMC algorithms, extensive graphics– extension of R/bim; due out Fall 2005

• R/qtl (Broman et al. 2003 Bioinformatics)*– biosun01.biostat.jhsph.edu/~kbroman/software– www.r-project.org contributed package

• Pseudomarker (Sen, Churchill 2002 Genetics)– www.jax.org/staff/churchill/labsite/software

• Bayesian QTL / Multimapper– Sillanpää Arjas (1998)– www.rni.helsinki.fi/~mjs

• Stephens & Fisch (1998 Biometrics)• R/bqtl (C Berry, hacuna.ucsd.edu/bqtl)

* Hao Wu, Jackson Labs, provide crucial computing support

Page 33: Bayesian Model Selection for Multiple QTLpages.stat.wisc.edu/~yandell/talk/jax/2005/jax2005.pdf · 2005-10-07 · • conjugate prior is inverse chi-square ... – epistasis between

October 2005 Jax Workshop © Brian S. Yandell 65

R/bim: our software• www.stat.wisc.edu/~yandell/qtl/software/Bmapqtl

– R contributed library (www.r-project.org)• library(bim) is cross-compatible with library(qtl)

– Bayesian module within WinQTLCart• WinQTLCart output can be processed using R library

• Software history– initially designed by JM Satagopan (1996)– major revision and extension by PJ Gaffney (2001)

• whole genome• multivariate update of effects; long range position updates• substantial improvements in speed, efficiency• pre-burnin: initial prior number of QTL very large

– upgrade (H Wu, PJ Gaffney, CF Jin, BS Yandell 2003)– epistasis in progress (H Wu, BS Yandell, N Yi 2004)

October 2005 Jax Workshop © Brian S. Yandell 66

many thanks

Jackson LabsGary ChurchillHao Wu

U AL BirminghamDavid Allison Nengjun YiTapan Mehta

Michael NewtonDaniel SorensenDaniel Gianola

Liang Li

my studentsJaya SatagopanFei ZouPatrick GaffneyChunfang JinElias Chaibub Neto

USDA Hatch, NIH/NIDDK (Attie), NIH/R01 (Yi)

Tom OsbornDavid ButruilleMarcio FerreraJosh UdahlPablo Quijada

Alan AttieJonathan StoehrHong LanSusie CleeJessica Byers