X ia - genotype of i-th individual at locus a X ia = 1/2 - individual is heterozygous at locus a

Modifying the Schwarz Bayesian Information Criterion to locate multiple interacting

Quantitative Trait Loci

1. M.Bogdan, J.K.Ghosh and R.W.Doerge,Genetics 2004 167: 989-999.

2. M.Bogdan and R.W.Doerge “Mapping multiple interacting QTL by multidimensional genome searches’’

Xia- genotype of i-th individual at locus a

Xia = 1/2 - individual is heterozygous at locus a

Xia = -1/2 - individual is homozygous at locus a

dab=10 cM - ρ (Xia, Xib) = 0.81

Data for QTL mapping

Y1,...,Yn - vector of trait values for n backcross individuals

X=[Xij], 1 ≤ i ≤ n, 1 ≤ j ≤ m - genotypes of m markers

Standard methods of QTL mapping One QTL model

2(1) Q , (0, )

Q (-1/2,1/2) - QTL genotypei i i i

i

Y N

1. Search over markers - fit model (1) at each marker and choose markers for which the likelihood exceeds a preestablished threshold value as candidate

QTL locations.

Interval mapping Lander and Botstein (1989)

• Consider a fixed position between markers

- state of flanking markers

1 1 1 1 1 1 1 1, , , , , , ,

2 2 2 2 2 2 2 2

1(Q | ) easy to compute

2

i

i

i i i

I

I

p P I

2

2 2

1

Q , (0, )

1 1( | ) ( , ) (1 ) ( , )

2 2

( | ) ( | )

i i i i

i i i i

n

i ii

Y N

f Y I p N p N

L Y I f Y I

1. Estimate μ, β, and σ by EM algorithm and compute the corresponding likelihood.

2. Repeat this procedure for a new possible QTL location.

3. Plot the resulting likelihoods as the function of assumed QTL position.

• Problems with interval mapping

a) Not able to distingush closely linked QTL

b) Not able to detect epistatic QTL (involved only in interactions)

• Solution

Estimate the location of several QTL at once using multiple regression model (Kao et al. 1999)

p r

i j ij jl ij ilj 1 1 j<l m

Y μ β γ εiQ Q Q

Problem : estimation of the number of additive and interaction terms

iεXXγXβμY jjj iuik

p

1j

r

1jjihji

Xij - genotype of j-th marker

average number of markers - (200,400)

Bayesian Information Criterion

• Choose the model which maximizes

log L -1/2 k log n

L – likelihood of the data for a given model

k – number of parameters in the model

n – sample size

Broman (1997) and Broman and Speed (2002) – BIC overestimates QTL number

How to modify BIC ?

Mi – i-th linear model (specifies which markers

are included in regression)

θ = (μ, β1,..., βp, γ1,..., γr, σ) – vector of parameters

for Mi

fi(θ) – density of the prior distribution for θ

π(i) – prior probability of Mi

L(Y|θ) – likelihood of the data given the vector

of paramers θ

mi(Y) – likelihood of the data given the model Mi

P(Mi|Y) π(i)mi(Y)

BIC neglects π(i) and uses asymptotic approximation

θ)dθ(θ)f|L(Y(Y)m ii

n 2)logr1/2(p)θ̂L(Y, log(Y)m log i

neglecting π(i) = assigning the same prior probabilityto all models = assigning high prior probability to the

event that there are many regressors

Example : 200 markers

200 models with one additive term

=19 900 models with one interaction or with two additive terms

= 9.05*1058 models with 100 additive terms

2

200

100

200

Idea: supplement BIC with a more realistic prior

distribution π

)(log2log))()((log)(

regression from squares of sum residual

)(log2

)ˆ,(log

log))()((2

1)ˆ,(log)(log)(

~

iniripRSSniS

RSS

nCRSSn

YL

niripYLiiS

Choice of π (George and McCulloch, 1993)

M – number of markers

2

1)M(MN

- number of potential interactions

α - the probability that i-th additive term appears in the model

ν - the probability that j-th interaction term appears in the model

π(M)= αp νr(1-α)M-p (1-ν)N-r

M- model with p additive terms and r interactions

We choose Nuu

Nll

,1

and ,1

log π(M)=C(M,N,l,u)-p log(l-1)-r log(u-1)

)1log(2)1log(2

log)(log)(

urlp

nrpRSSniS

Prior distribution on the number of additive terms, p –Binomial (M,α)

Prior distribution on the number of interactions, r –Binomial (N,ν)

Choice of l and u should depend on the prior knowledge on the number of QTL.

u

N, E(r)

l

ME(p)

Our choice – for the sample size 200probability of wrongly detecting QTL (when there are

none) ≈ 0.05

We keep E(p) and E(r) equal to 2.2

The choice is supported by theoretical bound on type I error based on Bonferoni inequality.

( ) log ( ) log

2 log( / 2.2) 2 log( / 2.2)

S i n RSS p r n

p M r N

Additional penalty similar to Risk Inflation Criterion of Foster and George (2k log t , where t is the total

number of available regressors) and to the modification of BIC proposed by Siegmund (2004).

Search over 12 chromosomesmarkers spaced every 10 cM

n h2 p corr. extr r corr extr

200 0 0 0.95 0.03 0 - 0.02

500 0 0 0.99 0.01 0 - 0

200 0.2 1 1 0.03 0 0 0.02

200 0.195 0 - 0.01 1 0.95 0.04

n h2 p corr extr r corr extr

200 0.55 0 - 0.02 3 2.88 0.08

200 0.5 7 5.06 0.26 0 - 0.09

500 0.5 7 6.99 0.14 0 - 0.03

200 0.43 12 2.39 0.31 0 - 0.03

500 0.43 12 9.68 0.47 0 - 0.02

200 0.71 12 9.53 0.75 0 - 0.02

200 0.53 2 1.95 0.04 5 2.11 0.11

500 0.53 2 2 0.03 5 3.47 0.08

• The criterion adjusts well to the number of available markers

• For n = 200 the criterion detects almost all additive QTL with individual h2 =0.13 and interactions with h2 =0.2.

• For n = 500 the criterion detects almost all additive QTL with individual h2 =0.06 and interactions with h2 =0.12.

Bound for the type I error

1

0 0

the maximum of the criterion over

all one dimensional models

ˆ ˆ= log L ( / , ) the value of the criterion

for the null model

- the number of terms chosen by our criterion

S

S Y

D

P

1 0( 0) ( )D P S S

0

0 0

0

- the value of the criterion for a

given one dimensional model

if

ˆ( / )2 log log 2(log( 1) or log( 1))

ˆ( / )

( )

2 ( log 2(log( 1) or log( 1)))

where (0,1)

i

i

i

i

M

M

M

M

S

S S

L Yn l u

L Y

P S S

P Z n l u

Z N

2

1 01 2

By Bonferoni inequality and the bound

1P(Z>x) exp( )

222 2

( )( 1) ( , ) ( 1) ( , )

x

xM N

P S Sl C l n u C u n

1 0

, 2.2 2.2

( )

4.4 1 1

2 log 2log( 1) log 2log( 1)

M Nl u

P S S

n n l n u

For n=200 and typical values of M this yields values in the range between 0.057 and 0.08.

X ia - genotype of i-th individual at locus a X ia = 1/2 - individual is heterozygous at locus a

Documents

Transcript of X ia - genotype of i-th individual at locus a X ia = 1/2 - individual is heterozygous at locus a