Marker Based Infinitesimal Model for Quantitative Trait ... · Marker Based Infinitesimal Model for...

48
Marker Based Infinitesimal Model for Quantitative Trait Analysis Shizhong Xu Department of Botany and Plant Sciences University of California Riverside, CA 92521

Transcript of Marker Based Infinitesimal Model for Quantitative Trait ... · Marker Based Infinitesimal Model for...

Page 1: Marker Based Infinitesimal Model for Quantitative Trait ... · Marker Based Infinitesimal Model for Quantitative Trait Analysis Shizhong Xu Department of Botany and Plant Sciences

Marker Based Infinitesimal Model for

Quantitative Trait Analysis

Shizhong Xu

Department of Botany and Plant Sciences

University of California

Riverside, CA 92521

Page 2: Marker Based Infinitesimal Model for Quantitative Trait ... · Marker Based Infinitesimal Model for Quantitative Trait Analysis Shizhong Xu Department of Botany and Plant Sciences

Outline

• Quantitative trait and the infinitesimal model

• Infinitesimal model using marker information

• Adaptive infinitesimal model

• Simulation studies

• Rice and beef cattle data analyses

Page 3: Marker Based Infinitesimal Model for Quantitative Trait ... · Marker Based Infinitesimal Model for Quantitative Trait Analysis Shizhong Xu Department of Botany and Plant Sciences

Outline

• Quantitative trait and the infinitesimal model

• Infinitesimal model using marker information

• Adaptive infinitesimal model

• Simulation studies

• Rice and beef cattle data analyses

Page 4: Marker Based Infinitesimal Model for Quantitative Trait ... · Marker Based Infinitesimal Model for Quantitative Trait Analysis Shizhong Xu Department of Botany and Plant Sciences

Quantitative Trait

Page 5: Marker Based Infinitesimal Model for Quantitative Trait ... · Marker Based Infinitesimal Model for Quantitative Trait Analysis Shizhong Xu Department of Botany and Plant Sciences

Quantitative Genetics Model

Phenotype = Genotype + Environment

Page 6: Marker Based Infinitesimal Model for Quantitative Trait ... · Marker Based Infinitesimal Model for Quantitative Trait Analysis Shizhong Xu Department of Botany and Plant Sciences

Infinitesimal Model

• Infinite number of genes

• Infinitely small effect of each gene

• Effect of an individual gene is not recognizable

• Collective effect of all genes are studied using

pedigree information (genetic relationship)

• Best linear unbiased prediction (BLUP)

Page 7: Marker Based Infinitesimal Model for Quantitative Trait ... · Marker Based Infinitesimal Model for Quantitative Trait Analysis Shizhong Xu Department of Botany and Plant Sciences

Outline

• Quantitative trait and the infinitesimal model

• Infinitesimal model using marker information

• Adaptive infinitesimal model

• Simulation studies

• Rice and beef cattle data analyses

Page 8: Marker Based Infinitesimal Model for Quantitative Trait ... · Marker Based Infinitesimal Model for Quantitative Trait Analysis Shizhong Xu Department of Botany and Plant Sciences

Marker Based Infinitesimal Model

1

1

0

( ) ( )

p

j jk k jk

j jk k jk

L

j j j

y Z

y Z

y Z d

Page 9: Marker Based Infinitesimal Model for Quantitative Trait ... · Marker Based Infinitesimal Model for Quantitative Trait Analysis Shizhong Xu Department of Botany and Plant Sciences

Different from Longitudinal Data Analysis

0

( ) ( )

( ) ( ) ;

L

j j j

j j

y Z d

y t t t

Page 10: Marker Based Infinitesimal Model for Quantitative Trait ... · Marker Based Infinitesimal Model for Quantitative Trait Analysis Shizhong Xu Department of Botany and Plant Sciences

Numerical Integration

Page 11: Marker Based Infinitesimal Model for Quantitative Trait ... · Marker Based Infinitesimal Model for Quantitative Trait Analysis Shizhong Xu Department of Botany and Plant Sciences

Bin Effect Model

1

0

1

1

( ) ( )

( ) ( )

j jk k jk

L

j j j

m

j j k k k jk

m

j jk k jk

y Z

y Z d

y Z

y Z

Page 12: Marker Based Infinitesimal Model for Quantitative Trait ... · Marker Based Infinitesimal Model for Quantitative Trait Analysis Shizhong Xu Department of Botany and Plant Sciences

Bin Effects

Dense markers

Bin Bin

1

1( )

kp

jk jhk

Z Z hp

Page 13: Marker Based Infinitesimal Model for Quantitative Trait ... · Marker Based Infinitesimal Model for Quantitative Trait Analysis Shizhong Xu Department of Botany and Plant Sciences

Recombination Breakpoint Data

1

0

1Marker: ( )

1Breakpoint: ( )

k

k

p

jk jhk

jk j

k

Z Z hp

Z Z d

Page 14: Marker Based Infinitesimal Model for Quantitative Trait ... · Marker Based Infinitesimal Model for Quantitative Trait Analysis Shizhong Xu Department of Botany and Plant Sciences

0

1( )

k

jk j

k

Z Z d

8 21 0 0.8

10 10jkZ

Page 15: Marker Based Infinitesimal Model for Quantitative Trait ... · Marker Based Infinitesimal Model for Quantitative Trait Analysis Shizhong Xu Department of Botany and Plant Sciences
Page 16: Marker Based Infinitesimal Model for Quantitative Trait ... · Marker Based Infinitesimal Model for Quantitative Trait Analysis Shizhong Xu Department of Botany and Plant Sciences

What Does a Bin Effect Represent?

1

0

0 0

1( )

1( ) ( )

size of bin k

uniform variable

k

k k

m

j jk k jk

jk j

k

k k

k

k

y Z

Z Z d

d d

Page 17: Marker Based Infinitesimal Model for Quantitative Trait ... · Marker Based Infinitesimal Model for Quantitative Trait Analysis Shizhong Xu Department of Botany and Plant Sciences

Assumptions of the Infinitesimal Model

• High linkage disequilibrium within a bin

• Homogeneous genetic effect within a bin

Page 18: Marker Based Infinitesimal Model for Quantitative Trait ... · Marker Based Infinitesimal Model for Quantitative Trait Analysis Shizhong Xu Department of Botany and Plant Sciences

High Linkage Disequilibrium

0

20

1( )

number of crossovers, inversely

related to linkage disequilibrium

1lim var( ) , high linkage disequilibrium (F )

2

lim var( ) 0, low linkage disequilibrium

Larger v

k

k

k

jk j

k

k

jk

jk

Z Z d

Z

Z

ar( ) means higher powerjkZ

Page 19: Marker Based Infinitesimal Model for Quantitative Trait ... · Marker Based Infinitesimal Model for Quantitative Trait Analysis Shizhong Xu Department of Botany and Plant Sciences

Range of Var(Z)

22

22

20 0 0

2

2 1 1 1lim var( ) lim lim

4 2 2

2 1 1lim var( ) lim lim 0

4 2

k

k

k k

k

k

k k k

k

kjk

k

jk

k

k

eZ e

eZ e

0 var( ) 0.5

0

choose var( ) as close to 0.5 as possible

but with the number of bins small enough to be

handled by a program for a given sample size

jk

k

jk

Z

Z

Page 20: Marker Based Infinitesimal Model for Quantitative Trait ... · Marker Based Infinitesimal Model for Quantitative Trait Analysis Shizhong Xu Department of Botany and Plant Sciences

Outline

• Quantitative trait and the infinitesimal model

• Infinitesimal model using marker information

• Adaptive infinitesimal model

• Simulation studies

• Rice and beef cattle data analyses

Page 21: Marker Based Infinitesimal Model for Quantitative Trait ... · Marker Based Infinitesimal Model for Quantitative Trait Analysis Shizhong Xu Department of Botany and Plant Sciences

Adaptive Model Relaxes the Two

Assumptions

• High linkage disequilibrium within a bin

- prevent var(Z) from being zero

• Homogeneous genetic effect within a bin

- make all effects positive

Page 22: Marker Based Infinitesimal Model for Quantitative Trait ... · Marker Based Infinitesimal Model for Quantitative Trait Analysis Shizhong Xu Department of Botany and Plant Sciences

Redefine the Bin Size by the Number

of Markers Within a Bin

1

1

1

1( )

( )

number of markers in bin k

k

k

m

j jk k jk

p

jk jhk

p

k k kh

k

y Z

Z Z hp

p h

p

Page 23: Marker Based Infinitesimal Model for Quantitative Trait ... · Marker Based Infinitesimal Model for Quantitative Trait Analysis Shizhong Xu Department of Botany and Plant Sciences

Weighted Average Effect of a Bin

1 1

* * 1

1 1

1Unweighted: ( ); ( )

1Weighted: ( ) ( ); ( ) ( )

k k

k k

p p

jk j k k kh hk

p p

jk j kh hk

Z Z h p hp

Z w h Z h w h hp

* *

1

m

j jk k jk

y Z

Page 24: Marker Based Infinitesimal Model for Quantitative Trait ... · Marker Based Infinitesimal Model for Quantitative Trait Analysis Shizhong Xu Department of Botany and Plant Sciences

Weight System

1

1

1

1 ˆ ˆDefine | | = mean(| |)

ˆwhere is the least squares estimate

of marker within bin

The weight for marker is defined as

ˆ ˆˆ

ˆmean(| |)ˆ| |

k

k

p

k hhk

h

k h hh k h p

hh

c b bp

b

h k

h

p b bw c b

bb

Page 25: Marker Based Infinitesimal Model for Quantitative Trait ... · Marker Based Infinitesimal Model for Quantitative Trait Analysis Shizhong Xu Department of Botany and Plant Sciences

Weighted Var(Z*) > 0

* * * *

21

2

21

2

21

1var( ) var ( ) 2 cov ( ), ( )

1 1 1 2 (1 2 )

2 2

1 1 , when no linkage disequilibrium (1 2 )

2

k k

k k

k

p p

jk j j jh l hk

p p

h h l hlh l hk

p

h hlhk

Z Z h Z h Z lp

w w wp

wp

0

0

Page 26: Marker Based Infinitesimal Model for Quantitative Trait ... · Marker Based Infinitesimal Model for Quantitative Trait Analysis Shizhong Xu Department of Botany and Plant Sciences

Homogenization of Marker

Effects Within Bin

* 1

1 1 1

*

1

( ) ˆ( ) | |ˆ

( )where (a constant)

ˆ

ˆ ˆ| | 0 as long as one 0

k k k

k

p p p

k h k k k hh h hh

h

p

k h hh

hw h c c p b

b

h

b

b b

Page 27: Marker Based Infinitesimal Model for Quantitative Trait ... · Marker Based Infinitesimal Model for Quantitative Trait Analysis Shizhong Xu Department of Botany and Plant Sciences

Outline

• Quantitative trait and the infinitesimal model

• Infinitesimal model using marker information

• Adaptive infinitesimal model

• Simulation studies

• Rice and beef cattle data analyses

Page 28: Marker Based Infinitesimal Model for Quantitative Trait ... · Marker Based Infinitesimal Model for Quantitative Trait Analysis Shizhong Xu Department of Botany and Plant Sciences

Measurement of Prediction (Cross Validation)

2

1

2

1

2

1ˆ( ) , Mean Squared Error

1( ) , Phenotypic Variance

, Squared Correlation

n

j jj

n

j jj

MSE y yn

MSY y yn

MSY MSER

MSY

Page 29: Marker Based Infinitesimal Model for Quantitative Trait ... · Marker Based Infinitesimal Model for Quantitative Trait Analysis Shizhong Xu Department of Botany and Plant Sciences

Simulation Experiment

• Genome size = 2,500 cM

• Number of markers = 120,000

• Marker interval = 0.02 cM

• Cross validation (MSE)

• Design I = 20 QTL

• Design II = Clustered polygenic model

• Design III = Polygenic model

• Design IV = Design I with 2,500 x100 cM

Page 30: Marker Based Infinitesimal Model for Quantitative Trait ... · Marker Based Infinitesimal Model for Quantitative Trait Analysis Shizhong Xu Department of Botany and Plant Sciences

True QTL Effect

0 500 1000 1500 2000 2500

-6-4

-20

24

68

Pa

ram

ete

r se

ttin

g

true values

Eff

ect

Position (cM)

Page 31: Marker Based Infinitesimal Model for Quantitative Trait ... · Marker Based Infinitesimal Model for Quantitative Trait Analysis Shizhong Xu Department of Botany and Plant Sciences

Estimated Bin Effects0 500 1000 1500 2000 2500

Estim

ate

(a) Δ = 1cM m = 2400

p = 50

-6-2

26

Eff

ect

0 500 1000 1500 2000 2500

Estim

ate

(b) Δ = 2cM m = 1200

p = 100

-6-2

26

Eff

ect

0 500 1000 1500 2000 2500

Estim

ate

(c) Δ = 5cM m = 480

p = 250

-6-2

26

Eff

ect

0 500 1000 1500 2000 2500

Estim

ate

(d) Δ = 10cM

m = 240 p = 500

-6-2

26

Eff

ect

0 500 1000 1500 2000 2500

Estim

ate

(e) Δ = 20cM

m = 120 p = 1000

-6-2

26

Eff

ect

0 500 1000 1500 2000 2500

Estim

ate

(f) Δ = 40cM m = 60 p = 2000

-6-2

26

Eff

ect

Position (cM)

0 500 1000 1500 2000 2500

Estim

ate

(g) Δ = 100cM m = 24

p = 5000

-6-2

26

0 500 1000 1500 2000 2500

Estim

ate

(h) Δ = 150cM m = 16

p = 7500

-6-2

26

0 500 1000 1500 2000 2500

Estim

ate

(i) Δ = 300cM m = 8

p = 15000

-6-2

26

0 500 1000 1500 2000 2500

Estim

ate

(j) Δ = 600cM

m = 4 p = 30000

-6-2

26

0 500 1000 1500 2000 2500

Estim

ate

(k) Δ = 1200cM

m = 2 p = 60000

-50

510

0 500 1000 1500 2000 2500

Estim

ate

(l) Δ = 2400cM m = 1 p = 120000

-50

510

Position (cM)

Page 32: Marker Based Infinitesimal Model for Quantitative Trait ... · Marker Based Infinitesimal Model for Quantitative Trait Analysis Shizhong Xu Department of Botany and Plant Sciences

True and Estimated QTL Effect

0 500 1000 1500 2000 2500

-6-4

-20

24

68

Pa

ram

ete

r se

ttin

gtrue values

Eff

ect

Position (cM)

0 500 1000 1500 2000 2500

Estim

ate

Δ = 20cM

m = 120

p = 1000

-6-4

-20

24

68

Eff

ect

Position (cM)

Page 33: Marker Based Infinitesimal Model for Quantitative Trait ... · Marker Based Infinitesimal Model for Quantitative Trait Analysis Shizhong Xu Department of Botany and Plant Sciences

(data$cm[idx])

data

$m

se[idx]

(data$cm[idx])

data

$m

se[idx]

(data$cm[idx])

data

$m

se[idx]

(data$cm[idx])

data

$m

se[idx]

(data$cm[idx])

data

$m

se[idx]

0 20 40 60 80 100

40

50

60

70

80

90

(a) 2=10, h

2=0.638

Me

an

sq

ua

red

err

or

(data$cm[idx])

data

$m

se[idx]

(data$cm[idx])

data

$m

se[idx]

(data$cm[idx])

data

$m

se[idx]

(data$cm[idx])

data

$m

se[idx]

(data$cm[idx])

data

$m

se[idx]

0 20 40 60 80 100

50

60

70

80

90

100

(b) 2=20, h

2=0.581

Bin size (cM)

Me

an

sq

ua

red

err

or

(data$cm[idx])

data

$m

se[idx]

(data$cm[idx])

data

$m

se[idx]

(data$cm[idx])

data

$m

se[idx]

(data$cm[idx])

data

$m

se[idx]

(data$cm[idx])

data

$m

se[idx]

0 20 40 60 80 100

90

100

120

140

(c) 2=50, h

2=0.457

(data$cm[idx])

data

$m

se[idx]

(data$cm[idx])

data

$m

se[idx]

(data$cm[idx])

data

$m

se[idx]

(data$cm[idx])

data

$m

se[idx]

(data$cm[idx])

data

$m

se[idx]

0 20 40 60 80 100140

160

180

200

220 (d)

2=100, h

2=0.337

Bin size (cM)

n=200 n=300 n=400 n=500 n=1000

Figure 1. Mean squared error expressed as a function of bin size for Design I. The mean squared

errors were obtained from 100 replicated simulations. The overall proportion of the phenotypic

variance contributed by the 20 simulated QTL was calculated using 2 264.41/ (64.41 26.53 )h . Each panel contains the result of five different sample sizes (n).

The phenotypic variance of the simulated trait is indicated by the light horizontal line in each

panel (each panel represents one of the four different scenarios).

Page 34: Marker Based Infinitesimal Model for Quantitative Trait ... · Marker Based Infinitesimal Model for Quantitative Trait Analysis Shizhong Xu Department of Botany and Plant Sciences

4.0 4.5 5.0 5.5

05

01

00

15

0

log10cmlist * 5000

$M

SE

1v

log10cmlist * 5000

Me

an

sq

ua

red

err

or

Bin size (log10 cM)

Figure 6. Mean squared error for the simulated data under design IV (low linkage disequilibrium)

plotted against the bin size. The sample size of the simulated population was 500n . The

residual error variance was 2 20 , corresponding to 2 0.777h . The filled circles indicate the

MSE under the infinitesimal model while the open circles indicate the MSE under the adaptive

infinitesimal model. The dashed horizontal line represents the phenotypic variance of the

simulated trait (89.71).

Page 35: Marker Based Infinitesimal Model for Quantitative Trait ... · Marker Based Infinitesimal Model for Quantitative Trait Analysis Shizhong Xu Department of Botany and Plant Sciences

Outline

• Quantitative trait and the infinitesimal model

• Infinitesimal model using marker information

• Adaptive infinitesimal model

• Simulation studies

• Rice and beef cattle data analyses

Page 36: Marker Based Infinitesimal Model for Quantitative Trait ... · Marker Based Infinitesimal Model for Quantitative Trait Analysis Shizhong Xu Department of Botany and Plant Sciences

Rice Tiller Number (Yu et al. 2011)

• Number of recombinant inbred lines: 210

• Number of SNP: 270,820

• Number of natural bins: 1619

• Number of artificial bins: vary from small to large

• Method: Empirical Bayes (eBayes)

• Cross validation: MSE and R-square

Page 37: Marker Based Infinitesimal Model for Quantitative Trait ... · Marker Based Infinitesimal Model for Quantitative Trait Analysis Shizhong Xu Department of Botany and Plant Sciences

Yu et al. 2007, PLoS One 6(3) e17595

Page 38: Marker Based Infinitesimal Model for Quantitative Trait ... · Marker Based Infinitesimal Model for Quantitative Trait Analysis Shizhong Xu Department of Botany and Plant Sciences

Figure 5. The MSE (curve in the left panel) and the R-square (curve in the right panel)

of the rice tiller number trait analysis, expressed as a function of bin size (artificial

bins). The black dashed horizontal line in the left panel is the phenotypic variance. The

red dashed horizontal line in the left panel is the MSE of the natural bin (without

breakpoints with bin) analysis. The red dashed horizontal line in the right panel is the

R-square of the natural bin analysis. R-square increased from 0.42 to 0.55.

Page 39: Marker Based Infinitesimal Model for Quantitative Trait ... · Marker Based Infinitesimal Model for Quantitative Trait Analysis Shizhong Xu Department of Botany and Plant Sciences

Beef Cattle Data Analysis

• Trait = carcass weight

• Number of beef = 922

• Number of SNP markers = 40809

• Number of chromosomes = 29

• Methods = unweighted and weighted

Page 40: Marker Based Infinitesimal Model for Quantitative Trait ... · Marker Based Infinitesimal Model for Quantitative Trait Analysis Shizhong Xu Department of Botany and Plant Sciences

5.0 5.5 6.0 6.5 7.0 7.5 8.0 8.5

40

04

50

50

05

50

60

06

50

70

0

Bin size (log10 bp)

Me

an

sq

ua

red

err

or

Figure 7. Mean squared error for the carcass trait of beef cattle plotted against the bin size. The

filled circles indicate the MSE under the infinitesimal model while the open circles indicate the

MSE under the adaptive infinitesimal model. The dashed horizontal line represents the

phenotypic variance of the simulated trait (670.36). The blue horizontal line along with the two

dotted lines represents the MSE and the standard deviation of the MSE in the situation where the

bin size was one (one marker per bin). The sample size was 921n and the number of SNP

markers was 40809p . The bin size was defined as log10 bp. For example, the largest bin size

10log bp 8.5 means that the bin size contains 58.5 10 base pairs.

Page 41: Marker Based Infinitesimal Model for Quantitative Trait ... · Marker Based Infinitesimal Model for Quantitative Trait Analysis Shizhong Xu Department of Botany and Plant Sciences
Page 42: Marker Based Infinitesimal Model for Quantitative Trait ... · Marker Based Infinitesimal Model for Quantitative Trait Analysis Shizhong Xu Department of Botany and Plant Sciences
Page 43: Marker Based Infinitesimal Model for Quantitative Trait ... · Marker Based Infinitesimal Model for Quantitative Trait Analysis Shizhong Xu Department of Botany and Plant Sciences

5.0 5.5 6.0 6.5 7.0 7.5 8.0 8.5

40

04

50

50

05

50

60

06

50

70

0

Bin size (log10 bp)

Me

an

sq

ua

red

err

or

Figure 7. Mean squared error for the carcass trait of beef cattle plotted against the bin size. The

filled circles indicate the MSE under the infinitesimal model while the open circles indicate the

MSE under the adaptive infinitesimal model. The dashed horizontal line represents the

phenotypic variance of the simulated trait (670.36). The blue horizontal line along with the two

dotted lines represents the MSE and the standard deviation of the MSE in the situation where the

bin size was one (one marker per bin). The sample size was 921n and the number of SNP

markers was 40809p . The bin size was defined as log10 bp. For example, the largest bin size

10log bp 8.5 means that the bin size contains 58.5 10 base pairs.

Page 44: Marker Based Infinitesimal Model for Quantitative Trait ... · Marker Based Infinitesimal Model for Quantitative Trait Analysis Shizhong Xu Department of Botany and Plant Sciences

5.0 5.5 6.0 6.5 7.0 7.5 8.0 8.5

40

04

50

50

05

50

60

06

50

70

0

Bin size (log10 bp)

MS

E

0.2940.30.2980.281

0.3050.3260.3240.3240.333

0.306

0.204

0.122

-0.0020.0010.001-0.001

0.0260.0170.0240.0260.039

0.0850.0750.087

Marker Analysis p = 40809

MSE = 600

R^2 = (670-600)/670 = 0.09

0.09

Page 45: Marker Based Infinitesimal Model for Quantitative Trait ... · Marker Based Infinitesimal Model for Quantitative Trait Analysis Shizhong Xu Department of Botany and Plant Sciences

Table 1. Mean squared error (MSE) and R-square values obtained from the 10-fold cross

validation analysis for the beef carcass trait using five competing models and the

proposed bin model.

Model MSE2 R-square

eBayes 648.11 0.0332

G-Blup 632.46 0.0565

BayesB-1 655.59 0.0220

BayesB-21 658.19 0.0182

Lasso 603.75 0.0994

Bin model 447.10 0.3330

1The Pi value for BayesB-2 is set at 0.95.

2The phenotypic variance of the beef carcass trait is 670.36. The magnitude of MSE value

smaller than 670.36 indicates the effectiveness of the model predictability.

Page 46: Marker Based Infinitesimal Model for Quantitative Trait ... · Marker Based Infinitesimal Model for Quantitative Trait Analysis Shizhong Xu Department of Botany and Plant Sciences

Outline

• Quantitative trait and the infinitesimal model

• Infinitesimal model using marker information

• Adaptive infinitesimal model

• Simulation studies

• Rice and beef cattle data analyses

Page 47: Marker Based Infinitesimal Model for Quantitative Trait ... · Marker Based Infinitesimal Model for Quantitative Trait Analysis Shizhong Xu Department of Botany and Plant Sciences

Acknowledgements

• Zhiqiu Hu (postdoc)

• Qifa Zhang (rice data)

• Zhiqiun Wang (beef data)

• USDA Grant 2007-02784

Page 48: Marker Based Infinitesimal Model for Quantitative Trait ... · Marker Based Infinitesimal Model for Quantitative Trait Analysis Shizhong Xu Department of Botany and Plant Sciences

Thank You !