GEE & GLMM in GWAS

45
Association Study: Binomial Case GEE & GLMM Jinseob Kim GSPH, SNU July 2, 2014 Jinseob Kim (GSPH, SNU) Association Study: Binomial Case July 2, 2014 1 / 45

description

GEE & GLMM in GWAS

Transcript of GEE & GLMM in GWAS

Page 1: GEE & GLMM in GWAS

Association Study: Binomial CaseGEE & GLMM

Jinseob Kim

GSPH, SNU

July 2, 2014

Jinseob Kim (GSPH, SNU) Association Study: Binomial Case July 2, 2014 1 / 45

Page 2: GEE & GLMM in GWAS

Contents

1 Correlated = Not IndependentConceptExample

2 GEE & GLMM BasicBasic Linear RegressionGEEGLMMComparison

3 GEE & GLMM in GWASConcepts of GWASGenetic CorrelationUse GEE & GLMM

4 Conclusion

Jinseob Kim (GSPH, SNU) Association Study: Binomial Case July 2, 2014 2 / 45

Page 3: GEE & GLMM in GWAS

Objective

1 Correlated data structure를 이해한다.

2 GEE, GLMM의 개념, 공통점, 차이점에 대해 이해한다.

3 GWAS에서 GEE, GLMM의 적용현실을 이해한다.

4 Binomial case에서 GEE, GLMM을 이용하지 못함을 숙지한다.

Jinseob Kim (GSPH, SNU) Association Study: Binomial Case July 2, 2014 3 / 45

Page 4: GEE & GLMM in GWAS

Correlated = Not Independent

Contents

1 Correlated = Not IndependentConceptExample

2 GEE & GLMM BasicBasic Linear RegressionGEEGLMMComparison

3 GEE & GLMM in GWASConcepts of GWASGenetic CorrelationUse GEE & GLMM

4 Conclusion

Jinseob Kim (GSPH, SNU) Association Study: Binomial Case July 2, 2014 4 / 45

Page 5: GEE & GLMM in GWAS

Correlated = Not Independent Concept

iid??

εi ∼iid N(0, σ2) or ε ∼ N(0, σ2In)

Independent

Identically distributed

εi ∼ N(0, σ2i )

Independent

Not Identically distributed

같은 모집단이 아니다!!

다음 시간에..

Jinseob Kim (GSPH, SNU) Association Study: Binomial Case July 2, 2014 5 / 45

Page 6: GEE & GLMM in GWAS

Correlated = Not Independent Concept

Variance-covariance matrix

var(ε) =

σ2 0 0 · · · 00 σ2 0 · · · 0...

......

. . ....

0 0 0 · · · σ2

= σ2

1 0 0 · · · 00 1 0 · · · 0...

......

. . ....

0 0 0 · · · 1

= σ2In

즉, covariance 중 0 아닌 것이 하나라도 있으면 correlated data!!

즉, 상관계수 중 0 아닌 것이 하나라도 있으면 correlated data!!

Jinseob Kim (GSPH, SNU) Association Study: Binomial Case July 2, 2014 6 / 45

Page 7: GEE & GLMM in GWAS

Correlated = Not Independent Example

Repeated Measure

Jinseob Kim (GSPH, SNU) Association Study: Binomial Case July 2, 2014 7 / 45

Page 8: GEE & GLMM in GWAS

Correlated = Not Independent Example

Clustered/Multilevel study

Jinseob Kim (GSPH, SNU) Association Study: Binomial Case July 2, 2014 8 / 45

Page 9: GEE & GLMM in GWAS

Correlated = Not Independent Example

Serial Correlation

Jinseob Kim (GSPH, SNU) Association Study: Binomial Case July 2, 2014 9 / 45

Page 10: GEE & GLMM in GWAS

Correlated = Not Independent Example

Familial structure in Genetic Study

Jinseob Kim (GSPH, SNU) Association Study: Binomial Case July 2, 2014 10 / 45

Page 11: GEE & GLMM in GWAS

Correlated = Not Independent Example

Genetic correlation

1 ρ12 ρ13 · · · ρ1nρ21 1 ρ23 · · · ρ2n

......

.... . .

...ρn1 ρn2 ρn3 · · · 1

1 0.5 0.25 · · · 00.5 1 1 · · · 0.5

......

.... . .

...0 0.5 0 · · · 1

Jinseob Kim (GSPH, SNU) Association Study: Binomial Case July 2, 2014 11 / 45

Page 12: GEE & GLMM in GWAS

GEE & GLMM Basic

Contents

1 Correlated = Not IndependentConceptExample

2 GEE & GLMM BasicBasic Linear RegressionGEEGLMMComparison

3 GEE & GLMM in GWASConcepts of GWASGenetic CorrelationUse GEE & GLMM

4 Conclusion

Jinseob Kim (GSPH, SNU) Association Study: Binomial Case July 2, 2014 12 / 45

Page 13: GEE & GLMM in GWAS

GEE & GLMM Basic Basic Linear Regression

Remind

β estimation in linear regression

1 Ordinary Least Square(OLS): semi-parametric

2 Maximum Likelihood Estimator(MLE): parametric

Jinseob Kim (GSPH, SNU) Association Study: Binomial Case July 2, 2014 13 / 45

Page 14: GEE & GLMM in GWAS

GEE & GLMM Basic Basic Linear Regression

Least Square(최소제곱법)

제곱합을 최소로: y 정규성에 대한 가정 필요없다.

Figure. OLS Fitting

Jinseob Kim (GSPH, SNU) Association Study: Binomial Case July 2, 2014 14 / 45

Page 15: GEE & GLMM in GWAS

GEE & GLMM Basic Basic Linear Regression

Likelihood??

가능도(likelihood) VS 확률(probability)

Discrete: 가능도 = 확률 - 주사위 던져 1나올 확률은 16

Continuous: 가능도 != 확률 - 0∼1 에서 숫자 하나 뽑았을 때 0.7일확률은 0...

Figure. Likelihood

Jinseob Kim (GSPH, SNU) Association Study: Binomial Case July 2, 2014 15 / 45

Page 16: GEE & GLMM in GWAS

GEE & GLMM Basic Basic Linear Regression

Maximum likelihood estimator(MLE)

최대가능도추정량: ε1, · · · , εn이 서로 독립이라하자.

1 각각의 가능도 함수를 구한다.

2 가능도를 전부 곱하면 전체 사건의 가능도 (독립이니까)

3 가능도를 최대로 하는 β를 구한다.

Jinseob Kim (GSPH, SNU) Association Study: Binomial Case July 2, 2014 16 / 45

Page 17: GEE & GLMM in GWAS

GEE & GLMM Basic Basic Linear Regression

MLE: 최대가능도추정량

데이터가 일어날 가능성을 최대로: y또는 ε 분포가정필요.

Jinseob Kim (GSPH, SNU) Association Study: Binomial Case July 2, 2014 17 / 45

Page 18: GEE & GLMM in GWAS

GEE & GLMM Basic Basic Linear Regression

Logistic function: MLE

Figure. Fitting Logistic Function

Jinseob Kim (GSPH, SNU) Association Study: Binomial Case July 2, 2014 18 / 45

Page 19: GEE & GLMM in GWAS

GEE & GLMM Basic Basic Linear Regression

LRT? Ward? score?

Likelihood Ratio Test VS Ward test VS score test

1 통계적 유의성 판단하는 방법들.

2 가능도비교 VS 베타값비교 VS 기울기비교/

Jinseob Kim (GSPH, SNU) Association Study: Binomial Case July 2, 2014 19 / 45

Page 20: GEE & GLMM in GWAS

GEE & GLMM Basic Basic Linear Regression

비교

Figure. Comparison

Jinseob Kim (GSPH, SNU) Association Study: Binomial Case July 2, 2014 20 / 45

Page 21: GEE & GLMM in GWAS

GEE & GLMM Basic Basic Linear Regression

AIC

우리가 구한 모형의 가능도를 L이라 하면.

1 AIC = −2× log(L) + 2× k

2 k: 설명변수의 갯수(성별, 나이, 연봉...)

3 작을수록 좋은 모형!!!

가능도가 큰 모형을 고르겠지만.. 설명변수 너무 많으면 페널티!!!

Jinseob Kim (GSPH, SNU) Association Study: Binomial Case July 2, 2014 21 / 45

Page 22: GEE & GLMM in GWAS

GEE & GLMM Basic GEE

OLS, GLS, GEE

Y = Xβ + ε (1)

var(ε) = σ2In : 즉 독립 - 그냥 OLS.

var(ε) = σ2Φ : 즉 독립이 아니라면?

GY = GXβ + Gε (2)

적당한 행렬 G를 곱한다.

var(Gε) = σ2In

OLS → G의 역행렬 다시 곱해준다: Generalized Least Square

GLS의 binomial, poisson 버전이 Generalized Estimating Equation.

Jinseob Kim (GSPH, SNU) Association Study: Binomial Case July 2, 2014 22 / 45

Page 23: GEE & GLMM in GWAS

GEE & GLMM Basic GEE

Ex: Repeated Measure

Cluster= individual, Option= exchangeable

Jinseob Kim (GSPH, SNU) Association Study: Binomial Case July 2, 2014 23 / 45

Page 24: GEE & GLMM in GWAS

GEE & GLMM Basic GEE

Serial or Unstructured

1 ρ ρ2 · · · ρn−1

ρ 1 ρ · · · ρn−2

......

.... . .

...ρn−1 ρn−2 ρn−3 · · · 1

1 ρ12 ρ13 · · · ρ1nρ21 1 ρ23 · · · ρ2n

......

.... . .

...ρn1 ρn2 ρn3 · · · 1

Jinseob Kim (GSPH, SNU) Association Study: Binomial Case July 2, 2014 24 / 45

Page 25: GEE & GLMM in GWAS

GEE & GLMM Basic GLMM

Fixed effect VS Random effect

Fixed effect

β를 구한다.

β = 0?

Random effect

β 구하는 것 포기. (ex: 병원 50개, 사람 3461명)

β 에 불확실성을 가정: 정확히 알 수 없다. (병원들의 효과 각각은 알수 없다, 개개인의 polygenic effect 정확히는 알 수 없다.)

Var(β) = 0? (병원들의 효과가 얼마나 차이가 있을라나...)

변수 49개 → 1개.

Jinseob Kim (GSPH, SNU) Association Study: Binomial Case July 2, 2014 25 / 45

Page 26: GEE & GLMM in GWAS

GEE & GLMM Basic GLMM

Linear Mixed Model

Y = Xβ + Zγ + ε (3)

Z: dummy variables for cluster.

var(ε) = σ2e In : 독립가정!!

var(β) = 0, var(γ) = σ2uA

σ2 = σ2u + σ2e (4)

이것의 Binomial 버전이 GLMM.

Jinseob Kim (GSPH, SNU) Association Study: Binomial Case July 2, 2014 26 / 45

Page 27: GEE & GLMM in GWAS

GEE & GLMM Basic Comparison

비교

공통점

1 독립가정이 깨졌을 때 이용한다.

차이점

1 GEE: semi-parametric, GLMM: parametric

2 Inference : Population VS Individual

Jinseob Kim (GSPH, SNU) Association Study: Binomial Case July 2, 2014 27 / 45

Page 28: GEE & GLMM in GWAS

GEE & GLMM Basic Comparison

Inference

Jinseob Kim (GSPH, SNU) Association Study: Binomial Case July 2, 2014 28 / 45

Page 29: GEE & GLMM in GWAS

GEE & GLMM Basic Comparison

철학의 차이

GEE: Cluster 보정만 하면 된다. 관심없다.

GLMM: Cluster마다 β값을 구하는 것은 포기. 단, Cluster마다 얼마나중요한지는 알아야겠다: 숫자 하나로 표현(σ2u) & β값 대략적으로는구할 수 있다(BLUP).

Jinseob Kim (GSPH, SNU) Association Study: Binomial Case July 2, 2014 29 / 45

Page 30: GEE & GLMM in GWAS

GEE & GLMM Basic Comparison

GEE example: Continuous

running glm to get initial regression estimate

(Intercept) age sex BMI

-64.2956645 0.1811694 -42.3958662 8.5256257

gee(formula = TG ~ age + sex + BMI, id = FID, data = a, corstr = "exchangeable")

Estimate Naive S.E. Naive z Robust S.E. Robust z

(Intercept) -67.2665582 35.8624272 -1.8756834 35.9094269 -1.8732284

age 0.1751885 0.3340099 0.5245007 0.3996143 0.4383938

sex -42.2905294 11.3716707 -3.7189372 8.3038131 -5.0929048

BMI 8.6744524 1.2930220 6.7086657 1.4041520 6.1777161

Working Correlation

[,1] [,2] [,3] [,4]

[1,] 1.0000000 0.2582559 0.2582559 0.2582559

[2,] 0.2582559 1.0000000 0.2582559 0.2582559

[3,] 0.2582559 0.2582559 1.0000000 0.2582559

[4,] 0.2582559 0.2582559 0.2582559 1.0000000

Jinseob Kim (GSPH, SNU) Association Study: Binomial Case July 2, 2014 30 / 45

Page 31: GEE & GLMM in GWAS

GEE & GLMM Basic Comparison

GLMM example: Continuous

lmer(formula = TG ~ age + sex + BMI + (1 | FID), data = a)

Estimate Std. Error t value

(Intercept) -65.222107 35.8720093 -1.8181894

age 0.109564 0.3318413 0.3301699

sex -41.942137 11.3684264 -3.6893529

BMI 8.648601 1.2917159 6.6954362

Groups Name Std.Dev.

FID (Intercept) 39.356

Residual 72.007

39.356^2/(39.356^2+72.007^2)=0.23

Jinseob Kim (GSPH, SNU) Association Study: Binomial Case July 2, 2014 31 / 45

Page 32: GEE & GLMM in GWAS

GEE & GLMM Basic Comparison

GEE example: Binomial

running glm to get initial regression estimate

(Intercept) age sex BMI

-5.457458529 0.009749659 -1.385819506 0.157734298

gee(formula = hyperTG ~ age + sex + BMI, id = FID, data = a,

family = binomial, corstr = "exchangeable")

Estimate Naive S.E. Naive z Robust S.E. Robust z

(Intercept) -5.453486897 1.10811194 -4.9214224 1.14198243 -4.7754561

age 0.008754136 0.00997040 0.8780125 0.01087413 0.8050421

sex -1.337114934 0.53428456 -2.5026270 0.52621253 -2.5410169

BMI 0.158988089 0.03867076 4.1113256 0.04248749 3.7419975

Working Correlation

[,1] [,2] [,3] [,4]

[1,] 1.0000000 0.1942491 0.1942491 0.1942491

[2,] 0.1942491 1.0000000 0.1942491 0.1942491

[3,] 0.1942491 0.1942491 1.0000000 0.1942491

[4,] 0.1942491 0.1942491 0.1942491 1.0000000

Jinseob Kim (GSPH, SNU) Association Study: Binomial Case July 2, 2014 32 / 45

Page 33: GEE & GLMM in GWAS

GEE & GLMM Basic Comparison

GLMM example: Binomial

glmer(formula = hyperTG ~ age + sex + BMI + (1 | FID), data = a,

family = binomial)

Estimate Std. Error z value Pr(>|z|)

(Intercept) -6.65451749 1.48227814 -4.4893852 7.142904e-06

age 0.01052907 0.01206682 0.8725635 3.829010e-01

sex -1.48506920 0.60773433 -2.4436158 1.454090e-02

BMI 0.19131619 0.05022612 3.8090977 1.394749e-04

Groups Name Std.Dev.

FID (Intercept) 1.1163

Jinseob Kim (GSPH, SNU) Association Study: Binomial Case July 2, 2014 33 / 45

Page 34: GEE & GLMM in GWAS

GEE & GLMM in GWAS

Contents

1 Correlated = Not IndependentConceptExample

2 GEE & GLMM BasicBasic Linear RegressionGEEGLMMComparison

3 GEE & GLMM in GWASConcepts of GWASGenetic CorrelationUse GEE & GLMM

4 Conclusion

Jinseob Kim (GSPH, SNU) Association Study: Binomial Case July 2, 2014 34 / 45

Page 35: GEE & GLMM in GWAS

GEE & GLMM in GWAS Concepts of GWAS

Issues

Concepts

Sample < SNP (3461 VS 500,000)

Regression more than 500,000 repeat...!!!!

Strict p-value(≤ 5× 10−8)

Issues

Computation burden.. speed!!

Complex correlation structure

Approximation technique

Jinseob Kim (GSPH, SNU) Association Study: Binomial Case July 2, 2014 35 / 45

Page 36: GEE & GLMM in GWAS

GEE & GLMM in GWAS Genetic Correlation

GCM

Genetic Correlation Matrix

Correlation structure: 이미 알고 있다. (가족구조 VS Data)

복잡하다. 규칙이 없다.

Computation...

Jinseob Kim (GSPH, SNU) Association Study: Binomial Case July 2, 2014 36 / 45

Page 37: GEE & GLMM in GWAS

GEE & GLMM in GWAS Genetic Correlation

Genetic Correlation Matrix: Example

R1E1I00051 R1E1I00241 R1E1I00251 R1E1I00040 R1E1I00230 R1R1I00251

R1E1I00051 1.00 0.5 0.0 0.25 0.25 0.5

R1E1I00241 0.50 1.0 0.0 0.50 0.50 0.0

R1E1I00251 0.00 0.0 1.0 0.50 0.50 0.0

R1E1I00040 0.25 0.5 0.5 1.00 0.50 0.0

R1E1I00230 0.25 0.5 0.5 0.50 1.00 0.0

R1R1I00251 0.50 0.0 0.0 0.00 0.00 1.0

R1E1I00060 0.00 0.0 0.0 0.00 0.00 0.0

R1E1I00070 0.00 0.0 0.0 0.00 0.00 0.0

R1E1I00081 0.00 0.0 0.0 0.00 0.00 0.0

R1E1I00091 0.00 0.0 0.0 0.00 0.00 0.0

R1E1I00060 R1E1I00070 R1E1I00081 R1E1I00091

R1E1I00051 0.0 0.0 0.0 0.0

R1E1I00241 0.0 0.0 0.0 0.0

R1E1I00251 0.0 0.0 0.0 0.0

R1E1I00040 0.0 0.0 0.0 0.0

R1E1I00230 0.0 0.0 0.0 0.0

R1R1I00251 0.0 0.0 0.0 0.0

R1E1I00060 1.0 0.5 0.5 0.5

R1E1I00070 0.5 1.0 0.5 0.5

R1E1I00081 0.5 0.5 1.0 0.5

R1E1I00091 0.5 0.5 0.5 1.0

Jinseob Kim (GSPH, SNU) Association Study: Binomial Case July 2, 2014 37 / 45

Page 38: GEE & GLMM in GWAS

GEE & GLMM in GWAS Use GEE & GLMM

주의점

Cluster는 없다. 각 개인 하나하나가 Cluster.

GCM 미리 저장한다.

Jinseob Kim (GSPH, SNU) Association Study: Binomial Case July 2, 2014 38 / 45

Page 39: GEE & GLMM in GWAS

GEE & GLMM in GWAS Use GEE & GLMM

GWAS example: GEE-continuous

running glm to get initial regression estimate

(Intercept) age sex BMI genecount

-63.0665181 0.1441694 -39.0676606 7.8280011 19.8533844

gee(formula = TG ~ age + sex + BMI + genecount, id = ID, data = a,

R = kin, corstr = "fixed")

Estimate Naive S.E. Naive z Robust S.E. Robust z

(Intercept) -63.0665181 35.4400639 -1.7795261 31.4650444 -2.0043359

age 0.1441694 0.3376881 0.4269307 0.3558302 0.4051635

sex -39.0676606 11.2797186 -3.4635315 7.2549380 -5.3849751

BMI 7.8280011 1.2914399 6.0614519 1.3054881 5.9962258

genecount 19.8533844 6.2315166 3.1859635 5.8534124 3.3917624

Working Correlation

[,1] [,2] [,3] [,4]

[1,] 1.0 0.5 0.5 0.5

[2,] 0.5 1.0 0.5 0.5

[3,] 0.5 0.5 1.0 0.0

[4,] 0.5 0.5 0.0 1.0

Jinseob Kim (GSPH, SNU) Association Study: Binomial Case July 2, 2014 39 / 45

Page 40: GEE & GLMM in GWAS

GEE & GLMM in GWAS Use GEE & GLMM

GWAS example: GEE-binomial

running glm to get initial regression estimate

(Intercept) age sex BMI genecount

-5.482288956 0.009646267 -1.348154797 0.151819412 0.192508455

gee(formula = hyperTG ~ age + sex + BMI + genecount, id = ID,

data = a, R = kin, family = binomial, corstr = "fixed")

Estimate Naive S.E. Naive z Robust S.E. Robust z

(Intercept) -5.482288957 1.10060632 -4.9811535 1.07919392 -5.0799850

age 0.009646267 0.01004073 0.9607134 0.01027862 0.9384789

sex -1.348154801 0.53873048 -2.5024662 0.52100579 -2.5876004

BMI 0.151819412 0.03861585 3.9315312 0.04199752 3.6149615

genecount 0.192508455 0.18683677 1.0303564 0.19281252 0.9984230

Working Correlation

[,1] [,2] [,3] [,4]

[1,] 1.0 0.5 0.5 0.5

[2,] 0.5 1.0 0.5 0.5

[3,] 0.5 0.5 1.0 0.0

[4,] 0.5 0.5 0.0 1.0

Jinseob Kim (GSPH, SNU) Association Study: Binomial Case July 2, 2014 40 / 45

Page 41: GEE & GLMM in GWAS

GEE & GLMM in GWAS Use GEE & GLMM

GWAS example: GLMM

lme4 패키지에서 구현 불가능.

hglm 패키지에서 가능.

GenABEL에서 polygenic hglm 함수로 구현되어 있음.

Jinseob Kim (GSPH, SNU) Association Study: Binomial Case July 2, 2014 41 / 45

Page 42: GEE & GLMM in GWAS

GEE & GLMM in GWAS Use GEE & GLMM

Limitation

Both GEE & GLMM

느리다. 특히 가족구조 + Binomial은 최악..

Continuous: Approximation의 발달로 극복- FASTA, GRAMMAR,GEMMA..

Binomial: Approximation 딱히..- Speed문제 극복불가.

Jinseob Kim (GSPH, SNU) Association Study: Binomial Case July 2, 2014 42 / 45

Page 43: GEE & GLMM in GWAS

Conclusion

Contents

1 Correlated = Not IndependentConceptExample

2 GEE & GLMM BasicBasic Linear RegressionGEEGLMMComparison

3 GEE & GLMM in GWASConcepts of GWASGenetic CorrelationUse GEE & GLMM

4 Conclusion

Jinseob Kim (GSPH, SNU) Association Study: Binomial Case July 2, 2014 43 / 45

Page 44: GEE & GLMM in GWAS

Conclusion

정리

1 독립가정이 깨질 때 이용한다.

2 GEE와 GLMM은 해석의 차이가 있다.

3 GLMM이 Computing burden이 더 크다.

4 GWAS에서는 Correlation 구조 미리 구한다: kinship matrix

5 Binomial trait: GWAS - 해결하면 nature급.

현재 Binomial trait은 TDT기반의 통계량밖에.. Sample size issue..;;

Jinseob Kim (GSPH, SNU) Association Study: Binomial Case July 2, 2014 44 / 45

Page 45: GEE & GLMM in GWAS

Conclusion

END

Email : [email protected]: (02)880-2473H.P: 010-9192-5385

Jinseob Kim (GSPH, SNU) Association Study: Binomial Case July 2, 2014 45 / 45