1 Associating Genomic Variations with Phenotypes Model comparison, rare variants, and analysis...

Post on 18-Dec-2015

227 views 0 download

Tags:

Transcript of 1 Associating Genomic Variations with Phenotypes Model comparison, rare variants, and analysis...

1

Associating Genomic Variations with

Phenotypes

Model comparison, rare variants, and analysis pipeline

Qunyuan Zhang

Division of Statistical Genomics & Genome Institute

Washington University School of Medicine

2

Data & Question

Relationshipbetween X and Y ?

nmnnn

m

m

xxxyn

xxxy

xxxy

XYi

..

.....................

...2

...1

21

222212

112111

Genotypes:SNP

InsertionDeletion

DuplicationInversion

Translocation…

Phenotypes(quantitative,categorical)

3

Linkage & Association

Association: (Y,X)

Linkage: (Y,Q)Q is unobservable

...

.....................

...2

...1

221

2222212

1212111

nnnn xqxyn

xqxy

xqxy

XYi Genotypes

Phenotype

Putative QTL

r1 Q r2

4

A Fixed-effect Mixture Model For LinkageCommonly used in plant genetics

r1 Q r2

P1 X P2

F1

F2

3

1

),|()(j

iji rXQPyf

2)(

2

1exp

2

1

j

jiy

j

n

iiyfYL

1

)()(

SNP A SNP B

5

A Variance-component Model For Linkage

Commonly used in human genetics

r1 Q r2

)()(

2

1exp

||)2(

1)( 1

2/12/

YYYL T

nV

V

222)( eggQQYCov IΔΔV

Background IBD matrix

QTL IBD matrix

Diagonal unit matrix

SNP A SNP B

6

Variance-component Model = Random-effect Linear Model

222eggQQ IΔΔV

eγZγZμ ggQQY

),0( 2QQMVN Δ ),0( 2

ggMVN Δ ),0( 2eN

)()(

2

1exp

||)2(

1)( 1

2/12/

YYYL T

nV

V

Random effects

7

From Linkage to Association

22egg IΔV

eγZγZμ ggQQY

)()(

2

1exp

||)2(

1)( 1

2/12/

XYXYYL T

nV

V

eγZXβμ ggY

marker effect(s)

Family-based association model

Linkage model

QTL effect(s)

fixed effect(s)

8

A Simple Association ModelFor Unrelated Subjects

2eIV

)()(

2

1exp

||)2(

1)( 1

2/12/

XYXYYL T

nV

V

eXβμ Y

n

i e

i Xy

e1

2)(2

1exp

2

1

9

Covariate(s): Adjusting For Confounder(s)

eβXXβμ CCY

Observed confounders: age, sex etc.Hidden confounders: population structure

Population structure can be estimated by:-PCA-Clustering-Admixture/ancestry

10

Modeling Hidden Genetic CorrelationBetween Subjects

22egg IΔV

eγZβXXβμ ggCCY

marker fixed effect(s)

Family data, pedigree => IBD matrixPopulation data, hidden, marker data => IBS matrix

covariate fixed effect(s)

Genetic background random effects

11

Modeling Rare Variants

eγZβXXβμ ggCCY

...11 XY μ

......2211 kkXXXY μ

Common variants, tested individually, H0: β1=0. One p-value per variant

Rare variants, tested as an entire group (burden test), usually by geneH0: β1= β2=…=βk=0 . One p-value per group of variants

Incorporated with variable selection, with loose criteria

β can be treated as random effects, variance components test, can be weighted by prior information

12

Collapsing Model

......2211 kkXXXY μ

... XY μ

1

1

0

0013

1102

0001321 XXXXsubject

Collapsing multiple variables into one

13

Weighted Sum Model

......2211 kkXXXY μ

...)(1

k

jjjXwY μ

2.0

8.0

0.0

0013

1102

00013.05.02.0 1

3

1

2

1

1 S

w

X

w

X

w

Xsubject

Weighted sum score

... SY μ

14

Weighting Variants

Base on allele frequency, continuous or binary(0,1) weight, variable threshold;

Based on function annotation/prediction;Based on sequencing quality (coverage, mapping quality,

genotyping quality, validated or not etc.);Data-driven, using both genotype and phenotype data,

learning weights (including effect directions) from data, requiring permutation test;

Any combination …

Grouping VariantsBy gene By transcript By exonBy gene set / pathway By protein domain……

15

Modeling More Data TypesGeneralized Linear (Mixed) Model

eXβμ ...)(Yg

Link function

For binary Y, logistic model

)0(1

)1(log)(log)(

YP

YPYitYg

1)...exp(

)...exp()1(

eXβμ

eXβμYP

16

Longitudinal Data (quantitative)

Fixed effect, time as covariate

Repeated measures, random effect, correlation within subjects

Time

17

Longitudinal Data (binary)

Linear model, time as covariate

Survival analysis, CoxPH model etc.

Time

18

Tools

SAS ProceduresREG, LOGISTIC, GENMOD, MIXED, HPMIXED, GLIMMIX, PHREG/LIFETEST

R Functions/Packageslm (), glm()gee, nlme, kinship2/coxme, lme4, survival

Other ProgramsSOLAR, MMAP, EMMA, EMMAX, SKAT

19

Pipeline

job1 job2 …..Job N

Input (data + options)

Options.jobi => self-programmed modules (SAS, R,…)

Options.jobi => external program modules (MMAP, SKAT,..)

Result 1

Result 2

….. Result N

Job generating/submitting module

Job number controlling module

Job status monitoring module (all done ?)

Yes

Result summarizing module

no

Wait …

LSF bsub

20

gwas.sh options.gwa

#!/bin/shOPFILE=$1...…

[DATA]database=SASgenotype_dir=/dsg1/gwas/fhsgenogenotype_file=

phenotype_file=fhs100markerinfo_file=mapallmarker_selection=MAF>0.01pedigree_file=pediallsubjectID=subjectpedgreeID=famidmarkername=snp…[ANALYSIS]phenolist_file=pheno_list=bmi/qtcovariates=program=SASGLManalysis=mixed[OUTPUT]output_dir=/dsguser/qunyuan/fhs/bmioutput_file=output_replace=no[RUN]clusterjobname=bmimixedmemsize=1000Mmaxjobn=300…

Pheno type covar program analysis runBmi qt age,sex SASGLM mixed YESObes ql NA SASGLM gee YESHD ql age SASGLM gee NOAge …Sex ……

Program language location Maintainer SASGLM SAS /dsg1/code/sas/glm.sas Q.ZhangGSTAT R /dsg1/code/R/gstat.R Q.ZhangMMAP C /dsg1/code/sas/mmap.sh J. Czajkowski…

21

Thanks !