Pathway talk for IGES 2009 Hawaii
-
Upload
usc -
Category
Technology
-
view
226 -
download
1
Transcript of Pathway talk for IGES 2009 Hawaii
Using pathways todiscover complexdisease models
Gary Chen,Duncan ThomasDepartment of
PreventiveMedicine
USC
1. Motivation
2. A stochasticsearch variableselection algorithm
3. Example usingcandidate genes
4. Ideas for GWAS
Using pathways to discovercomplex disease models
Gary Chen, Duncan ThomasDepartment of Preventive Medicine
USC
October 20, 2009
Using pathways todiscover complexdisease models
Gary Chen,Duncan ThomasDepartment of
PreventiveMedicine
USC
1. Motivation
2. A stochasticsearch variableselection algorithm
3. Example usingcandidate genes
4. Ideas for GWAS
An outline
1. Motivation
2. A stochastic search variable selectionalgorithm
3. Example using candidate genes
4. Ideas for GWAS
Using pathways todiscover complexdisease models
Gary Chen,Duncan ThomasDepartment of
PreventiveMedicine
USC
1. Motivation
2. A stochasticsearch variableselection algorithm
3. Example usingcandidate genes
4. Ideas for GWAS
Common disease have complexetiology
I GWAS have had great success in searchingfor genetic variants for common diseases
I Recent successes: AMD, BMI/obesity,Type 2 diabetes, Breast cancer, Prostatecancer
I Marginal effects from single SNP analysesdo not explain all heritability. Can wemove beyond the low-hanging fruit?
Using pathways todiscover complexdisease models
Gary Chen,Duncan ThomasDepartment of
PreventiveMedicine
USC
1. Motivation
2. A stochasticsearch variableselection algorithm
3. Example usingcandidate genes
4. Ideas for GWAS
Common disease have complexetiology
I GWAS have had great success in searchingfor genetic variants for common diseases
I Recent successes: AMD, BMI/obesity,Type 2 diabetes, Breast cancer, Prostatecancer
I Marginal effects from single SNP analysesdo not explain all heritability. Can wemove beyond the low-hanging fruit?
Using pathways todiscover complexdisease models
Gary Chen,Duncan ThomasDepartment of
PreventiveMedicine
USC
1. Motivation
2. A stochasticsearch variableselection algorithm
3. Example usingcandidate genes
4. Ideas for GWAS
Use biological knowledge to helpsearch for disease models
I Hierarchical ModelingI Stabilizes effect estimates β from an
association test by assuming they come froma prior distribution derived from biologicaldata
I Examples in Genetic EpiI Model selection: Conti et al (Hum Her,
2003), Baurley et al(Stat Med, in review)I GWAS: Lewinger et al (Gen Epi 2007), Chen
et Witte (AJHG 2007)I Review: Thomas et al (Hum Genomics 2009)
Using pathways todiscover complexdisease models
Gary Chen,Duncan ThomasDepartment of
PreventiveMedicine
USC
1. Motivation
2. A stochasticsearch variableselection algorithm
3. Example usingcandidate genes
4. Ideas for GWAS
Use biological knowledge to helpsearch for disease models
I Hierarchical ModelingI Stabilizes effect estimates β from an
association test by assuming they come froma prior distribution derived from biologicaldata
I Examples in Genetic EpiI Model selection: Conti et al (Hum Her,
2003), Baurley et al(Stat Med, in review)I GWAS: Lewinger et al (Gen Epi 2007), Chen
et Witte (AJHG 2007)I Review: Thomas et al (Hum Genomics 2009)
Using pathways todiscover complexdisease models
Gary Chen,Duncan ThomasDepartment of
PreventiveMedicine
USC
1. Motivation
2. A stochasticsearch variableselection algorithm
3. Example usingcandidate genes
4. Ideas for GWAS
An outline
1. Motivation
2. A stochastic search variable selectionalgorithm
3. Example using candidate genes
4. Ideas for GWAS
Using pathways todiscover complexdisease models
Gary Chen,Duncan ThomasDepartment of
PreventiveMedicine
USC
1. Motivation
2. A stochasticsearch variableselection algorithm
3. Example usingcandidate genes
4. Ideas for GWAS
Searching for independent maineffects and their interactions
I Ideally fit all predictors in a single model ifN > P
I Model selection: e.g. stepwise regressionI P-values can be anti-conservative: Don’t
adjust for number of testsI Can be computationally intractable
I An alternative: Bayesian model averagingI Probabilistically propose sub-models from a
posterior distributionI Summary statistics of parameters averaged
across all proposed modelsI Appears to better control for multiple
comparisons
Using pathways todiscover complexdisease models
Gary Chen,Duncan ThomasDepartment of
PreventiveMedicine
USC
1. Motivation
2. A stochasticsearch variableselection algorithm
3. Example usingcandidate genes
4. Ideas for GWAS
Searching for independent maineffects and their interactions
I Ideally fit all predictors in a single model ifN > P
I Model selection: e.g. stepwise regressionI P-values can be anti-conservative: Don’t
adjust for number of testsI Can be computationally intractable
I An alternative: Bayesian model averagingI Probabilistically propose sub-models from a
posterior distributionI Summary statistics of parameters averaged
across all proposed modelsI Appears to better control for multiple
comparisons
Using pathways todiscover complexdisease models
Gary Chen,Duncan ThomasDepartment of
PreventiveMedicine
USC
1. Motivation
2. A stochasticsearch variableselection algorithm
3. Example usingcandidate genes
4. Ideas for GWAS
The model form: A two-levelhierarchical model
I First Level: a linear modelI logit(P(Y = 1|β,X )) ∼ β0 +
∑Kk=1 βkX
I X can be G, E, GxG, GxE, etc.
I Second level: a mixture prior on each βkof univariate Gaussians:
I β ∼ N(φβ̄k + (1−φ)πTZk , φτ2
adjk+ (1−φ)σ2)
I 1st component: neighborhood of gene kI 2nd component: pathway info on gene k
Using pathways todiscover complexdisease models
Gary Chen,Duncan ThomasDepartment of
PreventiveMedicine
USC
1. Motivation
2. A stochasticsearch variableselection algorithm
3. Example usingcandidate genes
4. Ideas for GWAS
The model form: A two-levelhierarchical model
I First Level: a linear modelI logit(P(Y = 1|β,X )) ∼ β0 +
∑Kk=1 βkX
I X can be G, E, GxG, GxE, etc.
I Second level: a mixture prior on each βkof univariate Gaussians:
I β ∼ N(φβ̄k + (1−φ)πTZk , φτ2
adjk+ (1−φ)σ2)
I 1st component: neighborhood of gene kI 2nd component: pathway info on gene k
Using pathways todiscover complexdisease models
Gary Chen,Duncan ThomasDepartment of
PreventiveMedicine
USC
1. Motivation
2. A stochasticsearch variableselection algorithm
3. Example usingcandidate genes
4. Ideas for GWAS
How the parameters fit togetherβ ∼ N(φβ̄k + (1− φ)πTZk , φ
τ 2
adjk+ (1− φ)σ2)
Using pathways todiscover complexdisease models
Gary Chen,Duncan ThomasDepartment of
PreventiveMedicine
USC
1. Motivation
2. A stochasticsearch variableselection algorithm
3. Example usingcandidate genes
4. Ideas for GWAS
Stochastic Search VariableSelection
I Propose a swap, addition or deletion of anvariable
I Perform reversible jump MetropolisHastings step comparing posteriorprobabilities
I H = P(Y=1|β′,X )P(β′|Z ,A,π,σ,τ,φ)P(Y=1|β,X )P(β|Z ,A,π,σ,τ,φ)
I Accept move with probability min(1,H)
Using pathways todiscover complexdisease models
Gary Chen,Duncan ThomasDepartment of
PreventiveMedicine
USC
1. Motivation
2. A stochasticsearch variableselection algorithm
3. Example usingcandidate genes
4. Ideas for GWAS
Stochastic Search VariableSelection
I Propose a swap, addition or deletion of anvariable
I Perform reversible jump MetropolisHastings step comparing posteriorprobabilities
I H = P(Y=1|β′,X )P(β′|Z ,A,π,σ,τ,φ)P(Y=1|β,X )P(β|Z ,A,π,σ,τ,φ)
I Accept move with probability min(1,H)
Using pathways todiscover complexdisease models
Gary Chen,Duncan ThomasDepartment of
PreventiveMedicine
USC
1. Motivation
2. A stochasticsearch variableselection algorithm
3. Example usingcandidate genes
4. Ideas for GWAS
Stochastic Search VariableSelection
I Propose a swap, addition or deletion of anvariable
I Perform reversible jump MetropolisHastings step comparing posteriorprobabilities
I H = P(Y=1|β′,X )P(β′|Z ,A,π,σ,τ,φ)P(Y=1|β,X )P(β|Z ,A,π,σ,τ,φ)
I Accept move with probability min(1,H)
Using pathways todiscover complexdisease models
Gary Chen,Duncan ThomasDepartment of
PreventiveMedicine
USC
1. Motivation
2. A stochasticsearch variableselection algorithm
3. Example usingcandidate genes
4. Ideas for GWAS
An outline
1. Motivation
2. A stochastic search variable selectionalgorithm
3. Example using candidate genes
4. Ideas for GWAS
Using pathways todiscover complexdisease models
Gary Chen,Duncan ThomasDepartment of
PreventiveMedicine
USC
1. Motivation
2. A stochasticsearch variableselection algorithm
3. Example usingcandidate genes
4. Ideas for GWAS
Folate pathway
Reed et al J Nutr. 2006 Oct;136(10):2653-61
Using pathways todiscover complexdisease models
Gary Chen,Duncan ThomasDepartment of
PreventiveMedicine
USC
1. Motivation
2. A stochasticsearch variableselection algorithm
3. Example usingcandidate genes
4. Ideas for GWAS
Simulated data setI Simulated data for 4000 individualsI 14 genes, 2 environmental variablesI Pathway enzymes: genotype specific rates
I Simulating disease statusI Assign homocysteine as causal mechanismI ’Run’ the pathway until steady stateI Probabilistically assign disease status
conditional on metabolite conc.I Priors
I Deposit half the genotypes into priordatabase
I Z matrix, causal metabolite(s): correlation ofprior genotypes to candidate metabolite
I A matrix, network information: correlation ofcorrelation profiles between two effects
Using pathways todiscover complexdisease models
Gary Chen,Duncan ThomasDepartment of
PreventiveMedicine
USC
1. Motivation
2. A stochasticsearch variableselection algorithm
3. Example usingcandidate genes
4. Ideas for GWAS
Simulated data setI Simulated data for 4000 individualsI 14 genes, 2 environmental variablesI Pathway enzymes: genotype specific ratesI Simulating disease status
I Assign homocysteine as causal mechanismI ’Run’ the pathway until steady stateI Probabilistically assign disease status
conditional on metabolite conc.
I PriorsI Deposit half the genotypes into prior
databaseI Z matrix, causal metabolite(s): correlation of
prior genotypes to candidate metaboliteI A matrix, network information: correlation of
correlation profiles between two effects
Using pathways todiscover complexdisease models
Gary Chen,Duncan ThomasDepartment of
PreventiveMedicine
USC
1. Motivation
2. A stochasticsearch variableselection algorithm
3. Example usingcandidate genes
4. Ideas for GWAS
Simulated data setI Simulated data for 4000 individualsI 14 genes, 2 environmental variablesI Pathway enzymes: genotype specific ratesI Simulating disease status
I Assign homocysteine as causal mechanismI ’Run’ the pathway until steady stateI Probabilistically assign disease status
conditional on metabolite conc.I Priors
I Deposit half the genotypes into priordatabase
I Z matrix, causal metabolite(s): correlation ofprior genotypes to candidate metabolite
I A matrix, network information: correlation ofcorrelation profiles between two effects
Using pathways todiscover complexdisease models
Gary Chen,Duncan ThomasDepartment of
PreventiveMedicine
USC
1. Motivation
2. A stochasticsearch variableselection algorithm
3. Example usingcandidate genes
4. Ideas for GWAS
Setting up the priors
Using pathways todiscover complexdisease models
Gary Chen,Duncan ThomasDepartment of
PreventiveMedicine
USC
1. Motivation
2. A stochasticsearch variableselection algorithm
3. Example usingcandidate genes
4. Ideas for GWAS
Comparison
Same interactions detected. Z matrix providessupport.
Using pathways todiscover complexdisease models
Gary Chen,Duncan ThomasDepartment of
PreventiveMedicine
USC
1. Motivation
2. A stochasticsearch variableselection algorithm
3. Example usingcandidate genes
4. Ideas for GWAS
Sensitivity analysis
I How does our prior on β affect posteriorinference?
I Compare four special cases of the priordensity:
I βpriork ∼ N(φβ̄k + (1− φ)πTZk ,
φ τ2
nk+ (1− φ)σ2)
I 1. Non-informative: constrain φ = 0, π = 0I 2. Z matrix: constrain φ = 0I 3. Adjacency info: constrain π = 0I 4. Z matrix and adjacency info: no
constraints
Using pathways todiscover complexdisease models
Gary Chen,Duncan ThomasDepartment of
PreventiveMedicine
USC
1. Motivation
2. A stochasticsearch variableselection algorithm
3. Example usingcandidate genes
4. Ideas for GWAS
Sensitivity analysis
I How does our prior on β affect posteriorinference?
I Compare four special cases of the priordensity:
I βpriork ∼ N(φβ̄k + (1− φ)πTZk ,
φ τ2
nk+ (1− φ)σ2)
I 1. Non-informative: constrain φ = 0, π = 0I 2. Z matrix: constrain φ = 0I 3. Adjacency info: constrain π = 0I 4. Z matrix and adjacency info: no
constraints
Using pathways todiscover complexdisease models
Gary Chen,Duncan ThomasDepartment of
PreventiveMedicine
USC
1. Motivation
2. A stochasticsearch variableselection algorithm
3. Example usingcandidate genes
4. Ideas for GWAS
Sensitivity analysis
I How does our prior on β affect posteriorinference?
I Compare four special cases of the priordensity:
I βpriork ∼ N(φβ̄k + (1− φ)πTZk ,
φ τ2
nk+ (1− φ)σ2)
I 1. Non-informative: constrain φ = 0, π = 0I 2. Z matrix: constrain φ = 0I 3. Adjacency info: constrain π = 0I 4. Z matrix and adjacency info: no
constraints
Using pathways todiscover complexdisease models
Gary Chen,Duncan ThomasDepartment of
PreventiveMedicine
USC
1. Motivation
2. A stochasticsearch variableselection algorithm
3. Example usingcandidate genes
4. Ideas for GWAS
Model averaged estimates ofhyperparameters
I ResultsI Prior solely incorporating information in Z
matrix appeared to explain residual variationbetter than adjacency-only prior
I π estimated at 1.86, consistent withsimulated effect size.
Scenario σ̂2 τ̂ 2 φ̂Non informative .48 N/A 0Z matrix .00459 N/A 0Adjacency .48 .22 .56Z mat + Adj .00731 .23 .05
Using pathways todiscover complexdisease models
Gary Chen,Duncan ThomasDepartment of
PreventiveMedicine
USC
1. Motivation
2. A stochasticsearch variableselection algorithm
3. Example usingcandidate genes
4. Ideas for GWAS
Comparison among several priors
Using pathways todiscover complexdisease models
Gary Chen,Duncan ThomasDepartment of
PreventiveMedicine
USC
1. Motivation
2. A stochasticsearch variableselection algorithm
3. Example usingcandidate genes
4. Ideas for GWAS
Summary of simulated example
I Biomarker data incorporated as priorsI Intermediate phenotypes believed to be
causal in Z (mean) matrixI Global level pathway information encoded in
A (adjacency) matrix
I Influence of prior estimated by observeddata through π,τ ,σ,φ
I Informative priors provided additionalsupport for causal genes
Using pathways todiscover complexdisease models
Gary Chen,Duncan ThomasDepartment of
PreventiveMedicine
USC
1. Motivation
2. A stochasticsearch variableselection algorithm
3. Example usingcandidate genes
4. Ideas for GWAS
An outline
1. Motivation
2. A stochastic search variable selectionalgorithm
3. Example using candidate genes
4. Ideas for GWAS
Using pathways todiscover complexdisease models
Gary Chen,Duncan ThomasDepartment of
PreventiveMedicine
USC
1. Motivation
2. A stochasticsearch variableselection algorithm
3. Example usingcandidate genes
4. Ideas for GWAS
Can be applied in genome-wideassociation study
I Proof of concept: GWAS of breast cancerI 2000 cases, 2000 controls, ∼ 1M SNPsI Top SNP from each of 2755 genes, p < .05
from GWAS
I Gene Ontology used to define adjacencymatrix and proposal kernel
I Considered the 22 GO terms under BiologicalProcess (Level 3)
I Pair of SNPs considered neighbors if share atleast one GO term
I Define a proposal density for new var V ′i as:
I Q(V ′i ) = I (Aij,i 6=j 6= 0)
Using pathways todiscover complexdisease models
Gary Chen,Duncan ThomasDepartment of
PreventiveMedicine
USC
1. Motivation
2. A stochasticsearch variableselection algorithm
3. Example usingcandidate genes
4. Ideas for GWAS
Can be applied in genome-wideassociation study
I Proof of concept: GWAS of breast cancerI 2000 cases, 2000 controls, ∼ 1M SNPsI Top SNP from each of 2755 genes, p < .05
from GWAS
I Gene Ontology used to define adjacencymatrix and proposal kernel
I Considered the 22 GO terms under BiologicalProcess (Level 3)
I Pair of SNPs considered neighbors if share atleast one GO term
I Define a proposal density for new var V ′i as:
I Q(V ′i ) = I (Aij,i 6=j 6= 0)
Using pathways todiscover complexdisease models
Gary Chen,Duncan ThomasDepartment of
PreventiveMedicine
USC
1. Motivation
2. A stochasticsearch variableselection algorithm
3. Example usingcandidate genes
4. Ideas for GWAS
Analysis
I Stepwise regression:I Considered only first 100 SNPsI Retained 83/100 SNPsI Intractable for 2nd order interactions
I Our proposed algorithm:I Low posterior probability for interactionsI Most sub-models contained variables with
shared annotation
Using pathways todiscover complexdisease models
Gary Chen,Duncan ThomasDepartment of
PreventiveMedicine
USC
1. Motivation
2. A stochasticsearch variableselection algorithm
3. Example usingcandidate genes
4. Ideas for GWAS
Analysis
I Stepwise regression:I Considered only first 100 SNPsI Retained 83/100 SNPsI Intractable for 2nd order interactions
I Our proposed algorithm:I Low posterior probability for interactionsI Most sub-models contained variables with
shared annotation
Using pathways todiscover complexdisease models
Gary Chen,Duncan ThomasDepartment of
PreventiveMedicine
USC
1. Motivation
2. A stochasticsearch variableselection algorithm
3. Example usingcandidate genes
4. Ideas for GWAS
Sensitivity analysis
I Compare non-informative prior to oneusing GO terms in A
I 1. Non-informative: constrain φ = 0I 2. Adjacency info: no constraint on φ
Scenario σ̂2 τ̂ 2 φ̂Non informative .01 N/A 0Adjacency .01 .0004 .86
Using pathways todiscover complexdisease models
Gary Chen,Duncan ThomasDepartment of
PreventiveMedicine
USC
1. Motivation
2. A stochasticsearch variableselection algorithm
3. Example usingcandidate genes
4. Ideas for GWAS
Posterior inference
Using pathways todiscover complexdisease models
Gary Chen,Duncan ThomasDepartment of
PreventiveMedicine
USC
1. Motivation
2. A stochasticsearch variableselection algorithm
3. Example usingcandidate genes
4. Ideas for GWAS
Scaling up to larger sub-models
I Need to test larger sub-models in GWASsettings
I Partition models into submodels usingontology info
I Parallel processing: nodes fit submodels
I A parallelized MCMC algorithm - Poster190
Using pathways todiscover complexdisease models
Gary Chen,Duncan ThomasDepartment of
PreventiveMedicine
USC
1. Motivation
2. A stochasticsearch variableselection algorithm
3. Example usingcandidate genes
4. Ideas for GWAS
Logical topology of sub-models
Using pathways todiscover complexdisease models
Gary Chen,Duncan ThomasDepartment of
PreventiveMedicine
USC
1. Motivation
2. A stochasticsearch variableselection algorithm
3. Example usingcandidate genes
4. Ideas for GWAS
Hierarchical model
Using pathways todiscover complexdisease models
Gary Chen,Duncan ThomasDepartment of
PreventiveMedicine
USC
1. Motivation
2. A stochasticsearch variableselection algorithm
3. Example usingcandidate genes
4. Ideas for GWAS
Summary for GWAS exampleI External knowledge can be informative
I MLEs of β are smoothed towards pathwaymeans
I Ontologies useful: WECARE study in breastcancer - Poster 189
I For GWAS: Genome-wide expressionpotentially more biologically informative in Zmatrix
I Priors can guide towards biologically relevantinteractions
I Computational efficiency essential:I Defining proposal kernel: e.g. expit(πTZ )I More parsimonious sub-models desirable (e.g.
fused LASSO)I Fisher scoring can be improved using parallel
code (e.g. GPUs)
Using pathways todiscover complexdisease models
Gary Chen,Duncan ThomasDepartment of
PreventiveMedicine
USC
1. Motivation
2. A stochasticsearch variableselection algorithm
3. Example usingcandidate genes
4. Ideas for GWAS
Summary for GWAS exampleI External knowledge can be informative
I MLEs of β are smoothed towards pathwaymeans
I Ontologies useful: WECARE study in breastcancer - Poster 189
I For GWAS: Genome-wide expressionpotentially more biologically informative in Zmatrix
I Priors can guide towards biologically relevantinteractions
I Computational efficiency essential:I Defining proposal kernel: e.g. expit(πTZ )I More parsimonious sub-models desirable (e.g.
fused LASSO)I Fisher scoring can be improved using parallel
code (e.g. GPUs)
Using pathways todiscover complexdisease models
Gary Chen,Duncan ThomasDepartment of
PreventiveMedicine
USC
1. Motivation
2. A stochasticsearch variableselection algorithm
3. Example usingcandidate genes
4. Ideas for GWAS
Acknowledgements
I James Baurley
I David Conti
I Dataset: African American Breast CancerGWAS Collaborators
I Funding: R01 ES016813