A model for interpretable high dimensional interactions

44
A Model for Interpretable High Dimensional Interactions Sahir Rai Bhatnagar Joint work with Yi Yang, Mathieu Blanchette and Celia Greenwood Poster Number 67

Transcript of A model for interpretable high dimensional interactions

Page 1: A model for interpretable high dimensional interactions

A Model for Interpretable High Dimensional

Interactions

Sahir Rai Bhatnagar

Joint work with Yi Yang, Mathieu Blanchette and Celia Greenwood

Poster Number 67

Page 2: A model for interpretable high dimensional interactions

Motivation

Page 3: A model for interpretable high dimensional interactions

one predictor variable at a time

Predictor Variable Phenotype

Test 1

Test 2

Test 3

Test 4

Test 5

1

Page 4: A model for interpretable high dimensional interactions

one predictor variable at a time

Predictor Variable Phenotype

Test 1

Test 2

Test 3

Test 4

Test 5

1

Page 5: A model for interpretable high dimensional interactions

a network based view

Predictor Variable Phenotype

Test 1

2

Page 6: A model for interpretable high dimensional interactions

a network based view

Predictor Variable Phenotype

Test 1

2

Page 7: A model for interpretable high dimensional interactions

a network based view

Predictor Variable Phenotype

Test 1

2

Page 8: A model for interpretable high dimensional interactions

system level changes due to environment

Predictor Variable PhenotypeEnvironment

A

B

Test 1

3

Page 9: A model for interpretable high dimensional interactions

system level changes due to environment

Predictor Variable PhenotypeEnvironment

A

B

Test 1

3

Page 10: A model for interpretable high dimensional interactions

Motivating Dataset: Newborn epigenetic adaptations to gesta-

tional diabetes exposure (Luigi Bouchard, Sherbrooke)

Environment

Gestational

Diabetes

Large Data

Child’s epigenome

(p ≈ 450k)

Phenotype

Obesity measures

4

Page 11: A model for interpretable high dimensional interactions

Differential Correlation between environments

(a) Gestational diabetes affected pregnancy (b) Controls

5

Page 12: A model for interpretable high dimensional interactions

formal statement of initial problem

• n: number of subjects

• p: number of predictor variables

• Xn×p: high dimensional data set (p >> n)

• Yn×1: phenotype

• En×1: environmental factor that has widespread effect on X and can

modify the relation between X and Y

Objective

• Which elements of X that are associated with Y , depend on E?

6

Page 13: A model for interpretable high dimensional interactions

formal statement of initial problem

• n: number of subjects

• p: number of predictor variables

• Xn×p: high dimensional data set (p >> n)

• Yn×1: phenotype

• En×1: environmental factor that has widespread effect on X and can

modify the relation between X and Y

Objective

• Which elements of X that are associated with Y , depend on E?

6

Page 14: A model for interpretable high dimensional interactions

formal statement of initial problem

• n: number of subjects

• p: number of predictor variables

• Xn×p: high dimensional data set (p >> n)

• Yn×1: phenotype

• En×1: environmental factor that has widespread effect on X and can

modify the relation between X and Y

Objective

• Which elements of X that are associated with Y , depend on E?

6

Page 15: A model for interpretable high dimensional interactions

formal statement of initial problem

• n: number of subjects

• p: number of predictor variables

• Xn×p: high dimensional data set (p >> n)

• Yn×1: phenotype

• En×1: environmental factor that has widespread effect on X and can

modify the relation between X and Y

Objective

• Which elements of X that are associated with Y , depend on E?

6

Page 16: A model for interpretable high dimensional interactions

formal statement of initial problem

• n: number of subjects

• p: number of predictor variables

• Xn×p: high dimensional data set (p >> n)

• Yn×1: phenotype

• En×1: environmental factor that has widespread effect on X and can

modify the relation between X and Y

Objective

• Which elements of X that are associated with Y , depend on E?

6

Page 17: A model for interpretable high dimensional interactions

formal statement of initial problem

• n: number of subjects

• p: number of predictor variables

• Xn×p: high dimensional data set (p >> n)

• Yn×1: phenotype

• En×1: environmental factor that has widespread effect on X and can

modify the relation between X and Y

Objective

• Which elements of X that are associated with Y , depend on E?

6

Page 18: A model for interpretable high dimensional interactions

Methods

Page 19: A model for interpretable high dimensional interactions

ECLUST - our proposed method: 3 phases

Original Data

E = 0

1) Gene Similarity

E = 1

2) Cluster

Representation

n × 1 n × 1

3) Penalized

Regression

Yn×1∼ + ×E

7

Page 20: A model for interpretable high dimensional interactions

ECLUST - our proposed method: 3 phases

Original Data

E = 0

1) Gene Similarity

E = 1

2) Cluster

Representation

n × 1 n × 1

3) Penalized

Regression

Yn×1∼ + ×E

7

Page 21: A model for interpretable high dimensional interactions

ECLUST - our proposed method: 3 phases

Original Data

E = 0

1) Gene Similarity

E = 1

2) Cluster

Representation

n × 1 n × 1

3) Penalized

Regression

Yn×1∼ + ×E

7

Page 22: A model for interpretable high dimensional interactions

ECLUST - our proposed method: 3 phases

Original Data

E = 0

1) Gene Similarity

E = 1

2) Cluster

Representation

n × 1 n × 1

3) Penalized

Regression

Yn×1∼ + ×E

7

Page 23: A model for interpretable high dimensional interactions

ECLUST - our proposed method: 3 phases

Original Data

E = 0

1) Gene Similarity

E = 1

2) Cluster

Representation

n × 1 n × 1

3) Penalized

Regression

Yn×1∼ + ×E

7

Page 24: A model for interpretable high dimensional interactions

ECLUST - our proposed method: 3 phases

Original Data

E = 0

1) Gene Similarity

E = 1

2) Cluster

Representation

n × 1 n × 1

3) Penalized

Regression

Yn×1∼ + ×E

7

Page 25: A model for interpretable high dimensional interactions

the objective of statistical

methods is the reduction of data.

A quantity of data . . . is to be

replaced by relatively few quantities

which shall adequately represent

. . . the relevant information

contained in the original data.

- Sir R. A. Fisher, 1922

7

Page 26: A model for interpretable high dimensional interactions

Model

g(µ) =β0 + β1X1 + · · ·+ βpXp + βEE︸ ︷︷ ︸main effects

+α1E (X1E ) + · · ·+ αpE (XpE )︸ ︷︷ ︸interactions

Reparametrization1: αjE = γjEβjβE .

Strong heredity principle2:

αjE 6= 0 ⇒ βj 6= 0 and βE 6= 0

1Choi et al. 2010, JASA2Chipman 1996, Canadian Journal of Statistics

8

Page 27: A model for interpretable high dimensional interactions

Model

g(µ) =β0 + β1X1 + · · ·+ βpXp + βEE︸ ︷︷ ︸main effects

+α1E (X1E ) + · · ·+ αpE (XpE )︸ ︷︷ ︸interactions

Reparametrization1: αjE = γjEβjβE .

Strong heredity principle2:

αjE 6= 0 ⇒ βj 6= 0 and βE 6= 0

1Choi et al. 2010, JASA2Chipman 1996, Canadian Journal of Statistics

8

Page 28: A model for interpretable high dimensional interactions

Model

g(µ) =β0 + β1X1 + · · ·+ βpXp + βEE︸ ︷︷ ︸main effects

+α1E (X1E ) + · · ·+ αpE (XpE )︸ ︷︷ ︸interactions

Reparametrization1: αjE = γjEβjβE .

Strong heredity principle2:

αjE 6= 0 ⇒ βj 6= 0 and βE 6= 0

1Choi et al. 2010, JASA2Chipman 1996, Canadian Journal of Statistics

8

Page 29: A model for interpretable high dimensional interactions

Strong Heredity Model with Penalization

arg minβ0,β,γ

1

2‖Y − g(µ)‖2 +

λβ (w1β1 + · · ·+ wqβq + wEβE ) +

λγ (w1Eγ1E + · · ·+ wqEγqE )

wj =

∣∣∣∣∣ 1

βj

∣∣∣∣∣ , wjE =

∣∣∣∣∣ βj βEαjE

∣∣∣∣∣

9

Page 30: A model for interpretable high dimensional interactions

Results

Page 31: A model for interpretable high dimensional interactions

Simulation Study: Jaccard Index and test set MSE

10

Page 32: A model for interpretable high dimensional interactions

Open source software

• Software implementation in R: http://sahirbhatnagar.com/eclust/

• Allows user specified interaction terms

• Automatically determines the optimal tuning parameters through

cross validation

• Can also be applied to genetic data

11

Page 33: A model for interpretable high dimensional interactions

Conclusions

Page 34: A model for interpretable high dimensional interactions

Conclusions and Contributions

• Large system-wide changes are observed in many environments

• This assumption can possibly be exploited to aid analysis of large

data

• We develop and implement a multivariate penalization procedure for

predicting a continuous or binary disease outcome while detecting

interactions between high dimensional data (p >> n) and an

environmental factor.

• Dimension reduction is achieved through leveraging the

environmental-class-conditional correlations

• Also, we develop and implement a strong heredity framework

within the penalized model

• R software: http://sahirbhatnagar.com/eclust/

12

Page 35: A model for interpretable high dimensional interactions

Conclusions and Contributions

• Large system-wide changes are observed in many environments

• This assumption can possibly be exploited to aid analysis of large

data

• We develop and implement a multivariate penalization procedure for

predicting a continuous or binary disease outcome while detecting

interactions between high dimensional data (p >> n) and an

environmental factor.

• Dimension reduction is achieved through leveraging the

environmental-class-conditional correlations

• Also, we develop and implement a strong heredity framework

within the penalized model

• R software: http://sahirbhatnagar.com/eclust/

12

Page 36: A model for interpretable high dimensional interactions

Conclusions and Contributions

• Large system-wide changes are observed in many environments

• This assumption can possibly be exploited to aid analysis of large

data

• We develop and implement a multivariate penalization procedure for

predicting a continuous or binary disease outcome while detecting

interactions between high dimensional data (p >> n) and an

environmental factor.

• Dimension reduction is achieved through leveraging the

environmental-class-conditional correlations

• Also, we develop and implement a strong heredity framework

within the penalized model

• R software: http://sahirbhatnagar.com/eclust/

12

Page 37: A model for interpretable high dimensional interactions

Conclusions and Contributions

• Large system-wide changes are observed in many environments

• This assumption can possibly be exploited to aid analysis of large

data

• We develop and implement a multivariate penalization procedure for

predicting a continuous or binary disease outcome while detecting

interactions between high dimensional data (p >> n) and an

environmental factor.

• Dimension reduction is achieved through leveraging the

environmental-class-conditional correlations

• Also, we develop and implement a strong heredity framework

within the penalized model

• R software: http://sahirbhatnagar.com/eclust/

12

Page 38: A model for interpretable high dimensional interactions

Conclusions and Contributions

• Large system-wide changes are observed in many environments

• This assumption can possibly be exploited to aid analysis of large

data

• We develop and implement a multivariate penalization procedure for

predicting a continuous or binary disease outcome while detecting

interactions between high dimensional data (p >> n) and an

environmental factor.

• Dimension reduction is achieved through leveraging the

environmental-class-conditional correlations

• Also, we develop and implement a strong heredity framework

within the penalized model

• R software: http://sahirbhatnagar.com/eclust/

12

Page 39: A model for interpretable high dimensional interactions

Conclusions and Contributions

• Large system-wide changes are observed in many environments

• This assumption can possibly be exploited to aid analysis of large

data

• We develop and implement a multivariate penalization procedure for

predicting a continuous or binary disease outcome while detecting

interactions between high dimensional data (p >> n) and an

environmental factor.

• Dimension reduction is achieved through leveraging the

environmental-class-conditional correlations

• Also, we develop and implement a strong heredity framework

within the penalized model

• R software: http://sahirbhatnagar.com/eclust/

12

Page 40: A model for interpretable high dimensional interactions

Limitations

• There must be a high-dimensional signature of the exposure

• Clustering is unsupervised

• Two tuning parameters

• Need more samples . . . Got data? (Poster 67)

13

Page 41: A model for interpretable high dimensional interactions

Limitations

• There must be a high-dimensional signature of the exposure

• Clustering is unsupervised

• Two tuning parameters

• Need more samples . . . Got data? (Poster 67)

13

Page 42: A model for interpretable high dimensional interactions

Limitations

• There must be a high-dimensional signature of the exposure

• Clustering is unsupervised

• Two tuning parameters

• Need more samples . . . Got data? (Poster 67)

13

Page 43: A model for interpretable high dimensional interactions

Limitations

• There must be a high-dimensional signature of the exposure

• Clustering is unsupervised

• Two tuning parameters

• Need more samples . . . Got data? (Poster 67)

13

Page 44: A model for interpretable high dimensional interactions

acknowledgements

• Dr. Celia Greenwood

• Dr. Blanchette and Dr. Yang

• Dr. Luigi Bouchard, Andre Anne

Houde

• Dr. Steele, Dr. Kramer,

Dr. Abrahamowicz

• Maxime Turgeon, Kevin

McGregor, Lauren Mokry,

Dr. Forest

• Greg Voisin, Dr. Forgetta,

Dr. Klein

• Mothers and children from the

study

14