F INDING C ONSISTENT S UBNETWORKS ACROSS M ICROARRAY DATASET Fan Qi GS5002 Journal Club.

21
FINDING CONSISTENT SUBNETWORKS ACROSS MICROARRAY DATASET Fan Qi GS5002 Journal Club

Transcript of F INDING C ONSISTENT S UBNETWORKS ACROSS M ICROARRAY DATASET Fan Qi GS5002 Journal Club.

Page 1: F INDING C ONSISTENT S UBNETWORKS ACROSS M ICROARRAY DATASET Fan Qi GS5002 Journal Club.

FINDING CONSISTENT SUBNETWORKS ACROSS MICROARRAY DATASETFan Qi

GS5002 Journal Club

Page 2: F INDING C ONSISTENT S UBNETWORKS ACROSS M ICROARRAY DATASET Fan Qi GS5002 Journal Club.

2

OUTLINE

Introduction

Methodology

Results & Discussions

Conclusions

Page 3: F INDING C ONSISTENT S UBNETWORKS ACROSS M ICROARRAY DATASET Fan Qi GS5002 Journal Club.

3

INTRODUCTION

Identify Differential Gene Expression Identify significant genes w.r.t a phenotype

Importance: Testing effectiveness of treatment Biological insights of diseases Develop new treatment Disease Prophylaxis Any others ?

Page 4: F INDING C ONSISTENT S UBNETWORKS ACROSS M ICROARRAY DATASET Fan Qi GS5002 Journal Club.

4

CURRENT METHODS

Individual Genes Search for individual differentially expressed

genes Fold-change, t-test, SAM

Gene Pathway Detection Looking at a set of genes instead of individual

genes Bayesian learning and Boolean network learning

Gene Classes Adding existing biological insights Over-representation analysis (ORA), Functional

Class Scoring(FCS), GSEA, NEA, ErmineJ

Page 5: F INDING C ONSISTENT S UBNETWORKS ACROSS M ICROARRAY DATASET Fan Qi GS5002 Journal Club.

5

CHALLENGE

Different Results from Different Dataset of the SAME disease!

Zhang M [1] demonstrated inconsistency in SAM:Datasets DEGs POG nPOG

Prostate cancer

Top 10 0.3 0.3

Top 50 0.14 0.14

TOP 100 0.15 0.15

Lung cancer

Top 10 0.00 0.00

Top 50 0.20 0.19

TOP 100 0.31 0.30

DMD

Top 10 0.20 0.20

Top 50 0.42 0.42

TOP 100 0.54 0.54

Reconstruct from Table 1 in [1]

Inconsistencyamong datasets

Page 6: F INDING C ONSISTENT S UBNETWORKS ACROSS M ICROARRAY DATASET Fan Qi GS5002 Journal Club.

6

NEW APPROACH

SNet [2] Proposed in 2011 Utilize gene-gene relationship in analysis

Gene-gene relationship Activates VS. Inhibits

Gene Subnetwork Gene is the Vertex, Relationship is an edge

From Fig 1 in [2]

RHOA VAVPIK3R

2

ARHGEF1

RAC1 IQGAP1 Partially adapted

from Fig 2 in [2]

Page 7: F INDING C ONSISTENT S UBNETWORKS ACROSS M ICROARRAY DATASET Fan Qi GS5002 Journal Club.

7

METHODOLOGY

Input: Genes labeled with phenotype

Gain from microarray experiment

Third-party Info: Gene Pathway Info Gene Reaction Info

Attributes of Subnetwork Size, Score

Output: A set of significant sub-network

Subnetwork

Extraction

Subnetwork

Scoring

Subnetwork

Significance

Page 8: F INDING C ONSISTENT S UBNETWORKS ACROSS M ICROARRAY DATASET Fan Qi GS5002 Journal Club.

8

METHODOLOGY –STEP 1

P3 P2P1

Phenotypes

……..

Patient’s Gene Ranked List

Page 9: F INDING C ONSISTENT S UBNETWORKS ACROSS M ICROARRAY DATASET Fan Qi GS5002 Journal Club.

9

METHODOLOGY –STEP 1

P1 P1

Only top genes is kept

for patient

Repeat for every phenotype group

Page 10: F INDING C ONSISTENT S UBNETWORKS ACROSS M ICROARRAY DATASET Fan Qi GS5002 Journal Club.

10

METHODOLOGY –STEP 1

P1 (d)

Select one phenotype as others as

select genes occur in of patients

𝛽=50

𝐺𝐿

P1 P1 P1 P1

…….

Page 11: F INDING C ONSISTENT S UBNETWORKS ACROSS M ICROARRAY DATASET Fan Qi GS5002 Journal Club.

11

METHODOLOGY –STEP 1

Partition into multiple pathwaysGenerate Subnetwork

𝐺𝐿

………

𝑎1

𝑎5𝑎3

𝑎4 𝑎7

𝑎6

𝑎2

𝑎1

𝑎5𝑎3

𝑎4 𝑎7

𝑎6

𝑎2

A list of Subnetworks w.r.t

Page 12: F INDING C ONSISTENT S UBNETWORKS ACROSS M ICROARRAY DATASET Fan Qi GS5002 Journal Club.

12

METHODOLOGY – STEP 2 For each Subnetwork in in the and Patient ,

compute overall expression level: = , where a gene in that is highly expressed in # patients in who have highly expressed : total # patients in

For Patients and compute t-test

𝑆 𝑠𝑝𝑠𝑝 ,𝑑=¿𝑆𝑁𝑒𝑡𝑠𝑝 ,1 ,𝑆𝑁𝑒𝑡𝑠𝑝 ,2…𝑆𝑁𝑒𝑡 𝑠𝑝 ,𝑛>¿

𝑆 𝑠𝑝𝑠𝑝 ,¬𝑑=¿𝑆𝑁𝑒𝑡 𝑠𝑝 ,𝑛+1 ,𝑆𝑁𝑒𝑡𝑠𝑝 ,𝑛+2…𝑆𝑁𝑒𝑡 𝑠𝑝 ,𝑚>¿𝑆𝑆𝑝 𝑠𝑝 , 𝑡

T test

Assign to each Subnetwork

𝑎1

𝑎5𝑎3

𝑎4 𝑎7

𝑎6

𝑎2

P1 (d)

Page 13: F INDING C ONSISTENT S UBNETWORKS ACROSS M ICROARRAY DATASET Fan Qi GS5002 Journal Club.

13

METHODOLOGY – STEP 3

A. Randomly Swap Phenotype labels of patient, recreating subnetworks and t-test scores (step 1-2)

B. Repeat [A] for 1,000 permutations.• Forms a 2-D histogram ()

C. Estimate the nominal p-value of each Subnetwork

D. Select Subnetwork with -Null-hypo: subnetwork with is not significant

Fig 5 in original paper

Page 14: F INDING C ONSISTENT S UBNETWORKS ACROSS M ICROARRAY DATASET Fan Qi GS5002 Journal Club.

14

RESULTS AND DISCUSSIONS

Dataset: Leukemia: Golub VS Armstrong ALL: Ross VS Yeoh DMD: Haslett VS Pescatori Lung: Bhattacharjee VS Garber

Performance Comparison: Subnetwork Overlap (with GSEA) Gene Overlap (GSEA, SAM, t-Test)

Other Comparisons: Network Size, Gene Validity with t-Test

Page 15: F INDING C ONSISTENT S UBNETWORKS ACROSS M ICROARRAY DATASET Fan Qi GS5002 Journal Club.

15

RESULTS AND DISCUSSIONS

Subnetwork Overlap

Disease Dataset 1 Dataset 2 SNET GSEA SNET

GSEA

Leukemia Golub Armstrong

83.33% 0% 20 0

ALL Ross Yeoh 47.63% 23.1% 10 6

DMD Haslett Pescatori 58.33% 55.6% 7 10

Lung Bhattacharjee

Garber 90.90% 0% 9 0

Synthesized from Table 1, 2 from [2]Higher the better

Page 16: F INDING C ONSISTENT S UBNETWORKS ACROSS M ICROARRAY DATASET Fan Qi GS5002 Journal Club.

16

RESULTS AND DISCUSSIONS

Gene Overlap

Disease Snet GSEA T-Test (p <0.05)

T-Test(top)

SAM(p <0.05)

SAM(top)

Leukemia 91.30% 2.38% 73.01% 14.29% 49.96% 22.62%

ALL 93.01% 4.0% 60.20% 57.33% 81.25% 49.33%

DMD 69.23% 28.9% 49.60% 20.00% 76.98% 42.22%

Lung 51.18% 4.0% 65.61% 26.16% 65.61% 24.62%

Synthesized from Table 3, 4,5 from [2]Higher the better

Page 17: F INDING C ONSISTENT S UBNETWORKS ACROSS M ICROARRAY DATASET Fan Qi GS5002 Journal Club.

17

RESULTS AND DISCUSSIONS

Size of subnetworks

Disease T-Test SNet

Size of Network 2 3 4 5 5 6 7 >8

Leukemia 84 8 1 0 0 2 3 2 1

Subtype 75 5 1 1 1 1 0 1 6

DMD 45 3 1 0 0 1 0 0 5

Lung 65 3 2 1 0 5 3 0 1

Reconstructed from Table 6 from [2]

Page 18: F INDING C ONSISTENT S UBNETWORKS ACROSS M ICROARRAY DATASET Fan Qi GS5002 Journal Club.

18

RESULTS AND DISCUSSIONS

Validity Compare the genes in EACH Subnetwork with

those in t-test Genes in each Subnetwork appears in T-Test is

around 70%- 100% Selected Results (too large to present full) Subnetwork Name Percentage Subnetwork Name Percentage

Leukaemia_B Cell-VAV1 81.82% SNET_CTNNB1 100%

Leukaemia_UBC 100% SNET_TNFSF10 60%

Leukaemia_RAC1 57.15% SNET_PYGM 60%

DMD_RHOA 75% DMD_ACTB 83.33%

DMD_SDC3 88.89% Leaukaemia_POU2F2 75.00%

MLLBCR_ACAA1 28.67% BCR_T_RASA1 44.44%

MLLBCR_BLNK 72.73% BCR_ABL1 75.00%

SNET_NOTCH3 100% DMD_CALM1 80%

Selected from Table 7,8,9,10 in[2]

Page 19: F INDING C ONSISTENT S UBNETWORKS ACROSS M ICROARRAY DATASET Fan Qi GS5002 Journal Club.

19

CONCLUSIONS

Traditional Methods have inconsistency problem across different dataset of the same disease

SNet utilize Biological insights to mitigate the gap Gene-to-Gene relationship Gene Pathway knowledge

SNet shows better results than established algorithms More consistent

Page 20: F INDING C ONSISTENT S UBNETWORKS ACROSS M ICROARRAY DATASET Fan Qi GS5002 Journal Club.

20

REFERENCES [1] Zhang M, Zhang L, Zou J, Yao C, Xiao H, Liu Q, Wang J, Wang D,

Wang C, Guo Z: Evaluating reproducibility of differential expression discoveries in microarray studies by considering correlated molecular changes.

[2] Donny Soh, Difeng Dong1, Yike Guo, Limsoon Wong Finding consistent disease subnetworks across microarray datasets

Page 21: F INDING C ONSISTENT S UBNETWORKS ACROSS M ICROARRAY DATASET Fan Qi GS5002 Journal Club.

21

THANK YOU!!