4 IPM-NUS workshopbs.ipm.ac.ir/workshop/IPM-NUS2015/khosravi.pdf · From gene expression profile to...

32
From gene expression profile to network analysis 4 th IPM-NUS workshop Pegah Khosravi 03/10/2015

Transcript of 4 IPM-NUS workshopbs.ipm.ac.ir/workshop/IPM-NUS2015/khosravi.pdf · From gene expression profile to...

Page 1: 4 IPM-NUS workshopbs.ipm.ac.ir/workshop/IPM-NUS2015/khosravi.pdf · From gene expression profile to network analysis 4th IPM-NUS workshop Pegah Khosravi 03/10/2015

From gene expression profile to network analysis

4th IPM-NUSworkshop

Pegah Khosravi

03/10/2015

Page 2: 4 IPM-NUS workshopbs.ipm.ac.ir/workshop/IPM-NUS2015/khosravi.pdf · From gene expression profile to network analysis 4th IPM-NUS workshop Pegah Khosravi 03/10/2015

From gene expression profile to network analysis

We put math and biology into a blender and drink the resulting smoothie

2

Page 3: 4 IPM-NUS workshopbs.ipm.ac.ir/workshop/IPM-NUS2015/khosravi.pdf · From gene expression profile to network analysis 4th IPM-NUS workshop Pegah Khosravi 03/10/2015

From gene expression profile to network analysis

Outlines

• Introduction to gene expression and network analysis

• Network-based approach reveals Y chromosome influences prostate cancer susceptibility

• Computational hands

3

Page 4: 4 IPM-NUS workshopbs.ipm.ac.ir/workshop/IPM-NUS2015/khosravi.pdf · From gene expression profile to network analysis 4th IPM-NUS workshop Pegah Khosravi 03/10/2015

From gene expression profile to network analysis

Expression profile

• gene expression profiling is the measurementof the expression of thousands of genes atonce, to create a global picture of cellularfunction.

• quantitative PCR• Next-generation sequencing (NGS)• Microarray: microarrays are far more

common, accounting for 65858 PubMedarticles by March, 2015

4

Page 5: 4 IPM-NUS workshopbs.ipm.ac.ir/workshop/IPM-NUS2015/khosravi.pdf · From gene expression profile to network analysis 4th IPM-NUS workshop Pegah Khosravi 03/10/2015

From gene expression profile to network analysis

Microarray

• The core principle behind microarrays is hybridization between two DNA strands, the property of complementary nucleic acid sequences to specifically pair with each other by forming hydrogen bonds between complementary nucleotide base pairs.

• Single-channel• Affymetrix, Illumina

• Two-channel• Eppendorf, TeleChem

5

Page 6: 4 IPM-NUS workshopbs.ipm.ac.ir/workshop/IPM-NUS2015/khosravi.pdf · From gene expression profile to network analysis 4th IPM-NUS workshop Pegah Khosravi 03/10/2015

From gene expression profile to network analysis

Affymetrix versus Illumina

• Affymetrix

• 25mer

• Probe synthesized on chips

• Multiple probes/probeset

• May have multiple probes/transcript

• .dat, .cel, .cdf, .chp file types

• Normalization methods such as quantile

• Txt output can be used for downstream data analysis

• Annotations can be updated

• Illumina

• Longer oligo

• Bead technology

• Single probe

• May have multiple probes/transcript

• Image file processed by Bead Studio

• Several normalization methods

• Txt output can be used for downstream data analysis

• Annotations can be updated

6

Page 7: 4 IPM-NUS workshopbs.ipm.ac.ir/workshop/IPM-NUS2015/khosravi.pdf · From gene expression profile to network analysis 4th IPM-NUS workshop Pegah Khosravi 03/10/2015

From gene expression profile to network analysis

From experiment to data

Prediction:

Gene ValueD26528_at 193D26561_cds1_at -70D26561_cds2_at 144D26561_cds3_at 33D26579_at 318D26598_at 1764D26599_at 1537D26600_at 1204D28114_at 707

Class Sno D26528 D63874 D63880 …ALL 2 193 4157 556ALL 3 129 11557 476ALL 4 44 12125 498ALL 5 218 8484 1211AML 51 109 3537 131AML 52 106 4578 94AML 53 211 2431 209…

Data Miningand analysis

Newsample

Microarray chips Images scanned by laser

Datasets

7

Page 8: 4 IPM-NUS workshopbs.ipm.ac.ir/workshop/IPM-NUS2015/khosravi.pdf · From gene expression profile to network analysis 4th IPM-NUS workshop Pegah Khosravi 03/10/2015

From gene expression profile to network analysis

GEO-Affymetrix data

Probe set Id

Signal value

Total probesets

Raw files

8

Page 9: 4 IPM-NUS workshopbs.ipm.ac.ir/workshop/IPM-NUS2015/khosravi.pdf · From gene expression profile to network analysis 4th IPM-NUS workshop Pegah Khosravi 03/10/2015

From gene expression profile to network analysis

Gene expression data

Data Data (log scale)

Always log your data

9

Page 10: 4 IPM-NUS workshopbs.ipm.ac.ir/workshop/IPM-NUS2015/khosravi.pdf · From gene expression profile to network analysis 4th IPM-NUS workshop Pegah Khosravi 03/10/2015

From gene expression profile to network analysis

Quantilenorm

Normalize your data to avoid systematic (non-biological) effects.

Quantile Normalization: is a technique for making two distributions identical in statistical properties.

Is that already a result? No! It’s just data, not knowledge.

We need to use this data to answer a scientific question.

• NormData = quantilenorm(data)

• boxplot(data)

10

Page 11: 4 IPM-NUS workshopbs.ipm.ac.ir/workshop/IPM-NUS2015/khosravi.pdf · From gene expression profile to network analysis 4th IPM-NUS workshop Pegah Khosravi 03/10/2015

From gene expression profile to network analysis

Fold change and Hierarchical clustering

1 10 100 1000 10000

72 (control)0.01

0.1

1

10

100

1000

10000

72

(raw)

72

(raw)

72 (control)

1 10 100 1000 10000

72 (control)0.01

0.1

1

10

100

1000

10000

72

(raw)

72

(raw)

72 (control)

• pvalues = mattest(dependentData, independentData);

• mavolcanoplot(dependentData, independentData, pvalues,'Labels', probesetIDs)

11

• A heat map is a graphical representation of data where the individual values contained in a matrix are represented as colors.

• prostate = clustergram(data(1:40,:),'Standardize','Row')

Page 12: 4 IPM-NUS workshopbs.ipm.ac.ir/workshop/IPM-NUS2015/khosravi.pdf · From gene expression profile to network analysis 4th IPM-NUS workshop Pegah Khosravi 03/10/2015

From gene expression profile to network analysis

Microarray Applications and limitations

• Biological discovery

• new and better molecular diagnostics

• new molecular targets for therapy

• finding and refining biological pathways

• Mutation and polymorphism detection

• Limitations

• Chip to chip variation

• What fold change has biological relevance?

• Expensive!! Not every lab can afford experiment repeat.

• The real limitation is Bioinformatics

12

Page 13: 4 IPM-NUS workshopbs.ipm.ac.ir/workshop/IPM-NUS2015/khosravi.pdf · From gene expression profile to network analysis 4th IPM-NUS workshop Pegah Khosravi 03/10/2015

From gene expression profile to network analysis

Why do we need network analysis?

• Data normalization

• Selecting Significantly changes genes (criteria FC ≥ 2, P-value ≤ 0.05)

• 1157 probes ID for 978 unique genes (FC ≥ 2 at least in one stage)

• Identify Up and Down regulated genes for each stage• Adjacent: 65 genes up-regulated, 13 genes down-regulated

• Tumor: 178 genes up-regulated, 137 genes down-regulated

• Metastatic: 418 genes up-regulated, 634 genes down-regulated

13

Page 14: 4 IPM-NUS workshopbs.ipm.ac.ir/workshop/IPM-NUS2015/khosravi.pdf · From gene expression profile to network analysis 4th IPM-NUS workshop Pegah Khosravi 03/10/2015

From gene expression profile to network analysis

Biological networks

• Biological networks

• PPI, GRN, Metabolic Networks, etc….

• Essential nodes

• Hubs

• Bottlenecks

• Driver genes

• Association

• Correlation coefficient

• MI

• MIC

14

Page 15: 4 IPM-NUS workshopbs.ipm.ac.ir/workshop/IPM-NUS2015/khosravi.pdf · From gene expression profile to network analysis 4th IPM-NUS workshop Pegah Khosravi 03/10/2015

From gene expression profile to network analysis

DREAM competition

15

http://dreamchallenges.org/

Page 16: 4 IPM-NUS workshopbs.ipm.ac.ir/workshop/IPM-NUS2015/khosravi.pdf · From gene expression profile to network analysis 4th IPM-NUS workshop Pegah Khosravi 03/10/2015

From gene expression profile to network analysis

MI-based algorithms

• MI =∑P(x,y)log (P(x,y)/P(x)P(y))

• RELNET algorithm (RELevance NETworks)

• ARACNe (Algorithm for Reverse engineering of Accurate Cellular Networks)

• CLR (Context Likelihood Relatedness)

16

Page 17: 4 IPM-NUS workshopbs.ipm.ac.ir/workshop/IPM-NUS2015/khosravi.pdf · From gene expression profile to network analysis 4th IPM-NUS workshop Pegah Khosravi 03/10/2015

From gene expression profile to network analysis

ROC and PR

• Recall that PR_AUC is based on precision and recall (= TPR = sensitivity):

• Precision = TP / (TP + FP)• Recall = Sensitivity = TPR = TP / (TP + FN)

• And recall that ROC_AUC is based on TPR (= recall = sensitivity) and FPR:

• TPR = TP / (TP + FN)• FPR = FP / (FP + TN)

• Receiver Operator Characteristic (ROC) curves are commonly used to present results for binary decision problems in machine learning. However, when dealing with highly skewed datasets, Precision-Recall (PR) curves give a more informative picture of an algorithm’s performance.

17

Page 18: 4 IPM-NUS workshopbs.ipm.ac.ir/workshop/IPM-NUS2015/khosravi.pdf · From gene expression profile to network analysis 4th IPM-NUS workshop Pegah Khosravi 03/10/2015

From gene expression profile to network analysis

Network reconstruction

18

Page 19: 4 IPM-NUS workshopbs.ipm.ac.ir/workshop/IPM-NUS2015/khosravi.pdf · From gene expression profile to network analysis 4th IPM-NUS workshop Pegah Khosravi 03/10/2015

From gene expression profile to network analysis

Other databases

19

Page 20: 4 IPM-NUS workshopbs.ipm.ac.ir/workshop/IPM-NUS2015/khosravi.pdf · From gene expression profile to network analysis 4th IPM-NUS workshop Pegah Khosravi 03/10/2015

From gene expression profile to network analysis

Prostate metastasis network

STAT1

AR

HLF

TCF21

ISL1

GATA3

KLF6

SMAD3

NHLH2

EGR3

FOS

NKX2-2

FOXF1

ATF6

PBX1

HOXC6

FOXA1

ELK4

VDR

Up-regulated

Not-changed

Down-regulated

20

Page 21: 4 IPM-NUS workshopbs.ipm.ac.ir/workshop/IPM-NUS2015/khosravi.pdf · From gene expression profile to network analysis 4th IPM-NUS workshop Pegah Khosravi 03/10/2015

From gene expression profile to network analysis

Cytoscape

• Cytoscape supports many use cases in molecular and systems biology, genomics, and proteomics:• Load molecular and genetic interaction data sets in many standards formats• Project and integrate global datasets and functional annotations• Establish powerful visual mappings across these data• Perform advanced analysis and modeling using Cytoscape Apps• Visualize and analyze human-curated pathway datasets such as WikiPathways, Reactome, and KEGG

21

Page 22: 4 IPM-NUS workshopbs.ipm.ac.ir/workshop/IPM-NUS2015/khosravi.pdf · From gene expression profile to network analysis 4th IPM-NUS workshop Pegah Khosravi 03/10/2015

From gene expression profile to network analysis

Programming

22

Page 23: 4 IPM-NUS workshopbs.ipm.ac.ir/workshop/IPM-NUS2015/khosravi.pdf · From gene expression profile to network analysis 4th IPM-NUS workshop Pegah Khosravi 03/10/2015

From gene expression profile to network analysis

Google is my best friend

23

Page 24: 4 IPM-NUS workshopbs.ipm.ac.ir/workshop/IPM-NUS2015/khosravi.pdf · From gene expression profile to network analysis 4th IPM-NUS workshop Pegah Khosravi 03/10/2015

From gene expression profile to network analysis

Network-based approach

24

Page 25: 4 IPM-NUS workshopbs.ipm.ac.ir/workshop/IPM-NUS2015/khosravi.pdf · From gene expression profile to network analysis 4th IPM-NUS workshop Pegah Khosravi 03/10/2015

From gene expression profile to network analysis

Y-Chromosome

• 60 genes (loci) on the human Y chromosome• All genes that interact with Y-chromosome genes from iHOP database (471 genes)• GDS2545 from GEO (310 genes with FC ≥ 1.5 and p-value ≤ 0.05)• Grouped Normal and Adjacent samples as normal prostate tissue and Tumor and Metastasis samples as cancerous prostate

tissue• The resulting normal and cancerous networks contain 1973 and 1831 interactions• 80 genes with BN and Hub high score• Extracted a sub-network from normal and cancer co-expression networks using the Y-chromosome gene list• We detected 22 genes related to the Y chromosome in our sub-network

25

Page 26: 4 IPM-NUS workshopbs.ipm.ac.ir/workshop/IPM-NUS2015/khosravi.pdf · From gene expression profile to network analysis 4th IPM-NUS workshop Pegah Khosravi 03/10/2015

From gene expression profile to network analysis

Expression alteration

• The role of some of genes such as PRKY, PCDH11Y, PRY2, USP9Y, EIF1AY, NLGN4Y, ZFY, DDX3Y, BPY2, SRY, UTY, KDM5D, and TMSB4Y are most well-known in prostate cancer

• AMELY, DAZ4, RBMY1J, RBMY1E, VCY1B, RPS4Y1, CDY1B, XKRY2 and CYORF15B may have unknown roles in prostate cancer.

• CYORF15B, RPS4Y1, PRY2, RBMY1E, and DAZ4 are up-regulated in the cancerous stage

• KDM5D, USP9Y, RBMY1J, and DDX3Y have been down-regulated during cancer

26

Page 27: 4 IPM-NUS workshopbs.ipm.ac.ir/workshop/IPM-NUS2015/khosravi.pdf · From gene expression profile to network analysis 4th IPM-NUS workshop Pegah Khosravi 03/10/2015

From gene expression profile to network analysis

New modulation score

27

Page 28: 4 IPM-NUS workshopbs.ipm.ac.ir/workshop/IPM-NUS2015/khosravi.pdf · From gene expression profile to network analysis 4th IPM-NUS workshop Pegah Khosravi 03/10/2015

From gene expression profile to network analysis

Pathways and GOs

• We identified 18 distinct BPs and pathways with constituent genes having significant co-expression with each other, either in normal or cancerous states

• Y-chromosome genes such as PRKY, RPS4Y1 and USP9Y involve in protein phosphorylation, cellular protein metabolic process, and transforming growth factor beta receptor signaling pathway, respectively.

28

Page 29: 4 IPM-NUS workshopbs.ipm.ac.ir/workshop/IPM-NUS2015/khosravi.pdf · From gene expression profile to network analysis 4th IPM-NUS workshop Pegah Khosravi 03/10/2015

From gene expression profile to network analysis

Network rewiring

• The positive regulation of phosphatidylinositol 3-kinase cascade is activated only in cancerous stage

• Genes collaborating in the TNF signaling pathway are intra-connected in the normal stage.

• This novel network-based analysis suggests that significant biases exist among the two stage-specific co-expression networks when their constituent genes are classified by Gene Ontology (GO) terms or KEGG pathways.

29

Page 30: 4 IPM-NUS workshopbs.ipm.ac.ir/workshop/IPM-NUS2015/khosravi.pdf · From gene expression profile to network analysis 4th IPM-NUS workshop Pegah Khosravi 03/10/2015

From gene expression profile to network analysis

Computational hands

• Go to the http://cytoscape.org/ and download the last version that is 2.8.3• Import network• Analysis them via different apps such as cytoHubba, MCODE and BiNGO

30

Page 31: 4 IPM-NUS workshopbs.ipm.ac.ir/workshop/IPM-NUS2015/khosravi.pdf · From gene expression profile to network analysis 4th IPM-NUS workshop Pegah Khosravi 03/10/2015

From gene expression profile to network analysis

What we have covered today

• Analysis of gene expression

• Reconstruction of Gene Networks

• Recommending candidate genes, process andpathways for future research in the field ofcancer studies.

• Computational hands on Cytoscape

31

Page 32: 4 IPM-NUS workshopbs.ipm.ac.ir/workshop/IPM-NUS2015/khosravi.pdf · From gene expression profile to network analysis 4th IPM-NUS workshop Pegah Khosravi 03/10/2015

Thanks