Functions, Networks, and Phenotypes by Integrative Genomics Analysis Xianghong Jasmine Zhou...

67
Functions, Networks, and Phenotypes by Integrative Genomics Analysis Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern California IMS, NUS, Singapore, July 18, 2007

Transcript of Functions, Networks, and Phenotypes by Integrative Genomics Analysis Xianghong Jasmine Zhou...

Page 1: Functions, Networks, and Phenotypes by Integrative Genomics Analysis Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern.

Functions, Networks, and Phenotypes by Integrative

Genomics Analysis

Xianghong Jasmine Zhou

Molecular and Computational Biology

University of Southern California

IMS, NUS, Singapore, July 18, 2007

Page 2: Functions, Networks, and Phenotypes by Integrative Genomics Analysis Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern.

DNA

RNA

Protein

Gene Expression Datasets: the Transcriptome•Oligonucleotide Array

•cDNA Array

Protein abundance measurement (Mass Spec)

Protein interactions (yeast 2-hybrid system, protein arrays)

Protein complexes (Mass Spec)

Information FlowCellular systems

1995 Bacteria~1.6K genes

1997Eukaryote~6K genes

1998Animal~20K genes

2001Human30~100K genes

Objective:$1000 human genome

Page 3: Functions, Networks, and Phenotypes by Integrative Genomics Analysis Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern.

Rapid accumulation of microarray data in public repositories

• NCBI Gene Expression Omnibus

• EBI Array Express

137231 experiments

55228 experiments

The public microarray data increases by 3 folds per year

Page 4: Functions, Networks, and Phenotypes by Integrative Genomics Analysis Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern.

Multiple Microarray Technology Platforms

MicroarrayPlatforms

Page 5: Functions, Networks, and Phenotypes by Integrative Genomics Analysis Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern.

--------------------

--------------------

--------------------

--------------------

a

b

d

e

f

g

h

i

j

k

c

a

b

c

d

e

f

g

h

i

j

k

a

b

c

d

e

f

g

h

i

j

k

Recurrent Patterns

a

b

c

d

e

f

g

h

i

j

k

a

b

d

e

f

g

h

i

j

k

c

a

b

c

d

e

f

g

h

i

j

k

a

b

c

d

e

f

g

h

i

j

k

Coexpression Networks

a

b

c

d

e

f

g

h

i

j

k

Datasets

Graph-based Approach for the Integrative Microarray Analysis

gene

experiments

e

g

h

i

Annotation

a

b

d

f

Functional Annotation

Gene Ontology

e

g

h

i

Transcriptional Annotation

TF

Page 6: Functions, Networks, and Phenotypes by Integrative Genomics Analysis Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern.

Frequent Subgraph Mining Problem is hard!

Problem formulation: Given n graphs, identify

subgraphs which occur in at least m graphs (m n)

Our graphs are massive! (>10,000 nodes and >1 million edges)

The traditional pattern growth approach (expand frequent

subgraph of k edges to k+1 edges) would not work, since

the time and memory requirements increase exponentially

with increasing size of patterns and increasing number of

networks.

Page 7: Functions, Networks, and Phenotypes by Integrative Genomics Analysis Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern.

Novel Algorithms to identify diverse frequent network patterns

• CoDense (Hu et al. ISMB 2005)

– identify frequent coherent dense subgraphs across many massive graphs

• Network Biclustering (Huang et al, ISMB 2007)

– identify frequent subgraphs across many massive graphs

• Network Modules (NeMo) (Yan et al. ISMB 2007)

– identify frequent dense vertex sets across many massive graphs

Page 8: Functions, Networks, and Phenotypes by Integrative Genomics Analysis Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern.

CODENSE: identify frequent coherent dense subgraphs across

massive graphs

Hu et al, ISMB 2005

Page 9: Functions, Networks, and Phenotypes by Integrative Genomics Analysis Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern.

Identify frequent co-expression clusters across multiple microarray data sets

c1 c2… cm

g1 .1 .2… .2

g2 .4 .3… .4

c1 c2… cm

g1 .8 .6… .2

g2 .2 .3… .4

c1 c2… cm

g1 .9 .4… .1

g2 .7 .3… .5

c1 c2… cm

g1 .2 .5… .8

g2 .7 .1… .3

...a

b

c

d

e

f

g

h

i

j

k

a

b

c

d

e

f

g

h

i

j

k

a

b

c

d

ef

g

h

i

j

k

a

bd

e

f

g

h

i

j

k

c

...a

b

c

d

e

f

g

h

i

j

k

a

b

c

d

e

f

g

h

i

j

k

a

b

c

d

e

f

g

h

i

j

k

a

bd

e

f

g

h

i

j

k

c

...

Page 10: Functions, Networks, and Phenotypes by Integrative Genomics Analysis Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern.

The common pattern growth approach

Find a frequent subgraph of k edges, and

expand it to k+1 edge to check occurrence

frequency – Koyuturk M., Grama A. & Szpankowski W. An

efficient algorithm for detecting frequent subgraphs in biological networks. ISMB 2004

– Yan, Zhou, and Han. Mining Closed Relational Graphs with Connectivity Constraints. ICDE 2005

Page 11: Functions, Networks, and Phenotypes by Integrative Genomics Analysis Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern.

The time and memory requirements increase

exponentially with increasing size of patterns

and increasing number of networks. The

number of frequent dense subgraphs is

explosive when there are very large frequent

dense subgraphs, e.g., subgraphs with

hundreds of edges.

Problem of the Pattern-growth approach

Page 12: Functions, Networks, and Phenotypes by Integrative Genomics Analysis Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern.

Problem of the Pattern-growth approach

a

b

c

d

e

f

g

h

i

j

k

a

b

c

d

e

f

g

h

i

j

k

a

b

c

d

e

f

g

h

i

j

k

a

bd

e

f

g

h

i

j

k

c

a

b

c

d

e

f

g

h

i

j

k

a

b

c

d

e

f

g

h

i

j

k

a

b

c

d

e

f

g

h

i

j

k

a

bd

e

f

g

h

i

j

k

c

Pattern Expansionk k+1

a

b

c

d

e

f

g

h

i

j

k

a

b

c

d

e

f

g

h

i

j

k

a

b

c

d

e

f

g

h

i

j

k

a

bd

e

f

g

h

i

j

k

c

a

b

c

d

e

f

g

h

i

j

k

a

b

c

d

e

f

g

h

i

j

k

a

b

c

d

e

f

g

h

i

j

k

a

bd

e

f

g

h

i

j

k

c

a

b

c

d

e

f

g

h

i

j

k

a

b

c

d

e

f

g

h

i

j

k

a

b

c

d

e

f

g

h

i

j

k

a

bd

e

f

g

h

i

j

k

c

Page 13: Functions, Networks, and Phenotypes by Integrative Genomics Analysis Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern.

Our solutionWe develop a novel algorithm, called CODENSE, to mine

frequent coherent dense subgraphs. The target subgraphs

have three characteristics:

(1) All edges occur in >= k graphs (frequency)

(2) All edges should exhibit correlated occurrences in the given graph set. (coherency)

(3) The subgraph is dense, where density d is higher than a threshold and d=2m/(n(n-1)) (density)

m: #edges, n: #nodes

Page 14: Functions, Networks, and Phenotypes by Integrative Genomics Analysis Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern.

CODENSE: Mine coherent dense subgraph

a

b

d

e

g

h

i

c

f

summary graph Ĝ

f

a

b

d

e

g

h

i

c

G1

f

a

b

c

d

e

f

g

h

i

a

b

c

d

e

f

g

h

i

a

b

c

d

e

f

g

h

i

a

b

c

d

e

f

g

h

i

a

b

c

d

e

g

h

i

G3G2

G6G5G4

(1) (1) Builds a summary graph by eliminating infrequent edgesBuilds a summary graph by eliminating infrequent edges

Page 15: Functions, Networks, and Phenotypes by Integrative Genomics Analysis Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern.

(2) Identify dense subgraphs of the summary graph(2) Identify dense subgraphs of the summary graph

a

bd

e

g

h

i

cf

summary graph Ĝ

e

g

h

i

cf

Sub(Ĝ)

Step 2

MODES

Observation: If a frequent subgraph is dense, it must be a dense

subgraph in the summary graph. However, the reverse conclusion

is not true.

CODENSE: Mine coherent dense subgraph

Page 16: Functions, Networks, and Phenotypes by Integrative Genomics Analysis Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern.

(3) (3) Construct the edge occurrence profiles for each Construct the edge occurrence profiles for each dense summary subgraphdense summary subgraph

e

g

h

i

cf

Sub(Ĝ)

Step 3

…………………

111000e-f

011100c-i

111000c-h

111010c-f

101100c-e

G6G5G4G3G2G1E

edge occurrence profiles

CODENSE: Mine coherent dense subgraph

Page 17: Functions, Networks, and Phenotypes by Integrative Genomics Analysis Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern.

…………………

111000e-f

011100c-i

111000c-h

111010c-f

111100c-e

G6G5G4G3G2G1E

edge occurrence profiles

Step 4

c-f

c-h

c-e

e-h

e-f

f-h

c-i

e-i

e-g g-i

h-i

second-order graph S

g-hf-i

(4) (4) builds a second-order graph for each dense summary builds a second-order graph for each dense summary subgraphsubgraph

CODENSE: Mine coherent dense subgraph

Page 18: Functions, Networks, and Phenotypes by Integrative Genomics Analysis Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern.

c-f

c-h

c-e

e-h

e-f

f-h

c-i

e-i

e-g g-i

h-i

second-order graph S

g-hf-i

Step 4

c-f

c-h

c-e

e-h

e-f

f-h

e-i

e-g g-i

h-i

Sub(S)

g-h

(5) Identify dense subgraphs of the second-order graph(5) Identify dense subgraphs of the second-order graph

Observation: if a subgraph is coherent (its edges show high correlation

in their occurrences across a graph set), then its 2nd-order graph must

be dense.

CODENSE: Mine coherent dense subgraph

Page 19: Functions, Networks, and Phenotypes by Integrative Genomics Analysis Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern.

(6) Identify the coherent dense subgraphs(6) Identify the coherent dense subgraphs

c-f

c-h

c-e

e-h

e-f

f-h

e-i

e-g g-i

h-i

Sub(S)

g-h

Step 5

c

e

fh

e

g

h

i

Sub(G)

CODENSE: Mine coherent dense subgraph

Page 20: Functions, Networks, and Phenotypes by Integrative Genomics Analysis Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern.

Our solutionThe identified subgraphs by definition satisfy the three

criteria:

(1) All edges occur in >= k graphs (frequency)

(2) All edges should exhibit correlated occurrences in the given graph set. (coherency)

(3) The subgraph is dense, where density d is higher than a threshold and d=2m/(n(n-1)) (density)

m: #edges, n: #nodes

Page 21: Functions, Networks, and Phenotypes by Integrative Genomics Analysis Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern.

…………………

111000e-f

011100c-i

111000c-h

111010c-f

111100c-e

G6G5G4G3G2G1E

edge occurrence profiles

c

e

fh

e

g

h

i Step 4Step 5

Sub(G)

a

bd

e

g

h

i

cf

a

bc

d

e

f

g

h

i

a

b

c

d

e

f

g

h

i

a

b

c

d

e

f

g

h

i

a

b

d

e

f

g

h

i

c a

b

c

d

e

f

g

h

i

a

b

c

d

e

f

g

h

i

G1 G3G2

G6G5G4

c-f

c-h

c-e

e-h

e-f

f-h

c-i

e-i

e-g g-i

h-i

second-order graph S

g-hf-i

Step 1

Step 3

summary graph Ĝ

e

g

h

i

cf

Sub(Ĝ)

Step 2

c-f

c-h

c-e

e-h

e-f

f-h

e-i

e-g g-i

h-i

Sub(S)

g-h

Step 6

MODESAdd/Cut

MODESRestore G and MODES

CODENSE: Mine coherent dense subgraph

Page 22: Functions, Networks, and Phenotypes by Integrative Genomics Analysis Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern.

CODENSE

The design of CODENSE can solve the scalability issue.

Instead of mining each biological network individually,

CODENSE compresses the networks into two meta-graphs

(the summary graph and the second-order graph) and

performs clustering in these two graphs only. Thus,

CODENSE can handle any large number of networks.

Page 23: Functions, Networks, and Phenotypes by Integrative Genomics Analysis Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern.

V

g

j

h

i

g

f

e

a

b

c

d

h

i

j

g

f

e

a

b

c

d

h

i

j

V

h

i

f

e

a

b

c

d

h

i

f

e

h

i

Step 1 Step 2

Step 3Step 4

G Sub(G)

Sub(G’)

G’

HCS’ condense

HCS’

restoreHCS’

MODES: Mine overlapped dense subgraph

Hu et al. ISMB 2005

Page 24: Functions, Networks, and Phenotypes by Integrative Genomics Analysis Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern.

c1 c2… cm

g1 .1 .2… .2

g2 .4 .3… .4

c1 c2… cm

g1 .8 .6… .2

g2 .2 .3… .4

c1 c2… cm

g1 .9 .4… .1

g2 .7 .3… .5

c1 c2… cm

g1 .2 .5… .8

g2 .7 .1… .3

a

b

c

d

e

f

g

h

i

j

k

a

b

c

d

e

f

g

h

i

j

k

a

b

c

d

ef

g

h

i

j

k

a

bd

e

f

g

h

i

j

k

c

a

b

c

d

e

f

g

h

i

j

k

a

b

c

d

e

f

g

h

i

j

k

a

b

c

d

e

f

g

h

i

j

k

a

bd

e

f

g

h

i

j

k

c

Applying CoDense to 39 yeast microarray data sets

Page 25: Functions, Networks, and Phenotypes by Integrative Genomics Analysis Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern.

Functional annotation

Annotation

Page 26: Functions, Networks, and Phenotypes by Integrative Genomics Analysis Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern.

Functional Annotation (Validation)

Method: leave-one-out approach - masking a known gene to be unknown, and assign its function based on the other genes in the subgraph pattern.

Functional categories: 166 functional categories at GO level at least 6

Results: 448 predictions with accuracy of 50%

Page 27: Functions, Networks, and Phenotypes by Integrative Genomics Analysis Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern.

Functional Annotation (Prediction)

We made functional predictions for 169

genes, covering a wide range of functional

categories, e.g. amino acid biosynthesis, ATP

biosynthesis, ribosome biogenesis, vitamin

biosynthesis, etc. A significant number of our

predictions can be supported by literature.

Page 28: Functions, Networks, and Phenotypes by Integrative Genomics Analysis Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern.

However…

• How about frequent non-dense graphs?– Many biological modules may form paths

• How about subgraphs which are coherent across only a subset of the graphs?– Not all modules are activated across all

conditions, and genes may form modules with diff. other genes under diff. conditions

Page 29: Functions, Networks, and Phenotypes by Integrative Genomics Analysis Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern.

Network Biclustering:Identify frequent subgraphs across

massive graphs

Huang et al, ISMB 2007

Page 30: Functions, Networks, and Phenotypes by Integrative Genomics Analysis Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern.

Using 65 human co-expression network as an illustration example

• 65 co-expression networks generated from 65 microarray data sets

• each graph contains 8297 genes, and 1%-10% edges of a complete graph

Page 31: Functions, Networks, and Phenotypes by Integrative Genomics Analysis Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern.

Basically, it is a biclustering problem

edges

graphs

Page 32: Functions, Networks, and Phenotypes by Integrative Genomics Analysis Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern.

Network Biclustering

• Objective function

cmn

cf

'

c’: number of 1 in the bicluster c: number of 1 in the whole matrix mn: size of the bicluster: regularization factor

However, the matrix is very large with millions of edges …

We will first identify robust seed to narrow down the search space

1 0 1 0 0 … 1

0 1 0 0 1 … 0

0 1 1 1 1 … 0

0 1 0 1 1 … 1

0 1 1 0 1 … 0

1 1 1 1 1 … 0

0 0 1 0 0 … 1

graphs

edges

Page 33: Functions, Networks, and Phenotypes by Integrative Genomics Analysis Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern.

Identify Bicluster seed

The property of relation graphs: edge labels are unique.

Hence, each graph can be treated as a collection of items

Thus, Frequent subgraph Mining can be modeled as frequent item set mining

Problem: current frequent item set mining algorithms can only efficiently mine across many small item sets In our problem, we have 65 very large item set…

We use a trick….

Page 34: Functions, Networks, and Phenotypes by Integrative Genomics Analysis Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern.

…………………

101011e5

011100e4

111111e3

101011e2

101011e1

G6G5G4G3G2G1E

…………………

111000…

011100...

111000…

111010…

111100...

G65G64G63G62G61G60...

Edge occurrence profiles:

common edges common edges common edges

{e1, e10, e56, e100, e1000,…} {e4, e12, e33, e56, e890,…} …. {e99, e220, e1545, e2629,…}

{G1, G3, G5, G6, G7,…} {G2, G3, G5, G7, G8,…} …. {G8, G9, G15, G26, G29}

Graph set with more than 5 membersand with > 1000 common edgesFrequent pattern tree

Identify Bicluster seed

Very time consuming! It takes more than 2 weeks on 40 Pentium IV nodes

Huang et al. ISMB 2007

Page 35: Functions, Networks, and Phenotypes by Integrative Genomics Analysis Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern.

Expanding the Biclusters

…………………

101001e5

111111e4

111111e3

011111e2

111011e1

G6G5G4G3G2G1E

…………………

1110001

011100.0

1110000

1110101

1111000

G65G64G63G62G61…G7

Simulated Annealing

…………………

101001e5

111111e4

111111e3

011111e2

111011e1

G6G5G4G3G2G1E

…………………

1110001

011100.0

1110000

1110101

1111000

G65G64G63G62G61…G7

Identify connected components

Page 36: Functions, Networks, and Phenotypes by Integrative Genomics Analysis Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern.

Systematic identification of functional modules in human genome

• We identified 143,400 network modules with recurrence >= 5. They vary in size from 4 to 180.

• 77.0% of the patterns are functionally homogenous (GO hyper-geometric P-value less than 0.01)

• The figure shows that the functional homogeneity of modules increase with their recurrences.

Page 37: Functions, Networks, and Phenotypes by Integrative Genomics Analysis Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern.

Results (I): Examples of highly recurring modules

Recurrence 20Defense against oxygen and nitrogen species

Recurrence 18Involved in spermatogenesis

Page 38: Functions, Networks, and Phenotypes by Integrative Genomics Analysis Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern.

Loosely connected network patterns with high recurrence can

represent functional modules

Page 39: Functions, Networks, and Phenotypes by Integrative Genomics Analysis Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern.

Functional annotation

Annotation

We made functional predictions for 779 known and 116 unknown genes by random forest classification with 71% accuracy.

Variables for random forest classification: functional enrichment P-value network topology scorenetwork connectivity pattern recurrence numbersaverage node degree unknown gene ratioNetwork size

Page 40: Functions, Networks, and Phenotypes by Integrative Genomics Analysis Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern.

Network Modules (NeMo)Identify frequent dense vertex sets

across many massive graphs

m ic ro array

...

c o exp res s io ngrap h

...

(neighb o r as s o c iatio n)s um m ary grap h

c lus tering refinem ent

s tep 1 s tep 2 s tep 3 s tep 4

...

p artitio ning

Yan et al. ISMB 2007

Page 41: Functions, Networks, and Phenotypes by Integrative Genomics Analysis Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern.

105 human microarray data sets

NeMo

6477 recurrent coexpression clusters(density > 0.7 and support > 10)

Validation based on ChIp-chip data (9176 target genes for 20 TFs)

Validation based on human-mouse Conserved Transfac prediction (7720 target genes for 407 TFs)

15.4% homogenous clusters(vs. 0.2% by randomization test)

12.5% homogenous clusters(vs. 3.3% by randomization test)

Page 42: Functions, Networks, and Phenotypes by Integrative Genomics Analysis Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern.

Percentage of potential transcription modules validated by ChIP-Chip data increase with cluster density and recurrence

Page 43: Functions, Networks, and Phenotypes by Integrative Genomics Analysis Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern.

Microarray Data

Expression data

Phenotype Concepts (e.g. diseases, perturbations, tissues )in Unified Medical Language System (UMLS)

Phenotype information

Page 44: Functions, Networks, and Phenotypes by Integrative Genomics Analysis Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern.

Classifying microarray data based on phenotype

Adenocarcinoma Arthritis Asthma Glaucoma … HIV

For example, the current NCBI GEO database contains >60 cancer datasets, among which 11 leukemia datasets.

Page 45: Functions, Networks, and Phenotypes by Integrative Genomics Analysis Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern.

Identify phenotype-specific functional or transcriptional modules

• Unsupervised approach

Frequent pattern mining

Module 1, Module 2, Module 3, … Module kModule 1

Cancer Cancer Cancer Cancer

Page 46: Functions, Networks, and Phenotypes by Integrative Genomics Analysis Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern.

105 microarray data sets

NeMo

6477 recurrent coexpression clusters(density > 0.7 and support > 10)

A total of 459 of the clusters are statistically significant with a hypergeometric P-value <0.01 in a specific phenotype:

e.g. malignant neoplasms, cardiovascular diseases or nervous system disorders.

Page 47: Functions, Networks, and Phenotypes by Integrative Genomics Analysis Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern.

An example

5 out of the 9 support datasets are leukemia datasets (P-value 0.0039). It is potentially regulated by E2F4, and majority genes are involved in cell cycle and DNA repair.

Page 48: Functions, Networks, and Phenotypes by Integrative Genomics Analysis Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern.

Identify phenotype-specific functional or transcriptional modules

• Supervised approachAdenocarcinoma Arthritis Asthma Glaucoma … HIV Non-Adenocarcinoma

Functional and transcriptional modules which are active ONLY in Adenocarcinoma

Related data sets

Page 49: Functions, Networks, and Phenotypes by Integrative Genomics Analysis Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern.

A case study: Identify Network Modules Characterizing Cancer

c1 c2… cm

g1 .9 .4… .1

g2 .7 .3… .5

c1 c2… cm

g1 .2 .5… .8

g2 .7 .1… .3

a

b

c

d

e

f

g

h

i

j

k

a

b

c

d

ef

g

h

i

j

k

a

b

d

e

f

g

h

i

j

k

c

c1 c2… cm

g1 .9 .4… .1

g2 .7 .3… .5

c1 c2… cm

g1 .9 .4… .1

g2 .7 .3… .5

c1 c2… cm

g1 .9 .4… .1

g2 .7 .3… .5

c1 c2… cm

g1 .9 .4… .1

g2 .7 .3… .5

c1 c2… cm

g1 .9 .4… .1

g2 .7 .3… .5

32 Cancer Datasets

25 Non-cancer Datasets

a

b

c

d

e

f

g

h

i

j

k

a

b

c

d

e

f

g

h

i

j

k

a

b

c

d

e

f

g

h

i

j

k

c1 c2… cm

g1 .2 .5… .8

g2 .7 .1… .3

c1 c2… cm

g1 .2 .5… .8

g2 .7 .1… .3

c1 c2… cm

g1 .2 .5… .8

g2 .7 .1… .3

c1 c2… cm

g1 .2 .5… .8

g2 .7 .1… .3

c1 c2… cm

g1 .2 .5… .8

g2 .7 .1… .3

Page 50: Functions, Networks, and Phenotypes by Integrative Genomics Analysis Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern.

Examples of identified modules

Cell cycle Moduleacross all cancer datasets

Cell adhesionacross all solid tumor datasets

PDGF-signalingin breast cancer datasets

Page 51: Functions, Networks, and Phenotypes by Integrative Genomics Analysis Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern.

Reconstruct transcriptional cascades by second-order analysis

Zhou et al. Nature Biotech 2005

Page 52: Functions, Networks, and Phenotypes by Integrative Genomics Analysis Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern.

Frequently occurring tight clusters

Page 53: Functions, Networks, and Phenotypes by Integrative Genomics Analysis Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern.

Frequently occurring tight clusters

Transcription Factors

Page 54: Functions, Networks, and Phenotypes by Integrative Genomics Analysis Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern.

Co-occurrence of tight clusters

Coexpression network constructed with the dataset 1

Page 55: Functions, Networks, and Phenotypes by Integrative Genomics Analysis Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern.

Co-occurrence of tight clusters

Coexpression network constructed with the dataset 2

Page 56: Functions, Networks, and Phenotypes by Integrative Genomics Analysis Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern.

Co-occurrence of tight clusters

Coexpression network constructed with the dataset 3

Page 57: Functions, Networks, and Phenotypes by Integrative Genomics Analysis Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern.

Co-occurrence of tight clusters

Coexpression network constructed with the dataset 4

Page 58: Functions, Networks, and Phenotypes by Integrative Genomics Analysis Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern.

Co-occurrence of tight clusters

Coexpression network constructed with the dataset 5

Page 59: Functions, Networks, and Phenotypes by Integrative Genomics Analysis Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern.

Coexpression Networks

Transcription Factors Set 1

Transcription Factor Set 2

Cooperativity

Page 60: Functions, Networks, and Phenotypes by Integrative Genomics Analysis Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern.

Relevance Networks

Page 61: Functions, Networks, and Phenotypes by Integrative Genomics Analysis Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern.

Three types of transcription cascades

TF1

TF2

TF3

Type I

Type II

gene 2

gene 1

gene 3

gene 4

gene 5

TF1

TF2 gene 2

gene 1

gene 3gene 4

gene 5

Type III

TF1

TF2

gene 2

gene 1

gene 3

gene 4

gene 5

Transcription Regulation

Protein Interaction

Page 62: Functions, Networks, and Phenotypes by Integrative Genomics Analysis Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern.

Applying to 39 yeast microarray data sets

• We identified 60 transcription modules. Among them, we found 34 pairs that showed high 2nd-order correlation. A significant portion (29%, p-value<10-5 by Monte Carlo simulation ) of those modules pairs are participants in transcription cascades: 2 pairs in Type I, 8 pairs in Type II, and 3 pairs in type III cascades. In fact, these transcription cascades inter-connect into a partial cellular regulatory network.

Page 63: Functions, Networks, and Phenotypes by Integrative Genomics Analysis Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern.

CDC45, RAD27, RNR1, SPT21, YPL267W

YBL113C YRF1-1 YRF1-7

SWI5

YEL077C YRF1-3 YRF1-7

ALK1 BUD4 CDC20 CDC5 CLB2 HST3 SWI5 YIL158W YJL051W YOR315W

MSN4

NDD1

YBL113C YIL177C YJL225C YLR464W YML133C YRF1-1 YRF1-3 YRF1-4 YRF1-5 YRF1-6 YRF1-7

BUD19 RPL13A RPL13B RPL16A RPL24B RPL28 RPL2B RPL33B RPL39 RPL42B RPS16B RPS18A RPS21B RPS22A RPS23B RPS4B RPS6A

MET4

GCN4

MET10 MET14 MET28 SUL2

ALD5 ARG1 ARG2 ARG3 ARG4 ARG5,6 ARO1 ARO3 ARO4 ASN1 BNA1 CPA2 DED81 ECM40 HIS1 HIS4 HOM3 LEU4 LEU9 LYS20 MET22 ORT1 PCL5 TEA1 TRP2 YBR043C YDR341C YHM1 YHR162W YJL200C YJR111C

Leu3

BAT1 ILV2 LEU9

SWI4

MBP1

CLB6 RNR1 SPT21 YPL267W

CDC21 CDC45 CLB5 CLB6 CLN1 GIN4 IRR1 MCD1 MSH6 PDS5 RAD27 RNR1 SPO16 SPT21 SWE1 TOF1 YBR070C

HGH1 LOC1 NOC3 YDR152W YHL013C

YBL113C YRF1-1 YRF1-3 YRF1-7

SWI6

RCS1

GAT3

YAP5

PDR1

GAL4

RGM1

GAS1HTA1HTB1

Regulation of transcription modules by transcription factors, based on ChIP-chip data and supported by recurrent expression clusters

Regulation between transcription factors, based on ChIP-chip data and supported by 2nd-order expression correlation

Protein interactions between two transcription factors, based on experimental data and supported by 2nd-order expression correlation

Two transcription modules with high 2nd-order expression correlations

Page 64: Functions, Networks, and Phenotypes by Integrative Genomics Analysis Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern.

Integrative Array Analyzer (iArray): a software package for cross-platform and cross-species microarray analysis

Bioinformatics. 22(13):1665-7

Page 65: Functions, Networks, and Phenotypes by Integrative Genomics Analysis Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern.

Gene Aging Nexus (GAN): An Integrated Genomics Data Mining Platform for Aging

Nucleic Acids Res. 2006 Nov

Page 66: Functions, Networks, and Phenotypes by Integrative Genomics Analysis Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern.

Acknowledgement

Second-order TF cascade• Ming-Chih Kao (U. Michigan)• Haiyan Huang (UC Berkeley)• Wing H. Wong (Stanford)

CoDense• Haiyan Hu

NeMo• Xifeng Yan (IBM)• Mike Mehan

NetBiclustering• Haifeng Li• Yu Huang

Cancer Network Module• Min Xu

Funding agencies: NIGMS, NCI, NSF, Seaver Foundation, Sloan Foundation, Zumberg Foundation

Consultation• Jiawei Han (UIUC)• Michael Waterman

Page 67: Functions, Networks, and Phenotypes by Integrative Genomics Analysis Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern.

Thank you!