From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH...

108
From patterns to pathways. al analysis of gene expression Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@ biobase .de www. biobase .de

Transcript of From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH...

Page 1: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

From patterns to pathways. Causal analysis of gene expression data

Alexander Kel

BIOBASE GmbH

Halchtersche Strasse 33D-38304 Wolfenbuettel

Germany

[email protected] www.biobase.de

Page 2: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

TRANSCompel

TRANSFAC

TRANSPATH

Patho DBS/MARt DB

- mechanistic- semantic

Match Patch

Catch

Pathway builder Array analyser

Cytomer TRANSGenome TRANSPLORER

CMFinder

Page 3: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

The TRANSFAC® System comprises 7 databases:

 TRANSFAC® Professional Suite

 TRANSFAC® Professional

Transcription factor database

 TRANSCompel® Professional

Composite elements database

 PathoDB® Professional

Pathologically altered transcription factors

 TRANSPRO™Professional

Collection of human promoter sequences

 S/MARt DB™Professional

Scaffold or Matrix Attached Regions databases

   

Cytomer® Ontology of cells, structures, organs

 TRANSPATH® Professional

Signal transduction pathways

Page 4: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

TRANSFAC® Professional

Transcription factor database

Page 5: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

…cis

trans

Page 6: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

Human genes Sequences and positions of AP-1 binding sites glutathione P-

transferase

enhancer at -2500

hemoglobin, epsilon

-80 н.п.

Akt-2

-100 н.п.

IFN-

-89 н.п.

Apo АII

-792 н.п.

Melanotransferin

-2013 н.п.

Collagenase

-72 н.п.

proto-oncogene

c-myc

-335 н.п.

porphobilinogen deaminase

-162 н.п.

GM-CSF

enhancer at -3500

TGAСTTT

TGACATC

TGTCACC

TGACTCA

TGAGTCA

TGAGTCA

TGATTTA

TGACTCA

TGACTCA

Page 7: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

ST

GM-CSF Homo sapiens

+1

T-cell specific inducible enhancer at –3500 bp Promoter

TATTT

-54

AP-1

NFAT

CE

NF-Bp50/p65

-88

AP-1

NFAT

CE

AP-1

NFAT

CE

AP-1

NFAT

AP-1

NFAT

CE

NF-Bc-Rel/p65

HMG Y(I)

-114

CD28 response element

CBF CBF

Structure of regulatory regions of eukaryotic genes

Page 8: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

T F IIA T F IIE

T F IIH

T F 1

S 1S 2 S 3

T F IIF

R N A p o l II

T F IID

H isto n e a c e ty la seT F IIB

T F 2 T F 3

Protein-DNA and protein-protein interactions in gene transcriptional regulation.

Page 9: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

Transcription factors

Sequence-specific DNA binding

Non-DNA binding

TF1 TF2 TF3 TF4

adapter

Co-activator

HAT

DNA

Layer I

Layer III

Layer II

Page 10: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

interactingfactor

coding regionregulatory region

gene

expression

SITE

FACTOR

GENE

SYNONYMS

FEATURES

CLASS SPECIES

MATRIX

SEQUENCE

METHODCELL Q

FUNCTIONAL ELEMENT

TRANSFAC: relational scheme

Page 11: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

Manual annotation of the databases: input client

Page 12: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

TRANSFAC: GENE table

Page 13: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

TRANSFAC: SITE table

Page 14: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

Structure of transcription factors

USF-1, dimer

Page 15: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

DNA binding domain

Activation domain

oligomerization domain

Ligand- binding domain

Protein-protein interaction domain

Structure of transcription factors

Page 16: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

TRANSFAC: FACTOR table, protein sequence

Page 17: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

TRANSFAC: FACTOR table, protein domains

Page 18: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

TRANSFAC: FACTOR table, structural and functional features

Page 19: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

TRANSFAC: FACTOR table, links to other databases

Page 20: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

TRANSFAC: classification of transcription factors

Page 21: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

TRANSFAC: CLASS table

Page 22: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

TRANSFAC 8.1 (2004-03-31): number of factor entries for different species

human

mouse

rat

other vertebrates

fruit fly

plants

Fungi

Other

0

200

400

600

800

1000

1200

1400

Page 23: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

0

100

200

300

400

500

600

700

800

TRANSFAC 8.1 (2004-03-31): distribution of experimentally known TFBS in 5‘ regions of genes.

Page 24: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

TRANSFAC: FACTOR table, protein-DNA and protein-protein interactions

Page 25: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

TRANSFAC: MATRIX table

Page 26: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

TRANSCompel® Professional

Composite elements database

Page 27: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

tgccacacaggtagactcttTTGAAAATAtgTGTAATAtgtaaaa catcgtgaca cccccatatt… … . . . . . . .

-96 -79 ST

COMPEL:C00050NF-ATp

AP-1

Mouse Interleukin-2gene promoter

TGAGTCA

AP-1 consensus

Page 28: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

Synergistic activation of transcription

Low level of transcription

Low level of transcription

F1

F1

F1

F2

F2

F2

Composite elements

Minimal functional units where both protein-DNA and protein-protein interactions contribute to a highly specific pattern of gene expressionand provide cross-coupling of different signal transduction pathways.

Page 29: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

N Gene Scheme of CE 1. IgH ** , Mus

musculus 

 

 

2. IL-2, Homo sapiens  

-283 -268 : :

 

3.  

IL-2, Homo sapiens  

-167 -142 : :

 

 5.

4. Il-2, Mus musculus   

-167 -142 : :

 

IgH ** ,Homo sapiens 

 

6. 

Serum amyloid А1, Rattus norv

-117 -73 : :

 

7. IRF-1, Mus musculus  

-123 -113 -49 -40 : : : :

AP-1Ets

AP-1NFAT

AP-1NF-B

Ets CBF

AP-1 Oct-2

NF-BC/EBP

NF-BSTAT-1

Combinatorial regulation by the composite elements

Page 30: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

Ternary complex NFATp - AP1 - DNA

Page 31: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

Description of an evidence (experiment, cell type, two individual interactions)

flat files

Link to the TRANSFAC

GENE table

Link to EMBL

Link to the TRANSFAC FACTOR

table

Page 32: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

M e m b ran e re ce p tor

S rc

S H 3

S H 2 R a s

R a s

G D P

G T P

A d aptorsP L C

P I3 -K

Phospho ry la tion

IP 3

C a 2+

C a 2+C a2+

Ca2+ dependent cana l

Calc ineurin

E R K

E R K

JN K

JN K

P 3 8M A P K

P 3 8M A P K

N FAT p N FAT p

NFATp

P

P Pc-F o s c-F o s

с-F os

c-Ju n

c-Jun

c-Ju n

c-Ju n

AT F -2 AT F -2

AT F -2

IL -2

P K B /A k t

C om posite e lem ent

cytoplasm

Nucleus

Cross-coupling of signal transduction pathways

Page 33: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

Tissue-specific 

32        

Inducible 

44 119      

Cell-cycle dependent

  1 2    

Dev. stage-dependent

  3    

Ubiquitous constitutive

39 60 2 12

F1 F2

Tissue-specific

Indu-cible 

Cell-cycle dep.

Dev. stage-dependent

Ubiquit. constitut.

2

Inducible/inducible

19 CE‘s ETS / AP-1 providing cross-coupling of Ras/Raf- and PKC-dependent signalling pathways;

15 CE‘s NFATp / AP-1 providing cross-coupling of Ca2+ - and PKC-dependent signalling pathways;

14 CE‘s NF-B / C/EBP NF-B is inducible by IL-1 and TNF-; C/EBP is inducible by IL-6.

Page 34: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

Tissue-specific 

32        

Inducible 

44 119      

Cell-cycle dependent

  1 2    

Dev. stage-dependent

  3    

Ubiquitous constitutive

39 60 2 12

F1 F2

Tissue-specific

Indu-cible 

Cell-cycle dep.

Dev. stage-dependent

Ubiquit. constitut.

2

Inducible/constitutive

9 CE‘s ETS / Sp1 ETS factors are inducible through Ras/Raf- dependent signalling pathway;

5 CE‘s Smad / TEF3 Smads are inducible by TGF- signalling.

Page 35: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

Tissue-specific 

32        

Inducible 

44 119      

Cell-cycle dependent

  1 2    

Dev. stage-dependent

  3    

Ubiquitous constitutive

39 60 2 12

F1 F2

Tissue-specific

Indu-cible 

Cell-cycle dep.

Dev. stage-dependent

Ubiquit. constitut.

2

Inducible/tissue-restricted

CE‘s Pit-1 / AP-1 Pit1 is pituitary-restricted transcription factor whereas AP-1 and Ets are ubiquitous inducible factors;

Page 36: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

S S

F F

S S

F F

1 1

11

2 2

22

1)Cooperative binding to DNA and ternary complex formation

SS

F

1 2

2

3)

F1

Sim ultaneous interaction of activation domains w ith the com ponents of the basal complex

Mechanisms of functioning of synergistic composite elements

S S

F F

S S

F F

1 1

11

2 2

22

2)A new protein surface for DNA recognition could be formed

Page 37: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

S

F

S

F

1

1

2

2

4) Form ing a new protein surface for in teraction w ith the basal complex

Mechanisms of functioning of synergistic composite elements

F2F1

s1 s2

F1F2

5)Relief of autoinhibition as a result of protein-protein interactions

Page 38: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

7)

F1

F2

DNA wrapping around a nucleosome allows transcription factors to in teract

SS 1 2

2

8)

F

HAT com plex

F1

Recruitm ent of a HAT com plex by one of the transcription factors

Mechanisms of functioning of synergistic composite elements

S

SF

F

2

1

2

1

6)DNA bending by one of the transcription factors

Page 39: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

HDAC complex

1)HAT com plex

M utually exclusive binding of factor F1(activator) and F2 (repressor)

Mechanisms of functioning of antagonistic composite elements

Page 40: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

HDAC complex

HAT complex

2)

Binding of F2 (repressor) results in the conform ational changes of F1 (activator)

Mechanisms of functioning of antagonistic composite elements

Page 41: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

TRANSPATH® Professional

Database on signal transduction pathways

Page 42: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

TRANSPATH: map of IFN pathway

Page 43: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

TRANSPATH®TRANSPATH®

TRANSFAC®TRANSFAC®

Page 44: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

Extracellular ligand

Membrane receptor

Adaptor

Second messanger

Kinase(s)

Transcription factor

Target gene

TRANSPATH: molecules

Page 45: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

TLR4(h):MyD88(h)TLR4(h):MyD88(h)

complexescomplexes

TLR4(h)TLR4(h) TLR4(m)TLR4(m) TLR5(h)TLR5(h) basicbasic

IL-1/Toll receptor familyIL-1/Toll receptor family

TLRsTLRs

TLR4TLR4 TLR5TLR5

familyfamily

familyfamily

orthologortholog

modified form

modified form

TLR4(h)pTLR4(h)p

TRANSPATH: molecule hierarchy

TLR4a(h)TLR4a(h) TLR4b(m)TLR4b(m)

isoformisoform

Page 46: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

TRANSPATH: reactions

•Binding•Phosphorylation•Dephosphoralation•Degradation•Acetylation•Dissociation•Transregulation•Expression•Activation•...

Educts Products

Enzyme

Page 47: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

B

C

A

R

Reaction R, catalyzed by catalyst C, converts substance A into substance B.

The elementar reaction step

Page 48: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

Smad4

T:TR2p

R2T:

TR2p

:TR1p

R4

S2P:S4

TGFR-II

R1

TGF1

NTP

Smad2

R3

Smad2p

gene

R5

tc

NDP

TGFR-I

Pathway steps:

Pathway steps depict the signaling in a more biochemical way.

Page 49: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

In a semantic reaction, just individual key molecules are given.

Semantic: TGF1 TGF-RII TGF-RI Smad2 Smad4 gene

R1 R2 R3 R4 R5

Page 50: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

Info about a specific molecule

Parts of a molecule entryParts of a molecule entry

Many synonyms make sure, that you find your protein

Many synonyms make sure, that you find your protein

External database links allow identification of proteins easily

External database links allow identification of proteins easily

Page 51: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

Specific molecule (cont.)

Opens data entry of a specific reaction

Opens data entry of a specific reaction

Parts of a molecule entryParts of a molecule entry

Disease information and GO terminology

Disease information and GO terminology

localization of human APP

localization of human APP

Page 52: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

Specific reaction of APP(h)

Evaluation of this reaction is based on experimental evidences

Evaluation of this reaction is based on experimental evidences

Part of a reaction entryPart of a reaction entry

Page 53: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

Extracellular ligand

Membrane receptor

Adaptor

Second messanger

Kinase(s)

Transcription factor

Target gene

Signal transduction pathways

Page 54: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

Connecting path between two molecules

Connection between one specific molecule (magenta) and a group of molecules (transcription factors in blue)

Connection between one specific molecule (magenta) and a group of molecules (transcription factors in blue)

Page 55: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

Oncostatin M pathway

B-cell antigen receptor pathway

PDGF pathway

Insulin pathway

Page 56: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

Overview of a pathway – hand-drawn map

Page 57: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

TRANSPATH: number of entries

0

2000

4000

6000

8000

10000

12000

Release Profess ional2.1

Release Profess ional2.4

Release Profess ional3.1

m olecules reactions references

Page 58: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

Main tables + NetPro– Molecule 18029 + 7333 – Reaction 20199 + 30316 – Reference 8258 + 9582

Molecules of mammalian origin– Human 2503 3521– Mouse 1653 2025– Rat 810 1224

Prediction26 588 predicted human gene products of which 30.8% (~9000) seem to be

signal transduction relevant (Venter et al., 2001)

=> 28% coverage of predicted proteins in TRANSPATH®

Statistics: TRANSPATH® 5.1 and NetPro 1.1

Page 59: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

TRANSFAC® System

From patterns to pathways

Page 60: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

The starting point:A set of induced genes from

microarray experiments

Array analysis

Page 61: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

The conventional analysis:deduce the gene products

and map them to the network of metabolic pathways

KEGG

biochemical effects

Array analysis

Page 62: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

Extension of conventional analysis:

map the induced gene products to the network of regulatory pathways

biological effects

TRANSPATH

Array analysis

Page 63: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

Array analysis

Reasoningof experimental findings:

promoter analysis of induced genes connected to network mapping

KEGG

TRANSPATH

Identification ofnew targets

Page 64: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

Array analysis

promoter model

TRANSGENOMEdatabase

additionalpredicted genes

extendedpredicted network

Promoter analysis identifies additional target genesand extends the affected network

Page 65: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

microarray: set ofinduced genes

indirect hints on causes

retrieval of upstream sequences

promoter analysis

network analysis

new target

TRANSPATH

TRANSFAC

TRANSGENOME

assignment of gene products

modeling of effects

metabolic network mapping

KEGG

regulatory network mapping

TRANSPATH

Array analysis

Causes

Effects

Page 66: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

…cis

trans

Page 67: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

l

i

l

i

l

ii

ifiI

ifiIibfiIq

1

max

1

min

1

)()(

)()(),()( (1 )

},,,{

)),(4ln(),()(CGTAb

ibfibfiI (2 )

A 9 2 1 0 1 0 0 0 0 1 15 13 13 7C 8 3 1 1 13 3 29 0 22 8 9 1 4 8G 4 2 2 2 15 26 0 29 7 17 3 7 9 8T 8 22 25 26 0 0 0 0 0 3 2 8 3 6

N T T T S G C G C S M D R N

?…

Page 68: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

TRANSPLORER (TRANScription exPLORER) is a software package for the analysis of transcription regulatory sequences. Currently, TRANSPLORER site prediction tool uses position weight matrices (PWM) collections. It is able to use several matrix sources: the largest and most up-to-date library of matrices derived from TRANSFAC® Professional database, other matrix libraries as well as any user-developed matrix libraries. This means that it provides an opportunity to search for a great variety of different transcription factor binding sites. A search can be made using all or subsets of matrices from the libraries.

Page 69: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

Search for most probable binding sites regulating gene expression

Page 70: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

Search for binding sites coinsiding with SNPs

Page 71: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

Mouse c-fos promoter (Matrix search for TF binding sites)

1 <------------V$IK1_01(0.86) -----...V$CREBP1CJUN_01(0.85) 2 <-----------V$IK2_01(0.90) -----...V$CREB_01(0.96) 3 ----------->V$AP2_Q6(0.87) <-------------V$GKLF_01(0.87) 4-->V$ATF_01(0.89) <-------V$MZF1_01(0.99) ----...V$ELK1_01(0.87) 5 <-----------V$AP2_Q6(0.92) <------------V$SP1_Q6(0.88) 6>V$AP1FJ_Q2(0.89) <-------------V$GKLF_01(0.85) 7>V$AP1_Q2(0.87) <-------------V$GKLF_01(0.86) 8->V$CREB_Q2(0.86) <---------V$CETS1P54_01(0.90) 9->V$CREB_Q4(0.90) <---------V$NRF2_01(0.90) 10 <-------------V$GC_01(0.88) 11 ----------->V$CAAT_01(0.87) 12 <------------V$TCF11_01(0.87) 13 ----------->V$AP2_Q6(0.87) 14 <---------V$USF_Q6(0.93) 16 --------...V$ATF_01(0.94) 17 -------...V$AP1FJ_Q2(0.95) 20 -------...V$CREBP1_Q2(0.93) 21 -------...V$CREB_Q2(0.95) 23 ---...V$IK2_01(0.85) MMCFOS_1 GAGCGCCCGCAGAGGGCCTTGGGGCGCGCTTCCCCCCCCTTCCAGTTCCGCCCAGTGACG 420 1-->V$CREBP1CJUN_01(0.85) -------------->V$BARBIE_01(0.86) 2-->V$CREB_01(0.96) -------------->V$TATA_01(0.95) 3 ----------->V$CAAT_01(0.91) --------->V$AP4_Q5(0.95) 4----------->V$ELK1_01(0.87) --------------------->V$HEN1_01(0.87) 5 --------->V$AP4_Q5(0.88) <---...V$CMYB_01(0.93) 6 <---------V$CDPCR3HD_01(0.93) --...V$VMYB_02(0.89) 7 <--------------V$TATA_01(0.88) 8 --------------------->V$HEN1_02(0.87) 9 <---------------------V$HEN1_02(0.86) 10 <-----------------V$AP4_01(0.88) 11 ----------->V$LMO2COM_01(0.93) 12 <-----------V$LMO2COM_01(0.93) 13 <-----------V$MYOD_01(0.88) 17--->V$AP1FJ_Q2(0.95) <---------V$AP4_Q6(0.99) 20---->V$CREBP1_Q2(0.93) <---------V$MYOD_Q6(0.96) 21---->V$CREB_Q2(0.95) Transcription start 23-------->V$IK2_01(0.85) 24 <=========== E2F (0.80) MMCFOS_1 TAGGAAGTCCATCCATTCACAGCGCTTCTATAAAGGCGCCAGCTGAGGCGCCTACTACTC 480 1 <-----------------V$CMYB_01(0.91) -------...V$ER_Q6(0.86) 2 <-----------V$LMO2COM_01(0.90) <----...V$TCF11_01(0.87) 3 --------->V$MYOD_Q6(0.90) -------->V$STAT_01(0.93) 4 --------->V$VMYB_01(0.89) <--------V$STAT_01(0.89) 5--------------V$CMYB_01(0.93) -------->V$LMO2COM_02(0.93) 6------>V$VMYB_02(0.89) <-----------V$CAAT_01(0.85) 7 -------->V$VMYB_02(0.88) 8 -------------->V$EVI1_04(0.86) 9 ------------->V$GATA1_02(0.93) 12 <------------V$ZID_01(0.85) 13 <----------V$CP2_01(0.97) 14 ---------->V$GATA_C(0.92) 15 ----------------->V$CMYB_01(0.86) 16 --------->V$CREL_01(0.91) 24 <=========== E2F (0.82) MMCFOS_1 CAACCGCGACTGCAGCGAGCAACTGAGAAGACTGGATAGAGCCGGCGGTTCCGCGAACGA 540

1------------V$AHRARNT_01(0.90) <-----------------V$NF1_Q6(0.85) 2--------V$NMYC_01(0.89) --------->V$AP4_Q5(0.91) 3------>V$USF_Q6(0.89) --------->V$AP4_Q6(0.85) 4------V$USF_C(0.86) ------------...V$YY1_02(0.86) 5 --------->V$AP4_Q5(0.91) 6 --------->V$AP4_Q6(0.86) 7 --------->V$AP4_Q5(0.92) 8 --------->V$AP4_Q6(0.86) 9 --------->V$AP4_Q5(0.86) HS198161_1 ACGCGCAGCAGCAGGCGCAGCACCAGGCGCAGGCCGCGCAGGCGGCGGCAGCGGCCATCT 540 1 ----------------->V$NF1_Q6(0.96) 2 <-----------------V$NF1_Q6(0.90) 3 --------->V$USF_Q6(0.87) 4------->V$YY1_02(0.86) ---------->V$CP2_01(0.88) 5 --------->V$AP4_Q5(0.92) ----------->V$CAAT_01(0.85) 6 --------->V$AP4_Q6(0.85) --------->V$AP4_Q5(0.86)

7 ------...V$CP2_01(0.86) 8 ===========> E2F (0.81) 9 ===========> E2F (0.90)

HS198161_1 CCGTGGGCAGCGGTGGCGCCGGCCTTGGCGCACACCCGGGCCACCAGCCAGGCAGCGCAG 600 1 <---------V$CETS1P54_01(0.89) <--------...V$GATA_C(0.86) 2 ----------------->V$NF1_Q6(0.85) <-------...V$GATA1_02(0.90) 3 --------->V$CETS1P54_01(0.90) <-------...V$GATA1_03(0.92) 4 <--------------------V$R_01(0.88) <-----...V$LMO2COM_02(0.90) 5 <---------------V$AHRARNT_01(0.86) 6 ----------->V$AP2_Q6(0.95) 7---->V$CP2_01(0.86) <-------...V$GATA1_04(0.87)

8 <----...V$CETS1P54_01(0.87) 9 ===========> E2F (0.80)

HS198161_1 GCCAGTCTCCGGACCTGGCGCACCACGCCGCCAGCCCCGCGGCGCTGCAGGGCCAGGTAT 660 1--V$GATA_C(0.86) <---------V$CETS1P54_01(0.89) 2------V$GATA1_02(0.90) --------...V$DELTAEF1_01(0.96) 3------V$GATA1_03(0.92) <---...V$CEBPB_01(0.88) 4---V$LMO2COM_02(0.90) 5 <-----------V$IK2_01(0.92) 6 <---------------V$E47_02(0.87) 7-----V$GATA1_04(0.87) 8-----V$CETS1P54_01(0.87) 9 <--------------V$E47_01(0.86) 10 ---------->V$DELTAEF1_01(0.99) 11 <-----------V$LMO2COM_01(0.94) 12 <-----------V$MYOD_01(0.87) 13 --------->V$MYOD_Q6(0.91) 14 ------->V$USF_C(0.93) HS198161_1 CCAGCCTGTCCCACCTGAACTCCTCGGGCTCGGACTACGGCACCATGTCCTGCTCCACCT 720

Exon 2 sequence of human thyroid transcription factor-1 (TTF-1) gene (HS198161)

(Matrix search for TF binding sites)

Page 72: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

Recruitment of CIITA to MHC-II promoters. A prototypical MHC-II promoter (HLA-DRA) is represented schematically with the W, X, X2, and Y sequences conserved in all MHC-II, Ii, and HLA-DM promoters. RFX, X2BP, NF-Y, and an as yet undefined W-binding protein bind cooperatively to these sequences and assemble into a stable higher order nucleoprotein complex referred to here as the MHC-II enhanceosome. CIITA is tethered to the enhanceosome via multiple weak protein-protein interactions with the W, X, X2, and Y-binding factors. The octamer site found in the HLA-DRA promoter (O), and its cognate activators (Oct and OBF-1) are not required for recruitment of CIITA. CIITA is proposed to activate transcription (arrow) via its amino-terminal activation domains (AD), which contact the RNA polymerase II basal transcription machinery.

Masternak K et al., Genes Dev 2000 May 1;14(9):1156-66

Enhanceosome

Page 73: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

..WRGAAAA.. ..TGASTCA..

8-12 bp

5’ 3’

Recognition method forT-cell specific Composite Elements NFAT/AP-1

NFATp

AP-1

1 2 3 4 5 6 7 8

ACGT

5588

1212

11

20

231

00

260

26000

25010

25100

15524

1 2 3 4 5 6 7 8 9

ACGT

193

169

425

36

42

338

36425

313292

000

47

24401

47000

28

2413

0,7

1,7

2,7

3,7

4,7

5,7

6,7

0,7 1,2 1,7 2,2 2,7 3,2 3,7 4,2 4,7

NFAT/AP-1 (training)Random

NFAT = -log(1-scoreNFAT)

AP-1 = -log(1-scoreAP-1)

Composite score

3.50.88

4.71.47w

APNFAT

APNFATCE

1

10,17

Page 74: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

TTTGGCGCGAAA

Selection of motifs with high frequencyin a window

WSGmotif:

window: [ ]

Promoters of cell-cycle genes:

Exon 2 sequences:

. . . . . . . . . . . . .}

}Frequencyof the motifsin the window

. . . . . . . . . . . . .

Page 75: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

N Motif () Window (w)1)

NY ff ˆˆ 2) Utility

i

1 MGCG [27,34] 0.0048 / 0.0041 = 1.179 0.80 -0.394 2 TTT [39,41] 0.0112 / 0.0032 = 3.536 0.75 0.9618 3 CGSK [17,38] 0.0851 / 0.0341 = 2.499 0.90 0.5353 4 HKCG [13,16] 0.0675 / 0.0095 = 7.071 0.79 0.5904 5 VDWW [17,46] 0.1233 / 0.0536 = 2.299 0.72 0.223 6 DWTT [21,26] 0.0337 / 0.0000 0.80 0.5036

Positive

characteristics

7 GSDM [3,69] 0.0980 / 0.0559 = 1.754 0.82 0.595 8

VWS

[7,66]

0.1258 / 0.1932 = 0.651

0.91

-0.095 9 HSWY [26,65] 0.0413 / 0.0813 = 0.508 0.79 -0.2297 10 VTV [19,34] 0.0427 / 0.1354 = 0.315 0.71 -0.261 N

egative

characteristics

11 BAY [7,65] 0.0274 / 0.0614 = 0.447 0.78 -0.566 =-5.6767

k

iiii XwfXd

0

),,()(

Motifs found in the local context of E2F sites in promoters of cell cycle-related genes

Score of context:

+1 1000 3000 5000 7000 9000

+1 1000 3000 5000 7000 9000

-1000

-1000

Human uracil DNA-glycosylase (E2F sites)

+ score of context

ttTTTGCCGCGAAAag q=0.92 d=2.8 (known site)

Page 76: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

SITEVIDEO systemBuilding of E2F site recognition program (step 2)

Page 77: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

SITEVIDEO systemBuilding of E2F site recognition program (step 3)

Page 78: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

Composite modules

w

...

Start of transcription

)1(offcutq

)2(offcutq

)(koffcutq

)1( )2( )(k

...

...

...

Kk

kavr

k

wwqC

,1

)()( )(max )()( wq kavr

)1(1s

)2(1s

)(1

ks )(knk

s...

Parameters of the model to be estimated

)2(2s

K - number of TF matrixes

ws

qsqni

ki

ki

koffcut

ki

k

sq

)(

)()( )(,1

)( )(

Page 79: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

Composite modules

w

...

Start of transcription

)1(offcutq

)2(offcutq

)(koffcutq

)1( )2( )(k

...

...

...

)1(1s

)2(1s

)(1

ks )(knk

s...

Parameters of the model to be estimated

)2(2s

Genetic Algorithms

Page 80: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

Weight: TF matrix

1.000000 0.840072 V$E2F_19

0.954483 0.737637 V$TATA_01

0.888064 0.939687 V$CREB_01

0.816179 0.941583 V$SP1_Q6

0.039746 0.839702 V$TAL1BETAE47_01

No

of

seq

ue

nce

s

0

10

20

30

40

-0,5 0,0 0,5 1,0 1,5 2,0 2,5 3,0 3,5 4,0

Exon-2 sequences

Cell cycle-related promoters

offcutq

Composite module in promoters of cell cycle-related genes

5,1

)()(

k

koffcut

k qC

Page 81: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

Mouse c-fos promoter

Cell cycle composite module

1 <------------V$IK1_01(0.86) -----...V$CREBP1CJUN_01(0.85) 2 <-----------V$IK2_01(0.90) -----...V$CREB_01(0.96) 3 ----------->V$AP2_Q6(0.87) <-------------V$GKLF_01(0.87) 4-->V$ATF_01(0.89) <-------V$MZF1_01(0.99) ----...V$ELK1_01(0.87) 5 <-----------V$AP2_Q6(0.92) <------------V$SP1_Q6(0.88) 6>V$AP1FJ_Q2(0.89) <-------------V$GKLF_01(0.85) 7>V$AP1_Q2(0.87) <-------------V$GKLF_01(0.86) 8->V$CREB_Q2(0.86) <---------V$CETS1P54_01(0.90) 9->V$CREB_Q4(0.90) <---------V$NRF2_01(0.90) 10 <-------------V$GC_01(0.88) 11 ----------->V$CAAT_01(0.87) 12 <------------V$TCF11_01(0.87) 13 ----------->V$AP2_Q6(0.87) 14 <---------V$USF_Q6(0.93) 16 --------...V$ATF_01(0.94) 17 -------...V$AP1FJ_Q2(0.95) 20 -------...V$CREBP1_Q2(0.93) 21 -------...V$CREB_Q2(0.95) 23 ---...V$IK2_01(0.85) MMCFOS_1 GAGCGCCCGCAGAGGGCCTTGGGGCGCGCTTCCCCCCCCTTCCAGTTCCGCCCAGTGACG 420 1-->V$CREBP1CJUN_01(0.85) -------------->V$BARBIE_01(0.86) 2-->V$CREB_01(0.96) -------------->V$TATA_01(0.95) 3 ----------->V$CAAT_01(0.91) --------->V$AP4_Q5(0.95) 4----------->V$ELK1_01(0.87) --------------------->V$HEN1_01(0.87) 5 --------->V$AP4_Q5(0.88) <---...V$CMYB_01(0.93) 6 <---------V$CDPCR3HD_01(0.93) --...V$VMYB_02(0.89) 7 <--------------V$TATA_01(0.88) 8 --------------------->V$HEN1_02(0.87) 9 <---------------------V$HEN1_02(0.86) 10 <-----------------V$AP4_01(0.88) 11 ----------->V$LMO2COM_01(0.93) 12 <-----------V$LMO2COM_01(0.93) 13 <-----------V$MYOD_01(0.88) 17--->V$AP1FJ_Q2(0.95) <---------V$AP4_Q6(0.99) 20---->V$CREBP1_Q2(0.93) <---------V$MYOD_Q6(0.96) 21---->V$CREB_Q2(0.95) Transcription start 23-------->V$IK2_01(0.85) 24 <----------- E2F (0.80) MMCFOS_1 TAGGAAGTCCATCCATTCACAGCGCTTCTATAAAGGCGCCAGCTGAGGCGCCTACTACTC 480 1 <-----------------V$CMYB_01(0.91) -------...V$ER_Q6(0.86) 2 <-----------V$LMO2COM_01(0.90) <----...V$TCF11_01(0.87) 3 --------->V$MYOD_Q6(0.90) -------->V$STAT_01(0.93) 4 --------->V$VMYB_01(0.89) <--------V$STAT_01(0.89) 5--------------V$CMYB_01(0.93) -------->V$LMO2COM_02(0.93) 6------>V$VMYB_02(0.89) <-----------V$CAAT_01(0.85) 7 -------->V$VMYB_02(0.88) 8 -------------->V$EVI1_04(0.86) 9 ------------->V$GATA1_02(0.93) 12 <------------V$ZID_01(0.85) 13 <----------V$CP2_01(0.97) 14 ---------->V$GATA_C(0.92) 15 ----------------->V$CMYB_01(0.86) 16 --------->V$CREL_01(0.91) 24 <----------- E2F (0.82) MMCFOS_1 CAACCGCGACTGCAGCGAGCAACTGAGAAGACTGGATAGAGCCGGCGGTTCCGCGAACGA 540 1----------->V$ER_Q6(0.86) 2--------V$TCF11_01(0.87) 3 --------->V$AP4_Q5(0.91) 4 --------->V$AP4_Q6(0.87) 5 ---------->V$AP1FJ_Q2(0.93) 6 ---------->V$AP1_Q2(0.90) 7 ---------->V$AP1_Q4(0.87) 8 <-----------V$IK2_01(0.94) MMCFOS_1 GCAGTGACCGCGCTCCCACCCAGCTCTGCTCTGCAGCTCC 580

Page 82: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

Computationally predicted E2F target genes confirmed by in vivo footprint

Gene EMBL Sequence of the potential sites

Position rel. start of

transcription

Score, q

Score of context,

d

Positions of PCR primers

(-) gcCTTGGCGCGTGTcc -165 .. -176 0.92 -201 -> (-) ggGGTGGCGCGCGGgc -92 .. –103 0.84 2.92 +96 <- (+) ccTCTGGCGCCACCgt -90 .. –79 0.88

c-fos, Hs HSFOS

(-) acGGTGGCGCCAGAgg -78 .. –89 0.83 (+) gcTATCGCGCCAGAga 79 .. 90 0.89 -27 -> (-) tcTCTGGCGCGATAgc 91 .. 80 0.91 +313 <-

JunB, Hs HS207341

(-) ggGCTGGCGCGGGCgg 169 .. 158 0.82 3.17 (+) ctGTTTGCGGGGCGga -513 .. -502 0.80 2.03 -122 -> (+) ccCTTCGCGCCCTGgg -298 .. -287 0.91 +210 <- (+) ctCTTGGCGCGACGct 28 .. 39 0.93 (-) agCGTCGCGCCAAGag 40 .. 29 0.83

tgf-1, Hs

HSTGFB1P

R

(+) ccTTTGCCGCCGGGga 85 .. 96 0.85 (-) ctCTCCGCGCGCGGga -1384 .. -1395 0.81 4.11 -404 -> (-) gtCTTGGCGACCGTtg -1009 .. -1020 0.81 -143 <- (-) ggCCTGGCGCCGGAct -739 .. -750 0.81 (+) tgATTGGCGGATAGag -589 .. -578 0.83

p14ARF, Hs AF082338

(-) acTTTCCCGCCCTGtg -265 .. -276 0.86 (-) gtTTTCGCGGGAAAac -491 .. -502 0.93 3.53 -667 -> (-) ctTTCAGCGCCCGTgc -409 .. -420 0.82 -330 <- (+) gcAGTGGCGCCTCCcg -377 .. -366 0.80 (+) ggCGTGGCGCGGAGcc -175 .. -164 0.83 4.39

Mcm4 (Cdc21), Hs

HSU63630

(+) ctTGTCGCGCAGGTac -93 .. -82 0.86 (+) agTTTCGCGCCAAAtt -187 .. -176 0.99 4.91 -211 -> (-) aaTTTGGCGCGAAAct -175 .. -186 1.00 +88 <- (+) ttTTTCCCGCGAAAct 8 .. 19 0.89 3.01

mcm5 (P1-cdc46), Hs

HS286B10

(-) agTTTCGCGGGAAAaa 20 .. 9 0.93 4.21 (+) aaGCTCGCGCCACTgc -270 .. -259 0.81 -137 -> (-) gcAGTGGCGCGAGCtt -258 .. -269 0.84 +123 <-

Von Hippel-Lindau (VHL), Hs

AF010238

(-) gtCTTCGCGCGCGCtc -28 .. 39 0.92 2.22

(-) gtCCTGGCGCGCGGgc -72 .. –83 0.83 -296 -> B-myb, Hs HSBMYBD

NA (+) cgCTTGGCGGGAGAta -53 .. -42 0.87 1.18 +14 <-

(-) ttTTTGGCGCCGGCtg -297 .. -308 0.97 -407 -> nucleolin, Hs

HSNUCLEO (-) ccGTGGGCGCGCGGgt -256 .. -267 0.81 2.91 -41 <-

(-) cgTTTGGCGCGGCTtg -296 .. -307 0.97 6.67 -538 -> nucleolin, Cg

CSNUCLEO -198 <-

(-) agTTTGGCGCGGCTtg -306 .. -317 0.97 1.76 -531 -> nucleolin, Ms

MMNUCLE

O -232 <-

Chromatin crosslinking

Immunoprecipitation

PCR

Page 83: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

G1 G1/S S G2 G1 G1/S S G2

G1/S-cycle

G1/S-growth

Page 84: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

a ) R e l a t i v e i m p o r t a n c e

)( k

C u t - o f f v a l u e )( k

offcutq

M a t r i x A C M a t r i x I D

0 . 1 4 1 4 2 0 0 . 9 2 3 0 7 7 M 1 0 0 0 9 V $ E 2 F _ 1 9 0 . 3 8 9 9 4 1 0 . 9 4 7 4 3 4 M 0 0 1 7 5 V $ A P 4 _ Q 5 0 . 9 0 5 3 2 5 0 . 8 3 8 1 0 6 M 0 0 0 8 8 V $ I K 3 _ 0 1 - 0 . 5 9 5 2 5 9 0 . 8 5 6 0 5 5 M 0 0 0 9 8 V $ P A X 2 _ 0 1 - 0 . 9 8 2 5 9 3 0 . 9 9 7 6 3 9 M 0 0 2 5 3 V $ C A P _ 0 1 - 0 . 8 1 4 9 4 3 0 . 7 3 4 6 9 7 M 0 0 1 3 7 V $ O C T 1 _ 0 3 b )

Histogram of G1/S cycle vs. G1/S growth

Site combination score

No of

obs

0

1

2

3

4

5

-1,8 -1,6 -1,4 -1,2 -1,0 -0,8 -0,6 -0,4 -0,2 0,0 0,2 0,4 0,6 0,8 1,0 1,2 1,4 1,6

Results of selection of a specific combinations of sites that distinguish G1/S cycle and G1/S growth promoters. (microarray data)

E2F and a set of additional factors can distinguish these two sets of promoters. AP-4 factors – an ubiquitous factor that have similar structure of DNA binding domains as E2F and Myc – main cell cycle regulators; IK3 (Ik-1...Ik-5 - a family of zink finger TF that play a role in development of the lymphocytes). Pax-2 factor is known to be involved in regulating cell cycle by inhibiting the p53 transcription. It is known that Oct-3 differentially phosphorylated during cell cycle and may have a role in the regulation of the G1/S growth promoters. As for Cup site, it was already speculated that the structure of the basal promoter may play an important role in differentiating gene expression during cell cycle

Page 85: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

TGASTCA

AP-1

...

Jun Fos

Page 86: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

human TNF promoter

mast cells

T-cells + ?

dendritic cells

T-cells

-107 -74

NFAT

NFATAP-1

NF-kB

C/EBPAP-1

VDR

Page 87: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

 

Fuzzy puzzle hypothesis of the multipurpose structure of the eukaryotic promoters: of coding multiple regulatory messages in the same DNA sequence. A,B,C and D,E,F – two sets of TF; 1,2 – two sites in DNA; BC – basal complex. 

A B C

D EF

B C

BC

1

2

1

2

 

Page 88: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

There‘s More Then One Way To Do It

(Convergent evolution)

Page 89: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

RefSeq LocusLink symbol synonyms

NM_002421 4312 MMP1 CLG, CN2 matrix metalloproteinase 1 (interstitial collagenase)

NM_004530 4313 MMP2 CLG4, CLG4Amatrix metalloproteinase 2 (gelatinase A, 72kD gelatinase, 72kD type IV collagenase)

NM_000611 966 CD59 MSK21, MIC11, MIN2, MIN1, MIN3CD59 antigen p18-20 (antigen identified by monoclonal antibodies 16.3A5, EJ16, EJ30, EL32 and G344)

NM_001972 1991 ELA2 elastase 2, neutrophilNM_005317 3004 GZMM LMET1, MET1 granzyme M (lymphocyte met-ase 1)NM_005532 3429 IFI27 P27 interferon, alpha-inducible protein 27

NM_001548 3434 IFIT1 GARG-16, IFNAI1, G10P1, IFI56 interferon-induced protein with tetratricopeptide repeats 1NM_000565 3570 IL6R interleukin 6 receptorNM_001565 3627 SCYB10 chemokine (C-X-C motif) ligand 10NM_001572 3665 IRF7 IRF-7A interferon regulatory factor 7NM_005564 3934 LCN2 NGAL lipocalin 2 (oncogene 24p3)NM_005567 3959 LGALS3BP 90K, MAC-2-BP lectin, galactoside-binding, soluble, 3 binding protein

NM_002422 4314 MMP3 STMY, STMY1 matrix metalloproteinase 3 (stromelysin 1, progelatinase) NM_002423 4316 MMP7 MPSL1, PUMP-1 matrix metalloproteinase 7 (matrilysin, uterine)

NM_004994 4318 MMP9 CLG4Bmatrix metalloproteinase 9 (gelatinase B, 92kD gelatinase, 92kD type IV collagenase)

NM_004995 4323 MMP14 MT1-MMP matrix metalloproteinase 14 (membrane-inserted)NM_002428 4324 MMP15 MT2-MMP matrix metalloproteinase 15 (membrane-inserted)NM_002534 4938 OAS1 IFI-4, OIASI, OIAS 2',5'-oligoadenylate synthetase 1 (40-46 kD)

NM_002787 5683 PSMA2 proteasome (prosome, macropain) subunit, alpha type, 2NM_004586 6197 RPS6KA3 ribosomal protein S6 kinase, 90kD, polypeptide 3NM_007315 6772 STAT1 STAT91 signal transducer and activator of transcription 1, 91kD

NM_003254 7076 TIMP1 CLGI, EPO, TIMPtissue inhibitor of metalloproteinase 1 (erythroid potentiating activity, collagenase inhibitor)

NM_003255 7077 TIMP2 tissue inhibitor of metalloproteinase 2

NM_000362 7078 TIMP3 SFDtissue inhibitor of metalloproteinase 3 (Sorsby fundus dystrophy, pseudoinflammatory)

NM_003684 8569 MKNK1 MNK1 MAP kinase-interacting serine/threonine kinase 1NM_006417 10561 IFI44 p44, MTAP44 interferon-induced protein 44

AXX list of genes

Page 90: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

>ELA2 elastase 2, neutrophil; chrom=19p13.3; LocusLink=1991; 15-AUG-2002;length=1200 ggtatcacagggccctgggtaaactgaggcaggcgacacagctgcatgtggccggtatcacagggccctgggtaaactga ggcaggcgacacagctgcatgtggccggtatcacagggccctgggtaaactgaggcaggcgacacagctgcatgtggccg tatcacagggccctgggtaaactgaggcaggtgacacagctgcatgtggccggtatcacggggccctggataaacagagg caggcgacacagctgcatgtggccggtatcacggggccctgggtaaactgaggcaggcgaggccacccccatcaagtccc tcaggtctaggtttggcaggtttggcaaaaacacagcaacgctcggttaaatctgaatttcgggtaagtatatcctgggc ctcatttggaagagacttagattaaaaaaaaaacgtcgagaccagcccggccaacacggtgaaaccccgtctctactaaa aatacaaaaaattagccaggcgcagtggctcacgcctgtgatcccagcactctgggaggctgaggcaggcggatcacccg aggtcagatgttcaagaccagcctggccgacagggcgaaacactgtctctactacaaatacaaaaattagccgggagtgg tggcaggtgcctgtaatctcagctattcaggaggctgaggcaggagaatcacttgaacctgggaggcggaggttgccgtg agccgggatcacgccaccgcactccagcctgggcgatagagcaagactctgtctccaaaaaaataaattaaaaaacccac attgattatctgacatttgaatgcgattgtgcatcctgaattttgtctggaggccccacccgagccaatccagcgtcttg tcccccttctcccccttttcatcaacgccctgtgccaggggagaggaagtggagggcgctggccggccgtggggcaatgc aacggcctcccagcacagggctataagaggagccgggcgggcacggaggggcagagaccccggagccccagccccaccat gaccctcggccgccgactcgcgtgtcttttcctcgcctgtgtcctgccggccttgctgctggggggtgagtttttgagtc caacctcccgctgctccctctgtcccgggttctgttcccacctctccatagagggccccaccagtgtgggtccctcatcc >MMP3 matrix metalloproteinase 3 (stromelysin 1, progelatinase); chrom=11q22.3; LocusLink=4314; 15-AUG-2002;length=1200 aaagttttacaaaatgtcttcctctgaatatgtttagagtcttgcattcaagcatttattatacaccaataatgtgagca acactttacttgacaaagaaacagaaaagaaaggaaaggaagaaaacagaagagcatgaagagaaaatttaggatggatt ctgttcttcaacttcaaagcatctgctaatttgaatttagggaggaggggaaaaggttgaaagagaataagacatgtgta gaagacaaggacagagagaatttcagtccggtaagcaatgtaattcatttcagttctacaactatttatggagcagctac gtgggcccatcacccattaataaattggttacagaattaaaaccaacccaaagggaatatacttccttctttttcacaga ccctctttgttctattctgcccatgaggttttcctcctcaagaaccagcaaatccaacgacagtcaatagcaggcattac aaatcagattcagaaaaataaatcaccccttctaaatttcttctagatattatcttttatgttttgagtataattgtata tagtatagactatagctatgtatgtacactttccacttacatcttttatttgcttttataatgtctttcttaaaataaaa ctgcttttagaagttctgcacaattctgatttttaccaagtcaacctacttcttctctcaaaaggacaaacataaattgt ctagtgaattccagtcaatttttccagaagaaaaaaaatgctccagttttctcctctaccaagacaggaagcacttcctg gagattaatcactgtgttgccttgcaaaattgggaaggttgagagaaattagtaaagtaggttgtatcatcctactttga atttggaatgtttggaaatggtcctgctgccatttggatgaaagcaaggatgagtcaagctgcgggtgatccaaacaaac actgtcactctttaaaagctgcgctcccgaggttggacctacaaggaggcaggcaagacagcaaggcatagagacaacat agagctaagtaaagccagtggaaatgaagagtcttccaatcctactgttgctgtgcgtggcagtttgctcagcctatcca ttggatggagctgcaaggggtgaggacaccagcatgaaccttgttcaggtaattaacactaactgacctggccaggtggg >IL6R interleukin 6 receptor; chrom=1; LocusLink=3570; 15-AUG-2002;length=1200 ttctctccttcctttccttccttcccctctatccctccttccctccctccctccctcctcccttccttttctttctttct tttctttttttttttttctttccagacagggtctcactgtcatccaggctggagtagcagcccccaatcacggctcactg taccctggatctcccggactcaagcaattttcccacctcagcttccctagtagctgggactataggtgtgtaccaccaca cccagctaatttttaaatttttttatagaaatgggggtctcactttgttacacaggctggtctagaattcctggactgaa gcaatccacccacccggctctcccaaagtgttggggttacaggcgtgagccactgcccctggtgttagtgtctgtctgtc aagtcaggagggcagccatgaacgttctgatgtctactgagcacgtgtggcccagaccgtgtgtcaggtgtttaggtgcc atccacagaaccttcctaataaccctgggcagcataggctttcttatctctgacagatgaggaaatggagactcagattc tgaaccgaagtcacagacacagtagatggtaggtctaaatggggacccaggtctatctgactgcaaagtccaaaccgttt ccttgcctctgctgcagcctgcgaggagcagctgggcagaaagactgtgcctttacggtggtgagtcttccgatgcccaa gcctcaccccagaccgatgaaatcagaatctctggagacccgacccagacattggtgggttttagggctcctggctgatt

ExtractpromotersusingTRANSGENOME

AXX promoterset

Page 91: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

Histogram (tt1.STA 2v*188c)

y = 13 * 0,42348 * normal (x; 1,503956; 0,895746)

VAR1

Pe

rce

nt

of

ob

s

0%5%

10%15%20%25%30%35%40%45%50%55%60%65%70%75%80%85%90%95%

100%

<= ,423 (,423;,847] (,847;1,27] (1,27;1,694] (1,694;2,117] > 2,117

Importance Core cut-off Matr. Cut-off AC Matrix--------------------------------------------- ---------------------------------

0.917751 0.877000 0.930000 M00062 V$IRF1_01

0.323077 1.000000 0.948000 M00339 V$ETS1_B

0.640828 0.989000 0.982000 M00199 V$AP1_C

0.276923 0.840000 0.853000 M00037 V$NFE2_01

1.000000 0.756000 0.760000 M00481 V$AR_01

0.159172 0.869000 0.866000 M00699 V$ICSBP_Q6

Interferon regulatory factor 1

Ets factorsAP-1

NF-E2 – an erythroid-specific factor

Androgen receptor

Interferon Consensus Sequence binding protein

Composite module found in the AXX promoters

Page 92: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

Sites in the AXX promoter set: Yes V$IRF1_01 V$ETS1_B V$AP1_C V$NFE2_01 V$AR_01 V$ICSBP_Q6 0 0.951000 1.742000 Char = 0.78964 ELA2 elastase 2, neutrophil 1 1.941000 0.984000 0.876000 Char = 1.50025 MMP3 matrix metalloproteinase 3 2 0.772000 Char = 0.77200 IL6R interleukin 6 receptor 3 1.681000 Char = 1.68100 MMP2 matrix metalloproteinase 2 4 0.964000 0.856000 0.764000 1.764000 Char = 1.59327 OAS1 2',5'-oligoadenylate synthetase 1 5 1.000000 0.880000 1.644000 Char = 2.52852 MMP1 matrix metalloproteinase 1 6 0.984000 Char = 0.63057 TIMP1 tissue inhibitor of metalloproteinase 1 7 1.860000 0.939000 Char = 1.85648 STAT1 signal transducer and activator of transc 8 1.987000 1.850000 0.812000 Char = 2.59763 MMP9 matrix metalloproteinase 9 9 0.868000 1.548000 Char = 1.78836 MMP15 matrix metalloproteinase 15 10 0.985000 0.862000 1.575000 Char = 2.44492 MMP7 matrix metalloproteinase 7 11 0.780000 Char = 0.78000 MMP14 matrix metalloproteinase 14 12 1.966000 0.853000 Char = 1.49608 CD59 CD59 antigen p18-20 13 Char = 0.00000 LCN2 lipocalin 2 (oncogene 24p3) 14 1.921000 1.715000 Char = 2.33563 GZMM granzyme M (lymphocyte met-ase 1) 15 0.802000 Char = 0.80200 IFI27 interferon, alpha-inducible protein 27 16 0.975000 1.766000 Char = 2.08100 TIMP3 tissue inhibitor of metalloproteinase 3 17 1.866000 1.852000 Char = 2.00731 IFIT1 interferon-induced protein with tetratr 18 1.569000 1.892000 Char = 1.87015 IFI44 interferon-induced protein 44 19 0.760000 Char = 0.76000 MKNK1 MAP kinase-interacting serine/threonine 20 1.886000 0.810000 Char = 2.54087 IRF7 interferon regulatory factor 7 21 0.765000 Char = 0.76500 TIMP2 tissue inhibitor of metalloproteinase 2 22 0.948000 0.873000 Char = 0.54803 LGALS3BP lectin, galactoside-binding, soluble 23 1.892000 0.885000 Char = 1.87725 SCYB10 24 Char = 0.00000 PSMA2

Sites in the other human promoters Not V$IRF1_01 V$ETS1_B V$AP1_C V$NFE2_01 V$AR_01 V$ICSBP_Q6 0 Char = 0.00000 1 Char = 0.00000 2 Char = 0.00000 3 Char = 0.00000 4 Char = 0.00000 5 Char = 0.00000 6 Char = 0.00000 7 Char = 0.00000 8 Char = 0.00000 9 Char = 0.00000 10 Char = 0.00000 11 Char = 0.00000 12 Char = 0.00000 13 Char = 0.00000 14 Char = 0.00000 15 Char = 0.00000

Page 93: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.
Page 94: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

Insulin pathway

?

InsR

Page 95: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

Insulin Part of the insulin signaling network in TRANSPATH

STAT1

Ras

InsR

Signaling network analysis

Page 96: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

AhR targetsGene expression Log(Experiment/Control)

-4

-2

0

2

4

6

8

10

Page 97: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

log(Experiment/Control)

-4

-2

0

2

4

6

8

10

-4 -2 0 2 4 6 8 10

real expression

pre

dic

ted

ex

pre

ssio

n

S41 distance = 0.417599 D2:0.658627 SIG:0.000000 MIN_LENGTH 3000.000000 3.581248 1.000000 0.933000 M00026 V$AHR_Q5 2.942371 1.000000 0.917000 M00639 V$HNF6_Q6 0.798865 0.844000 0.900000 M00220 V$SREBP1_01 0.409376 0.962000 0.926000 M00173 V$AP1_Q2 0.055716 0.959000 0.989000 M00726 V$USF2_Q6

-1.329975 1.000000 0.959000 M00235 V$AHRARNT_01 -0.713625 1.000000 0.918000 M00156 V$RORA1_01 -0.668375 0.903000 0.854000 M00201 V$CEBP_C

Composite model correlate with theexpression level

TSS

-1000 +1000

V$AHR_Q5

V$AHRARNT_01

Page 98: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

0.0983 * V$TCF11MAFG_01(0.821)0.0471 * V$FOXO4_01(0.961)0.0301 * V$IPF1_Q4(0.852)0.0410 * V$AR_01(0.851)0.0766 * V$GR_Q6(0.971)0.0482 * V$STAT1_02(0.995)0.0508 * V$CEBPB_01(0.98)0.0281 * V$STAT5A_02(0.826)

0.1040 * V$CETS1P54_02(0.949) -50- V$TCF4_Q5(0.908)0.0751 * V$TCF1P_Q6(0.726) -50- V$STAT6_01(0.861)0.0728 * V$SF1_Q6(0.684) -50- V$SMAD3_Q6(0.833)0.0419 * V$ELK1_02(0.862) -50- V$GRE_C(0.842)

Sma1Norm

-0.1 0.0 0.1 0.2 0.3 0.4 0.50

50

100

150

200

250

300

350

400

450

No ofobs

0

5

10

15

20

25

30

35

40

Sma1NormSma1Norm

-0.1 0.0 0.1 0.2 0.3 0.4 0.50

50

100

150

200

250

300

350

400

450

No ofobs

0

5

10

15

20

25

30

35

40

Composite module found in promoters of differentially expressed genes in liver of

growth hormone-deficient mice (Sma1).

differentially

expressed

genes

Non-changed

genes

Page 99: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

Results of the ArrayAnalyzer™ search upstream from TFs resulting in identifying: growth hormone (GH) and receptor tyrosine kinases (RTK) as potential key molecules involved in differential expression of the genes in liver of growth hormone-deficient mice (Sma1).

Page 100: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

TRANSPATH and tools, ArrayAnalyzer and PathwayBuilder4

At the next step, one can map the transcription factors found at the previous step on the signaling network of the TRANSPATH. If the factors found are parts of the same cascades that have been suggested on the step 1, then probability is increased that those factors are responsible for the coordinated gene regulation.

Page 101: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

cytokines, chemokines

membrane receptors

adaptor proteins

PI3K

Calcineurin, Ca2+ binding proteins

NF-ATs

Ras, Raf

ERK, JNK, MAPK

Jun, Fos

NF-AT/Jun:Fos

Groups that are statistically enriched by potential target genesfor Jun:Fos and NFATs (as shown in the table above).

Other groups that contain potential target genes for Jun:Fosand NFATs.

cytokines, chemokines

membrane receptors

adaptor proteins

PI3K

Calcineurin, Ca2+ binding proteins

NF-ATs

Ras, Raf

ERK, JNK, MAPK

Jun, Fos

NF-AT/Jun:Fos

Groups that are statistically enriched by potential target genesfor Jun:Fos and NFATs (as shown in the table above).

Other groups that contain potential target genes for Jun:Fosand NFATs.

Feedback loops in activating immune cells through Feedback loops in activating immune cells through

NF-AT/AP-1NF-AT/AP-1

Page 102: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

Network Network controlling S controlling S

phase entry in phase entry in response to a response to a proliferative proliferative

signalsignalErk-1 JNK

c-myc

cdc2 cycE cycD1 cdk4 cycD3

e2f-1

rb1

B-myb c-fosc-ets c-jun

_

+

++ ++ +

+

+

+

+ + + +

+

+

+

+

+

+

+

+

+ +

+ +

+

+

+

+

c-Myc B-Myb c-Fosc-Ets c-Jun

cycEcdc2

cycEcdk2

cycD1cdk4

cycD3

cdk4

pRB pRB

erk-1

c-rashtf9a

MEK

RafRas

RanRanBP1

_

p

p

?

ada, odc, ts

Nucleolines

cdc21, cdc46, p1 co-factor

Histones: H1, H2B-143,H3-143

+ +

+

E2F-1

DP-1

Enzymes of nucleotidemetabolism: dhfr, tk, cad

Factors and enzymes of replicationDNA pol , cdc6, ori1

S-phase entry

Page 103: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

1 <===========V$CREB_02(0.85) 2 <=======V$CREB_01(0.82) MMNUCLEO TCTCCCCAC-CACACCAGGAAGTCACCTCTCTCA----------ACCTG---GAGTTATA 225 1 <===========V$CREB_02(0.85) 2 <=======V$CREB_01(0.82) RNNUCIA1 TCTCCCACCACACACCAGGAAGTCACCTCTCTGA----------ACCTG---GAGTTATA 221 1 <===========V$CREB_02(0.85) 2 <=======V$CREB_01(0.82) CSNUCLEO CCTCC-AGCACACACCAGGAAGTCACCTCTCCGAGACCGTCCCCATCAG---GAGTTAAA 229 1 <===============V$TH1E47_01(0.85) HSNUCLEO TGGCCCTGT-GAGGCCAGAAAGTTACTTCTCCGAGGCCAGTTCCCCATGTCTGAGAAATA 229 ** * **** **** ** **** * * *** * * ============================================================================= 1 <==========V$DELTAEF1_01(0.82) MMNUCLEO CCTACCG-CGAGAGGTCACCGACATTACATGGATCGCTTGTGCACTGCTCGTA--CACAC 282 1 <======== ==V$DELTAEF1_01(0.87) RNNUCIA1 CCTACCG-CGTGAGGTCA--GAGATTAAATGGACTGTTTGTGCACTGCTCACA--CACAC 276 1 <======== ==V$DELTAEF1_01(0.84) CSNUCLEO TCTACCG-CGCGAGGTTG--GACATTAAGCGAGCTGTTTGAGCACTGCACACAGGCGCGC 286 1 <========= =V$DELTAEF1_01(0.84) HSNUCLEO TCTCCCAACTTGAGGTTCT-GTGGGGTAGGGGAGGGTTCGTGACTTTCTCACAGAAAACC 288 ** ** * ***** * * * * * * * * * * * ============================================================================= 1 <=======V$NKX25_02(0.84) 2 =========>V$CETS1P54_01(0.87) MMNUCLEO ACACACGCAC------------AACTGCTTTTATTAGGAGCT----CTCAGGAAAGCGGG 326 1 <=======V$NKX25_02(0.84) 2 =========>V$CETS1P54_01(0.87) RNNUCIA1 ACACACGCGCGCGCGCGCGCGAAATTGCTTTTATTAGGAGCT----CTCAGGAAAGTGGT 332 1 =======>V$NKX25_02(0.82) 2 <==========V$DELTAEF1_01(0.81) 3 =========>V$CETS1P54_01(0.84) CSNUCLEO ACACACGCACGC----------AACTGCCTTTATTGGGAGCTGTCTCTCAGGAGAACAGC 336 1 <=======V$NKX25_02(0.83) 2 <==========V$DELTAEF1_01(0.81) 3 =========>V$CETS1P54_01(0.86) HSNUCLEO TCGTACAGACCC-------CGCCACTGCCTTTATTAACAGCT----CTCAGGAGACTGCC 337 * ** * * *** ****** **** ******* * ============================================================================= MMNUCLEO GACTCGCATCA---TAGCCAAG----AAGCCGTTCGCGAC-TCCGCGGAGAACAGGCCGA 378 RNNUCIA1 GGCTCGCATCAGGCTACCACAGCC--AAGAGGACCGCCACCTCTACCGAGGGCAGGCCAA 390 CSNUCLEO GGCCCGCGGCGCAACACTAGAGCCCCGGGATGTTCTCGGC-TCTGCCGAGGGCAG-CCGA 394 HSNUCLEO TGCAGGAGGGGGGTCGCTCCGGCC---CCATGCTCGCGGG-CAAGCAGGGATAAG--CTG 391 * * * * * * * * * ** *

============================================================================= MMNUCLEO GGCCCGCTCATCAGCCCGAGGGAACCCTAGG--CC------TTCCGGCGTTCT------- 423 RNNUCIA1 GGCCCACTAAACGGCCCGAATGAACTCTAGG--CC------TTCCGGCGCTCT------- 435 CSNUCLEO GGCC-GCGAGCTGGCCCCAGTGG-CTCTAGG--CCCTCAACTTCCGGCGCTCTCCGGCTC 450 HSNUCLEO TGCCTCCAAAAGGGCCAACGGGAACTCCGCGGTCCCTGAACTTCCGGTGCTGGAGG---A 448 *** * *** * * * * ** ****** * * ============================================================================= MMNUCLEO -TCAGCAGGACCACGCGGCG---------------------------------------- 442 RNNUCIA1 -CCAGCTCTTCAGCGCGGCGAACGTTCTAGGCCCCTGAGAAGTCCACCGGGAGGCGCAGG 494 CSNUCLEO CTCAGCGGGAACGCGCGGCGAGCAGTTGAGGCCGCCGCGGATTCCAACGGGTTGGGGACG 510 HSNUCLEO CTCCTCGCTCCAGGGCCACCAGGAGCCGCGGC---------------------GTGAGTG 487 * * ** * ============================================================================= MMNUCLEO --------------GGGGGAAA-----GCACCGAGAAACGCCCAGACCACCTGAGCATCG 483 RNNUCIA1 TTTCCGCTACGCGAGGGGGAAA-----TCCCCGAGAAATGCCCAGACCACCTAAGCACAG 549 CSNUCLEO TTCGC----AGCGCGGGGGATGCTCGGGCCACCCACCACCCCCCCACCCCCCCGGCCACG 566 HSNUCLEO CGTGCCGGAACCGAGGGCGGGG-----TCTCTGAGGAACTCCAAGGCTGCCCAAGCCTAC 542 *** * * * ** * ** ** ============================================================================= MMNUCLEO CCGCCC--------ATGCTGCCTCGGAACACCTGAGGGAATCCGGGCCACGCCGCCACCT 535 RNNUCIA1 ACGTCC--------ATGCGGCGTACGGATACCTGAGGGAATCCGGGCCATACCGCCACCT 601 CSNUCLEO AGGCCCGGAGCTCCAGGTAGCAGTGCAGCACTAGGCGGCGTCCGGGCCACGCCGCCCAAT 626 HSNUCLEO GGACCC---------AGCCACATTGGCGAACC----GGAGACCGCCCGATTCCACCACC- 588 ** * * ** ** *** * * ** ** ============================================================================= 1 <=======V$E2F_02(1.00) MMNUCLEO ACCCGCG--CCTCACACACAAGCCGCGCCAAACTCGCCCGTCCCACTGCGCAGGCGTGGG 593 1 <=======V$E2F_02(1.00) RNNUCIA1 ACTCGCG--CCTCACTC--AAGCCGCGCCAAACTCGCGCGTTTCACTGCGCAGGCGTGTA 657 1 <=======V$E2F_02(1.00) CSNUCLEO TCCCCCGAGCCCCTTCCACAAGCCGCGCCAAACGGGTCTG---CACCGCGCAGGCG--GC 681 1 <=======V$E2F_02(1.00) HSNUCLEO -CCCGCGCTCCCCTCAC--AGCCGGCGCCAAAAACGCCAGTCCCACGACGCAGGC----- 640 * * ** ** * * * * ******** * * *** *******

Phylogenetic footprint of promoter regions of nucleolin genes

HSNUCLEO - Homo sapiens;CSNUCLEO - Cricetulus griseus;MMNUCLEO - Mus musculus;RNNUCIA1 – Rattus norvegicus

TFBS identification via pattern search

Page 104: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

A

T

G

C

Page 105: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

A

T

G

C

A

T

G

C

A

T

G

C

1) 2) 3)

Page 106: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

0,65 0,7 0,75 0,8 0,85 0,9 0,95

Kernel

MEME

CONSENSUSGIBBS

0,000

0,200

0,400

0,600

0,800

1,000 Kernel

MEME

CONSENSUS

GIBBS

Table 1. Comparison of 3 programs performing the best for the low levels of value.

Kernel MULTIPROFILER PROJECTION 0,65 0,205 0,208 0,260 0,7 0,165 0,255 0,304

Result of comparison of four different pattern discovery programs on the sets of simulated sequences with implanted TF binding sites for one matrix; y-axis: the averaged sum of squared differences between reveled matrix and the original one; x-axis: values, that are the probabilities of “consensus nucleotide” in each position of the matrix.

Page 107: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

Gradual evolutionby fixation of multiple substitutions (Protein functional centres)

Edited bipolymerby fixation of a small number of substitutions (Protein folding)

Evolution at onceby fixation of single substitutions(Regulatory regions of eukaryoticgenes)

Three mechanisms of biopolymer evolution

Page 108: From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany ake@biobase.de.

Thank you !

www.biobase.de