Deciphering Gene Regulatory Networks by in silico approaches Sridhar Hannenhalli Penn Center for...

62
Deciphering Gene Regulatory Networks by in silico approaches Sridhar Hannenhalli Penn Center for Bioinformatics Department of Genetics University of Pennsylvania
  • date post

    20-Jan-2016
  • Category

    Documents

  • view

    212
  • download

    0

Transcript of Deciphering Gene Regulatory Networks by in silico approaches Sridhar Hannenhalli Penn Center for...

Page 1: Deciphering Gene Regulatory Networks by in silico approaches Sridhar Hannenhalli Penn Center for Bioinformatics Department of Genetics University of Pennsylvania.

Deciphering Gene Regulatory Networks by in silico approaches

Sridhar Hannenhalli Penn Center for Bioinformatics

Department of GeneticsUniversity of Pennsylvania

Page 2: Deciphering Gene Regulatory Networks by in silico approaches Sridhar Hannenhalli Penn Center for Bioinformatics Department of Genetics University of Pennsylvania.

Transcriptional RegulationTranscriptional Regulation

TF-DNA binding

Interactions and

Modules

Transcription Start Site

Page 3: Deciphering Gene Regulatory Networks by in silico approaches Sridhar Hannenhalli Penn Center for Bioinformatics Department of Genetics University of Pennsylvania.

Core promoter prediction

TF-DNA binding

TF-TF interactions

Transcriptional Modules

Applications

Overview

Page 4: Deciphering Gene Regulatory Networks by in silico approaches Sridhar Hannenhalli Penn Center for Bioinformatics Department of Genetics University of Pennsylvania.

Core promoter prediction

TF-DNA binding

TF-TF interactions

Transcriptional Modules

Applications

Overview

IdentificationRepresentationDiscovery (motif-discovery)SearchAmbiguity/Redundancy

Page 5: Deciphering Gene Regulatory Networks by in silico approaches Sridhar Hannenhalli Penn Center for Bioinformatics Department of Genetics University of Pennsylvania.

Binding site identification

SELEX

Deletion/Mutation

ChIP-chip

ATACGGT

ATACCGT

ATCGGCA

AAAGGCT

CONSENSUS

A T A S G S T

WEIGHT MATRIX +1.2 0.0 0.96 -1.6 -1.6 -1.6 0.0-1.6 -1.6 0.0 0.59 0.0 0.59 -1.6-1.6 -1.6 -1.6 0.59 0.96 0.59 -1.6-1.6 0.96 -1.6 -1.6 -1.6 -1.6 0.96

Specificity

Page 6: Deciphering Gene Regulatory Networks by in silico approaches Sridhar Hannenhalli Penn Center for Bioinformatics Department of Genetics University of Pennsylvania.

Binding site search

TFs often bind to short and degenerate DNA sequences, leading to false positives

Evolutionary conservation (phylogenetic footprinting/shadowing) can help reduce the false positives

About half of the functional binding sites are not conserved

A combination of evolutionary conservation and binding site score can detects ~70% of the experimentally verified binding sites at a “False Positive” rate of 1/50kb per PWM (Levy and Hannenhalli, Mammalian Genome, 2002)

TRANSFAC/JASPAR PWM

Multi-species conservationHuman genome

Page 7: Deciphering Gene Regulatory Networks by in silico approaches Sridhar Hannenhalli Penn Center for Bioinformatics Department of Genetics University of Pennsylvania.

Non-Independence of binding site positions

Bacteriophage Mnt prefers binding to C, instead of wild-type A, at position 16 when wild-type C at position 17 is changed to other bases. (Man and Stormo, 2001, NAR)

Barash, Elidan, Freidman, Kaplan, 2003, RECOMB

Osada, Zaslavsky and Singh, 2004, Bioinformatics

Page 8: Deciphering Gene Regulatory Networks by in silico approaches Sridhar Hannenhalli Penn Center for Bioinformatics Department of Genetics University of Pennsylvania.

Binding site representation

ATACGGT

ATACCGT

CGCGGCA

CGAGCCT

WEIGHT MATRIX +1.2 0.0 0.96 -1.6 -1.6 -1.6 0.0-1.6 -1.6 0.0 0.59 0.0 0.59 -1.6-1.6 -1.6 -1.6 0.59 0.96 0.59 -1.6-1.6 0.96 -1.6 -1.6 -1.6 -1.6 0.96

Assumption of positional independence

ATACGGT

ATACCGT

CGCGGCA

CGAGCCT

A PSPA or Variable length Markov Model of binding sites is superior to the PWM model

For 95 JASPAR PWMs, PSPAM is better in 48 cases and worse in 6 cases at significant For 95 JASPAR PWMs, PSPAM is better in 48 cases and worse in 6 cases at significant level of 0.05.level of 0.05.

Page 9: Deciphering Gene Regulatory Networks by in silico approaches Sridhar Hannenhalli Penn Center for Bioinformatics Department of Genetics University of Pennsylvania.

Conservation patterns in cis-elements reveal inter-position dependence

Human ……….ACCGTGT……….ACCTTCT…………..Chimp ……….AGCGTGT……….ACCTTGT…………..Mouse ……….TCGGTGA……….TGCTTCT…………..Rat ……….CCCGTGA……….AGCTTGT…………..Dog ……….TCGGTCT……….ACCCTCT…………..

C C G C G G G G G C

Page 10: Deciphering Gene Regulatory Networks by in silico approaches Sridhar Hannenhalli Penn Center for Bioinformatics Department of Genetics University of Pennsylvania.

X Y

1 2 3 N (binding sites)

X Y X Y X Y

Compensatory Mutation SXY = fraction of sites for which Pr(X | Y) > Pr(X)

Pr(X) = probability of X using standard tree Markov process

Pr(X|Y) = probability of X dependent on corresponding Y branches

Scope = |X – Y|

Page 11: Deciphering Gene Regulatory Networks by in silico approaches Sridhar Hannenhalli Penn Center for Bioinformatics Department of Genetics University of Pennsylvania.

Control-1 Randomly select i, j pairs. Control-2 Randomly select i and then select j=i+s. Control-3 constructs PWM Mr with same width as M by randomly sampling columns from the 79 vertebrate PWMs in JASPAR. Control-4 Construct PWM Mr from M by randomly shuffling the compositions at each column (position).

SX,X+1 for 79 vertebrate PWMs from JASPAR

Page 12: Deciphering Gene Regulatory Networks by in silico approaches Sridhar Hannenhalli Penn Center for Bioinformatics Department of Genetics University of Pennsylvania.

SX,X+s decreases with increasing scope s.

However it remains significantly greater than the respective control-4 up to scope = 6

Page 13: Deciphering Gene Regulatory Networks by in silico approaches Sridhar Hannenhalli Penn Center for Bioinformatics Department of Genetics University of Pennsylvania.

Functional relevance of positions with compensatory mutation

Page 14: Deciphering Gene Regulatory Networks by in silico approaches Sridhar Hannenhalli Penn Center for Bioinformatics Department of Genetics University of Pennsylvania.

Evans, Donahue, Hannenhalli, RECOMB-Comparative Genomics 2006

Page 15: Deciphering Gene Regulatory Networks by in silico approaches Sridhar Hannenhalli Penn Center for Bioinformatics Department of Genetics University of Pennsylvania.

Binding site Ambiguity/Redundancy

Several transcription factors have distinct PWMs

Several distinct transcription factors have very similar PWMs

ACCGTGTTTACCGACTTTACCGTGAATACCGTGTTTTCCGTGTTTTCAGTGTTTTCTGTGTTTTCGGTGTTT

PWM

PWM1

PWM2

Page 16: Deciphering Gene Regulatory Networks by in silico approaches Sridhar Hannenhalli Penn Center for Bioinformatics Department of Genetics University of Pennsylvania.

A mixture model allowing an arbitrary number of base PWMA mixture model allowing an arbitrary number of base PWM

∑ ∏= =

=k

j

n

uiujjkki uXMMMX

1 111 ],[),...,,,...,|Pr( λλλ

)},(),...,,{( 11 kk MM λλ

Use EM algorithm to estimate subclasses

We use k=2 base class PWMs (due to lack of data and lack of knowledge of appropriate number of classes)

Given mixture

the probability of observing sequence Xi = (Xi1,…, Xin) is

Enhancing Positional Weight Matrices using Mixture models

Hannenhalli and Wang, Bioinformatics, 2005

Page 17: Deciphering Gene Regulatory Networks by in silico approaches Sridhar Hannenhalli Penn Center for Bioinformatics Department of Genetics University of Pennsylvania.

Based on 64 Vertebrate TF entries in JASPAR databaseBased on 64 Vertebrate TF entries in JASPAR database

0%

10%

20%

30%

40%

50%

60%

70%

80%

At least one PWMmore conserved

Mixture more conserved

Both PWMs moreconserved

4839

23

Sequence conservation of binding sites using Mixture model

Page 18: Deciphering Gene Regulatory Networks by in silico approaches Sridhar Hannenhalli Penn Center for Bioinformatics Department of Genetics University of Pennsylvania.

Subclass Dissimilarity vs Prediction ImprovementSubclass Dissimilarity vs Prediction Improvement

Less dissimilar More dissimilar

Page 19: Deciphering Gene Regulatory Networks by in silico approaches Sridhar Hannenhalli Penn Center for Bioinformatics Department of Genetics University of Pennsylvania.

0%

20%

40%

60%

80%

100%

>=0(64)>=0.8(57) >=1(44)>=1.2(32)>=1.4(20)>=1.6(16)Worse

Better

39 36 30 23 15 13

64 57 44 32 20 16

Relative entropy between two base PWMs

Page 20: Deciphering Gene Regulatory Networks by in silico approaches Sridhar Hannenhalli Penn Center for Bioinformatics Department of Genetics University of Pennsylvania.

Expression Coherence of target genes using mixture modelExpression Coherence of target genes using mixture model

PWM1 PWM2

EC of a set of genes is the fraction of gene-pairs whose expressions across several tissues/conditions are “very” similar

Is the intra-class EC higher than inter-class EC?

In 44 of the 55 (80%) cases, the average expression coherence within subclass-PWM targets was higher than expression coherence of across subclass targets.

In all but one cases (98%) at least one of the two subclass PWMs had a coherence score higher than the cross coherence score.

Hannenhalli and Wang, Bioinformatics, 2005

Page 21: Deciphering Gene Regulatory Networks by in silico approaches Sridhar Hannenhalli Penn Center for Bioinformatics Department of Genetics University of Pennsylvania.

LEU3 Dataset LEU3 Dataset [[Liu et al., Liu et al., 2002]2002]

FFree energy of binding ree energy of binding available for 46 available for 46 observed binding sites of LEU3 [observed binding sites of LEU3 [Liu et al., Liu et al., 20022002]]

TheThe two clusters two clusters from the EM algorithmfrom the EM algorithm have have significantly different binding energiessignificantly different binding energies..

Page 22: Deciphering Gene Regulatory Networks by in silico approaches Sridhar Hannenhalli Penn Center for Bioinformatics Department of Genetics University of Pennsylvania.

ACCGTCTCAAACCGTGTGAAAGCGTGCCCTACGGTGCCCATGGCCGCCGATCGCACTCTTTGCCCCTGCTTGGCCCTCTT

I

II

III

IV

V

HorizontalPartitioning

VerticalPartitioning

Bi-clustering based modeling

ATACGGT

ATACCGT

CGCGGCA

CGAGCCT

ACCGTGTTTACCGACTTTACCGTGAATACCGTGTTTTCCGTGTTTTCAGTGTTTTCTGTGTTTTCGGTGTTT

Vertical partitioning

Horizontal partitioning

Page 23: Deciphering Gene Regulatory Networks by in silico approaches Sridhar Hannenhalli Penn Center for Bioinformatics Department of Genetics University of Pennsylvania.

X

YX

Z X

Context-dependent binding specificity

Page 24: Deciphering Gene Regulatory Networks by in silico approaches Sridhar Hannenhalli Penn Center for Bioinformatics Department of Genetics University of Pennsylvania.

Binding site Ambiguity/Redundancy

Several transcription factors have distinct PWMs

Several distinct transcription factors have very similar PWMs

Page 25: Deciphering Gene Regulatory Networks by in silico approaches Sridhar Hannenhalli Penn Center for Bioinformatics Department of Genetics University of Pennsylvania.

TESS

Page 26: Deciphering Gene Regulatory Networks by in silico approaches Sridhar Hannenhalli Penn Center for Bioinformatics Department of Genetics University of Pennsylvania.

+1.2 0.0 0.96 -1.6 -1.6 -1.6 0.0-1.6 -1.6 0.0 0.59 0.0 0.59 -1.6-1.6 -1.6 -1.6 0.59 0.96 0.59 -1.6-1.6 0.96 -1.6 -1.6 -1.6 -1.6 0.96

+1.2 0.0 0.96 -1.6 -1.6 -1.6 0.0-1.6 -1.6 0.0 0.59 0.0 0.59 -1.6-1.6 -1.6 -1.6 0.59 0.96 0.59 -1.6-1.6 0.96 -1.6 -1.6 -1.6 -1.6 0.96

32 Class

80 Family

117 Subfamily

1034 factors

Page 27: Deciphering Gene Regulatory Networks by in silico approaches Sridhar Hannenhalli Penn Center for Bioinformatics Department of Genetics University of Pennsylvania.

DNA Binding Domain

Interaction Domain

Conserved DBD

Redundantparalogs Divergent

Promoter

Divergent nDBD

Once upon a time a transcription factor gene was duplicated

Promoter

Divergent Expression

Page 28: Deciphering Gene Regulatory Networks by in silico approaches Sridhar Hannenhalli Penn Center for Bioinformatics Department of Genetics University of Pennsylvania.

Hypothesis: Homologous TF-pairs with similar DBD have diverged in expression.

Control: Homologous nonTF-pairsHomologous TF-pairs with dissimilar DBD

T1

D(X,Y) = |EX – EY|

T158

TF X

TF Y

Ti

Page 29: Deciphering Gene Regulatory Networks by in silico approaches Sridhar Hannenhalli Penn Center for Bioinformatics Department of Genetics University of Pennsylvania.

Homologous TFs with Similar vs Non-Similar Binding in a Human Thyroid

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2

Expression divergence

Homologous TFs with Similar Binding

Homologous TFs with Non-Similar Binding

416 homologous TF-pairs (BLAST E-value <= E-10)125 with similar binding (p-value <= 0.02)

In thyroid tissue the hypothesis holds (Mann-Whitney p-value = 0.00156)

TFs with similar binding are more similar overall. Thus a greater expression divergence is surprising.

Page 30: Deciphering Gene Regulatory Networks by in silico approaches Sridhar Hannenhalli Penn Center for Bioinformatics Department of Genetics University of Pennsylvania.

In Yeast, 219 homologous TFs, 35 with similar bindingIn a total of 57 samples (Spellman)

In Human, 416 homologous TFs, 125 with similar bindingIn a total of 158 samples (Novartis)

p-value Number of Yeast Samples

0.1 49.1% (28)

0.05 33.3% (19)

0.01 1.8% (1)

p-value Number of Human Tissues – MW test

0.1 91.7% (145)

0.05 87.3% (138)

0.01 74.7% (118)

Page 31: Deciphering Gene Regulatory Networks by in silico approaches Sridhar Hannenhalli Penn Center for Bioinformatics Department of Genetics University of Pennsylvania.

Core promoter prediction

TF-DNA binding

TF-TF interactions

Transcriptional Modules

Applications

Overview

Page 32: Deciphering Gene Regulatory Networks by in silico approaches Sridhar Hannenhalli Penn Center for Bioinformatics Department of Genetics University of Pennsylvania.

Transcription Factor cooperation/interaction

Expression Coherence

Pilpel et al. (2001). Nat Genet,

Banerjee and Zhang (2003) NAR

Positional Coherence

Hannenhalli and Levy (2002). NAR.

Interaction-dependent binding

Page 33: Deciphering Gene Regulatory Networks by in silico approaches Sridhar Hannenhalli Penn Center for Bioinformatics Department of Genetics University of Pennsylvania.

Interaction-dependent binding

Transcription Factor F

ChIP-chip Set of gene promotersbound by F

DNA binding motif M of F

Bound promoters (P)

Unbound promoters (B)

Can M discriminate between

P and B?

The answer is NO for a large fraction of transcription factors

Perhaps binding of F depends (synergistic or antagonistic) on other motifs

Page 34: Deciphering Gene Regulatory Networks by in silico approaches Sridhar Hannenhalli Penn Center for Bioinformatics Department of Genetics University of Pennsylvania.

ijRk

ikjkijjjij

j

xbxaY εμ +++= ∑∈

Wang, Jensen, Hannenhalli RECOMB-Regulation 2005

The ChIP-chip data for a majority of TFs is better explained using interaction-dependent binding.

Almost all of the Yeast cell cycle interactions were detected at 10% prediction rate

When applied to genome-wide CREB binding in rat, 15 of the 18 detected interactions have varying degree of support.

PWM based

occupancy probabilit

y

Binding probability (ChIP)

PWM based

occupancy probability

Interaction coefficient

Page 35: Deciphering Gene Regulatory Networks by in silico approaches Sridhar Hannenhalli Penn Center for Bioinformatics Department of Genetics University of Pennsylvania.

Core promoter prediction

TF-DNA binding

TF-TF interactions

Transcriptional Modules

Applications

Overview

Page 36: Deciphering Gene Regulatory Networks by in silico approaches Sridhar Hannenhalli Penn Center for Bioinformatics Department of Genetics University of Pennsylvania.

Co-regulated genes have common binding sites in their Co-regulated genes have common binding sites in their promoterspromoters

BCL2-antagonist(BAD)

B-cell CLL/lymphoma 2(BCL2)

Apoptosis Pathway

AP-2, CREB, E2F, cMyc, NF-Kappa-b, c-ETS, Egr-1 etc.

68 TFs

89 TFs

37 TFs in common

Hypergeometric p-val = E-11

374

89 3768

Page 37: Deciphering Gene Regulatory Networks by in silico approaches Sridhar Hannenhalli Penn Center for Bioinformatics Department of Genetics University of Pennsylvania.

Interacting proteins have greater similarity in their Interacting proteins have greater similarity in their promoter regionspromoter regions

Hannenhalli and Levy (2003). Mamm Genome

Page 38: Deciphering Gene Regulatory Networks by in silico approaches Sridhar Hannenhalli Penn Center for Bioinformatics Department of Genetics University of Pennsylvania.

Transcriptional module discovery

Singular Value Decomposition

1 1 1 0 0 11 1 0 0 1 01 1 1 0 0 00 0 0 1 0 1

1 0 0 0 1 11 0 1 1 0 10 1 0 1 0 01 0 1 0 0 0

Distance Matrix K-means Clustering

Gen

esTFs

Cluster of genesand discriminating TF

Clique enumeration in bipartite graphs

Genes

TFs

Page 39: Deciphering Gene Regulatory Networks by in silico approaches Sridhar Hannenhalli Penn Center for Bioinformatics Department of Genetics University of Pennsylvania.

Gene

Tissue

Tissue-Specific Transcriptional Tissue-Specific Transcriptional ModuleModule

TFTissue

Binding predictionTissue specificityby expression level[Schug et al 2005]

Transcriptional-Modulespecific to a tissue type

Everett, Wang, Hannenhalli, ISMB 2006

Page 40: Deciphering Gene Regulatory Networks by in silico approaches Sridhar Hannenhalli Penn Center for Bioinformatics Department of Genetics University of Pennsylvania.

Core promoter prediction

TF-DNA binding

TF-TF interactions

Transcriptional Modules

Applications

Overview

Page 41: Deciphering Gene Regulatory Networks by in silico approaches Sridhar Hannenhalli Penn Center for Bioinformatics Department of Genetics University of Pennsylvania.

Transcriptional Regulation in Cardiac Myocytes

Frey N, Olson EN. Annu Rev Physiol. 2003;65:45-79.

Page 42: Deciphering Gene Regulatory Networks by in silico approaches Sridhar Hannenhalli Penn Center for Bioinformatics Department of Genetics University of Pennsylvania.

Large tissue bank from Temple and PennLarge tissue bank from Temple and Penn Failing explanted hearts (n=173) Failing explanted hearts (n=173) Non-failing hearts from unused donors (n=16)Non-failing hearts from unused donors (n=16) Each hybridized with an HU133A (n=189)Each hybridized with an HU133A (n=189) Conservative analysis: RMA (bioconductor), SAM Conservative analysis: RMA (bioconductor), SAM

Expression profiling in advanced heart failureExpression profiling in advanced heart failure

~3000 dysregulated genes in advanced human HF with FDR < 5%.

Is there any evidence that specific transcription factors are directing these changes?

Page 43: Deciphering Gene Regulatory Networks by in silico approaches Sridhar Hannenhalli Penn Center for Bioinformatics Department of Genetics University of Pennsylvania.

Set of transcripts representedon array (~20 -40K)

Genomic sequences -5kb promoter regions

TF binding site annotation for all transcripts

TF targets altered in disease

Refseq

Transfac / Human -Mouse conservation

TF binding sites over-represented in diseaseExpression Data

(diseased and control)

Annotation

Analysis

Set of transcripts representedon array (~20 -40K)

Genomic sequences -5kb promoter regions

TF binding site annotation for all transcripts

TF targets altered in disease

Refseq

Transfac / Human -Mouse conservation

TF binding sites over-represented in diseaseExpression Data

(diseased and control)

Annotation

Analysis

Transcriptional Genomics

Page 44: Deciphering Gene Regulatory Networks by in silico approaches Sridhar Hannenhalli Penn Center for Bioinformatics Department of Genetics University of Pennsylvania.

Differentially expressed Genes (G)

Background Set (B) Statistical Significance is computed using 1000 random sampling of genes from background set

Score(x) = freq(x) in G / freq(x) in B

Page 45: Deciphering Gene Regulatory Networks by in silico approaches Sridhar Hannenhalli Penn Center for Bioinformatics Department of Genetics University of Pennsylvania.

TRANSFAC ID Fold enrichment p-value FactorM00471 1.70 0.000 TBPM00318 1.63 0.001 Lentiviral_Poly_AM00062 1.52 0.000 IRF-1M00138 1.50 0.004 OctamerM00291 1.48 0.000 Freac-3M00403 1.48 0.001 aMEF-2M00103 1.48 0.000 CloxM00216 1.47 0.000 TATAM01000 1.46 0.001 AIREM00109 1.46 0.000 C/EBPbetaM00405 1.45 0.001 MEF-2M00451 1.45 0.004 NKX3AM00972 1.44 0.001 IRFM00249 1.43 0.002 CHOP:C/EBPalphaM00102 1.43 0.002 CDPM00302 1.43 0.000 NF-ATM00729 1.42 0.003 Cdx-2M00622 1.41 0.001 C/EBPgammaM00078 1.41 0.005 Evi-1M00407 1.40 0.003 RSRFC4M00616 1.39 0.004 AFP1M00310 1.35 0.000 APOLYAM00770 1.35 0.002 C/EBPM00485 1.34 0.002 Nkx2-2M00432 1.34 0.004 TTF1M00346 1.34 0.002 GATA-1M00478 1.34 0.003 Cdc5M00724 1.33 0.005 HNF-3alphaM00699 1.32 0.002 ICSBPM00394 1.31 0.002 Msx-1M00088 1.28 0.005 Ik-3M00238 1.27 0.005 Barbie_Box

Transcription Factors enriched in differentially up-regulated genes

Page 46: Deciphering Gene Regulatory Networks by in silico approaches Sridhar Hannenhalli Penn Center for Bioinformatics Department of Genetics University of Pennsylvania.

The differentially upregulated genes have a greater number The differentially upregulated genes have a greater number (32) of enriched TFs compared to downregulated genes (6).(32) of enriched TFs compared to downregulated genes (6).

The ischemic and idiopathic cases are consistentThe ischemic and idiopathic cases are consistent

Validation of GATA, MEF2, NKx, NFAT transcription factors in Validation of GATA, MEF2, NKx, NFAT transcription factors in human heart failurehuman heart failure

Potential role for FOX factors and IRFPotential role for FOX factors and IRF

What about early events?What about early events?

Mice with infarcts and sham operated controls sacrificed at varying times after surgery (1, 4, 8, 24 hrs, 8 wks)

Analysis of differentially co-regulated gene clusters reveal consistent set of transcription factors.

Page 47: Deciphering Gene Regulatory Networks by in silico approaches Sridhar Hannenhalli Penn Center for Bioinformatics Department of Genetics University of Pennsylvania.

FOX factor SummaryFOX factor Summary

FOX targets change substantially in advanced human FOX targets change substantially in advanced human HF and in early HF in mice.HF and in early HF in mice.

FOX factors are present in human heart at FOX factors are present in human heart at physiologic levels: FOXP1, P4, C1, C2, J2physiologic levels: FOXP1, P4, C1, C2, J2

FOXP1 is localized to nuclei of human cardiac FOXP1 is localized to nuclei of human cardiac myocytes.myocytes.

Do FOX factors mediate cardiac hypertrophy?Do FOX factors mediate cardiac hypertrophy?

Hannenhalli et al. Circulation, 2006

Page 48: Deciphering Gene Regulatory Networks by in silico approaches Sridhar Hannenhalli Penn Center for Bioinformatics Department of Genetics University of Pennsylvania.

Naïve (N)

Conditioned Stimulus only (CS)

Fear Conditioned (FC)

Gene Regulation in Learning and Memory

Hippocampus

Amygdala

Keeley et al. Memory and Learning, 2006

Page 49: Deciphering Gene Regulatory Networks by in silico approaches Sridhar Hannenhalli Penn Center for Bioinformatics Department of Genetics University of Pennsylvania.

Immediate Early Gene Expression is Immediate Early Gene Expression is Regulated by Many Transcription FactorsRegulated by Many Transcription Factors

http://web1.tch.harvard.edu/research/greenberg/oldsite/Pathways.html

Page 50: Deciphering Gene Regulatory Networks by in silico approaches Sridhar Hannenhalli Penn Center for Bioinformatics Department of Genetics University of Pennsylvania.

50 Most Significantly Regulated Genes 50 Most Significantly Regulated Genes were Used for Further Analysiswere Used for Further Analysis

rank Symbol Molecular Role N (log2) CS vs N (%) FC vs N (%)

1 Fos DNA-binding transcription factor 5.4 134 1532 Ssty1 unknown 3.9 -22 -243 Ssty2 unknown 4.8 -27 -314 Dusp1 Phosphatase 7.4 33 365 Cd84 cell adhesion 7.6 -30 -336 Pura DNA-binding transcription factor 7.7 41 337 Nr4a1 DNA-binding transcription factor 7.2 33 408 Egr1 DNA-binding transcription factor 8.4 27 309 Cacna2d1 voltage dependent calcium channel 6.7 31 34

10 Junb DNA-binding transcription factor 7.6 20 24

rank Symbol Molecular Role N (log2) CS vs N (%) FC vs N (%)

1 Junb DNA-binding transcription factor 7.11 32 552 Fos DNA-binding transcription factor 5.26 123 2003 Nr4a1 DNA-binding transcription factor 6.44 28 434 Ier2 unknown 3.97 30 385 ly6e unknown 6.47 17 216 Stk19 serine/threonine kinase 6.61 14 147 Gadd45g upstream activator of p38 and JNK MAPKs 6.18 18 268 Egr1 DNA-binding transcription factor 7.98 32 459 Aaas nuclear pore/adapter 5.42 21 2510 Mlf2 unknown 9.54 13 18

Hippocampus

Amygdala

Page 51: Deciphering Gene Regulatory Networks by in silico approaches Sridhar Hannenhalli Penn Center for Bioinformatics Department of Genetics University of Pennsylvania.

Hippocampus- and Amygdala-specific Hippocampus- and Amygdala-specific promoter modeling promoter modeling

Hippocampus: Hippocampus:

CREB, E2F1, Pax4, Sp1, GATA1, AP2, ZF5, CREB, E2F1, Pax4, Sp1, GATA1, AP2, ZF5, Nrf-1 Nrf-1

Amygdala: Amygdala:

CREB, E2F1, Pax4, Sp1, GATA1, AP2, ZF5, CREB, E2F1, Pax4, Sp1, GATA1, AP2, ZF5, Ets1, Elk1, Ets1, Elk1, Myc/Max, USFMyc/Max, USF

Page 52: Deciphering Gene Regulatory Networks by in silico approaches Sridhar Hannenhalli Penn Center for Bioinformatics Department of Genetics University of Pennsylvania.

Promoter models were able to predict Promoter models were able to predict regulation of less significant genes with regulation of less significant genes with

some system specificity some system specificity

A. Genes Predicted by Hippocampus Promoter Model

0%

2%

4%

6%

8%

Hippocampus Amygdala

Tissue Examined

Average Change FC vs N

B. Genes Predicted by Amygdala Promoter Model

0%

2%

4%

6%

8%

Hippocampus Amygdala

Tissue Examined

Average Change FC vs N

Page 53: Deciphering Gene Regulatory Networks by in silico approaches Sridhar Hannenhalli Penn Center for Bioinformatics Department of Genetics University of Pennsylvania.

Core promoter prediction

TF-DNA binding

TF-TF interactions

Transcriptional Modules

Applications

Overview

Page 54: Deciphering Gene Regulatory Networks by in silico approaches Sridhar Hannenhalli Penn Center for Bioinformatics Department of Genetics University of Pennsylvania.

Core Promoter : Minimal DNA sequence required for the assembly of the Pre-initiation complex (~100 bps flanking the TSS)

Goal : Determine sequence properties responsible for precise Pol-II localiazation

Page 55: Deciphering Gene Regulatory Networks by in silico approaches Sridhar Hannenhalli Penn Center for Bioinformatics Department of Genetics University of Pennsylvania.

1990 1995 2000 2006

TATA

PromoterScan

Promoter1.0

Autogene

PromFind

TSSG

Calverie

NNPP

CorePromoter

PromoterInspector

Hannenhalli

FirstEF

Dragon

PSPA

CpG island line

Page 56: Deciphering Gene Regulatory Networks by in silico approaches Sridhar Hannenhalli Penn Center for Bioinformatics Department of Genetics University of Pennsylvania.

CpG Islands

Unmethylated GC-rich regions (experimental)

GC-rich regions ( 200 bp) on the genome with high CG di-nucleotide frequency (computational)

6.05.0 ≥≥+GC

CGGC ff

fANDf

Gardiner-Garden and Frommer, 1987

About half of all genes have a CpG island overlapping the first exon.

Antequera and Bird, 1993

Page 57: Deciphering Gene Regulatory Networks by in silico approaches Sridhar Hannenhalli Penn Center for Bioinformatics Department of Genetics University of Pennsylvania.

Long range sequence Characteristics(10kb)

Short genomicSub regional signal,

eg. CpG island(0.5~2kb) Specific cis elements (eg. TATA)

Categories of DNA sequence “signals” used in promoter prediction

TSS

Generalization of Markov Models

Wang and Hannenhalli, BMC BI, 2005

Page 58: Deciphering Gene Regulatory Networks by in silico approaches Sridhar Hannenhalli Penn Center for Bioinformatics Department of Genetics University of Pennsylvania.

Position Specific Propensity Analysis (PSPA)Position Specific Propensity Analysis (PSPA)

PSPA based Model

Use +-100bp around TSS as training

Wang and Hannenhalli, BBRC, 2006

Page 59: Deciphering Gene Regulatory Networks by in silico approaches Sridhar Hannenhalli Penn Center for Bioinformatics Department of Genetics University of Pennsylvania.

Overlap between prediction toolsOverlap between prediction tools

Page 60: Deciphering Gene Regulatory Networks by in silico approaches Sridhar Hannenhalli Penn Center for Bioinformatics Department of Genetics University of Pennsylvania.

Carninci et al. (2006). "Genome-wide analysis of mammalian promoter architecture and evolution." Nat Genet 38(6): 626-635.

Page 61: Deciphering Gene Regulatory Networks by in silico approaches Sridhar Hannenhalli Penn Center for Bioinformatics Department of Genetics University of Pennsylvania.

CpG poor promoters have greater conservation and CpG poor promoters have greater conservation and fewer aTSS and mostly involved in extra-cellular and fewer aTSS and mostly involved in extra-cellular and stress-response activities.stress-response activities.

By including position specific motifs and their co-By including position specific motifs and their co-occurrence, PSPA improves the Transcription Start site occurrence, PSPA improves the Transcription Start site localization.localization.

Many Position Specific elements are associated with Many Position Specific elements are associated with target gene function.target gene function.

There is little overlap among various state-of-the-art There is little overlap among various state-of-the-art prediction tools.prediction tools.

Alternative promoters have tissue specific usageAlternative promoters have tissue specific usage

Page 62: Deciphering Gene Regulatory Networks by in silico approaches Sridhar Hannenhalli Penn Center for Bioinformatics Department of Genetics University of Pennsylvania.

Acknowledgement

Junwen Wang PCBI, UPennLarry Singh PCBI, UPennLi-San Wang Biology, UPennShane Jensen Statistics, Wharton, UPenn

Perry EvansGreg Donahue Genomics and Comp Bio, Upenn

Tom Cappola Cardiology, UPenn

Mike Keeley Biology, UpennTed Abel Biology, Upenn