Gene regulatory network Jin Chen CSE891-001 2012 Fall 1
Slide 2
Outline Transcriptional regulation Co-expression &
co-regulation Bio-techniques ChIP-seq Bacterial one-hybrid system
Computational models for gene regulation network construction
Binding + expressions with TF knock out Binding + time serial gene
expressions 2
Slide 3
3 David J.C. MacKay, Information Theory, Inference &
Learning Algorithms, 2003
Slide 4
Transcriptional regulation Regulation of transcription controls
when transcription occurs and how much RNA is created Transcription
factors are often needed to be bound to a regulatory binding site
to switch a gene on (activator) or to shut off a gene (repressor)
Generally, as the organism grows more sophisticated, their cellular
protein regulation becomes more complicated 4
Slide 5
Transcription control Transcription control is directed
primarily by two elements Transcription factors (TF) DNA sequences
that facilitate the binding of these TFs (cis-regulatory elements)
5
Slide 6
Transcription control First, there needs to be an initiating
signal This signal gives rise to the activation of a TF, and
recruits other members of the "transcription machine." TFs
generally simultaneously bind DNA. TFs and their cofactors, can be
regulated through reversible structural alterations Transcription
is initiated at the promoter site, as an increase in the amount of
an active TF binds a target DNA sequence. Other proteins, known as
"scaffolding proteins" bind other cofactors and hold them in place
Frequently, extracellular signals induce the expression of
immediate early genes. These are in and of themselves TFs or
components thereof, and can further influence gene expression
6
Slide 7
Gene regulation network Node TF or target gene Edge Regulation
relation Directed Activation ---> Inhibition ---| 7 Mata et al.
Genome Biol. 2007;8(10):R217 Yeast
Slide 8
Co-expression & co-regulation Genes belonging to the same
cluster are often called co-expressed Genes with similar expression
patterns might share transcription factors and functional
regulatory binding sites 8
Slide 9
Topic 1. GRN Reconstruction Algorithms for GRN reconstruction
based on gene expression & motif data (2002-2008) Algorithms
focusing on integrating binding data with existing data (2007-now)
Time 2002200820072010 2 input data Time serial microarray TF
binding motifs 3 models Time shift matching Mutual information
Granger causality & DBN 2 limitations High false positive rate
Small scale (# genes~100) 3 input data Time serial microarray TF
binding motifs Binding data 2 models Time shift matching and
binding Protein expression approximation from binding data 2
limitations Lack of binding data at systems level Combinatorial TF
studies Background 9
Slide 10
ChIP-seq ChIP-seq is the sequencing of the genomic DNA
fragments that co-precipitate with a DNA-binding protein that is
under study DNA-binding proteins most frequently investigated in
this way are transcription factors, etc ChIP-seq can identify all
DNA segments in the genome physically associated with a specific
DNA-binding protein It does not rely on prior knowledge of precise
DNA binding sites Liu et al, BMC Biology 2010, 8:56 10
Slide 11
Liu et al. BMC Biology 2010 8:56 Flow scheme of the central
steps in the ChIP-seq procedure 11
Slide 12
A ChIP DNA sample from a stem cell population and the
corresponding input DNA sample were both processed without
amplification
http://www.helicosbio.com/Applications/ChIPSeq/tabid/69/Default.aspx
Example 12 Algorithmic analysis for mapping and peak-calling
Slide 13
Three key steps in ChIP-seq Antibody selection Sequencing
Algorithmic analysis for mapping and peak- calling 13
Slide 14
Binding network First action of a TF is to find and to bind DNA
segments and ChIP-seq allows the binding sites of TFs to be
identified across entire genomes Protein-DNA binding network Direct
downstream targets of any transcription factor can be determined
DNA sequence motif that is recognized by the binding protein can be
computed 14
Slide 15
Example ChIP-seq profiling of 13 TFs in embryonic stem (ES)
cell development revealed the organization of regulatory elements.
This provided insights in the integration of TF-mediated signaling
pathways in ES cell differentiation Chen et al Cell 2008,
133:1106-1117 15
Slide 16
What ChIP-seq cannot do Many observed binding events are
neutral and do not regulate transcription Regulatory binding events
often occur at enhancers that are not proximal to the target gene
that they control The task of identifying transcriptional targets
requires the integration of ChIP-seq with evidence from expression
data to help associate binding events with target gene regulation
16 Honkela et al. PNAS 2010 vol. 107 no. 17 pp 77937798
Slide 17
Bacterial one-hybrid system 17 wikipedia
Slide 18
Gene expression data TF knock-out gene expression TF
over-expression gene expression Differential expression of genes
between wild type and mutant/over-expression is indicative of a
potential regulatory interaction, e.g. Yeast GRN 18 Reimand et al,
Nucleic Acids Research, 2010, Vol. 38, No. 14 pp 47684777
Slide 19
Comprehensive analysis of TF knockout expression data in Yeast
269 TF knockout microarrays, covering almost all yeast TFs
DNAprotein interactions derived from ChIP-chip experiments
Predicted TF binding sites with position weight matrices 19 Hu et
al. Nat. Genet., 39, 683687, 2007; Harbison et al. Nature, 431,
99104. 2004
Slide 20
Comprehensive analysis of TF knockout expression data in Yeast
Checked the expression levels of the TFs Intuitively one expects
the TF under consideration to have lower expression in the mutant
strain compared with the wild type strain confirms this for 155 TFs
78 TFs display a negative fold change at statistically non-
significant levels 36 TFs are lethal 20 Reimand et al, Nucleic
Acids Research, 2010, Vol. 38, No. 14 pp 47684777
Comprehensive analysis of TF knockout expression data in Yeast
Examine functional annotations of differentially expressed genes As
most TFs are considered to regulate distinct cellular processes,
their target genes should be associated with a coherent set of
molecular and biological functions Used g:Profiler to identify GO,
KEGG and Reactome pathway annotations 22 Reimand et al, Nucleic
Acids Research, 2010, Vol. 38, No. 14 pp 47684777
Comprehensive analysis of TF knockout expression data in Yeast
Overlap between TF-binding and TF knockout data Collect binding
sites for 142 TFs, comprising 5,188 ChIP- chip interactions and
17,091 motif predictions Calculate the intersection between the
list of differentially expressed genes from the TF knockout and
targets identified by ChIP-chip or binding-site predictions 2,230
regulation relations 24 Reimand et al, Nucleic Acids Research,
2010, Vol. 38, No. 14 pp 47684777
Slide 25
Comprehensive analysis of TF knockout expression data in Yeast
Include protein-protein interaction information as an additional
perspective to the assessment of GRN TFs that function together may
show significant overlap in their target genes Out of 115 pairs of
physically interacting TFs in the dataset, 92 display such an
overlap TFs tend to regulate genes that interact with each other
Out of 110,487 differentially expressed genes, there are 3,846
pair-wise interactions between co-regulated genes, covering 2,262
genes in total Most TFs target at least one pair of interacting
genes 25 Reimand et al, Nucleic Acids Research, 2010, Vol. 38, No.
14 pp 47684777
Slide 26
Temporal gene expression data A problem with the above approach
is that the creation of mutant strains is challenging or impossible
for many TFs of interest Even when available, mutants may provide
very limited information because of redundancy or due to the
confounding of signal from indirect regulatory feedback Temporal
dynamics: use time serial wild-type gene expression. e.g.
Drosophila GRN 26 Honkela et al. PNAS 2010 vol. 107 no. 17 pp
77937798
Slide 27
Other models for gene regulation network construction
Expression based study Dynamic Bayesian Network Granger causality
TF binding motif based study Weader AlignAce 27
Slide 28
Gene regulation network analysis Transcriptional regulation is
mediated by the combinatorial interplay between cis-regulatory DNA
elements and trans-acting transcription factors, and is perhaps the
most important mechanism for controlling gene expression A
transcriptional regulatory network that integrates such information
can lead to a systems-level understanding of regulatory mechanisms
28 Kim et al. Wiley Interdisciplinary Reviews: Systems Biology and
Medicine. Vol. 3, Iss 1, pp 2135 2011
Slide 29
Discovery of motifs and regulatory modules Binding motif
prediction only based on the knowledge afforded by PWMs often
suffers from high false-positive rates To reduce the false-positive
rate, a number of biological insights have been used evolutionary
pressure placed on these important cis-regulatory elements
co-expression with genes that have well-documented functions and
expression patterns clustering of cis-regulatory features into
cis-regulatory modules 29
Slide 30
Discovery of motifs and regulatory modules Collect the binding
sequences for known TFs and to identify potential binding sites in
unannotated genome sequence position frequency matrix cluster of
individual TF binding sites regulatory relations among 4 genes 30
Modeling of individual transcription factor binding sites and
cis-regulatory modules Kim et al. Wiley Interdisciplinary Reviews:
Systems Biology and Medicine. Vol. 3, Iss 1, pp 2135 2011
Slide 31
Co-regulation A density-based subspace clustering algorithms
for coherent clustering of gene expression data The model allows
Expression profiles of genes in a cluster to follow any
shifting-and-scaling patterns in subspace Expression value changes
across any two conditions of the cluster to be significant
Experimental results show that the algorithm is able to detect a
significant amount of high biological significant clusters missed
by previous models 31