Regulatory Genomics Lecture 2 November 2012 Yitzhak (Tzachi) Pilpel 1.

Post on 11-Jan-2016

219 views 1 download

Tags:

Transcript of Regulatory Genomics Lecture 2 November 2012 Yitzhak (Tzachi) Pilpel 1.

Regulatory Genomics

Lecture 2 November 2012

Yitzhak (Tzachi) Pilpel

Lecture 2 November 2012

Yitzhak (Tzachi) Pilpel1

Course requirements

• Attendance and participation

• Two reading assignments

• A final take home papers reading-based exam

• website

No meeting next week on Nov 15th

2

Expression regulation of genes determines complex spatio-temporal patterns

3

Monitor expression during

cell cycle

0 5 10 15-2

-1

0

1

2

3

4

Time

mR

NA

exp

ress

ion

leve

l

G1 S G2 M G1 S G2 M 4

Time-point 1

Tim

e-po

int 3

Tim

e-po

int 2

-1.8

-1.3

-0.8

-0.3

0.2

0.7

1.2

1 2 3

-2

-1.5

-1

-0.5

0

0.5

1

1.5

1 2 3

-1.5

-1

-0.5

0

0.5

1

1.5

1 2 3

Time -pointTime -point

Time -point

Normalized

Expression

Normalized

Expression

Normalized

Expression

Genes can be clustered based on time-dependent expression profilesGenes can be clustered based on time-dependent expression profiles

5

The K-means algorithm

• Start with random positions of centroids.

Iteration = 0

6

K-means

• Start with random positions of centroids.

• Assign data points to centroids

Iteration = 1

7

K-means

• Start with random positions of centroids.

• Assign data points to centroids.

• Move centroids to center of assigned points.

Iteration = 1

8

K-means

• Start with random positions of centroids.

• Assign data points to centroids.

• Move centroids to center of assigned points.

• Iterate till minimal cost. Iteration = 3

9

An expression cluster

1D and 2D clustering of gene expression data

Hierarchical clustering

How to join sets?

f

e dc

b

a

How to measure a distance between expression profiles?

14

Gene x

Gen

e y

t1

t2t3

Gene x

Gen

e y

t4t5

Clustering the data

http://home.dei.polimi.it/matteucc/Clustering/tutorial_html/AppletH.html

http://home.dei.polimi.it/matteucc/Clustering/tutorial_html/AppletKM.html

Try these two applets at home (needs java)

The common distance matrices

16

Promoter Motifs and expression

profilesCGGCCCCGCGGA

CTCCTCCCCCCCTTC TGGCCAATCA

ATGTACGGGTG

17

Formaldehyde crosslinks living yeast cells

Binding site

TFBinding site Binding site

Inside the yeast nucleus:

ChIP - chromatin immunoprecipitation

Reversal of the crosslinks to separate DNA segments from proteins,and fluorescence labeling of each pool separately

(enriched DNA)

hybridization to DNA array of all yeast intergenic sequences

(unenriched DNA)

TF

= epitope tag

= TF of interest

Harvest and sonicate; results in DNA fragments(some of which are bound to proteins)

18

P-value 0.0535,365 interactions

P-value 0.0112,040 interactions

P-value 0.0058,190 interactions

P-value 0.0013,985 interactions

P-value, or confidence level, for each spot in array

The total number of protein-DNA interactions in the location analysis data set, using a range of P value thresholds:

A P-value was selected which minimizes false positives, at the expense of gaining false negatives. P-value 0.001

19

Genome-wide Distribution of Transcriptional Regulators

• Promoter regions of 2343 of 6270 yeast genes (37%) were bound by 1 or more of the 106 transcriptional regulators (P=.001)

Avg.: regulator binds 38 promoter regions

At P= 0.001, significantly more intergenic regions bind 4 or more regulators than expected by chance

20

Network Motifs

21

Network Motifs

22

Network Motifs in the Yeast Regulatory Network

-Based on algorithmic analyses performed in Matlab; http://jura.wi.mit.edu/cgi-bin/young_public/navframe.cgi?s=17&f=networkmotif

103

49

90 81

188

23

Protein

Gene

The Cell Cycle Transcriptional Regulatory Network:

Various stages of cell cycle

Blue boxes represent sets of genes bound by a common set of regulators.

Each box is positioned according to the time of peak expression levels for the genes represented by the box.

Ovals represent regulators, connected to genes they regulate

Length of arc defines the period of activity of that regulator24

Network of Transcriptional Regulators Binding to Genes Encoding Other Transcriptional Regulators

25

Network of Transcriptional Regulators Binding to Genes Encoding Other Transcriptional Regulators

26

Network of Transcriptional Regulators Binding to Genes Encoding Other Transcriptional Regulators

27

DNA mRNA Protein

Inactive DNA

The Central Dogma of Molecular BiologyExpressing the genome

RNA

28

Translation consists of initiation, elongation and termination

5’5’ 3’3’STOPSTOP

Codon

Anti-codon

29

The ribosome attachment site determines initiation rate

E. coli

Yeast

30

A consensus for S. cerevisiae ribosome attachment sites?

position relative to ATG

100%

0%

sequenceHow good is it as a

“ribosomal attachment site” ?

ribosomal attachmentsite score

31

5’ 3’

CTGCGC

GCG

GCGGCG

GCG

GCG

GCGGCG

CAGGCG

32

Rank

ribosomal attachmentsite score

The sequence adaptation score of proteins in yeast

CRP

good score

bad score

33

Multiple codons for the same amino acid

C1 C2 C3 C4 C5 C6Serine: UCU UCC UCA UCG AGC AGUCysteine: UGU UGCMethionine: UGG

STOP: UAA, UAG UGA

C1 C2 C3 C4 C5 C6Serine: UCU UCC UCA UCG AGC AGUCysteine: UGU UGCMethionine: UGG

STOP: UAA, UAG UGA

34

G T R Y E C Q A S F D

C1C1C1C1C1C1C1C1C1C1C1C2C2C2C2C2C2C2C2C2C2C2C1C1C2C1C1C2C1C1C2C1C1C2C2C2C2C1C1C1C1C1C1C1C1C1C1C1C1C1C1C2C2C2C2

For a hypothetical protein of 300 amino acids with two-codon each, There are 2^300 possible nucleotide sequences

These variants will code for the same protein, and are thus considered “synonymous”.

Indeed evolution would easily exchange between them

These variants will code for the same protein, and are thus considered “synonymous”.

Indeed evolution would easily exchange between themBut are they all really equivalent??

35

Selection of codons might affect:AccuracyThroughput

CostsFolding

RNA-structure

36

in

jijiji tRNAsW

1

)1(

Wi/Wmax if Wi0wi = wmean else{

tAIg wikk1

g

1/g

dos Reis et al. NAR 2004

The tRNA Adaptation Index (tAI)

ATC CCA AAA TCG AAT … ……

A simple model for translation efficiency

Wobble InteractionWobble Interaction

37

Supply demand and charging

38

How the RNA structure influences translation?

?

39

No correlation between CAI and protein expression

Positive correlation between structure’s energy and expression

The 5’ window needs to be un-folded for high expression

Pro

tein

ab

unda

nce

Pro

tein

ab

unda

nce

Conclusions from synthetic library

40

Formaldehyde crosslinks living yeast cells

Binding site

TFBinding site Binding site

Inside the yeast nucleus:

ChIP - chromatin immunoprecipitation

Reversal of the crosslinks to separate DNA segments from proteins,and fluorescence labeling of each pool separately

(enriched DNA)

hybridization to DNA array of all yeast intergenic sequences

(unenriched DNA)

TF

= epitope tag

= TF of interest

Harvest and sonicate; results in DNA fragments(some of which are bound to proteins)

41

A genome-wide method to measure translation efficiency

(Ingolia Science 2009)

42

Translational response to starvation

43

DNA mRNA Protein

Inactive DNA

The Central Dogma of Molecular BiologyExpressing the genome

RNA

44

mRNA abundance

Option 1 Option 2 Option 3 Option 4

Production

degradation

45

Relationship between gene expression levels and mRNA decay rates across genes.

A study in human population examined decay and steady-state mRNA level variation across people.Found strong negative or positive correlations between mRNA level and decay rates.Fast responding genes show “discordant” relation suggesting that increased expression is often accompanied by increased decay rate

The various phases are coupled

47

At the hardware level (post-transcription: RNA binding proteins)

G1 1 1 1 0

G2 1 0 0 1

G3 0 1 1 1 48

At the hardware level (post-transcription: microRNA)

G1 1 1 1 0

G2 1 0 0 1

G3 0 1 1 1

RISC RISC RISC RISC

49

Yang CGFR 16:397, 2005

50

Computational approaches to find microRNA genes

• MiRscan (Lim, et al. 2003)– Scan to find conserved

hairpin structures in both C. elegans and C. briggsae.

– Using known microRNA genes (50) as training set.

51

What is the effect of over expression of a miR?

52

53

None-Coding RNAs are often co- targeted with their own targets for various cellular needs

miR-124 decreases similarly the abundance and translation of mRNA targets

54

microRNA expression profiles classify microRNA expression profiles classify human cancershuman cancers

Lu et al. Nature 435: 834, 2005Samples (patients)

miR

s

55

Gene expression is noisy

56

Fluorescence distribution shapes

57

The cell intrinsic and extrinsic contributions to noise

58

DNA

RNA

Protein

Regulationby transcription

factors

RNA Polymerase

RibosomeExtrinsic

IntrinsicChromatin

remodeling

Transcription process

Translation process

Φ

Protein degradation

The actual intrinsic and extrinsic sources of noise:Extrinsic – variation in copy numbers of molecules

among cells; Intrinsic: stochastic events

59

A theoretical approach

60

DNA mRNA Protein

The ratio of transcription to translation should affect noise

61

Transcription bursts should affect noise

62

Can noise be useful?

The native net shows longer and more duration-diverse competence periods

Native networks does better on a wider range of extracellular [DNA]

The trade-off:High competence allows finding solutions, but reduces growth rate

Questions about noise

• What are the sources of noise?

• How is noise regulated in cells

• How is it tolerated by the biological systems that need to be noise free?

• When is noise advantageous /deleterious/ neutral?

66