Many similarly expressed genes are coregulated by the same transcription factor(s) …

Therefore, can search promoters of coregulated genes for binding sites

Genes induced by carbon starvation

ORFsUpstream regionGenes induced by carbon starvation

ORFsUpstream region

Similar sequence found in most upstream regions(here = CCAAT which = Hap4p binding site)

Genes induced by carbon starvation

Finding sequence motifs common to a group of ‘similar’ sequences

ORFsUpstream region

Similar sequence found in most upstream regions

How do you identify motifs in sequence data?

How can you tell if the identified motif is ‘significant’?

How do you find genomic examples of the identified motif? 4

A G A T G G A T G GT G A T T G A T G T T G A T G G A T G GA G A T T G A T C G T G A T G G A T T G T G A T G G A T T G A G A T G G A T T G

IUPAC consensus: W G A T G G A T N G

Site 1Site 2Site 3Site 4Site 5Site 6Site 7

(where W = A or T)

First, representation of motifs: Position-specific Weight Matrices (PWMsaka Position-Specific Scoring Matrix, PSSM)

G 0 1.0 0 0 0.7 1.0 0 0 0.4 0.8

A 0.4 0 1.0 0 0 0 1.0 0 0 0T 0.6 0 0 1.0 0.3 0 0 1.0 0.4 0.2C 0 0 0 0 0 0 0 0 0.2 0

PWM represents frequencies of each base at each position in the motif *

* These days, PWM/PSSM can correspond to the frequency matrix or a likelihood matrix

First, representation of motifs: Position-specific Weight Matrices (PWMsaka Position-Specific Scoring Matrix, PSSM)

Web-logo: A graphical representation of PWMs

http://weblogo.berkeley.edu/

Height of the base proportional to frequency of base on that position …more specifically known as “bits” , “information content” , or “entropy”

Information content IC

The least variable positions likely are important for specifying the protein-DNA interactionTherefore high information content = low sequence variation at that position.

If using log2, the info content is in ‘bits’

ICi = 2 + Pb(i) * log2(Pb(i) )b=G,A,T,C

Information Content at position i:

Where Pb(i) is the probability of base b at position i

G 0 1.0 0 0 0.7 1.0 0 0 0.4 0.8

A 0.4 0 1.0 0 0 0 1.0 0 0 0T 0.6 0 0 1.0 0.3 0 0 1.0 0.4 0.2C 0 0 0 0 0 0 0 0 0.2 0

If using log2, the info content is in ‘bits’

ICi = 2 + Pb(i) * log2(Pb(i) )b=G,A,T,C

Information Content at position i:

Where Pb(i) is the probability of base b at position i

Maximum IC if P of some base is 1.0: = 2 + [ (1.0 * 0) + 0 + 0 + 0 ] = 2

Minimum IC if P is 0.25 for all bases: = 2 + [0.25(-2) ] * 4 = 0

G 0 1.0 0 0 0.7 1.0 0 0 0.4 0.8

A 0.4 0 1.0 0 0 0 1.0 0 0 0T 0.6 0 0 1.0 0.3 0 0 1.0 0.4 0.2C 0 0 0 0 0 0 0 0 0.2 0

G 0 1.0 0 0 0.7 1.0 0 0 0.4 0.8

A 0.4 0 1.0 0 0 0 1.0 0 0 0T 0.6 0 0 1.0 0.3 0 0 1.0 0.4 0.2C 0 0 0 0 0 0 0 0 0.2 0

IC 1.0 2.0 2.0 2.0 1.1 2.0 2.0 2.0 0.5 1.3 = bit score of 15.9

Position

Information Profile:

Position

Often for protein-DNA interactions, IC profile is smooth

Position

Real motif Randomized data

One limitation of PWMs: each position is considered independently(does not represent inter-dependencies across motif positions)

Gary Stormo, Nat Biotech 2011

Morris et al. , Nat Biotech 2011

Finding matches to (instances of) a PWM

G 0 1.0 0 0 0.7 1.0 0 0 0.4 0.8

A 0.4 0 1.0 0 0 0 1.0 0 0 0T 0.6 0 0 1.0 0.3 0 0 1.0 0.4 0.2C 0 0 0 0 0 0 0 0 0.2 0

b = G,A,T,C i

Joint probability: assuming each position is independent,

P(motif) Pb(i)

P(sequence | matrix model ) = (0.4)(1.0)(1.0)(1.0)(0.3)(1.0)(1.0)(1.0)(0.2)(0.2) = 0.0048

Is the sequence A G A T T G A T C T a match to this matrix?

G 0 1.0 0 0 0.7 1.0 0 0 0.4 0.8

A 0.4 0 1.0 0 0 0 1.0 0 0 0T 0.6 0 0 1.0 0.3 0 0 1.0 0.4 0.2C 0 0 0 0 0 0 0 0 0.2 0

Is the sequence A G A T T G A T C T a match to this matrix?

b = G,A,T,C i

P(motif) Pb(i)

P(sequence | background model ) = (0.25)(0.25)(0.25)(0.25)(0.25)(0.25)(0.25)(0.25)(0.25)(0.25) = 6.8e-24

P(sequence | matrix model ) = (0.4)(1.0)(1.0)(1.0)(0.3)(1.0)(1.0)(1.0)(0.2)(0.2) = 0.0048

Background model:P(G,A,T,C) = 0.25

Log-likelihood ratio LLR

= log ( P(sequence | matrix model ) / P(sequence | background model ) )

A measure of how different the likelihood of the sequence is, given themotif model vs. the background model.

In our example:

LLR = log ( 0.0048 / 6.8e-24 ) = 20.8

The larger the LLR, the more likely the motif model is the right one.To select motifs in real life, can define a LLR cutoff (often defined by sampling).

G 0 1.0 0 0 0.7 1.0 0 0 0.4 0.8

A 0.4 0 1.0 0 0 0 1.0 0 0 0T 0.6 0 0 1.0 0.3 0 0 1.0 0.4 0.2C 0 0 0 0 0 0 0 0 0.2 0

Is the sequence A A A T T G A T C T a match to this matrix?

b = G,A,T,C i

P(motif) Pb(i)

P(sequence | matrix model ) = (0.4)(0)(1.0)(1.0)(0.3)(1.0)(1.0)(1.0)(0.2)(0.2) = 0

** If your PWM was trained on a small sample set, you might have missed some examples= overfitting of the matrix (ie. too specific) 17

Pseudo-counts: protecting against overfitting due to small sample sizes

G 0 1.0 0 0 0.7 1.0 0 0 0.4 0.8

A 0.4 0 1.0 0 0 0 1.0 0 0 0T 0.6 0 0 1.0 0.3 0 0 1.0 0.4 0.2C 0 0 0 0 0 0 0 0 0.2 0

Add 1 count to each base at each position, then divide by n + 4

Without pseudo-counts:

Motif finding methods and algorithms

Given a set of n promoters of n coregulated genes, find a motif common to the promoters.Both the PWM and the motif sequences are unknown.

Common methods:1. Enumeration:

Simplest case: look at the frequency of all n-mers* Finds Global Optimum since can search entire space

2. EM algorithms (MEME): Iteratively hone in on the most likely motif model – can simultaneouslyidentify the motif and find examples of the motif

3. Gibbs sampling methods (AlignAce, BioProspector)Iteratively replace (‘sample’) sites to retrain the matrix

Many similarly expressed genes are coregulated by the same transcription factor(s) …

Documents

Transcript of Many similarly expressed genes are coregulated by the same transcription factor(s) …

High Quality Transcription Services at Affordable Rate GMR Transcription

Transcription and Translationlibvolume1.xyz/.../transcription/transcriptionpresentation1.pdf · Transcription • Transcription is the process of making an RNA copy of a single gene.

TRANSCRIPTION Mitotic transcription andwaves of gene ... · TRANSCRIPTION Mitotic transcription andwaves of gene reactivation during mitotic exit Katherine C. Palozola,1,2 Greg Donahue,2

1 Gene transcription,post- transcriptional processing & reverse transcription.

Transcription The following is a transcription of early ...

(Transcription Factor-Transcription Factor Binding Site) patterns

Medical Transcription - Blackstone Career Institute Medical Transcription Program discusses the fundamentals of medical transcription, the medical transcription profession, and the

Transcription and Translation - WordPress.com...transcription DNA mRNA translation polypeptide ribosome DNA Transcription Making mRNA from DNA To better understand transcription, let’s

Transcription vs Translation. Central Dogma Transcription Translation.

Medical Transcription - Blackstone Career Institute...fundamentals of medical transcription, the med-ical transcription profession, and the practice of medical transcription including

1 Chapter 12: Mechanisms of Transcription. 2 RNA polymerase and transcription cycle RNA polymerase and transcription cycle The transcription cycle in.

8.4 Transcription - masoumehapbiology · Transcription VOCABULARY central dogma RNA transcription ... transcription, and translation all occur in the cytoplasm at approximately the

Analysis: Discovery of coregulated genes

Transcription in Prokaryotes. Transcription: production of mRNA copy of the DNA gene. Transcription Eukaryote model.

NAME Circle One: A1 A2 SCORE /16a. Replication, transcription b. Translation, transcription c. Mutation, replication d. Transcription, replication e. Transcription, translation 9.

Transcription II- Post transcriptional modifications and inhibitors of Transcription

* Review DNA replication & Transcription Transcription The synthesis of mRNA.

Transkribus Transcription Conventions · 2 Transkribus Transcription Conventions Contents Introduction ...

Transcription, Reverse Transcription, and Analysis of … reverse transcription and... · Transcription, Reverse Transcription, and Analysis of RNA ... the raw data suggest that this

Eukaryotic Transcription factors: Transcription … lecture 6.pdfEukaryotic Transcription factors: ... Major functional domains of eukaryotic transcription factor DNA binding ... •