MW  12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano

39
http://cs273a.stanford.edu [BejeranoFall15/16] 1 MW 1:30-2:50pm in Clark S361* (behind Peet’s) Profs: Serafim Batzoglou & Gill Bejerano CAs: Karthik Jagadeesh & Johannes Birgmeier CS273A Lecture 3: Non Coding Genes

description

CS273A. Lecture 3: Protein coding genes. MW  12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano TAs: Harendra Guturu & Panos Achlioptas. Announcements. http://cs273a.stanford.edu/ is up Course guidelines, lecture slides, etc. Communications via Piazza - PowerPoint PPT Presentation

Transcript of MW  12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano

Page 1: MW   12:50-2:05pm  in Beckman B302 Profs: Serafim  Batzoglou  & Gill  Bejerano

http://cs273a.stanford.edu [BejeranoFall15/16] 1

MW  1:30-2:50pm in Clark S361* (behind Peet’s)Profs: Serafim Batzoglou & Gill BejeranoCAs: Karthik Jagadeesh & Johannes Birgmeier* Mostly: track on website/piazza

CS273A

Lecture 3: Non Coding Genes

Page 2: MW   12:50-2:05pm  in Beckman B302 Profs: Serafim  Batzoglou  & Gill  Bejerano

• http://cs273a.stanford.edu/– Lecture slides, problem sets, etc.

• Course communications via Piazza– Auditors please sign up too

• Second Tutorial this Fri 10/2. UCSC genome browser. We recommend bringing your laptop.

•We are starting to take attendance today.

http://cs273a.stanford.edu [BejeranoFall15/16] 2

Announcements

Page 3: MW   12:50-2:05pm  in Beckman B302 Profs: Serafim  Batzoglou  & Gill  Bejerano

http://cs273a.stanford.edu [BejeranoFall15/16] 3

“non coding” RNAs (ncRNA)

Page 4: MW   12:50-2:05pm  in Beckman B302 Profs: Serafim  Batzoglou  & Gill  Bejerano

4

Central Dogma of Biology:

Page 5: MW   12:50-2:05pm  in Beckman B302 Profs: Serafim  Batzoglou  & Gill  Bejerano

5

Active forms of “non coding” RNA

reverse transcription(of eg retrogenes)

long non-coding RNA

microRNA

rRNA, snRNA, snoRNA

Page 6: MW   12:50-2:05pm  in Beckman B302 Profs: Serafim  Batzoglou  & Gill  Bejerano

6

What is ncRNA?• Non-coding RNA (ncRNA) is an RNA that functions without being

translated to a protein.• Known roles for ncRNAs:

– RNA catalyzes excision/ligation in introns.– RNA catalyzes the maturation of tRNA.– RNA catalyzes peptide bond formation.– RNA is a required subunit in telomerase.– RNA plays roles in immunity and development (RNAi).– RNA plays a role in dosage compensation.– RNA plays a role in carbon storage.– RNA is a major subunit in the SRP, which is important in protein trafficking.– RNA guides RNA modification.

– RNA can do so many different functions, it is thought in the beginning there was an RNA World, where RNA was both the information carrier and active molecule.

Page 7: MW   12:50-2:05pm  in Beckman B302 Profs: Serafim  Batzoglou  & Gill  Bejerano

http://cs273a.stanford.edu [BejeranoFall15/16] 7

“non coding” RNAs (ncRNA) subtype:

Small structural RNAs (ssRNA)

Page 8: MW   12:50-2:05pm  in Beckman B302 Profs: Serafim  Batzoglou  & Gill  Bejerano

8

AAUUGCGGGAAAGGGGUCAACAGCCGUUCAGUACCAAGUCUCAGGGGAAACUUUGAGAUGGCCUUGCAAAGGGUAUGGUAAUAAGCUGACGGACAUGGUCCUAACCACGCAGCCAAGUCCUAAGUCAACAGAUCUUCUGUUGAUAUGGAUGCAGUUCA

ssRNA Folds into Secondary and 3D Structures

P 6b

P 6a

P 6

P 4

P 5P 5a

P 5b

P 5c

120

140

160

180

200

220

240

260

AAU

UGCGGG

AAA

GGGGUCA

ACAGCCG UUCAGU

ACCA

AGUCUCAGGGGAAA

CUUUGAGAU

GGCCUUGCA A A G G G U A U

GGUAA

UA AGC

UGACGGACA

UGGUCC

UAA

CCA CGCA

GC

CAA

GUCC

UAA G

UCAACAG

AU C U

UCUGUUGAU

AU

GGAU

GC

AGU

UC A

Cate, et al. (Cech & Doudna).(1996) Science 273:1678.

Waring & Davies. (1984) Gene 28: 277.

Computational Challenge: predict 2D & 3D structure from sequence (1D).

Page 9: MW   12:50-2:05pm  in Beckman B302 Profs: Serafim  Batzoglou  & Gill  Bejerano

For example, tRNA Activity

Complementation is a “brilliant” way for the Genome to “get things done”.

Replication, Transcription, Translation, …Compliment(Compliment(Identity)) = Identity

Page 10: MW   12:50-2:05pm  in Beckman B302 Profs: Serafim  Batzoglou  & Gill  Bejerano

tRNA Structure

Page 11: MW   12:50-2:05pm  in Beckman B302 Profs: Serafim  Batzoglou  & Gill  Bejerano

ssRNA structure rules• Canonical basepairs:

– Watson-Crick basepairs:• G ≡ C• A = U

– Wobble basepair:• G − U

• Stacks: continuous nested basepairs. (energetically favorable)

• Non-basepaired loops:– Hairpin loop– Bulge– Internal loop– Multiloop

• Pseudo-knots

Page 12: MW   12:50-2:05pm  in Beckman B302 Profs: Serafim  Batzoglou  & Gill  Bejerano

Ab initio RNA structure prediction: lots of Dynamic Programming

• Objective: Maximizing # of base pairings (Nussinov et al, 1978)

simple model:(i, j) = 1 iff GC|AU|GUfancier model:GC > AU > GU

Page 13: MW   12:50-2:05pm  in Beckman B302 Profs: Serafim  Batzoglou  & Gill  Bejerano

Dynamic programming Compute S(i,j) recursively (dynamic

programming)– Compares a sequence against itself in a dynamic

programming matrix

Three steps

Page 14: MW   12:50-2:05pm  in Beckman B302 Profs: Serafim  Batzoglou  & Gill  Bejerano

1. Initialization

Example:

GGGAAAUCC

G G G A A A U C CG 0G 0 0G 0 0A 0 0A 0 0A 0 0U 0 0C 0 0C 0 0

the main diagonal

the diagonal belowL: the length of input sequence

Page 15: MW   12:50-2:05pm  in Beckman B302 Profs: Serafim  Batzoglou  & Gill  Bejerano

2. RecursionG G G A A A U C C

G 0 0 0 0G 0 0 0 0 0G 0 0 0 0 0A 0 0 0 0 ?A 0 0 0 1A 0 0 1 1U 0 0 0 0C 0 0 0C 0 0

Fill up the table (DP matrix) -- diagonal by diagonal

j

i

Page 16: MW   12:50-2:05pm  in Beckman B302 Profs: Serafim  Batzoglou  & Gill  Bejerano

3. Traceback

G G G A A A U C CG 0 0 0 0 0 0 1 2 3G 0 0 0 0 0 0 1 2 3G 0 0 0 0 0 1 2 2A 0 0 0 0 1 1 1A 0 0 0 1 1 1A 0 0 1 1 1U 0 0 0 0C 0 0 0C 0 0

The structure is:

What are the other “optimal” structures?

Page 17: MW   12:50-2:05pm  in Beckman B302 Profs: Serafim  Batzoglou  & Gill  Bejerano

Pseudoknots drastically increase computational complexity

Page 18: MW   12:50-2:05pm  in Beckman B302 Profs: Serafim  Batzoglou  & Gill  Bejerano

http://cs273a.stanford.edu [BejeranoFall15/16] 18

Objective: Minimize Secondary StructureFree Energy at 37 OC:

C G U U U G G GUU

CACAAACG

-2 .0

-2 .1

-0 .9

-0 .9

-1 .8

-1 .6

+ 5 .0

G helix = GC GG C

+ G

G UC A

+ 2G

U UA A

+ G

U GA C

=

-2.0 kcal/mol - 2.1 kcal/mol + 2x(-0.9) kcal/mol - 1.8 kcal/mol = -7.7 kcal/mol

Ghairpin loop = G

initiation (6 nucleotides) + Gmismatch

G GC A

=

5.0 kcal/mol - 1.6 kcal/mol = 3.4 kcal/mol

Gtotal = G

hairpin + Ghelix = 3.4 kcal/mol - 7.7 kcal/mol = -4.3 kcal/mol

Mathews, Disney, Childs, Schroeder, Zuker, & Turner. 2004. PNAS 101: 7287.

Instead of (i, j), measure and sum energies:

Page 19: MW   12:50-2:05pm  in Beckman B302 Profs: Serafim  Batzoglou  & Gill  Bejerano

Zuker’s algorithm MFOLD: computing loop dependent energies

Page 20: MW   12:50-2:05pm  in Beckman B302 Profs: Serafim  Batzoglou  & Gill  Bejerano

Bafna 1

RNA structure• Base-pairing defines a secondary

structure. The base-pairing is usually non-crossing.

• In this example the pseudo-knot makes for an exception.

Page 21: MW   12:50-2:05pm  in Beckman B302 Profs: Serafim  Batzoglou  & Gill  Bejerano

S aSu

S cSg

S gSc

S uSa

S a

S c

S g

S u

S SS

1. A CFG

S aSu

acSgu

accSggu

accuSaggu

accuSSaggu

accugScSaggu

accuggSccSaggu

accuggaccSaggu

accuggacccSgaggu

accuggacccuSagaggu

accuggacccuuagaggu

2. A derivation of “accuggacccuuagaggu”3. Corresponding structure

Stochastic context-free grammar

Page 22: MW   12:50-2:05pm  in Beckman B302 Profs: Serafim  Batzoglou  & Gill  Bejerano

ssRNA transcription• ssRNAs like tRNAs are usually encoded by short

“non coding” genes, that transcribe independently.• Found in both the UCSC “known genes” track, and

as a subtrack of the RepeatMasker track

http://cs273a.stanford.edu [BejeranoFall15/16] 22

Page 23: MW   12:50-2:05pm  in Beckman B302 Profs: Serafim  Batzoglou  & Gill  Bejerano

http://cs273a.stanford.edu [BejeranoFall15/16] 23

“non coding” RNAs (ncRNA) subtype:

microRNAs (miRNA/miR)

Page 24: MW   12:50-2:05pm  in Beckman B302 Profs: Serafim  Batzoglou  & Gill  Bejerano

http://cs273a.stanford.edu [BejeranoFall15/16] 24

MicroRNA (miR)

mRNA

~70nt ~22nt miR match to target mRNAis quite loose.

a single miR can regulate the expression of hundreds of genes.

Page 25: MW   12:50-2:05pm  in Beckman B302 Profs: Serafim  Batzoglou  & Gill  Bejerano

http://cs273a.stanford.edu [BejeranoFall15/16] 25

MicroRNA Transcription

mRNA

Page 26: MW   12:50-2:05pm  in Beckman B302 Profs: Serafim  Batzoglou  & Gill  Bejerano

http://cs273a.stanford.edu [BejeranoFall15/16] 26

MicroRNA Transcription

mRNA

Page 27: MW   12:50-2:05pm  in Beckman B302 Profs: Serafim  Batzoglou  & Gill  Bejerano

http://cs273a.stanford.edu [BejeranoFall15/16] 27

MicroRNA (miR)

mRNA

~70nt ~22nt miR match to target mRNAis quite loose.

a single miR can regulate the expression of hundreds of genes.

Computational challenges:Predict miRs.Predict miR targets.

Page 28: MW   12:50-2:05pm  in Beckman B302 Profs: Serafim  Batzoglou  & Gill  Bejerano

http://cs273a.stanford.edu [BejeranoFall15/16] 28

MicroRNA Therapeutics

mRNA

~70nt ~22nt miR match to target mRNAis quite loose.

a single miR can regulate the expression of hundreds of genes.

Idea: bolster/inhibit miR production tobroadly modulate protein productionHope: “right” the good guys and/or “wrong” the bad guysChallenge: and not vice versa.

Page 29: MW   12:50-2:05pm  in Beckman B302 Profs: Serafim  Batzoglou  & Gill  Bejerano

http://cs273a.stanford.edu [BejeranoFall15/16] 29

Other Non Coding Transcripts

Page 30: MW   12:50-2:05pm  in Beckman B302 Profs: Serafim  Batzoglou  & Gill  Bejerano

lncRNAs (long non coding RNAs)

http://cs273a.stanford.edu [BejeranoFall15/16] 30

Don’t seem to fold into clear structures (or only a sub-region does).Diverse roles only now starting to be understood.Example: bind splice site, cause intron retention.

Challenges: 1) detect and 2) predict function computationally

Page 31: MW   12:50-2:05pm  in Beckman B302 Profs: Serafim  Batzoglou  & Gill  Bejerano

X chromosome inactivation in mammals

X X X Y

X

Dosage compensation

Page 32: MW   12:50-2:05pm  in Beckman B302 Profs: Serafim  Batzoglou  & Gill  Bejerano

Xist – X inactive-specific transcript

Avner and Heard, Nat. Rev. Genetics 2001 2(1):59-67

Page 33: MW   12:50-2:05pm  in Beckman B302 Profs: Serafim  Batzoglou  & Gill  Bejerano

System output measurementsWe can measure non/coding gene expression!(just remember – it is context dependent)1. First generation mRNA (cDNA) and EST sequencing:

http://cs273a.stanford.edu [BejeranoFall15/16] 33

In UCSC Browser:

Page 34: MW   12:50-2:05pm  in Beckman B302 Profs: Serafim  Batzoglou  & Gill  Bejerano

2. Gene Expression Microarrays (“chips”)

http://cs273a.stanford.edu [BejeranoFall15/16] 34

Page 35: MW   12:50-2:05pm  in Beckman B302 Profs: Serafim  Batzoglou  & Gill  Bejerano

3. RNA-seq“Next” (2nd) generation sequencing.

http://cs273a.stanford.edu [BejeranoFall15/16] 35

Page 36: MW   12:50-2:05pm  in Beckman B302 Profs: Serafim  Batzoglou  & Gill  Bejerano

http://cs273a.stanford.edu [BejeranoFall15/16] 36

Transcripts, transcripts everywhere

Human Genome

Transcribed* (Tx)

Tx from both strands*

* True size of set unknown

Page 37: MW   12:50-2:05pm  in Beckman B302 Profs: Serafim  Batzoglou  & Gill  Bejerano

Or are they?

http://cs273a.stanford.edu [BejeranoFall15/16] 37

Page 38: MW   12:50-2:05pm  in Beckman B302 Profs: Serafim  Batzoglou  & Gill  Bejerano

http://cs273a.stanford.edu [BejeranoFall15/16] 38

The million dollar question

Human Genome

Transcribed* (Tx)

Tx from both strands*

* True size of set unknown

Leaky tx?

Functional?

Page 39: MW   12:50-2:05pm  in Beckman B302 Profs: Serafim  Batzoglou  & Gill  Bejerano

http://cs273a.stanford.edu [BejeranoFall15/16] 39

Gene Finding II: technology dependenceChallenge:“Find the genes, the whole genes, and nothing but the genes”

We started out trying to predict genes directly from the genome.When you measure gene expression, the challenge changes:

Now you want to build gene models from your observations.These are both technology dependent challenges.The hybrid: what we measure is a tiny fraction of the space-time state

space for cells in our body. We want to generalize from measured states and improve our predictions for the full compendium of states.