RNA Secondary Structure Prediction. 16s rRNA RNA Secondary Structure Hairpin loop Junction...

Post on 20-Jan-2016

218 views 0 download

Tags:

Transcript of RNA Secondary Structure Prediction. 16s rRNA RNA Secondary Structure Hairpin loop Junction...

RNA Secondary RNA Secondary Structure PredictionStructure Prediction

16s rRNA

RNA Secondary Structure

Hairpin loop

Junction (Multiloop)Bulge

Single-Stranded Interior Loop

Stem

Image– Wuchty

Pseudoknot

Dangling end

RNA secondary structureRNA secondary structure

G A

A A G G

A-U U-G C-G A-U G-C

Loop

Stem

wobble pair

canonical pair

Legitimate structurePseudoknots

RNA secondary structure representation

Non-canonical interactions of RNA Non-canonical interactions of RNA secondary-structure elementssecondary-structure elements

Pseudoknot

Kissing hairpins

Hairpin-bulge contact

These patterns are excluded from the prediction schemes as their computation is too intensive.

“Rules for 2D RNA prediction”

• Base Pairs in stems: GOOD

• Additional possible assumptions: e.g., G:C better than A:T

• Bulges, Loops: BAD• Canonical Interactions (base pairs, stems,

bulges, loops): OK• Non canonical interactions (pseudoknots,

kissing hairpins): Forbidden• The more interactions: The better

Predicting RNA secondary Structure

• Allowed base pairing rules (Watson-Crick A:U, G:C, and Wobble pair G:U)

• Sequences may form different structures

• An free energy value is associated with each possible structure

• Predict the structure with the minimal free energy (MFE)

Simplifying Assumptions for Structure Prediction

• RNA folds into one minimum free-energy structure.

• There are no non-canonical interactions.

• The energy of a particular base pair in a double stranded regions is sequence independent– Neighbors have no influence.

Was solved by dynamic programmingZucker and Steigler 1981

Sequence-dependent free-energy (the nearest neighbor model)

U U

C G G C A UG CA UCGAC 3’

U U

C G U A A UG CA UCGAC 3’

Example values:GC GC GC GCAU GC CG UA -2.3 -2.9 -3.4 -2.1

Free energy computationFree energy computation

U UA A G C G C A G C U A A U C G A U A 3’A5’

-0.3

-0.3

-1.1 mismatch of hairpin-2.9 stacking

+3.3 (1 nt bulge) -2.9 stacking

-1.8 stacking

5’ dangling

-0.9 stacking-1.8 stacking

-2.1 stacking

G= -4.6 KCAL/MOL

+5.9 (4 nt loop)

Prediction Programs

• Mfoldhttp://www.bioinfo.rpi.edu/applications/mfold/old/rna/form1.cgi

• Vienna RNA Secondary Structure Predictionhttp://rna.tbi.univie.ac.at/cgi-bin/RNAfold.cgi

Mfold - Suboptimal FoldingMfold - Suboptimal Folding• For any sequence of N nucleotides, the expected

number of structures is greater than 1.8N

• A sequence of 100 nucleotides has ~31025 possible folds. If a computer can calculate 1000 folds/second, it would take 1015 years (age of universe = ~1010 years)!

• Mfold generates suboptimal folds whose free energy fall within a certain range of values. Many of these structures are different in trivial ways. These suboptimal folds can still be useful for designing experiments.

Example:

Output: