AdvancedCourse:Shapes
RobertGiegerich
Motivation
Lost in FoldingSpace
Abstractioncomes to rescue
Abstractshapes
Defining shapeabstractions
Properties of theshape space
RNAshapes
Simple shapeanalysis
Completeprobabilisticshape analysis
ShapeProbabilitites
Application:Shape basedindexing
Application:Shape basedmatching
RNA Abstract Shape Analysis
Robert Giegerich
Faculty of Technology & Center of BiotechnologyBielefeld University
EMBO Practical Course on Computational RNA Biology,Cargese, April 2010
Robert Giegerich Advanced Course: Shapes
AdvancedCourse:Shapes
RobertGiegerich
Motivation
Lost in FoldingSpace
Abstractioncomes to rescue
Abstractshapes
Defining shapeabstractions
Properties of theshape space
RNAshapes
Simple shapeanalysis
Completeprobabilisticshape analysis
ShapeProbabilitites
Application:Shape basedindexing
Application:Shape basedmatching
Where do we stand ...
1 Thermodynamic model (X. Flamm)
2 MFE folding, optimal structure, fallacies (G. Steger)
3 representative structural alternatives
4 structure prediction from multiple sequences (D.Mathews)
5 structure comparison (D. Mathews)
6 search by structure (I. Meyer, P. Gardner)
7 . . .
Robert Giegerich Advanced Course: Shapes
AdvancedCourse:Shapes
RobertGiegerich
Motivation
Lost in FoldingSpace
Abstractioncomes to rescue
Abstractshapes
Defining shapeabstractions
Properties of theshape space
RNAshapes
Simple shapeanalysis
Completeprobabilisticshape analysis
ShapeProbabilitites
Application:Shape basedindexing
Application:Shape basedmatching
1 MotivationLost in Folding SpaceAbstraction comes to rescue
2 Abstract shapesDefining shape abstractionsProperties of the shape space
3 RNAshapesSimple shape analysisComplete probabilistic shape analysisShape Probabilitites
4 Application: Shape based indexing
5 Application: Shape based matching
Robert Giegerich Advanced Course: Shapes
AdvancedCourse:Shapes
RobertGiegerich
Motivation
Lost in FoldingSpace
Abstractioncomes to rescue
Abstractshapes
Defining shapeabstractions
Properties of theshape space
RNAshapes
Simple shapeanalysis
Completeprobabilisticshape analysis
ShapeProbabilitites
Application:Shape basedindexing
Application:Shape basedmatching
Better than optimal . . . (1)
Can we get better/more information from thermodynamicfolding than the MFE structure?
How accurate is the MFE structure anyway?
Robert Giegerich Advanced Course: Shapes
AdvancedCourse:Shapes
RobertGiegerich
Motivation
Lost in FoldingSpace
Abstractioncomes to rescue
Abstractshapes
Defining shapeabstractions
Properties of theshape space
RNAshapes
Simple shapeanalysis
Completeprobabilisticshape analysis
ShapeProbabilitites
Application:Shape basedindexing
Application:Shape basedmatching
Better than optimal . . . (1)
Can we get better/more information from thermodynamicfolding than the MFE structure?
How accurate is the MFE structure anyway?
Robert Giegerich Advanced Course: Shapes
AdvancedCourse:Shapes
RobertGiegerich
Motivation
Lost in FoldingSpace
Abstractioncomes to rescue
Abstractshapes
Defining shapeabstractions
Properties of theshape space
RNAshapes
Simple shapeanalysis
Completeprobabilisticshape analysis
ShapeProbabilitites
Application:Shape basedindexing
Application:Shape basedmatching
2004 Mfold evaluation by Gutell Lab
Doshi KJ, Cannone JJ, Cobaugh CW, Gutell RR.: Evaluation of the
suitability of free-energy minimization using nearest-neighbor energy
parameters for RNA secondary structure prediction. BMC Bioinformatics.
2004 Aug 5;5:105.
Compares MFE foldings to structures derived by comparativeanalysis and proven by experimental techniques.Findings:
base pair accuracy of about 20% - 71%
no improvement from recently updated thermodynamicparameters
note: did not check for good near-optimal solutions
Robert Giegerich Advanced Course: Shapes
AdvancedCourse:Shapes
RobertGiegerich
Motivation
Lost in FoldingSpace
Abstractioncomes to rescue
Abstractshapes
Defining shapeabstractions
Properties of theshape space
RNAshapes
Simple shapeanalysis
Completeprobabilisticshape analysis
ShapeProbabilitites
Application:Shape basedindexing
Application:Shape basedmatching
Base pair accuracy – what does it mean?
( )
( )
((((
((((
))))
))))
(((( ))))
( )((((
(((())))
(((( ))))
( )....
....
....
.... ....
.... ....
....
....
((((...)))) ((((...))))...((((...))))...((((...))))...((((....))))
((((...)))) ..............((((((((((((((((........))))))))))))))))
((((...)))) ............((((((((((((((((........))))))))))))))))..
4 out of 20 BP correct...
....))))))))
.... ))))(( ))
....
....
4 out of 20 BP correct...
a reference structure
and two structures
at the same distance 16
two structures at distance 16, but with the same "shape"
(((((( ))
((((
[ [ ] [ ] ]
Robert Giegerich Advanced Course: Shapes
AdvancedCourse:Shapes
RobertGiegerich
Motivation
Lost in FoldingSpace
Abstractioncomes to rescue
Abstractshapes
Defining shapeabstractions
Properties of theshape space
RNAshapes
Simple shapeanalysis
Completeprobabilisticshape analysis
ShapeProbabilitites
Application:Shape basedindexing
Application:Shape basedmatching
Accuracy of MFE folding . . .
RNA folding struggles with
adequacy of thermodynamic parameters . . . ?
uncovered structural motifs – pseudoknots, kissinghairpins!
dynamics of interaction with other molecules . . . ?
RNA transcript processing . . . ?
folding kinetics (co-transcriptional folding) . . . ?
...
physical properties of the folding space . . . !
Robert Giegerich Advanced Course: Shapes
AdvancedCourse:Shapes
RobertGiegerich
Motivation
Lost in FoldingSpace
Abstractioncomes to rescue
Abstractshapes
Defining shapeabstractions
Properties of theshape space
RNAshapes
Simple shapeanalysis
Completeprobabilisticshape analysis
ShapeProbabilitites
Application:Shape basedindexing
Application:Shape basedmatching
The problem to be solved
We want more comprehensive information about an RNAmolecule’s foldings than just its MFE structure.
Robert Giegerich Advanced Course: Shapes
AdvancedCourse:Shapes
RobertGiegerich
Motivation
Lost in FoldingSpace
Abstractioncomes to rescue
Abstractshapes
Defining shapeabstractions
Properties of theshape space
RNAshapes
Simple shapeanalysis
Completeprobabilisticshape analysis
ShapeProbabilitites
Application:Shape basedindexing
Application:Shape basedmatching
Lost in folding space (1)
The folding space of a given sequence is LARGE:
number of foldings is exponential in sequence length
number of near-optimal foldings is exponential in energywindow
Structure asymptotics:
S(n) ≈ 1.104366 ∗ n−3/2 ∗ 2.618034n
Number of secondary structures for ALL sequences of length n.A tyical tRNA of 74 nt has about 4 Mio. feasible structures.Consider the 111 “best” structures, each with 27 - 28 bp:
Robert Giegerich Advanced Course: Shapes
AdvancedCourse:Shapes
RobertGiegerich
Motivation
Lost in FoldingSpace
Abstractioncomes to rescue
Abstractshapes
Defining shapeabstractions
Properties of theshape space
RNAshapes
Simple shapeanalysis
Completeprobabilisticshape analysis
ShapeProbabilitites
Application:Shape basedindexing
Application:Shape basedmatching
gggcccauagcucagugguagagugccuccuuugcaaggaggaugcccuggguucgaaucccagugggucca
((((((((.((...))(((((.((((((((((...))))))).))).)))(((....))))).)))))))).
((((((((.((...)).))((.((((((((((...))))))).))).))))))))((.(((....)))))..
((((((((.((...)).))((.((((((((((...))))))).))).))))))))((..(((...)))))..
((.(((((.((...))(((((.((((((((((...))))))).))).)))(((....))))).)))))))..
.(((((((.((...))(((((.((((((((((...))))))).))).)))(((....))))).)))))))..
((((((((.......((((((.((((((((((...))))))).))).)))(((....)))))))))))))).
(((((((((((((((.(((...((.(((((((...))))))))))))))))))).........)))))))).
((((((((.((...))(((...((.(((((((...)))))))))))).(((((....)).))))))))))).
((((((((.((...))(((...((.(((((((...)))))))))))).(((((....))).)))))))))).
((((((((.((...))((....((((((((((...))))))).)))))(((((....)).))))))))))).
((((((((.((...))((....((((((((((...))))))).)))))(((((....))).)))))))))).
((((((((.((...))(((.((((((((((((...))))))).))((...)))))..)))...)))))))).
((((((((.((...))(((.((((.(((((((...)))))))))(((...)))))..)))...)))))))).
((((((((.((...))(((((.((((((((((...))))))).))).))).((....))))..)))))))).
((((((((.((...))((.((.((((((((((...))))))).))).)).(((....))))).)))))))).
((((((((.((...))(((((.((((((((((...))))))).))).)))((......)))).)))))))).
((((((((.((...))(((((.((((((((((...))))))).))).))).((....)).)).)))))))).
((((((((.((...))(((((.((.(((((((...)))))))))...)))(((....))))).)))))))).
((((((((.((...))(((((..(((((((((...))))))).))..)))(((....))))).)))))))).
((((((((.((...))(((((.(((..((((((....))))))))).)))(((....))))).)))))))).
Robert Giegerich Advanced Course: Shapes
AdvancedCourse:Shapes
RobertGiegerich
Motivation
Lost in FoldingSpace
Abstractioncomes to rescue
Abstractshapes
Defining shapeabstractions
Properties of theshape space
RNAshapes
Simple shapeanalysis
Completeprobabilisticshape analysis
ShapeProbabilitites
Application:Shape basedindexing
Application:Shape basedmatching
((((((((.((...))(((((.(((..((((((...)).))))))).)))(((....))))).)))))))).
((((((((.((...))(((((.(((..((((((...))).)))))).)))(((....))))).)))))))).
((((((((.((...))(((((.(((((((((.....)))))).))).)))(((....))))).)))))))).
((((((((.((...))(((((.(((((((.((....)))))).))).)))(((....))))).)))))))).
((((((((.((...))(((((.(((((((..((...)))))).))).)))(((....))))).)))))))).
((((((((.((...))(((((.(((((((.((...)).)))).))).)))(((....))))).)))))))).
((((((((.((...))(((((.((((((.(((....)))))).))).)))(((....))))).)))))))).
((((((((.((...))(((((.((((((..(((...)))))).))).)))(((....))))).)))))))).
((((((((.((...))(((((.((((((.(((...))).))).))).)))(((....))))).)))))))).
((((((((.((...))(((((.(((((((.((...)))).)).))).)))(((....))))).)))))))).
((((((((.((...))(((((.(((((.((((...)))).)).))).)))(((....))))).)))))))).
((((((((.((...))(((((.(((.((((((...))))))..))).)))(((....))))).)))))))).
((((((((.((...))(((((.(((.((((((...)))).)).))).)))(((....))))).)))))))).
((((((((.((...))(((((.((.(((((((...)))))))..)).)))(((....))))).)))))))).
((((((((.((...))((((..((((((((((...))))))).)))..))(((....))))).)))))))).
((((((((.(((....)))((.((((((((((...))))))).))).))((((....))))..)))))))).
((((((((.(((....)))((.((((((((((...))))))).))).))((((....)).)).)))))))).
((((((((.((((.((.((...))))((((((...)))))))).))..(((((....)).))))))))))).
Robert Giegerich Advanced Course: Shapes
AdvancedCourse:Shapes
RobertGiegerich
Motivation
Lost in FoldingSpace
Abstractioncomes to rescue
Abstractshapes
Defining shapeabstractions
Properties of theshape space
RNAshapes
Simple shapeanalysis
Completeprobabilisticshape analysis
ShapeProbabilitites
Application:Shape basedindexing
Application:Shape basedmatching
((((((((.((((.((.((...))))((((((...)))))))).))..(((((....))).)))))))))).
(((((((...((....(((((.((((((((((...))))))).))).)))(((....)))))))))))))).
(((((((...((.((.(((...((.(((((((...))))))))))))))((((....)).))))))))))).
(((((((...((.((.(((...((.(((((((...))))))))))))))((((....)))).))))))))).
(((((((...(((((.(((...((.(((((((...)))))))))))))))(((....)))..))))))))).
(((((((..((...))(((...((.(((((((...))))))))))))((((((....)).))))))))))).
(((((((..((...))(((...((.(((((((...))))))))))))((((((....))).)))))))))).
(((((((..((...))(((...((.(((((((...))))))))))))((((((....)))).))))))))).
(((((((..((...))(((((.((((((((((...))))))).))).)))(((....)))))..))))))).
(((((((((.((....))))..((((((((((...))))))).))).((((((....)).))))))))))).
(((((((((.((....))))..((((((((((...))))))).))).((((((....))).)))))))))).
(((((((((.((....))))..((((((((((...))))))).))).((((((....)))).))))))))).
(((((((((..((...))))..((((((((((...))))))).))).((((((....)).))))))))))).
(((((((((..((...))))..((((((((((...))))))).))).((((((....))).)))))))))).
(((((((((..((...))))..((((((((((...))))))).))).((((((....)))).))))))))).
(((((((((((...))..))..((((((((((...))))))).))).((((((....)).))))))))))).
(((((((((((...))..))..((((((((((...))))))).))).((((((....))).)))))))))).
(((((((((((...))..))..((((((((((...))))))).))).((((((....)))).))))))))).
Robert Giegerich Advanced Course: Shapes
AdvancedCourse:Shapes
RobertGiegerich
Motivation
Lost in FoldingSpace
Abstractioncomes to rescue
Abstractshapes
Defining shapeabstractions
Properties of theshape space
RNAshapes
Simple shapeanalysis
Completeprobabilisticshape analysis
ShapeProbabilitites
Application:Shape basedindexing
Application:Shape basedmatching
(((((((((..((.((.((...))))((((((...))))))))))..((((((....)).))))))))))).
(((((((((..((.((.((...))))((((((...))))))))))..((((((....))).)))))))))).
(((((((((..((.((.((...))))((((((...))))))))))..((((((....)))).))))))))).
(((((((((((...))((((...))))((((((....))))))))..((((((....)).))))))))))).
(((((((((((...))((((...))))((((((....))))))))..((((((....))).)))))))))).
(((((((((((...))((((...))))((((((....))))))))..((((((....)))).))))))))).
(((((((((((...))((((...))))((((((...)).))))))..((((((....)).))))))))))).
(((((((((((...))((((...))))((((((...)).))))))..((((((....))).)))))))))).
(((((((((((...))((((...))))((((((...)).))))))..((((((....)))).))))))))).
(((((((((((...))((((...))))((((((...))).)))))..((((((....)).))))))))))).
(((((((((((...))((((...))))((((((...))).)))))..((((((....))).)))))))))).
(((((((((((...))((((...))))((((((...))).)))))..((((((....)))).))))))))).
(((((((..((((.((.((...))))((((((...)))))))).)).((((((....)).))))))))))).
(((((((..((((.((.((...))))((((((...)))))))).)).((((((....))).)))))))))).
(((((((..((((.((.((...))))((((((...)))))))).)).((((((....)))).))))))))).
((((((...(((....(((((.((((((((((...))))))).))).)))(((....)))))))))))))).
((((((...(((.((.(((...((.(((((((...))))))))))))))((((....)).))))))))))).
((((((...(((.((.(((...((.(((((((...))))))))))))))((((....)))).))))))))).
((((((...((((((.(((...((.(((((((...)))))))))))))))(((....)))..))))))))).
Robert Giegerich Advanced Course: Shapes
AdvancedCourse:Shapes
RobertGiegerich
Motivation
Lost in FoldingSpace
Abstractioncomes to rescue
Abstractshapes
Defining shapeabstractions
Properties of theshape space
RNAshapes
Simple shapeanalysis
Completeprobabilisticshape analysis
ShapeProbabilitites
Application:Shape basedindexing
Application:Shape basedmatching
((((((...((....((((((.((((((((((...))))))).))).)))(((....)))))))))))))). [ [][]]
(((((((((((((((.(((...((.(((((((...)))))))))))))))))))...))......)))))). [ ]
((((((..(((((((.(((...((.(((((((...)))))))))))))))))))...((....)))))))).
(((((..((.((....(((((.((((((((((...))))))).))).)))(((....)))))))))))))).
(((((..((.((.((.(((...((.(((((((...))))))))))))))((((....)).))))))))))).
(((((..((.((.((.(((...((.(((((((...))))))))))))))((((....)))).))))))))).
(((((..((.(((((.(((...((.(((((((...)))))))))))))))(((....)))..))))))))).
(((((..((((...))(((...((.(((((((...))))))))))))((((((....)).))))))))))). [[][][]]
(((((..((((...))(((...((.(((((((...))))))))))))((((((....))).)))))))))).
(((((..((((...))(((...((.(((((((...))))))))))))((((((....)))).))))))))).
(((((..((((...))(((((.((((((((((...))))))).))).)))(((....)))))..))))))).
(((((..((((((.((.((...))))((((((...)))))))).)).((((((....)).))))))))))).
(((((..((((((.((.((...))))((((((...)))))))).)).((((((....))).)))))))))).
(((((..((((((.((.((...))))((((((...)))))))).)).((((((....)))).))))))))).
(((((.((.((...))(((((.((((((((((...))))))).))).)))(((....))))).)).))))).
((((..(((((((((.(((...((.(((((((...)))))))))))))))))))...))((....)))))).
((((..(((((((((.(((...((.(((((((...)))))))))))))))))))...)).((...)))))).
((((.(((.((...))(((((.((((((((((...))))))).))).)))(((....))))).))).)))).
(((.((.((.((....(((((.((((((((((...))))))).))).)))(((....)))))))))))))).
(((.((.((.((.((.(((...((.(((((((...))))))))))))))((((....)).))))))))))).
Robert Giegerich Advanced Course: Shapes
AdvancedCourse:Shapes
RobertGiegerich
Motivation
Lost in FoldingSpace
Abstractioncomes to rescue
Abstractshapes
Defining shapeabstractions
Properties of theshape space
RNAshapes
Simple shapeanalysis
Completeprobabilisticshape analysis
ShapeProbabilitites
Application:Shape basedindexing
Application:Shape basedmatching
(((.((.((.((.((.(((...((.(((((((...))))))))))))))((((....)))).))))))))).
(((.((.((.(((((.(((...((.(((((((...)))))))))))))))(((....)))..))))))))).
(((.((.((((...))(((...((.(((((((...))))))))))))((((((....)).))))))))))).
(((.((.((((...))(((...((.(((((((...))))))))))))((((((....))).)))))))))).
(((.((.((((...))(((...((.(((((((...))))))))))))((((((....)))).))))))))).
(((.((.((((...))(((((.((((((((((...))))))).))).)))(((....)))))..))))))).
(((.((.((((((.((.((...))))((((((...)))))))).)).((((((....)).))))))))))).
(((.((.((((((.((.((...))))((((((...)))))))).)).((((((....))).)))))))))).
(((.((.((((((.((.((...))))((((((...)))))))).)).((((((....)))).))))))))).
(((.((((.((...))(((((.((((((((((...))))))).))).)))(((....))))).)).))))).
(((((.((.((...))(((((.((((((((((...))))))).))).)))(((....))))).)))).))).
((((((((.((...)).))((.((((((((((...))))))).))).)))))(((..((....)))))))).
(((...(((((((((.(((...((.(((((((...)))))))))))))))))))...))(((...)))))).
(((.((((.((...))(((((.((((((((((...))))))).))).)))(((....))))).)))).))).
((.(((((.((...))(((((.((((((((((...))))))).))).)))(((....))))).))))).)).
.(((((((.((...))(((((.((((((((((...))))))).))).)))(((....))))).))))).)).
Robert Giegerich Advanced Course: Shapes
AdvancedCourse:Shapes
RobertGiegerich
Motivation
Lost in FoldingSpace
Abstractioncomes to rescue
Abstractshapes
Defining shapeabstractions
Properties of theshape space
RNAshapes
Simple shapeanalysis
Completeprobabilisticshape analysis
ShapeProbabilitites
Application:Shape basedindexing
Application:Shape basedmatching
Lost in folding space (2)
What we observe from the simple tRNA example:
LARGE number of close-to-optimal foldings
FEW structural classes holding many similar foldings
Can we condense the folding space to good representatives ofthese classes?
Robert Giegerich Advanced Course: Shapes
AdvancedCourse:Shapes
RobertGiegerich
Motivation
Lost in FoldingSpace
Abstractioncomes to rescue
Abstractshapes
Defining shapeabstractions
Properties of theshape space
RNAshapes
Simple shapeanalysis
Completeprobabilisticshape analysis
ShapeProbabilitites
Application:Shape basedindexing
Application:Shape basedmatching
Better than optimal . . . (2)
Alternatives to a single MFE structure prediction:
BP probabilities and dotplots (McCaskill)
sampling of near-optimal structures (Mfold)
complete enumeration within a threshold (RNAsubopt)
stochastic sampling and clustering a posteriori (Sfold)
classified folding by abstract shape (RNAshapes)
Robert Giegerich Advanced Course: Shapes
AdvancedCourse:Shapes
RobertGiegerich
Motivation
Lost in FoldingSpace
Abstractioncomes to rescue
Abstractshapes
Defining shapeabstractions
Properties of theshape space
RNAshapes
Simple shapeanalysis
Completeprobabilisticshape analysis
ShapeProbabilitites
Application:Shape basedindexing
Application:Shape basedmatching
Classification by abstract shape
C
U
GC
A
G
UA
G
G
U U GG
UC C
G
CG
C
G
U C
UG
CUG
CGG
U
GC
C G
G
A
AU
C
G
U
C
G
G
U
U
G
G
Multiple Loop
Stacking Region
Hairpin Loop
Internal Loop
Bulge Loop (left)
Bulge Loop (right)
C
C A
C
UGGC
GCC
G
CG
G
GC
C
G
A
CG
UC
G A
CU
A G
G CC
G
C
U
C
GGA
A
A
C
G
G
G
G
U
A
C
C
G
C
G
UU
C
CC
A
C
U
A
G
G
C
G
C
C
GG
What is a shape LIKE this .............. or NOT like this.....?Robert Giegerich Advanced Course: Shapes
AdvancedCourse:Shapes
RobertGiegerich
Motivation
Lost in FoldingSpace
Abstractioncomes to rescue
Abstractshapes
Defining shapeabstractions
Properties of theshape space
RNAshapes
Simple shapeanalysis
Completeprobabilisticshape analysis
ShapeProbabilitites
Application:Shape basedindexing
Application:Shape basedmatching
Levels of abstraction
Level 0 Level 1
All types ofFull structure
loops
Level 3
All helix
Level 4
Multi− and
internal loops,
no bulges
Level 5
Stem
arrangement
only
interruptions
Robert Giegerich Advanced Course: Shapes
AdvancedCourse:Shapes
RobertGiegerich
Motivation
Lost in FoldingSpace
Abstractioncomes to rescue
Abstractshapes
Defining shapeabstractions
Properties of theshape space
RNAshapes
Simple shapeanalysis
Completeprobabilisticshape analysis
ShapeProbabilitites
Application:Shape basedindexing
Application:Shape basedmatching
String representation of shapes
CGUCUUAAACUCAUCACCGUGUGGAGCUGCGACCCUUCCCUAGAUUCGAAGACGAG((((((...(((..(((...))))))...(((..((.....))..)))))))))..
Shape Type 5: [[][]]Shape Type 4: [[][[]]]Shape Type 3: [[[]][[]]]Shape Type 2: [[ []][ [] ]]Shape Type 1: [ [ [ ]] [ [ ] ]]
1
10
20
30
40
50
56
C
G
U
C
U
UAA
A
CUC
AU
CACC
G
U G U G G A G
C
UG C
G
A
C
CC
U
U
C C
C
UA
G
A
UU
C
G
A
A
G
A
C
G AG*
*
*
*
*
*
******
*
*
*
*
*
1
Robert Giegerich Advanced Course: Shapes
AdvancedCourse:Shapes
RobertGiegerich
Motivation
Lost in FoldingSpace
Abstractioncomes to rescue
Abstractshapes
Defining shapeabstractions
Properties of theshape space
RNAshapes
Simple shapeanalysis
Completeprobabilisticshape analysis
ShapeProbabilitites
Application:Shape basedindexing
Application:Shape basedmatching
Formalizing the notion of (abstract) shape
Shape abstraction retains nesting and adjacency of stems
Shape abstraction disregards all sizes (of stems, loops, . . . )Shape abstraction may retain or disregard presence and type ofbulges and internal loops, i.e. helix interruptions
RNAshapes provides shape abstraction levels 1 through 5
Robert Giegerich Advanced Course: Shapes
AdvancedCourse:Shapes
RobertGiegerich
Motivation
Lost in FoldingSpace
Abstractioncomes to rescue
Abstractshapes
Defining shapeabstractions
Properties of theshape space
RNAshapes
Simple shapeanalysis
Completeprobabilisticshape analysis
ShapeProbabilitites
Application:Shape basedindexing
Application:Shape basedmatching
Formalizing the notion of (abstract) shape
Shape abstraction retains nesting and adjacency of stemsShape abstraction disregards all sizes (of stems, loops, . . . )
Shape abstraction may retain or disregard presence and type ofbulges and internal loops, i.e. helix interruptions
RNAshapes provides shape abstraction levels 1 through 5
Robert Giegerich Advanced Course: Shapes
AdvancedCourse:Shapes
RobertGiegerich
Motivation
Lost in FoldingSpace
Abstractioncomes to rescue
Abstractshapes
Defining shapeabstractions
Properties of theshape space
RNAshapes
Simple shapeanalysis
Completeprobabilisticshape analysis
ShapeProbabilitites
Application:Shape basedindexing
Application:Shape basedmatching
Formalizing the notion of (abstract) shape
Shape abstraction retains nesting and adjacency of stemsShape abstraction disregards all sizes (of stems, loops, . . . )Shape abstraction may retain or disregard presence and type ofbulges and internal loops, i.e. helix interruptions
RNAshapes provides shape abstraction levels 1 through 5
Robert Giegerich Advanced Course: Shapes
AdvancedCourse:Shapes
RobertGiegerich
Motivation
Lost in FoldingSpace
Abstractioncomes to rescue
Abstractshapes
Defining shapeabstractions
Properties of theshape space
RNAshapes
Simple shapeanalysis
Completeprobabilisticshape analysis
ShapeProbabilitites
Application:Shape basedindexing
Application:Shape basedmatching
Shape abstraction mathematics
General:
tree-like domains of structures F and shapes Ptree homomorphism π : F → P
For each sequence s:
folding space of sequence s: F (s)
shape space of sequence s: P(s) = π(F (s))
shape class of p in F (s):f (x , p) = {x |x ∈ F (S), π(x) = p}
shape representative structure:shrep = class member of minimal free energy, formally
shrep(s, p)
Robert Giegerich Advanced Course: Shapes
AdvancedCourse:Shapes
RobertGiegerich
Motivation
Lost in FoldingSpace
Abstractioncomes to rescue
Abstractshapes
Defining shapeabstractions
Properties of theshape space
RNAshapes
Simple shapeanalysis
Completeprobabilisticshape analysis
ShapeProbabilitites
Application:Shape basedindexing
Application:Shape basedmatching
Structures and shapes as trees and strings
Level 0
sr
sr
ml
c
c
c
a a u
sr
bl
aua
g
g
g
sr
sr
g
ML
HE HE
HE
HE
ML
HE
HEc
c
g
g
c
((((.(((....)))((...(...))))))) [ [ ] [ ] ]
sr
uuuu
c g
hl
g
gc
chl
ccc
Level 3
abstract
shape
Level 5
abstract
shape
sr
[ [ ] [ [ ] ] ]
[ _ [_] [ _ [_] ] ] level 1
HE
Robert Giegerich Advanced Course: Shapes
AdvancedCourse:Shapes
RobertGiegerich
Motivation
Lost in FoldingSpace
Abstractioncomes to rescue
Abstractshapes
Defining shapeabstractions
Properties of theshape space
RNAshapes
Simple shapeanalysis
Completeprobabilisticshape analysis
ShapeProbabilitites
Application:Shape basedindexing
Application:Shape basedmatching
Shape algorithmics
Implementation of shape analysis:
shape abstractions are tree homomorphisms
integrate well with DP algorithms
allows for a priori rather than a posteriori analysis
compute shapes in parallel with energyperform analyses on per-shape basis
Any RNA folding program can implement shape abstractionCurrently: use RNAshapes.
Robert Giegerich Advanced Course: Shapes
AdvancedCourse:Shapes
RobertGiegerich
Motivation
Lost in FoldingSpace
Abstractioncomes to rescue
Abstractshapes
Defining shapeabstractions
Properties of theshape space
RNAshapes
Simple shapeanalysis
Completeprobabilisticshape analysis
ShapeProbabilitites
Application:Shape basedindexing
Application:Shape basedmatching
Properties of shapes and shreps
Good properties:
shape classes are disjoint
shreps are interesting representatives
shapes have sequence-independent representation
shapes are meaningful across different sequences (ofdifferent length)
shapes and shreps can be computed efficiently
Bad properties:
shapes are too abstract
shapes are not abstract enough
Robert Giegerich Advanced Course: Shapes
AdvancedCourse:Shapes
RobertGiegerich
Motivation
Lost in FoldingSpace
Abstractioncomes to rescue
Abstractshapes
Defining shapeabstractions
Properties of theshape space
RNAshapes
Simple shapeanalysis
Completeprobabilisticshape analysis
ShapeProbabilitites
Application:Shape basedindexing
Application:Shape basedmatching
Properties of shapes and shreps
Good properties:
shape classes are disjoint
shreps are interesting representatives
shapes have sequence-independent representation
shapes are meaningful across different sequences (ofdifferent length)
shapes and shreps can be computed efficiently
Bad properties:
shapes are too abstract
shapes are not abstract enough
Robert Giegerich Advanced Course: Shapes
AdvancedCourse:Shapes
RobertGiegerich
Motivation
Lost in FoldingSpace
Abstractioncomes to rescue
Abstractshapes
Defining shapeabstractions
Properties of theshape space
RNAshapes
Simple shapeanalysis
Completeprobabilisticshape analysis
ShapeProbabilitites
Application:Shape basedindexing
Application:Shape basedmatching
Simple shape analysis with RNAshapes
The three top shreps of our tRNA example:
Shape GGGCCCAUAGCUCAGUGGUAGAGUGCCUCCUUUGCAAGGAGGAUGCCCUGGGUUCGAAUCCCAGUGGGUCCA[] (((((((((((((((.((((.....(((((((...))))))).))))))))))).........)))))))). -35.9 kcal/mol[[][]] ((((((((.....((.((((.....(((((((...))))))).))))))(((.......))).)))))))). -32.2 kcal/mol[[][][]] ((((((...((((.......)))).(((((((...))))))).....(((((.......))))).)))))). -31.7 kcal/mol
Robert Giegerich Advanced Course: Shapes
AdvancedCourse:Shapes
RobertGiegerich
Motivation
Lost in FoldingSpace
Abstractioncomes to rescue
Abstractshapes
Defining shapeabstractions
Properties of theshape space
RNAshapes
Simple shapeanalysis
Completeprobabilisticshape analysis
ShapeProbabilitites
Application:Shape basedindexing
Application:Shape basedmatching
Shape [ ]
GG
GG
AUG
UA
GC
UCA
GUG
GUAG
AGC
GC
AU
GC
UU C
GCAUGU A U
GA
GGCC C
CGGGUU C
GAUCCCC G
GC
AUCU
C
Robert Giegerich Advanced Course: Shapes
AdvancedCourse:Shapes
RobertGiegerich
Motivation
Lost in FoldingSpace
Abstractioncomes to rescue
Abstractshapes
Defining shapeabstractions
Properties of theshape space
RNAshapes
Simple shapeanalysis
Completeprobabilisticshape analysis
ShapeProbabilitites
Application:Shape basedindexing
Application:Shape basedmatching
Shape [[ ][ ]]
GGGCCCAUAG
CUCA
GUGG
UAGAG
UGCCUCCUU
UG C
AAGGAGG
AUGCCCU
G G GU U
CG
AAUCCC
AGUGGGUCCA
Robert Giegerich Advanced Course: Shapes
AdvancedCourse:Shapes
RobertGiegerich
Motivation
Lost in FoldingSpace
Abstractioncomes to rescue
Abstractshapes
Defining shapeabstractions
Properties of theshape space
RNAshapes
Simple shapeanalysis
Completeprobabilisticshape analysis
ShapeProbabilitites
Application:Shape basedindexing
Application:Shape basedmatching
Shape [[ ][ ][ ]]
GGGCCCAUA
GCUCAGU
GG
U AG A G U
GCCUCCUU
UG C
AAGGAGGAUGC
CC U G G G
U UCG
AAUCCCAG
UGGGUCCA
Robert Giegerich Advanced Course: Shapes
AdvancedCourse:Shapes
RobertGiegerich
Motivation
Lost in FoldingSpace
Abstractioncomes to rescue
Abstractshapes
Defining shapeabstractions
Properties of theshape space
RNAshapes
Simple shapeanalysis
Completeprobabilisticshape analysis
ShapeProbabilitites
Application:Shape basedindexing
Application:Shape basedmatching
Shape Space Statistics
Condensation of the folding space:Structure asymptotics:
S(n) ≈ 1.104366 ∗ n−3/2 ∗ 2.618034n
Level-k shape asymptotics:
P1(n) ≈ 0.98542 ∗ n−3/2 ∗ 2.40591n
P5(n) ≈ 2.44251 ∗ n−3/2 ∗ 1.32218n
Empirically, numbers are much smaller for a concrete sequenceSee some statistics within 5% kcal/mol of MFE:
Robert Giegerich Advanced Course: Shapes
AdvancedCourse:Shapes
RobertGiegerich
Motivation
Lost in FoldingSpace
Abstractioncomes to rescue
Abstractshapes
Defining shapeabstractions
Properties of theshape space
RNAshapes
Simple shapeanalysis
Completeprobabilisticshape analysis
ShapeProbabilitites
Application:Shape basedindexing
Application:Shape basedmatching
Numbers of shapes versus structures
0
50
100
150
200
250
300
350
400
0 50 100 150 200 250 300
Nr.
of S
truct
ures
/Sha
pes
Sequence length [nt]
ShapesStructures
Robert Giegerich Advanced Course: Shapes
AdvancedCourse:Shapes
RobertGiegerich
Motivation
Lost in FoldingSpace
Abstractioncomes to rescue
Abstractshapes
Defining shapeabstractions
Properties of theshape space
RNAshapes
Simple shapeanalysis
Completeprobabilisticshape analysis
ShapeProbabilitites
Application:Shape basedindexing
Application:Shape basedmatching
Shapes versus structures, logarithmic scale
0.01
1
100
10000
1e+06
1e+08
1e+10
1e+12
1e+14
1e+16
1e+18
0 20 40 60 80 100 120
Nr.
of S
truct
ures
/Sha
pes
Sequence length N [nt]
StructuresShapes
0.0391 * 1.3968912N
0.2064 * 1.1067094N
Robert Giegerich Advanced Course: Shapes
AdvancedCourse:Shapes
RobertGiegerich
Motivation
Lost in FoldingSpace
Abstractioncomes to rescue
Abstractshapes
Defining shapeabstractions
Properties of theshape space
RNAshapes
Simple shapeanalysis
Completeprobabilisticshape analysis
ShapeProbabilitites
Application:Shape basedindexing
Application:Shape basedmatching
Homogenity in shape classes
The “Boltzman Ensemble” on Ice
Robert Giegerich Advanced Course: Shapes
AdvancedCourse:Shapes
RobertGiegerich
Motivation
Lost in FoldingSpace
Abstractioncomes to rescue
Abstractshapes
Defining shapeabstractions
Properties of theshape space
RNAshapes
Simple shapeanalysis
Completeprobabilisticshape analysis
ShapeProbabilitites
Application:Shape basedindexing
Application:Shape basedmatching
Homogenity in shape classes
The “Boltzman Ensemble” on Ice
Robert Giegerich Advanced Course: Shapes
AdvancedCourse:Shapes
RobertGiegerich
Motivation
Lost in FoldingSpace
Abstractioncomes to rescue
Abstractshapes
Defining shapeabstractions
Properties of theshape space
RNAshapes
Simple shapeanalysis
Completeprobabilisticshape analysis
ShapeProbabilitites
Application:Shape basedindexing
Application:Shape basedmatching
Best k shreps
Björn Voß
[] [[][]] [[][][]]
RNAshapes
Robert Giegerich Advanced Course: Shapes
AdvancedCourse:Shapes
RobertGiegerich
Motivation
Lost in FoldingSpace
Abstractioncomes to rescue
Abstractshapes
Defining shapeabstractions
Properties of theshape space
RNAshapes
Simple shapeanalysis
Completeprobabilisticshape analysis
ShapeProbabilitites
Application:Shape basedindexing
Application:Shape basedmatching
Complete probabilistic shape analysis
“How much would you trust a structure with aprobability of 0.1 ∗ 10−12, even when it is optimal?”
Chip Lawrence, Benasque 2003 and ISMB 2007
Robert Giegerich Advanced Course: Shapes
AdvancedCourse:Shapes
RobertGiegerich
Motivation
Lost in FoldingSpace
Abstractioncomes to rescue
Abstractshapes
Defining shapeabstractions
Properties of theshape space
RNAshapes
Simple shapeanalysis
Completeprobabilisticshape analysis
ShapeProbabilitites
Application:Shape basedindexing
Application:Shape basedmatching
From energy to probability
According to Boltzmann statistics, sequence s has structure xwith probability
Prob(x) = (e−Ex/RT )/Q
where Ex is folding energy, T is temperature, R universal gasconstant, and Q the “partition function”,
Q =∑
x∈F (s)
e−Ex/RT
Accumulated shape probabilities
Prob(p) =∑
π(x)=p Prob(x) for all p ∈ P(s)
Robert Giegerich Advanced Course: Shapes
AdvancedCourse:Shapes
RobertGiegerich
Motivation
Lost in FoldingSpace
Abstractioncomes to rescue
Abstractshapes
Defining shapeabstractions
Properties of theshape space
RNAshapes
Simple shapeanalysis
Completeprobabilisticshape analysis
ShapeProbabilitites
Application:Shape basedindexing
Application:Shape basedmatching
New information from shape probabilities
Overtaking: Shape probabilities may contradict energy ranking
[ ]E= -22.90 kcal/mol
P= 0.2370279
[ ][ ][ ]E= -22.50 kcal/mol
P= 0.0999191
[ ][ ]E= -22.30 kcal/mol
P= 0.5511424
Gets 2nd Gets 3rd
Gets 1stBjörn Voß
Robert Giegerich Advanced Course: Shapes
AdvancedCourse:Shapes
RobertGiegerich
Motivation
Lost in FoldingSpace
Abstractioncomes to rescue
Abstractshapes
Defining shapeabstractions
Properties of theshape space
RNAshapes
Simple shapeanalysis
Completeprobabilisticshape analysis
ShapeProbabilitites
Application:Shape basedindexing
Application:Shape basedmatching
A propos “complete”
Probabilistic shape analysis is computationally expensive
probabilities give full information about folding space, but
we can not compute only the k most likely shapes
computation feasible up to 400 nts ...
but check for RapidShapes by Stefan Janssen
Robert Giegerich Advanced Course: Shapes
AdvancedCourse:Shapes
RobertGiegerich
Motivation
Lost in FoldingSpace
Abstractioncomes to rescue
Abstractshapes
Defining shapeabstractions
Properties of theshape space
RNAshapes
Simple shapeanalysis
Completeprobabilisticshape analysis
ShapeProbabilitites
Application:Shape basedindexing
Application:Shape basedmatching
Requirements
Complete probabilistic shape analysis
requires a non-ambiguous grammar with correct dangles atall places
applies “classified” dynamic programming
takes time O(1.1n ∗ n3) where n = |s|
Robert Giegerich Advanced Course: Shapes
AdvancedCourse:Shapes
RobertGiegerich
Motivation
Lost in FoldingSpace
Abstractioncomes to rescue
Abstractshapes
Defining shapeabstractions
Properties of theshape space
RNAshapes
Simple shapeanalysis
Completeprobabilisticshape analysis
ShapeProbabilitites
Application:Shape basedindexing
Application:Shape basedmatching
Results from complete probabilistic analysis
Some observations:
Sequence Shape 1 Prob. Shape 2 Prob.lin-4 precursor [] 0.99999994tRNA-ala [] 0.989744 [[]] 0.008994typical mRNA [][[][]] 0.432154 [[[][]][]] 0.149831HIV-1 Leader [][[][[][]]]] 0.6164 [][[[][[][]]][]] 0.3492
Robert Giegerich Advanced Course: Shapes
AdvancedCourse:Shapes
RobertGiegerich
Motivation
Lost in FoldingSpace
Abstractioncomes to rescue
Abstractshapes
Defining shapeabstractions
Properties of theshape space
RNAshapes
Simple shapeanalysis
Completeprobabilisticshape analysis
ShapeProbabilitites
Application:Shape basedindexing
Application:Shape basedmatching
The RNAshapes package
Modes of operation:
Computation of low-energy shape representative structures
Computation of accumulated shape probabilities
Computation of consensus shapes
No heuristics involvedAvailable athttp://bibiserv.techfak.uni-bielefeld.de/RNAshapes/
Robert Giegerich Advanced Course: Shapes
AdvancedCourse:Shapes
RobertGiegerich
Motivation
Lost in FoldingSpace
Abstractioncomes to rescue
Abstractshapes
Defining shapeabstractions
Properties of theshape space
RNAshapes
Simple shapeanalysis
Completeprobabilisticshape analysis
ShapeProbabilitites
Application:Shape basedindexing
Application:Shape basedmatching
Application: shape based indexing
Assume we have a ncRNA candidate in some novel organism,(⇒ lecture by C. Sharma)and want to know whether it resembles something known:
main resource: Rfam database with 600 structural RNAfamilies
families represented by curated structural alignments (cf.Rfam lecture)
search via covariance models (cf. probabilistic modelslecture)
search effort O(n4) per model
Robert Giegerich Advanced Course: Shapes
AdvancedCourse:Shapes
RobertGiegerich
Motivation
Lost in FoldingSpace
Abstractioncomes to rescue
Abstractshapes
Defining shapeabstractions
Properties of theshape space
RNAshapes
Simple shapeanalysis
Completeprobabilisticshape analysis
ShapeProbabilitites
Application:Shape basedindexing
Application:Shape basedmatching
Filter techniques
Filter techniques are used to skip unsuccessful searches
1 BLAST filter
2 Ravenna HMM filter
3 shape index based filtering – RNAsifter by Stefan Janssen
Details on (1) and (2) in the Rfam Database lecture
Robert Giegerich Advanced Course: Shapes
AdvancedCourse:Shapes
RobertGiegerich
Motivation
Lost in FoldingSpace
Abstractioncomes to rescue
Abstractshapes
Defining shapeabstractions
Properties of theshape space
RNAshapes
Simple shapeanalysis
Completeprobabilisticshape analysis
ShapeProbabilitites
Application:Shape basedindexing
Application:Shape basedmatching
Shape index construction
Robert Giegerich Advanced Course: Shapes
AdvancedCourse:Shapes
RobertGiegerich
Motivation
Lost in FoldingSpace
Abstractioncomes to rescue
Abstractshapes
Defining shapeabstractions
Properties of theshape space
RNAshapes
Simple shapeanalysis
Completeprobabilisticshape analysis
ShapeProbabilitites
Application:Shape basedindexing
Application:Shape basedmatching
Shape index based search
_[_[_[_[_[]]_[_[]_]_]_]_]_[]__[_[_[_[_[]]_[_[]_]_]_]_]_[]_[_[_[_[_[]]_[_[]_]]_]_]_[]_
[_[_[[_[]][_[]_]]_]_][][[_[_[[_[]][_[]_]]_]_][]][_[]_][_[_[[_[]_][]]_]_]
[[[[[]][[]]]]][][[[[[[]][[]]]]][]][[]][[[[[]][]]]]
[][[[[]]]]
53,116 more shapes
[[[]][[[]]]]
[[[[]]]][[[]]]
[[[[[]][[]]]]][]
59,337 more shapes
[[[[[[]][[]]]]][]]
[[]][[[[[]][]]]]
[_[_[_[]_]_]_][_[_[]_]]
93,840 more shapes
[[_[_[[_[]][_[]_]]_]_][]]
[_[]_][_[_[[_[]_][]]_]_]
_[_[_[]]]_
112,489 more shapes
[[[_[_[]_]_]_]_]_
_[_[[_[[]_]_]_]]_
>Q
uery
: hg1
7_ct
_RN
Azs
et19
0_s5
031
[]
12,156 more shapes
[[][[][]]]
[[][]][][]
Robert Giegerich Advanced Course: Shapes
AdvancedCourse:Shapes
RobertGiegerich
Motivation
Lost in FoldingSpace
Abstractioncomes to rescue
Abstractshapes
Defining shapeabstractions
Properties of theshape space
RNAshapes
Simple shapeanalysis
Completeprobabilisticshape analysis
ShapeProbabilitites
Application:Shape basedindexing
Application:Shape basedmatching
Filtered search performance
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.40 0.50 0.60 0.70 0.80 0.90 1.00
k-best-shape-index1-SS_cons-shape-index1-consensus-shape-index1-hybrid-shape-index1-union-shape-index1-RNAalifold-shape-index
cmsearch --hmmfilterk-RNAlishapes-shape-index
Robert Giegerich Advanced Course: Shapes
AdvancedCourse:Shapes
RobertGiegerich
Motivation
Lost in FoldingSpace
Abstractioncomes to rescue
Abstractshapes
Defining shapeabstractions
Properties of theshape space
RNAshapes
Simple shapeanalysis
Completeprobabilisticshape analysis
ShapeProbabilitites
Application:Shape basedindexing
Application:Shape basedmatching
Average run times
0
100
200
300
400
500
600
700
800
0 100 200 300 400 500 600 700 800 900
RNAsiftercmsearch
HMM-filterBLAST-filter
Robert Giegerich Advanced Course: Shapes
AdvancedCourse:Shapes
RobertGiegerich
Motivation
Lost in FoldingSpace
Abstractioncomes to rescue
Abstractshapes
Defining shapeabstractions
Properties of theshape space
RNAshapes
Simple shapeanalysis
Completeprobabilisticshape analysis
ShapeProbabilitites
Application:Shape basedindexing
Application:Shape basedmatching
Shape based matching
Search by structure ...
Assume you have a (single) transcript with a well-definedstructure
How to search for structural homologues in relatedorganisms?
Create a specialized folding program via Locomotif athttp://bibiserv.cebitec.uni-bielefeld.de/locomotif
Robert Giegerich Advanced Course: Shapes
AdvancedCourse:Shapes
RobertGiegerich
Motivation
Lost in FoldingSpace
Abstractioncomes to rescue
Abstractshapes
Defining shapeabstractions
Properties of theshape space
RNAshapes
Simple shapeanalysis
Completeprobabilisticshape analysis
ShapeProbabilitites
Application:Shape basedindexing
Application:Shape basedmatching
References on abstract shape analysis
Abstract Shapes of RNA. Giegerich R, Voss B, Rehmsmeier M.Nucleic Acids Research 2004, Vol. 32, No 15, 1 - 9.
Complete Probabilistic Analysis of RNA Abstract Shapes. Voss,Giegerich, Rehmsmeier. BMC Biology, 2006, Feb 15;4(1):5
RNAshapes: an integrated RNA analysis package based onabstract shapes. Steffen P, Voss B, Rehmsmeier M, Reeder J,Giegerich R. Bioinformatics 2006, Feb 15;22(4):500-3.
Shape based indexing for faster search of RNA family databases.Janssen S, Reeder J, Giegerich R, BMC Bioinformatics, 2008
Locomotif
Rapidshapes
Robert Giegerich Advanced Course: Shapes
AdvancedCourse:Shapes
RobertGiegerich
Motivation
Lost in FoldingSpace
Abstractioncomes to rescue
Abstractshapes
Defining shapeabstractions
Properties of theshape space
RNAshapes
Simple shapeanalysis
Completeprobabilisticshape analysis
ShapeProbabilitites
Application:Shape basedindexing
Application:Shape basedmatching
The End
Thanks for your attention.
Robert Giegerich Advanced Course: Shapes
Top Related