RNA: Secondary Structure Prediction and Analysis
description
Transcript of RNA: Secondary Structure Prediction and Analysis
![Page 1: RNA: Secondary Structure Prediction and Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062408/5681438a550346895db00906/html5/thumbnails/1.jpg)
RNA: Secondary Structure Prediction and
Analysis
![Page 2: RNA: Secondary Structure Prediction and Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062408/5681438a550346895db00906/html5/thumbnails/2.jpg)
Outline
1. RNA Folding
2. Dynamic Programming for RNA Secondary Structure Prediction
3. Covariance Model for RNA Structure Prediction
4. Small RNAs: Identification and Analysis
![Page 3: RNA: Secondary Structure Prediction and Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062408/5681438a550346895db00906/html5/thumbnails/3.jpg)
Section 1:RNA Folding
![Page 4: RNA: Secondary Structure Prediction and Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062408/5681438a550346895db00906/html5/thumbnails/4.jpg)
RNA Basics
• RNA bases: A,C,G,U
• Canonical Base Pairs• A-U• G-C• G-U
• Bases can only pair with one other base.
![Page 5: RNA: Secondary Structure Prediction and Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062408/5681438a550346895db00906/html5/thumbnails/5.jpg)
RNA Basics
• RNA bases: A,C,G,U
• Canonical Base Pairs• A-U• G-C• G-U
• Bases can only pair with one other base.
2 Hydrogen Bonds
![Page 6: RNA: Secondary Structure Prediction and Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062408/5681438a550346895db00906/html5/thumbnails/6.jpg)
RNA Basics
• RNA bases: A,C,G,U
• Canonical Base Pairs• A-U• G-C• G-U
• Bases can only pair with one other base.
3 Hydrogen Bonds—more stable
![Page 7: RNA: Secondary Structure Prediction and Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062408/5681438a550346895db00906/html5/thumbnails/7.jpg)
RNA Basics
• RNA bases: A,C,G,U
• Canonical Base Pairs• A-U• G-C• G-U
• Bases can only pair with one other base.
“Wobble Pairing”
![Page 8: RNA: Secondary Structure Prediction and Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062408/5681438a550346895db00906/html5/thumbnails/8.jpg)
RNA Basics
http://www.genetics.wustl.edu/eddy/tRNAscan-SE/
• Various types of RNA:
• transfer RNA (tRNA)
• messenger RNA (mRNA)
• ribosomal RNA (rRNA)
• small interfering RNA (siRNA)
• micro RNA (miRNA)
• small nuclear RNA (snRNA)
• small nucleolar RNA (snoRNA)
![Page 9: RNA: Secondary Structure Prediction and Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062408/5681438a550346895db00906/html5/thumbnails/9.jpg)
Section 2:Dynamic Programming for RNA Secondary Structure
Prediction
![Page 10: RNA: Secondary Structure Prediction and Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062408/5681438a550346895db00906/html5/thumbnails/10.jpg)
RNA Secondary Structure
Hairpin loop Junction (Multiloop)Bulge Loop
Single-Stranded
Interior Loop
Stem
Image– Wuchty
Pseudoknot
![Page 11: RNA: Secondary Structure Prediction and Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062408/5681438a550346895db00906/html5/thumbnails/11.jpg)
Sequence Alignment to Determine Structure
• Bases pair in order to form backbones and determine the secondary structure.
• Aligning bases based on their ability to pair with each other gives an algorithmic approach to determining the optimal structure.
![Page 12: RNA: Secondary Structure Prediction and Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062408/5681438a550346895db00906/html5/thumbnails/12.jpg)
Base Pair Maximization: Dynamic Programming
Images – Sean Eddy
• S(i, j) is the folding of the RNA subsequence of the strand from index i to index j which results in the highest number of base pairs.
• Recurrence:
![Page 13: RNA: Secondary Structure Prediction and Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062408/5681438a550346895db00906/html5/thumbnails/13.jpg)
Base Pair Maximization: Dynamic Programming
Images – Sean Eddy
€
S i, j( ) = max
⎧
⎨ ⎪ ⎪
⎩ ⎪ ⎪
• S(i, j) is the folding of the RNA subsequence of the strand from index i to index j which results in the highest number of base pairs.
• Recurrence:
![Page 14: RNA: Secondary Structure Prediction and Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062408/5681438a550346895db00906/html5/thumbnails/14.jpg)
Base Pair Maximization: Dynamic Programming
Base pair at i and j
Images – Sean Eddy
€
S i, j( ) = max
S i +1, j −1( ) +1 (if i, j( ) base pair) ⎧
⎨ ⎪ ⎪
⎩ ⎪ ⎪
• S(i, j) is the folding of the RNA subsequence of the strand from index i to index j which results in the highest number of base pairs.
• Recurrence:
![Page 15: RNA: Secondary Structure Prediction and Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062408/5681438a550346895db00906/html5/thumbnails/15.jpg)
Base Pair Maximization: Dynamic Programming
Base pair at i and j Unmatched at i
Images – Sean Eddy
€
S i, j( ) = max
S i +1, j −1( ) +1 (if i, j( ) base pair)
S i +1, j( )
⎧
⎨
⎪ ⎪
⎩
⎪ ⎪
• S(i, j) is the folding of the RNA subsequence of the strand from index i to index j which results in the highest number of base pairs.
• Recurrence:
![Page 16: RNA: Secondary Structure Prediction and Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062408/5681438a550346895db00906/html5/thumbnails/16.jpg)
Base Pair Maximization: Dynamic Programming
Base pair at i and j Unmatched at jUnmatched at i
Images – Sean Eddy
€
S i, j( ) = max
S i +1, j −1( ) +1 (if i, j( ) base pair)
S i +1, j( )
S i, j −1( )
⎧
⎨
⎪ ⎪
⎩
⎪ ⎪
• S(i, j) is the folding of the RNA subsequence of the strand from index i to index j which results in the highest number of base pairs.
• Recurrence:
![Page 17: RNA: Secondary Structure Prediction and Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062408/5681438a550346895db00906/html5/thumbnails/17.jpg)
Base Pair Maximization: Dynamic Programming
Base pair at i and j Unmatched at jUnmatched at i
Bifurcation
Images – Sean Eddy
€
S i, j( ) = max
S i +1, j −1( ) +1 (if i, j( ) base pair)
S i +1, j( )
S i, j −1( )
max1<k < j
S i,k( ) + S k +1, j( )
⎧
⎨
⎪ ⎪
⎩
⎪ ⎪
• S(i, j) is the folding of the RNA subsequence of the strand from index i to index j which results in the highest number of base pairs.
• Recurrence:
![Page 18: RNA: Secondary Structure Prediction and Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062408/5681438a550346895db00906/html5/thumbnails/18.jpg)
Images – Sean Eddy
Base Pair Maximization: Dynamic Programming
• Alignment Method:
• Align RNA strand to itself
• Score increases for feasible base pairs
• Each score independent of overall structure
• Bifurcation adds extra dimension
Images—Sean Eddy
![Page 19: RNA: Secondary Structure Prediction and Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062408/5681438a550346895db00906/html5/thumbnails/19.jpg)
Images – Sean Eddy
Base Pair Maximization: Dynamic Programming
• Alignment Method:
• Align RNA strand to itself
• Score increases for feasible base pairs
• Each score independent of overall structure
• Bifurcation adds extra dimension
Initialize first two diagonals to 0
Images—Sean Eddy
![Page 20: RNA: Secondary Structure Prediction and Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062408/5681438a550346895db00906/html5/thumbnails/20.jpg)
Images – Sean Eddy
Base Pair Maximization: Dynamic Programming
• Alignment Method:
• Align RNA strand to itself
• Score increases for feasible base pairs
• Each score independent of overall structure
• Bifurcation adds extra dimension
Fill in squares sweeping diagonally
Images—Sean Eddy
![Page 21: RNA: Secondary Structure Prediction and Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062408/5681438a550346895db00906/html5/thumbnails/21.jpg)
Images – Sean Eddy
Base Pair Maximization: Dynamic Programming
• Alignment Method:
• Align RNA strand to itself
• Score increases for feasible base pairs
• Each score independent of overall structure
• Bifurcation adds extra dimension
Fill in squares sweeping diagonally
Images—Sean Eddy
![Page 22: RNA: Secondary Structure Prediction and Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062408/5681438a550346895db00906/html5/thumbnails/22.jpg)
Images – Sean Eddy
Base Pair Maximization: Dynamic Programming
• Alignment Method:
• Align RNA strand to itself
• Score increases for feasible base pairs
• Each score independent of overall structure
• Bifurcation adds extra dimension
Bases cannot pair
Images—Sean Eddy
![Page 23: RNA: Secondary Structure Prediction and Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062408/5681438a550346895db00906/html5/thumbnails/23.jpg)
Images – Sean Eddy
Base Pair Maximization: Dynamic Programming
• Alignment Method:
• Align RNA strand to itself
• Score increases for feasible base pairs
• Each score independent of overall structure
• Bifurcation adds extra dimension
Bases can pair, similar to matched alignment
Images—Sean Eddy
![Page 24: RNA: Secondary Structure Prediction and Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062408/5681438a550346895db00906/html5/thumbnails/24.jpg)
Images – Sean Eddy
Base Pair Maximization: Dynamic Programming
• Alignment Method:
• Align RNA strand to itself
• Score increases for feasible base pairs
• Each score independent of overall structure
• Bifurcation adds extra dimension
Dynamic Programming—possible paths
Images—Sean Eddy
![Page 25: RNA: Secondary Structure Prediction and Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062408/5681438a550346895db00906/html5/thumbnails/25.jpg)
Images – Sean Eddy
S(i, j – 1)
Base Pair Maximization: Dynamic Programming
• Alignment Method:
• Align RNA strand to itself
• Score increases for feasible base pairs
• Each score independent of overall structure
• Bifurcation adds extra dimension
Dynamic Programming—possible paths
Images—Sean Eddy
![Page 26: RNA: Secondary Structure Prediction and Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062408/5681438a550346895db00906/html5/thumbnails/26.jpg)
Dynamic Programming—possible paths
Images – Sean Eddy
S(i + 1, j)
Base Pair Maximization: Dynamic Programming
• Alignment Method:
• Align RNA strand to itself
• Score increases for feasible base pairs
• Each score independent of overall structure
• Bifurcation adds extra dimension
Images—Sean Eddy
![Page 27: RNA: Secondary Structure Prediction and Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062408/5681438a550346895db00906/html5/thumbnails/27.jpg)
Images—Sean EddyS(i + 1, j – 1) +1
Base Pair Maximization: Dynamic Programming
• Alignment Method:
• Align RNA strand to itself
• Score increases for feasible base pairs
• Each score independent of overall structure
• Bifurcation adds extra dimension
Dynamic Programming—possible paths
![Page 28: RNA: Secondary Structure Prediction and Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062408/5681438a550346895db00906/html5/thumbnails/28.jpg)
Base Pair Maximization: Dynamic Programming
Bifurcation—add values for all k
Images—Sean Eddy
• Alignment Method:
• Align RNA strand to itself
• Score increases for feasible base pairs
• Each score independent of overall structure
• Bifurcation adds extra dimension
![Page 29: RNA: Secondary Structure Prediction and Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062408/5681438a550346895db00906/html5/thumbnails/29.jpg)
Base Pair Maximization: Dynamic Programming
Bifurcation—add values for all k
Images—Sean Eddy
• Alignment Method:
• Align RNA strand to itself
• Score increases for feasible base pairs
• Each score independent of overall structure
• Bifurcation adds extra dimension
![Page 30: RNA: Secondary Structure Prediction and Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062408/5681438a550346895db00906/html5/thumbnails/30.jpg)
Base Pair Maximization: Dynamic Programming
Bifurcation—add values for all k
Images—Sean Eddy
• Alignment Method:
• Align RNA strand to itself
• Score increases for feasible base pairs
• Each score independent of overall structure
• Bifurcation adds extra dimension
![Page 31: RNA: Secondary Structure Prediction and Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062408/5681438a550346895db00906/html5/thumbnails/31.jpg)
Base Pair Maximization: Dynamic Programming
Bifurcation—add values for all k
Images—Sean Eddy
• Alignment Method:
• Align RNA strand to itself
• Score increases for feasible base pairs
• Each score independent of overall structure
• Bifurcation adds extra dimension
![Page 32: RNA: Secondary Structure Prediction and Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062408/5681438a550346895db00906/html5/thumbnails/32.jpg)
Base Pair Maximization: Drawbacks
• Base pair maximization will not necessarily lead to the most stable structure.
• It may create structure with many interior loops or hairpins which are energetically unfavorable.
• In comparison to aligning sequences with scattered matches—not biologically reasonable.
![Page 33: RNA: Secondary Structure Prediction and Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062408/5681438a550346895db00906/html5/thumbnails/33.jpg)
Energy Minimization
• Thermodynamic Stability
• Estimated using experimental techniques.
• Theory : Most Stable = Most likely
• No pseudoknots due to algorithm limitations.
• Attempts to maximize the score, taking thermodynamics into account.
• MFOLD and ViennaRNA
![Page 34: RNA: Secondary Structure Prediction and Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062408/5681438a550346895db00906/html5/thumbnails/34.jpg)
Energy Minimization Results
• Linear RNA strand folded back on itself to create secondary structure• Circularized representation uses this requirement
• Arcs represent base pairing
Images – David Mount
![Page 35: RNA: Secondary Structure Prediction and Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062408/5681438a550346895db00906/html5/thumbnails/35.jpg)
Energy Minimization Results
• All loops must have exactly three bases in them.
• Equivalent to having at least three base pairs between arc endpoints.
Images – David Mount
![Page 36: RNA: Secondary Structure Prediction and Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062408/5681438a550346895db00906/html5/thumbnails/36.jpg)
Energy Minimization Results
• All loops must have exactly three bases in them.
• Equivalent to having at least three base pairs between arc endpoints.
Images – David Mount
![Page 37: RNA: Secondary Structure Prediction and Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062408/5681438a550346895db00906/html5/thumbnails/37.jpg)
Energy Minimization Results
• All loops must have exactly three bases in them.
• Exception: Location where beginning and end of RNA come together in circularized representation.
Images – David Mount
![Page 38: RNA: Secondary Structure Prediction and Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062408/5681438a550346895db00906/html5/thumbnails/38.jpg)
Energy Minimization Results
• All loops must have exactly three bases in them.
• Exception: Location where beginning and end of RNA come together in circularized representation.
Images – David Mount
![Page 39: RNA: Secondary Structure Prediction and Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062408/5681438a550346895db00906/html5/thumbnails/39.jpg)
Trouble with Pseudoknots
• Pseudoknots cause a breakdown in the dynamic programming algorithm.
• In order to form a pseudoknot, checks must be made to ensure base is not already paired—this breaks down the recurrence relations.
Images – David Mount
![Page 40: RNA: Secondary Structure Prediction and Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062408/5681438a550346895db00906/html5/thumbnails/40.jpg)
Trouble with Pseudoknots
• Pseudoknots cause a breakdown in the dynamic programming algorithm.
• In order to form a pseudoknot, checks must be made to ensure base is not already paired—this breaks down the recurrence relations.
Images – David Mount
![Page 41: RNA: Secondary Structure Prediction and Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062408/5681438a550346895db00906/html5/thumbnails/41.jpg)
Trouble with Pseudoknots
• Pseudoknots cause a breakdown in the dynamic programming algorithm.
• In order to form a pseudoknot, checks must be made to ensure base is not already paired—this breaks down the recurrence relations.
Images – David Mount
![Page 42: RNA: Secondary Structure Prediction and Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062408/5681438a550346895db00906/html5/thumbnails/42.jpg)
Energy Minimization: Drawbacks
• Computes only one optimal structure.
• Optimal solution may not represent the biologically correct solution.
![Page 43: RNA: Secondary Structure Prediction and Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062408/5681438a550346895db00906/html5/thumbnails/43.jpg)
Section 3:Covariance Model for
RNA Structure Prediction
![Page 44: RNA: Secondary Structure Prediction and Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062408/5681438a550346895db00906/html5/thumbnails/44.jpg)
Alternative Algorithms - Covariaton
• Incorporates Similarity-based method
• Evolution maintains sequences that are important
• Change in sequence coincides to maintain structure through base pairs (Covariance)
• Cross-species structure conservation example – tRNA
• Manual and automated approaches have been used to identify covarying base pairs
• Models for structure based on results
• Ordered Tree Model
• Stochastic Context Free Grammar
![Page 45: RNA: Secondary Structure Prediction and Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062408/5681438a550346895db00906/html5/thumbnails/45.jpg)
Alternative Algorithms - Covariaton
• Expect areas of base pairing in tRNA to be covarying between various species.
![Page 46: RNA: Secondary Structure Prediction and Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062408/5681438a550346895db00906/html5/thumbnails/46.jpg)
Alternative Algorithms - Covariaton
• Expect areas of base pairing in tRNA to be covarying between various species.
• Base pairing creates same stable tRNA structure in organisms.
![Page 47: RNA: Secondary Structure Prediction and Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062408/5681438a550346895db00906/html5/thumbnails/47.jpg)
Alternative Algorithms - Covariaton
• Expect areas of base pairing in tRNA to be covarying between various species.
• Base pairing creates same stable tRNA structure in organisms.
• Mutation in one base yields pairing impossible and breaks down structure.
![Page 48: RNA: Secondary Structure Prediction and Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062408/5681438a550346895db00906/html5/thumbnails/48.jpg)
Alternative Algorithms - Covariaton
• Expect areas of base pairing in tRNA to be covarying between various species.
• Base pairing creates same stable tRNA structure in organisms.
• Mutation in one base yields pairing impossible and breaks down structure.
• Covariation ensures ability to base pair is maintained and RNA structure is conserved.
![Page 49: RNA: Secondary Structure Prediction and Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062408/5681438a550346895db00906/html5/thumbnails/49.jpg)
Binary Tree Representation of RNA Secondary Structure• Representation of RNA structure
using Binary tree
• Nodes represent
• Base pair if two bases are shown
• Loop if base and “gap” (dash) are shown
• Pseudoknots still not represented
• Tree does not permit varying sequences
• Mismatches
• Insertions & Deletions
Images – Eddy et al.
![Page 50: RNA: Secondary Structure Prediction and Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062408/5681438a550346895db00906/html5/thumbnails/50.jpg)
Binary Tree Representation of RNA Secondary Structure• Representation of RNA structure
using Binary tree
• Nodes represent
• Base pair if two bases are shown
• Loop if base and “gap” (dash) are shown
• Pseudoknots still not represented
• Tree does not permit varying sequences
• Mismatches
• Insertions & Deletions
Images – Eddy et al.
![Page 51: RNA: Secondary Structure Prediction and Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062408/5681438a550346895db00906/html5/thumbnails/51.jpg)
Binary Tree Representation of RNA Secondary Structure• Representation of RNA structure
using Binary tree
• Nodes represent
• Base pair if two bases are shown
• Loop if base and “gap” (dash) are shown
• Pseudoknots still not represented
• Tree does not permit varying sequences
• Mismatches
• Insertions & Deletions
Images – Eddy et al.
![Page 52: RNA: Secondary Structure Prediction and Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062408/5681438a550346895db00906/html5/thumbnails/52.jpg)
Binary Tree Representation of RNA Secondary Structure• Representation of RNA structure
using Binary tree
• Nodes represent
• Base pair if two bases are shown
• Loop if base and “gap” (dash) are shown
• Pseudoknots still not represented
• Tree does not permit varying sequences
• Mismatches
• Insertions & Deletions
Images – Eddy et al.
![Page 53: RNA: Secondary Structure Prediction and Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062408/5681438a550346895db00906/html5/thumbnails/53.jpg)
Covariance Model
• Covariance Model: HMM which permits flexible alignment to an RNA structure – emission and transition probabilities
• Model trees based on finite number of states
• Match states – sequence conforms to the model:
• MATP: State in which bases are paired in the model and sequence.
• MATL & MATR: State in which either right or left bulges in the sequence and the model.
• Deletion – State in which there is deletion in the sequence when compared to the model.
• Insertion – State in which there is an insertion relative to model.
![Page 54: RNA: Secondary Structure Prediction and Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062408/5681438a550346895db00906/html5/thumbnails/54.jpg)
Covariance Model
• Covariance Model: HMM which permits flexible alignment to an RNA structure – emission and transition probabilities
• Transitions have probabilities.
• Varying probability: Enter insertion, remain in current state, etc.
• Bifurcation: No probability, describes path.
![Page 55: RNA: Secondary Structure Prediction and Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062408/5681438a550346895db00906/html5/thumbnails/55.jpg)
Covariance Model (CM) Training Algorithm
€
S i, j( ) = max
S i +1, j −1( ) + M i, j( )
S i +1, j( )
S i, j −1( )
maxi<k < j
S i, k( ) + S k +1, j( )
⎧
⎨
⎪ ⎪
⎩
⎪ ⎪
• S(i, j) = Score at indices i and j in RNA when aligned to the Covariance Model.
• Frequencies obtained by aligning model to “training data”—consists of sample sequences.
• Reflect values which optimize alignment of sequences to model.
![Page 56: RNA: Secondary Structure Prediction and Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062408/5681438a550346895db00906/html5/thumbnails/56.jpg)
Covariance Model (CM) Training Algorithm
€
S i, j( ) = max
S i +1, j −1( ) + M i, j( )
S i +1, j( )
S i, j −1( )
maxi<k < j
S i, k( ) + S k +1, j( )
⎧
⎨
⎪ ⎪
⎩
⎪ ⎪
• S(i, j) = Score at indices i and j in RNA when aligned to the Covariance Model.
• Frequencies obtained by aligning model to “training data”—consists of sample sequences.
• Reflect values which optimize alignment of sequences to model.
Frequency of seeing the symbols (A, C, G, T) together in locations i and j depending on symbol.
![Page 57: RNA: Secondary Structure Prediction and Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062408/5681438a550346895db00906/html5/thumbnails/57.jpg)
Covariance Model (CM) Training Algorithm
€
S i, j( ) = max
S i +1, j −1( ) + M i, j( )
S i +1, j( )
S i, j −1( )
maxi<k < j
S i, k( ) + S k +1, j( )
⎧
⎨
⎪ ⎪
⎩
⎪ ⎪
• S(i, j) = Score at indices i and j in RNA when aligned to the Covariance Model.
• Frequencies obtained by aligning model to “training data”—consists of sample sequences.
• Reflect values which optimize alignment of sequences to model.
Independent frequency of seeing the symbols (A, C, G, T) in locations i or j depending on symbol.
![Page 58: RNA: Secondary Structure Prediction and Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062408/5681438a550346895db00906/html5/thumbnails/58.jpg)
Covariance Model (CM) Training Algorithm
€
S i, j( ) = max
S i +1, j −1( ) + M i, j( )
S i +1, j( )
S i, j −1( )
maxi<k < j
S i, k( ) + S k +1, j( )
⎧
⎨
⎪ ⎪
⎩
⎪ ⎪
• S(i, j) = Score at indices i and j in RNA when aligned to the Covariance Model.
• Frequencies obtained by aligning model to “training data”—consists of sample sequences.
• Reflect values which optimize alignment of sequences to model.
Independent frequency of seeing the symbols (A, C, G, T) in locations i or j depending on symbol.
![Page 59: RNA: Secondary Structure Prediction and Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062408/5681438a550346895db00906/html5/thumbnails/59.jpg)
Alignment to CM Algorithm
• Calculate the probability score of aligning RNA to CM.
• Three dimensional matrix—O(n³)• Align sequence to given
subtrees in CM.• For each subsequence,
calculate all possible states.
• Subtrees evolve from bifurcations• For simplicity, left singlet is
default.
Images—Eddy et al.
![Page 60: RNA: Secondary Structure Prediction and Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062408/5681438a550346895db00906/html5/thumbnails/60.jpg)
Alignment to CM Algorithm
• For each calculation, take into account:• Transition (T) to next state.• Emission probability (P) in
the state as determined by training data.
Images—Eddy et al.
![Page 61: RNA: Secondary Structure Prediction and Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062408/5681438a550346895db00906/html5/thumbnails/61.jpg)
Alignment to CM Algorithm
• For each calculation, take into account:• Transition (T) to next state.• Emission probability (P) in
the state as determined by training data.
Images—Eddy et al.
![Page 62: RNA: Secondary Structure Prediction and Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062408/5681438a550346895db00906/html5/thumbnails/62.jpg)
Alignment to CM Algorithm
• For each calculation, take into account:• Transition (T) to next state.• Emission probability (P) in
the state as determined by training data.
Images—Eddy et al.
![Page 63: RNA: Secondary Structure Prediction and Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062408/5681438a550346895db00906/html5/thumbnails/63.jpg)
Alignment to CM Algorithm
• For each calculation, take into account:• Transition (T) to next state.• Emission probability (P) in
the state as determined by training data.
• Deletion—does not have emission probability associated with it.
Images—Eddy et al.
![Page 64: RNA: Secondary Structure Prediction and Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062408/5681438a550346895db00906/html5/thumbnails/64.jpg)
Alignment to CM Algorithm
• For each calculation, take into account:• Transition (T) to next state.• Emission probability (P) in
the state as determined by training data.
• Deletion—does not have emission probability associated with it.
• Bifurcation—does not have state probability associated with it.
Images—Eddy et al.
![Page 65: RNA: Secondary Structure Prediction and Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062408/5681438a550346895db00906/html5/thumbnails/65.jpg)
Covariance Model Drawbacks
• Needs to be well trained.
• Not suitable for searches of large RNA.
• Structural complexity of large RNA cannot be modeled
• Runtime
• Memory requirements
![Page 66: RNA: Secondary Structure Prediction and Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062408/5681438a550346895db00906/html5/thumbnails/66.jpg)
Section 4:Small RNAs: Identification
and Analysis
![Page 67: RNA: Secondary Structure Prediction and Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062408/5681438a550346895db00906/html5/thumbnails/67.jpg)
Discovery of small RNAs
• The first small RNA:• In 1993 Rosalind Lee was studying a non-
coding gene in C. elegans, lin-4, that wasinvolved in silencing of another gene,lin-14, at the appropriate time in thedevelopment of the worm C. elegans.
• Two small transcripts of lin-4 (22nt and 61nt) were found to be complementary to a sequence in the 3' UTR of lin-14.
• Because lin-4 encoded no protein, she deduced that it must be these transcripts that are causing the silencing by RNA-RNA interactions.
• The second small RNA wasn't discovered until 2000!
Rosalind Lee
![Page 68: RNA: Secondary Structure Prediction and Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062408/5681438a550346895db00906/html5/thumbnails/68.jpg)
What are small ncRNAs?
• Two flavors of small non-coding RNA:
1. micro RNA (miRNA)
2. short interfering RNA (siRNA)
• Properties of small non-coding RNA:
• Involved in silencing other mRNA transcripts.
• Called “small” because they are usually only about 21-24 nucleotides long.
• Synthesized by first cutting up longer precursor sequences (like the 61nt one that Lee discovered).
• Silence an mRNA by base pairing with some sequence on the mRNA.
![Page 69: RNA: Secondary Structure Prediction and Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062408/5681438a550346895db00906/html5/thumbnails/69.jpg)
miRNA Pathway Illustration
![Page 70: RNA: Secondary Structure Prediction and Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062408/5681438a550346895db00906/html5/thumbnails/70.jpg)
siRNA Pathway Illustration
Complementary base pairing facilitates the mRNA cleavage
![Page 71: RNA: Secondary Structure Prediction and Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062408/5681438a550346895db00906/html5/thumbnails/71.jpg)
Features of miRNAs
• Hundreds miRNA genes are already identified in human genome.
• Most miRNAs start with a U
• The second 7-mer on the 5' end is known as the “seed.”
• When an miRNAs bind to their targets, the seed sequence has perfect or near-perfect alignment to some part of the target sequence.
• Example: UGAGCUUAGCAG...
![Page 72: RNA: Secondary Structure Prediction and Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062408/5681438a550346895db00906/html5/thumbnails/72.jpg)
Features of miRNAs
• Many miRNAs are conserved across species: • For half of known human miRNAs, >18% of all occurrences
of one of these miRNA seeds are conserved among human, dog, rat, and mouse.
• As a rule, the full sequence of miRNAs is almost never completely complementary to the target sequence.• Common to see a loop or bulge after the seed when binding.• Loop/bulge is often a hairpin because of stability.
• The site at which miRNAs attack is often in their target's 3' UTR.
![Page 73: RNA: Secondary Structure Prediction and Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062408/5681438a550346895db00906/html5/thumbnails/73.jpg)
Hairpin is more stable than a simple bulge
Bulges
The MRE is known as the “miRNA recognition element.” This is simply the sequence in the target that an miRNA binds to
miRNA Binding
![Page 74: RNA: Secondary Structure Prediction and Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062408/5681438a550346895db00906/html5/thumbnails/74.jpg)
Locating miRNA Genes: Experimentally
• Locating miRNA experimentally is difficult.
• Procedure:
1. Find a gene that causes down-regulation of another gene.
2. Determine if no protein is encoded.
3. Analyze the sequence to determine if it is complementary to its target.
![Page 75: RNA: Secondary Structure Prediction and Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062408/5681438a550346895db00906/html5/thumbnails/75.jpg)
Locating miRNA Genes: Comparative Genomics
• Idea: Find the seed binding sites.
1. Examine well-conserved 3' UTRs among species to find well-conserved 8-mers (A + seed) that might be an miRNA target sequence.
2. Look for a sequence complementary to this 8-mer to identify a potential miRNA seed. Once found, check flanking sequence to see if any stable hairpin structures can form—these are potentially pre-miRNAs.
3. To determine the possibility of secondary RNA structure, use RNAfold.
![Page 76: RNA: Secondary Structure Prediction and Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062408/5681438a550346895db00906/html5/thumbnails/76.jpg)
Locating miRNA Genes: Example
• Suppose you found a well-conserved 8-mer in 3' UTRs (this could be where an miRNA seed binds in its target).
• Example: AGACTAGG
• Look elsewhere in genome for complementary sequence (this could be an miRNA seed).
• Example: TCTGATCC
• When TCTGATCC is found, check to see (with RNAfold) if the sequences around it could form hairpin; if so, this could be an miRNA gene.
![Page 77: RNA: Secondary Structure Prediction and Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062408/5681438a550346895db00906/html5/thumbnails/77.jpg)
Finding miRNA Targets: Method 1
• Now we know of some miRNAs, but where do they attack?
• Goal: Find the targets of a set of miRNAs that are shared between human and mouse.
• Looking for the miRNA recognition element (MRE), not whole mRNA. This is just the part that the miRNA would bind to.
• Basic Assumption: Whole miRNA:MRE interactions (binding) are likely to have highly energetically favorable base pairing.
• Basic Method: Look through the conserved 3' UTRs—this is where the MREs are most likely to be located—and try to make an alignment that minimizes the binding energy between the miRNA sequence and the UTRs (most favorable).
![Page 78: RNA: Secondary Structure Prediction and Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062408/5681438a550346895db00906/html5/thumbnails/78.jpg)
Finding miRNA Targets: Method 1
• Method:
• First look at the binding energies of all 38-mers of the mRNA when binding to the miRNA. Subsequently apply several filters to pick alignments that “look” like miRNA binding.
• Why 38-mers? ~22 nt for the miRNA and the rest to allow for bulges, loops, etc.
• Algorithm: Use a modified dynamic programming sequence alignment algorithm to calculate the binding energies for each 38-mer.
• Modifications: Scoring and speedup
![Page 79: RNA: Secondary Structure Prediction and Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062408/5681438a550346895db00906/html5/thumbnails/79.jpg)
Finding miRNA Targets: Method 1
• Scoring:
• Mismatches and indels allowed.
• Matrix based on RNA-RNA binding energies.
• Use known binding energies of Watson-Crick pairing and wobble (G-U) pairing.
• Binding energy (score) calculated for every two adjacent pairings (unlike the standard alignment algorithm which just takes into account the “score” for one pair at a time).
• Adds dimensions to scoring matrix.
• Adds complexity to recurrence relation.
![Page 80: RNA: Secondary Structure Prediction and Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062408/5681438a550346895db00906/html5/thumbnails/80.jpg)
Finding miRNA targets: Method 2
• Goal: Find the set of miRNA targets for miRNAs shared across multiple species
• Trying to identify which genes have 3' UTRs are attacked by miRNAs
• Basic Assumptions:
1. There is perfect binding to the miRNA seed.
2. Any leftover sequence wants to achieve optimal RNA secondary structure.
• Basic Method: For each species’ set of 3' UTRs, find sites where there is perfect binding of the miRNA seed and “optimal folding” nearby. Look for agreement among all the species.
![Page 81: RNA: Secondary Structure Prediction and Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062408/5681438a550346895db00906/html5/thumbnails/81.jpg)
Method 2: Example
![Page 82: RNA: Secondary Structure Prediction and Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062408/5681438a550346895db00906/html5/thumbnails/82.jpg)
Method 2: Steps
1. Find a perfect match to the miRNA seed.
2. Extend the matching region if possible.
3. Find the optimal folding for the remaining sequences.
4. Calculate the energy of this interaction.
![Page 83: RNA: Secondary Structure Prediction and Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062408/5681438a550346895db00906/html5/thumbnails/83.jpg)
Method 2: Details
• Input: A set of miRNAs conserved among species and a set of 3' UTR sequences for those species.
• Method: For each organism:
1. Find all occurrences in the UTR sequences that match the miRNA seed exactly.
2. Extend this region with perfect or wobble pairings.
3. With the remaining sequence of the miRNA, use the program RNAfold to find optimal folding with the next 35 bases of the UTR sequence.
4. Calculate a score for this interaction based on the free energy of the interaction given by RNAfold.
![Page 84: RNA: Secondary Structure Prediction and Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062408/5681438a550346895db00906/html5/thumbnails/84.jpg)
Method 2: Details
• Method Cont.:
5. Sum up the scores of all interactions for each UTR.
6. Rank all the organism's gene's UTRs by this score (sum of all interactions in that UTR).
7. Repeat the above steps for each organism.
8. Create a cutoff score and a cutoff rank for the UTRs.
9. Select the set of genes where the orthologous genes across all the sampled species have UTR's that score and rank above this cutoff.
![Page 85: RNA: Secondary Structure Prediction and Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062408/5681438a550346895db00906/html5/thumbnails/85.jpg)
Method 2: Details
• Verification:
• Find the number of predicted binding sites per miRNA.
• Compare it to number of binding sites for a randomly generated miRNA.
• The result is much higher.
![Page 86: RNA: Secondary Structure Prediction and Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062408/5681438a550346895db00906/html5/thumbnails/86.jpg)
Analysis of the Two Methods
• Method 1:
• Good at identifying very strong, highly complementary miRNA targets.
• Found gene targets with one miRNA binding site, failed to identify genes with multiple weaker binding sites.
• Method 2:
• Good at identifying gene targets that have many weaker interactions.
• Fails to identify single-site genes.
![Page 87: RNA: Secondary Structure Prediction and Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062408/5681438a550346895db00906/html5/thumbnails/87.jpg)
Analysis of the Two Methods
• Both Methods:
• Speed is an issue.
• Won't find targets that aren't in the 3' UTR of a gene.
• We need more species sequenced!
• Conserved sequences are used to discover small RNAs.
• Conserved small RNAs are used to discover targets.
• Confidence in prediction of small RNAs and targets.
• Allows for broader scope with different combinations of species.
![Page 88: RNA: Secondary Structure Prediction and Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062408/5681438a550346895db00906/html5/thumbnails/88.jpg)
Results
• Predicted a large portion of already known targets and provided direction for identifying undiscovered targets.
• Found that it is more common that genes are regulated by multiple small RNAs.
• Found that many small RNAs have multiple targets.
![Page 89: RNA: Secondary Structure Prediction and Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062408/5681438a550346895db00906/html5/thumbnails/89.jpg)
A Novel siRNA Mechanism
• Recently, a new mechanism of siRNA activity was discovered.• Two genes (called A and B here) that have a cis-antisense
orientation (they are overlapping on opposite strands) have transcripts that produce an siRNA due to the dsRNA formed by their mRNA transcripts.
• Gene A is constitutive, gene B is induced by salt stress• Normally, just B's transcript is present.
• When both A and B are present, we get annealing to get dsRNA and this forms an siRNA.
• Since the siRNA is complementary to gene A's transcript, the siRNA attacks gene A, silencing it.
• These genes might provide direction to finding new siRNAs.
![Page 90: RNA: Secondary Structure Prediction and Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062408/5681438a550346895db00906/html5/thumbnails/90.jpg)
Pathway: Illustration
Annealing of transcripts nicely sets up the dsRNA to be used later in making the siRNA
Both transcripts present if salt is present“A”
“B”
siRNA silences A
![Page 91: RNA: Secondary Structure Prediction and Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062408/5681438a550346895db00906/html5/thumbnails/91.jpg)
References
• How Do RNA Folding Algorithms Work?. S.R. Eddy. Nature Biotechnology, 22:1457-1458, 2004.
• Borsani, O., Zhu, J, Verslues, P.E., Sunkar, R., Zhu, J.-K. (2005). Endogenous siRNAs Derived From a Pair of Natural cis-Antisense Transcripts Regulate Solt Tolerance in Arabidopsis. Cell 123, Jury, W.A. and Vaux Jr., H. (2005). The role of science in solving the world's emerging water problems. Proc. Natl Acad. Sci. USA 102: 15715-15720.
![Page 92: RNA: Secondary Structure Prediction and Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062408/5681438a550346895db00906/html5/thumbnails/92.jpg)
References
• Kiriakidou, M., Nelson, P.T., Kouranov, A., Fitziev, P., Bouyioukos, C., Hatzigeorgiou, A., and Hatzigeorgiou, M. (2004). A combined computational-experimental approach predicts human microRNA targets. Genes & Dev. 18: 1165-1178.
• Lee, R.C., Feinbaum, R.L., and Ambros, V. (1993). The C. elegans Heterochronic Gene lin-4 Encodes Small RNAs with Antisense Complementarity to lin-14. Cell 75: 843-854.
![Page 93: RNA: Secondary Structure Prediction and Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062408/5681438a550346895db00906/html5/thumbnails/93.jpg)
References
• Lee, Y. Kim, M, Han, J. Yeom, K-H, Lee, S., Baek, S.H., and Kim, V.N. (2004). MicroRNA genes are transcribed by RNA polymerase II. The EMBO Journal 23: 4051-4060.
• Lewis, B.P., Shih, I., Jones-Rhoades, M.W., Bartel, D.P., and Burge, C.B. (2003). Prediction of Mammalian MicroRNA Targets. Cell 115: 787-798.
![Page 94: RNA: Secondary Structure Prediction and Analysis](https://reader036.fdocuments.in/reader036/viewer/2022062408/5681438a550346895db00906/html5/thumbnails/94.jpg)
References
• Xie, X, Lu, J, Kulbokas, E.J., Golub, T.R., Mootha, V., Lindblad-Toh, K., Lander, E.S., and Kellis, M. (2005). Systematic discovery of regulatory motifs in human promoters and 30 UTRs by comparison of several mammals. Nature 443: 338-345.