Post on 20-Jan-2016
RNA informaticsRNA informaticsUnit 12Unit 12
BIOL221TBIOL221T: Advanced : Advanced Bioinformatics for Bioinformatics for
BiotechnologyBiotechnology
Irene Gabashvili, PhD
Non coding DNA Non coding DNA (98.5% human genome)(98.5% human genome)
IntergenicIntergenic Repetitive elementsRepetitive elements PromotersPromoters IntronsIntrons mRNA untranslated region (UTR)mRNA untranslated region (UTR)
RNA MoleculesRNA Molecules
mRNAmRNA tRNAtRNA rRNArRNA Other types of RNAOther types of RNA
-RNaseP –-RNaseP –trimming 5’ end of pre tRNAtrimming 5’ end of pre tRNA
-telomerase RNA- -telomerase RNA- maintaining the chromosome maintaining the chromosome ends ends
-Xist-Xist RNA- RNA- inactivation of the extra copy of the x inactivation of the extra copy of the x chromosomechromosome
What are RNA and mRNA?What are RNA and mRNA?
Traditional role as messenger molecule (mRNA)Traditional role as messenger molecule (mRNA)
RNA is a polymer of nucleotides A, U, C, and G transcribed from
DNAGATTACA GAUUACA
non-coding RNA (RNA non-coding RNA (RNA genes)genes)
RNA enzymes: catalytic RNARNA enzymes: catalytic RNA Ribosomal RNA (rRNA)Ribosomal RNA (rRNA) Transfer RNA (tRNA)Transfer RNA (tRNA)
RNAi: RNA mediated gene regulation Micro RNA (miRNA) Short-interfering RNA (siRNA)
Alternative splicing: small-nuclear RNA (snRNA)
Others: snoRNA, eRNA, srpRNA, tmRNA, gRNA
Structure essential to function for many ncRNAs
Some biological functions of Some biological functions of ncRNAncRNA
Nuclear exportNuclear export mRNA cellular localizationmRNA cellular localization Control of mRNA stabilityControl of mRNA stability Control of translationControl of translation
The function of the RNA molecule depends on its folded structure
Most biological molecules contain Most biological molecules contain one-dimensional information that is one-dimensional information that is called “sequence”, which can be called “sequence”, which can be treated as “string” in computer treated as “string” in computer science.science.
Molecules with sequence: DNA, RNA Molecules with sequence: DNA, RNA and proteinsand proteins
Molecules without much sequence Molecules without much sequence information: Lipids and information: Lipids and polysaccharides.polysaccharides.
From Sequence to From Sequence to StructureStructure
Can all the properties of a Can all the properties of a macromolecule be macromolecule be
predicted by its predicted by its sequences?sequences?
Three dimensional structuresThree dimensional structures Alternate splicingAlternate splicing Kinetics propertiesKinetics properties Etc. Etc.
RNA sequence hierarchyRNA sequence hierarchy
1D: 1D: CCAUCUUCUCCUUGGAGAUUUGGCCAUCUUCUCCUUGGAGAUUUGG
2D:2D:
3D:3D:
Control of Iron levels by mRNA Control of Iron levels by mRNA secondary structuresecondary structure
G U A GC N N N’ N N’ N N’ N N’C N N’ N N’ N N’ N N’ N N’
5’ 3’
conserved
Iron Responsive ElementIRE
Recognized byIRP1, IRP2
IRP1/2
5’ 3’F mRNA
5’ 3’TR mRNA
IRP1/2
F: Ferritin = iron storageTR: Transferin receptor = iron uptake
IRE
Low IronIRE-IRP inhibits translation of ferritinIRE-IRP Inhibition of degradation of TR
High IronIRE-IRP off -> ferritin translated
Transferin receptor degradated
RNA Secondary RNA Secondary StructureStructure
U U
C G U A A UG C
5’ 3’5’
G A U C U U G A U C
3’
STEM
LOOP
The RNA molecule folds on itself. The RNA molecule folds on itself. The base pairing is as follows:The base pairing is as follows: G C A U G U G C A U G U hydrogen bond. hydrogen bond.
RNA Secondary RNA Secondary structurestructure
G G A U
U GC C GG A U A A U G CA G C U U
INTERNAL LOOP
HAIRPIN LOOP
BULGE
STEM
DANGLING ENDS5’ 3’
Examples of known Examples of known interactions of RNA interactions of RNA secondary structural secondary structural
elementselementsPseudo-knot
Kissing hairpins
Hairpin-bulge contact
These patterns are excluded from the prediction schemes as their computation is too intensive.
What is RNA secondary What is RNA secondary structure/folding?structure/folding?
bulgeloop
helix (stem)
hairpin loopinternal loop
multi-branch
loop
2D: mRNA Regulatory 2D: mRNA Regulatory elements elements
Mini-Rose and Macro-Rose at 37oC and 42oC
Legal structure
RNA secondary structure RNA secondary structure representationrepresentation,,
also:also:
RNA 2D structure in RNA 2D structure in MatlabMatlab
3D motifs3D motifs
16S rRNA 16S rRNA 22OO sstructuretructure can be predicted from can be predicted from 11OO structure structure
BacteriaBacteria ArchaeaArchaea EukaryaEukarya
How is RNA folding How is RNA folding done?done?
Simple Nussinov Folding AlgorithmSimple Nussinov Folding Algorithm
Only scores interactions between paired bases
Useful for demonstrating general structure of more complex folding algorithms
Score for optimal structure from base i to base j
Base i is unpaired, consider pairing between i+1 and j
We want the highest scoring fold
Base j is unpaired, consider pairing between i and j-1
δ(i, j) = score for a pairing between i and j.
How is RNA folding How is RNA folding done?done?
Simple Nussinov Folding AlgorithmSimple Nussinov Folding Algorithm
Only scores interactions between paired bases
Useful for demonstrating general structure of more complex folding algorithms
Pair i and j. Now consider pairing between i+1 and j-1.
How is RNA folding How is RNA folding done?done?
Simple Nussinov Folding AlgorithmSimple Nussinov Folding Algorithm
Only scores interactions between paired bases
Useful for demonstrating general structure of more complex folding algorithms
i and j begin a bifurcation. Consider every possible bifurcation point k. Sum scores from each folded structure.
CONTRAfoldCONTRAfold
ProblemProblem: Given an RNA sequence, predict the most : Given an RNA sequence, predict the most likely secondary structurelikely secondary structure
AUCCCCGUAUCGAUCAAAAUCCAUGGGUACCCUAGUGAAAGUGUAUAUACGUGCUCUGAUUCUUUACUGAGGAGUCAGUGAACGAACUGA
How does CONTRAfold How does CONTRAfold work?work?
CONTRAfold looks at CONTRAfold looks at featuresfeatures that indicate a that indicate a good structuregood structure
C-G base pairings
A-U base pairings
Helices of length 5
Hairpin loops of size 9
Bulge loops of size 2
CG/GC Base-pair stacking interactions
For example:
These examples are called thermodynamic parameters because they represent free energy values
What is an RNA What is an RNA regulatory motif?regulatory motif?
Motif: A conserved sequence elementMotif: A conserved sequence element
A regulator binds to a regulatory motif
RNA regulatory motif: A motif used to regulate translation
G A U U A C A . . . RNA
Regulatory motif (AUUAC)
Regulatory protein Micro RNA
U A A U G microRNA
What is an accessible What is an accessible motif?motif?
If a sequence is part of an intramolecular hybridization, it is unlikely to bind to regulators
We define a motif as “accessible” if none of its nucleotides is hybridized as part of the folding
Accessible motifs cont’dAccessible motifs cont’d
Therefore, only accessible sequences should be scanned for regulatory motifs
Accessible motifs cont’dAccessible motifs cont’d
Therefore, only accessible sequences should be scanned for regulatory motifs.
Results: Degradation Related MotifsResults: Degradation Related Motifs
Prediction Tools based Prediction Tools based on Energy Calculationon Energy Calculation
Fold, MfoldFold, Mfold Zucker & Stiegler (1981) Nuc. Acids Res. 9:133-Zucker & Stiegler (1981) Nuc. Acids Res. 9:133-
148148Zucker (1989) Science 244:48-52Zucker (1989) Science 244:48-52
RNAfoldRNAfoldVienna RNA secondary structure serverVienna RNA secondary structure serverHofacker (2003) Nuc. Acids Res. 31:3429-3431Hofacker (2003) Nuc. Acids Res. 31:3429-3431
LinksLinks
http://rna.tbi.univie.ac.at/ http://rna.tbi.univie.ac.at/cgi-bin/RNAfol
d.cgi http://frontend.bioinfo.rpi.edu/application
s/mfold/ http://frontend.bioinfo.rpi.edu/applications/mf
old/cgi-bin/rna-form1.cgi
http://bioweb2.pasteur.fr/nucleic/intro-en.html#rna http://mobyle.pasteur.fr/cgi-bin/MobylePortal/
portal.py?form=mfold
RNAalifold (Hofacker 2002)From the vienna RNA package
Predicts the consensus secondarystructure for a set of aligned RNA sequences by using modified dynamic programming algorithm that addcovariance term to the standardenergy model
Improvement in prediction accuracy
Other related programsOther related programs
COVECOVE RNA structure analysis using the covariance RNA structure analysis using the covariance modelmodel (implementation of the stochastic free (implementation of the stochastic free grammar method)grammar method)
QRNA (Rivas and Eddy 2001)QRNA (Rivas and Eddy 2001)Searching for conserved RNA structuresSearching for conserved RNA structures
tRNAscan-SEtRNAscan-SE tRNA detection in genome tRNA detection in genome sequencessequences
Sean Eddy’s Lab WUhttp://www.genetics.wustl.edu/eddy
RNA familiesRNA families
Rfam : General non-coding RNA Rfam : General non-coding RNA database database
(most of the data is taken from (most of the data is taken from specific databases)specific databases)
http://www.sanger.ac.uk/Software/Rfam/
Includes many families of non coding RNAs and functionalMotifs, as well as their alignement and their secondary structures
RfamRfam
379 different RNA families or 379 different RNA families or functional functional
Motifs from mRNA UTRs etc.Motifs from mRNA UTRs etc.
GENE
INTRON
Cis ELEMENTS
Scopes of sequence Scopes of sequence analysisanalysis
Sequences onlySequences only Sequences of DNA/RNA/Proteins for Sequences of DNA/RNA/Proteins for
defining transcription unit and defining transcription unit and intronintron
Sequence with other kinds of dataSequence with other kinds of data MicroarrayMicroarray 3D-data such as comparative modeling3D-data such as comparative modeling Metabolic dataMetabolic data
Purposes of Purposes ofSequence Sequence Analysis i Analysis i
ncludenclude Identification of coding regions Identification of coding regions IdentificationIdentification ofof regulatoryregulatory elementselements Identifying events of genetic Identifying events of genetic
recombinationrecombination Identifying the existence of selective Identifying the existence of selective
pressurespressures Searching for homologous sequences Searching for homologous sequences Identifying shared patterns of a group of Identifying shared patterns of a group of
sequences sequences Modeling secondary or 3D structures (Modeling secondary or 3D structures (ab ab
initioinitio modeling) modeling)
What can you do with a What can you do with a single sequence?single sequence?
Information content in a sequenceInformation content in a sequence G/C contentG/C content Codon usageCodon usage Synonymous/Non-synonymous Synonymous/Non-synonymous
mutationsmutations Periodicity Periodicity
16Based on S rRNA sequenc aaa aaaaaaa aaaaaaaa a aaa a, aaaaaaa aaaa a aaaaaaaaaaaa3
s.
Information contained in Information contained in SSU rRNA genes SSU rRNA genes
SSU rRNA sequences are us SSU rRNA sequences are us ed for universal phylogeneti ed for universal phylogeneti
c construction c construction Its length is ~1500 bp~ Its length is ~1500 bp~33
000 bits of information 000 bits of information - 3538Lifebegins~ . . bi l l i on year s - 3538Lifebegins~ . . bi l l i on year s
p
Maximum average change o Maximum average change o f information in SSU rRNA f information in SSU rRNA
genesgenes
1 bit/million years of evolutio 1 bit/million years of evolutioaa
aaaaaaaa aaa aaaa aaa aaa aa aaaaaaaa aaa a, aaaaaaaa aaa aaaa aaa aaa aa aaaaaaaa aaa a,aaaaaaaaaaa a aaaa aaaaaaaaaa aaaaaaaaaaa a aaaa aaaaaaaaaa aaaa a aaa aaaaaaaa aaaa aaaaaaaaa a aaa aaaaaaaa aaaa aaaaa
aaaaaa< 1 . aaaaaa< 1 .
Homology vs Simil Homology vs Similarityarity
““ Homology” implies common origin. Homology” implies common origin. Homologous sequences are generall Homologous sequences are generall
y recognized by similarity y recognized by similarity Similar sequences may not be homo Similar sequences may not be homo
logous logous but may be but may be due to converge due to converge nt evolution or just by chance. nt evolution or just by chance.
Homology is qualitative, while simil Homology is qualitative, while simil arity is quantitative arity is quantitative
Orthologous vs Paralo Orthologous vs Paralogousgous
Orthologues usually refer to homo Orthologues usually refer to homo logous genes with the same functi logous genes with the same functi
on in different organisms. on in different organisms. Paralogues usually refer to homol Paralogues usually refer to homol
ogous genes with different functio ogous genes with different functio ns, usually in the same organisms. ns, usually in the same organisms.
Bioinformatic tools cannot differe Bioinformatic tools cannot differe ntiate between both. ntiate between both.