Post on 20-Jun-2015
Visualising RNA
Paul Gardnerpaul.gardner@canterbury.ac.nz
University of Canterbury, Christchurch,New Zealand.
March 20, 2013
Paul Gardner Visualising RNA
Feel free to share
I Feel free to tweet (@ppgardne), Google+, tumblr, ...I Slides are available from
http://www.slideshare.net/ppgardne/.
Paul Gardner Visualising RNA
What is an RNA?
GCGGAUUU
AGCUC
AGDDGG G A
G A G CG
CCA
GACUG
A A.A.
CUGGAGGU
CC U G U G
T . CGA
UCCACAG
AAUUCGC
AC
CA
VariableLoopAnticodon
Loop
T ΨCLoop
10 15 20 25 30 355 40 45 50 55 60 65 70 75
AnticodonLoop
Acceptor Stem
GCGGAUUUAGCUCAGDDGGGAGAGCGCCAGACUGAAYA.CUGGAGGUCCUGUGT.CGAUCCACAGAAUUCGCACCA5’ 3’
Secondary Structure Tertiary StructureB C
Primary StructureA
Acceptor Stem
T ΨCLoop
ΨΨ
Ψ
Ψ
Y
6560
55
40
10
20
155
70
75
25
30
35
45
50
D Loop
3’
5’
5’3’
D Loop
Paul Gardner Visualising RNA
What is Rfam?
I Sister database to Pfam
I Aims to annotate all ncRNA families
I Consortium headed by Alex Bateman (Wellcome Trust SangerInstitute), Sean Eddy (Janelia, Howard Hughes), SamGriffiths-Jones (Manchester, BBSRC), Paul Gardner(University of Canterbury, RSNZ)
Paul Gardner Visualising RNA
Rfam: families of ncRNAs
http://rfam.sanger.ac.ukhttp://rfam.janelia.org
Paul Gardner Visualising RNA
Building an Rfam family
I A structure from literature
Pollard KS, et al. (2006). An RNA gene expressed during cortical development evolved rapidly in humans. Nature.
Paul Gardner Visualising RNA
Building an Rfam family
I An Rfam family: produced manually from publication figures# STOCKHOLM 1.0
G.gallus.1 UGAAAUGGAGGAGAAAUUACAGCAAUUUAUCAACUGAAAUUAUAGGUGUAGACACAUGUCAGCAGUAGM.musculus.1 UAAAAUGGAGGAGAAAUUACAGCAAUUUAUCAGCUGAAAUUAUAGGUGUAGACACAUGUCAGCCGUGGM.mulatta.1 UGAAAUGGAGGAGAAAUUACAGCAAUUUAUCAGCUGAAAUUAUAGGUGUAGACACAUGUCAGCAGUGGG.gorilla.1 UGAAAUGGAGGAGAAAUUACAGCAAUUUAUCAACUGAAAUUAUAGGUGUAGACACAUGUCAGCAGUGGH.sapiens.1 UGAAACGGAGGAGACGUUACAGCAACGUGUCAGCUGAAAUGAUGGGCGUAGACGCACGUCAGCGGCGGP.troglodytes.1 UGAAAUGGAGGAGAAAUUACAGCAAUUUAUCAACUGAAAUUAUAGGUGUAGACACAUGUCAGCAGUGGP.abelii.1 UGAAAUGGAGGAGAAAUUACAGCAAUUUAUCAACUGAAAUUAUAGGUGUAGACACAUGUCAGCAGUGGC.lupus.1 UGAAAUGGAGGAGAAAUUACAGCAAUUUAUCAACUGAAAUUAUAGGUGUAGACACAUGUCAGCGGUGCT.truncatus.1 CGAAAAGGAGGGGAAAUUACAGCAAUUCAUCAACUGAAAUUAUAGGUGUAGACACAUGUCAGCAGUGGB.taurus.1 CGAAAUGGAGGAGAAAUUACAGCAAUUCAUCAGCUGAAAUUAUAGGUGUAGACACAUGUCAGCAGUGGV.pacos.1 UGAAACAGAGGAGAAAUUACAGCAAUUCAUCAACCGAAAUGAUAGGGAUAGACAUGUGUCGGCAGUGGM.lucifugus.1 CGAAAUGGAGGAGAAAUUACAGCAAUUUAUCAACUGAAAUUAUAGGUGUAGACACAUGUCAUCCGUGGO.anatinus.1 UGAAAUGGAGGAUAAAUUACAGCAAUUUAUCAAAUGAAAUUAUAGGUGUAGACACAUGUCAGCAAUGG#=GC SS_cons <<<<<<.<<<<<<<<<<<.....>>>>>.....>><<<<<.<<<.<<<....>>>.>>>.........#=GC RF uGaaacGGaGGagaaguuAcAGcaacuuAUcAgcuGaaacuaugGGcGUAGACgCAcgucAGcaguGg
G.gallus.1 AAACAGUUUCUAUCAAAAUUAAAGUAUUUAGAGAUUUUCCUCAAAUUUCAM.musculus.1 AAAUGGUUUCUAUCAAAAUUAAAGUAUUUAGAGAUUUUCCUCAAAUUUCAM.mulatta.1 AAAUAGUUUCUAUCAAAAUUAAAGUAUUUAGAGAUUUUCCUCAAAUUUCAG.gorilla.1 AAAUAGUUUCUAUCAAAAUUAAAGUAUUUAGAGAUUUUCCUCAAAUUUCAH.sapiens.1 AAAUGGUUUCUAUCAAAAUGAAAGUGUUUAGAGAUUUUCCUCAAGUUUCAP.troglodytes.1 AAAUAGUUUCUAUCAAAAUUAAAGUAUUUAGAGAUUUUCCUCAAAUUUCAP.abelii.1 AAAUAGUUUCUAUCAAAAUUAAAGUAUUUAGAGAUUUUCCUCAAAUUUCAC.lupus.1 AAACAGUUUCUAUCAAAAUUAAAGUAUUUAGAGAUUUUCCUCAAAUUUCAT.truncatus.1 GAACACUUUCUAUCAAAAUUAAAGUACUUAGCGAUUUUCCUUAAAUUUCAB.taurus.1 AAACCGUUUCUAUCAAAAUUAAAGUAUUUAGAGAUUUUCCUUAAAUUUCAV.pacos.1 AAACAGUUUCUAUCAAAAUUAAAGUAUUUAGAGACUUUCCUCAAAUUUCAM.lucifugus.1 AAACAGUUACGAUCAAAAUUAAAGUGUUUAGAGAUUUUCCUC.AAUUUUAO.anatinus.1 AAACAAUUUCUAUCAAAAUUAAAGUAUUUAGAGAUUUUCCUCAAAUUUCA#=GC SS_cons .....>>>>>....<<<<<..............>>>>>>>>>..>>>>>>#=GC RF AAAuaguuuCUAUcaaaauuAAAGUAUUUAGAGauuuuCCuCAAguuuCa//
Paul Gardner Visualising RNA
Building an Rfam family
I And the Wikipedia entry
Paul Gardner Visualising RNA
Conflicting priorities
I A Curator’s priorities
1. New families2. Accuracy of models3. Annotation4. Functional codebase5. Website6. Visualization
I A User’s priorities
1. FTP (Bioinformaticians)2. Website3. Visualization4. Number of families5. Accuracy of models6. Annotation
Image credits: www.conflictdynamics.org
Paul Gardner Visualising RNA
2007: challenges
I Quality ControlI Re-write the website and add some blingI Update codebaseI Export annotation to WikipediaI User community input via RNA Biology
Paul Gardner Visualising RNA
Visualisation priorities
SCALEI Two to two million sequences, 30 to 3,000 nucleotides long, 0
to 1,000 basepairs.
I AUTOMATED: thousands of families.
INFORMATIVEI Generates biologically relevant hypotheses
INCLUSIVEI Make the most of our fantastic Bioinformatic & Visualisation
community.
Paul Gardner Visualising RNA
Examples
I Caveat: none of these images I am showing are final solutions,everything can be improved upon.
I Secondary Structure
I Taxonomic Distribution
I Alignment
I Genomic contexts & GeneOrder
Paul Gardner Visualising RNA
RNA Secondary Structure
5’ 3’
0Sequence conservation
1
UVDWHAUGAUGA
GY
UC
MACUUCWUuGG
UC
CG U G U U U C U G A g a R MCYM
RUGAUMUBWRU
Ga
SA
AaGUUCUGAY
UHM
Gardner, Bateman & Poole (2010) SnoPatrol: how many snoRNA genes are there?. Journal of Biology.
Paul Gardner Visualising RNA
Old Taxonomic distributions: RybB
I Contamination displayed first.
Paul Gardner Visualising RNA
Old Taxonomic distributions: RybB
I After some scrolling
Paul Gardner Visualising RNA
New Taxonomic distributions: RybBI Sunbursts: concentric “pie charts”, each external ring
contains the “children” nodes of the internal ring.
Paul Gardner Visualising RNA
Alignments
I When we have sequenced everything, how is this view goingto look?
Paul Gardner Visualising RNA
Genomic contexts & Gene Order
I How can we display comparative gene-order information in ascalable fashion?
I Think of hundreds to thousands of genomes, tens to hundredsof features.
Barquist L, et al. (2013). A comparison of dense transposon insertion libraries in the Salmonella serovars Typhi and
Typhimurium. Nucleic Acids Research.
Paul Gardner Visualising RNA
Open problems
I Evolution and RNA structureI Scalable, alignment visualisation (and editing)
I As alignments grow, we need to be able to be able to partition,compress and summarize groupings of sequences. 1,000s ofsequences from the same species is not interesting to view, noris a screen full of gaps.
I Expression and conservation levels
I Genomic context & gene-order
Paul Gardner Visualising RNA
Thanks!
I The Rfam Consortium:I Alex Bateman, Sean
Eddy, SamGriffiths-Jones, SarahBurge, Eric Nawrocki,John Tate, Rob Finn,Jennifer Daub, RuthEberhardt
I Visualisation Tools:I Ivo Hofacker, Yann
Ponti, Jim Proctor,Ian Holmes, IrmtraudMeyer, ZashaWeinberg and manyothers.
PPG is supported by a Rutherford Discovery Fellowship from Government funding, administered by the RoyalSociety of New Zealand.
Paul Gardner Visualising RNA