Vizbi2013: Visualising RNA

Post on 20-Jun-2015

249 views 0 download

Tags:

Transcript of Vizbi2013: Visualising RNA

Visualising RNA

Paul Gardnerpaul.gardner@canterbury.ac.nz

University of Canterbury, Christchurch,New Zealand.

March 20, 2013

Paul Gardner Visualising RNA

Feel free to share

I Feel free to tweet (@ppgardne), Google+, tumblr, ...I Slides are available from

http://www.slideshare.net/ppgardne/.

Paul Gardner Visualising RNA

What is an RNA?

GCGGAUUU

AGCUC

AGDDGG G A

G A G CG

CCA

GACUG

A A.A.

CUGGAGGU

CC U G U G

T . CGA

UCCACAG

AAUUCGC

AC

CA

VariableLoopAnticodon

Loop

T ΨCLoop

10 15 20 25 30 355 40 45 50 55 60 65 70 75

AnticodonLoop

Acceptor Stem

GCGGAUUUAGCUCAGDDGGGAGAGCGCCAGACUGAAYA.CUGGAGGUCCUGUGT.CGAUCCACAGAAUUCGCACCA5’ 3’

Secondary Structure Tertiary StructureB C

Primary StructureA

Acceptor Stem

T ΨCLoop

ΨΨ

Ψ

Ψ

Y

6560

55

40

10

20

155

70

75

25

30

35

45

50

D Loop

3’

5’

5’3’

D Loop

Paul Gardner Visualising RNA

What is Rfam?

I Sister database to Pfam

I Aims to annotate all ncRNA families

I Consortium headed by Alex Bateman (Wellcome Trust SangerInstitute), Sean Eddy (Janelia, Howard Hughes), SamGriffiths-Jones (Manchester, BBSRC), Paul Gardner(University of Canterbury, RSNZ)

Paul Gardner Visualising RNA

Rfam: families of ncRNAs

http://rfam.sanger.ac.ukhttp://rfam.janelia.org

Paul Gardner Visualising RNA

Building an Rfam family

I A structure from literature

Pollard KS, et al. (2006). An RNA gene expressed during cortical development evolved rapidly in humans. Nature.

Paul Gardner Visualising RNA

Building an Rfam family

I An Rfam family: produced manually from publication figures# STOCKHOLM 1.0

G.gallus.1 UGAAAUGGAGGAGAAAUUACAGCAAUUUAUCAACUGAAAUUAUAGGUGUAGACACAUGUCAGCAGUAGM.musculus.1 UAAAAUGGAGGAGAAAUUACAGCAAUUUAUCAGCUGAAAUUAUAGGUGUAGACACAUGUCAGCCGUGGM.mulatta.1 UGAAAUGGAGGAGAAAUUACAGCAAUUUAUCAGCUGAAAUUAUAGGUGUAGACACAUGUCAGCAGUGGG.gorilla.1 UGAAAUGGAGGAGAAAUUACAGCAAUUUAUCAACUGAAAUUAUAGGUGUAGACACAUGUCAGCAGUGGH.sapiens.1 UGAAACGGAGGAGACGUUACAGCAACGUGUCAGCUGAAAUGAUGGGCGUAGACGCACGUCAGCGGCGGP.troglodytes.1 UGAAAUGGAGGAGAAAUUACAGCAAUUUAUCAACUGAAAUUAUAGGUGUAGACACAUGUCAGCAGUGGP.abelii.1 UGAAAUGGAGGAGAAAUUACAGCAAUUUAUCAACUGAAAUUAUAGGUGUAGACACAUGUCAGCAGUGGC.lupus.1 UGAAAUGGAGGAGAAAUUACAGCAAUUUAUCAACUGAAAUUAUAGGUGUAGACACAUGUCAGCGGUGCT.truncatus.1 CGAAAAGGAGGGGAAAUUACAGCAAUUCAUCAACUGAAAUUAUAGGUGUAGACACAUGUCAGCAGUGGB.taurus.1 CGAAAUGGAGGAGAAAUUACAGCAAUUCAUCAGCUGAAAUUAUAGGUGUAGACACAUGUCAGCAGUGGV.pacos.1 UGAAACAGAGGAGAAAUUACAGCAAUUCAUCAACCGAAAUGAUAGGGAUAGACAUGUGUCGGCAGUGGM.lucifugus.1 CGAAAUGGAGGAGAAAUUACAGCAAUUUAUCAACUGAAAUUAUAGGUGUAGACACAUGUCAUCCGUGGO.anatinus.1 UGAAAUGGAGGAUAAAUUACAGCAAUUUAUCAAAUGAAAUUAUAGGUGUAGACACAUGUCAGCAAUGG#=GC SS_cons <<<<<<.<<<<<<<<<<<.....>>>>>.....>><<<<<.<<<.<<<....>>>.>>>.........#=GC RF uGaaacGGaGGagaaguuAcAGcaacuuAUcAgcuGaaacuaugGGcGUAGACgCAcgucAGcaguGg

G.gallus.1 AAACAGUUUCUAUCAAAAUUAAAGUAUUUAGAGAUUUUCCUCAAAUUUCAM.musculus.1 AAAUGGUUUCUAUCAAAAUUAAAGUAUUUAGAGAUUUUCCUCAAAUUUCAM.mulatta.1 AAAUAGUUUCUAUCAAAAUUAAAGUAUUUAGAGAUUUUCCUCAAAUUUCAG.gorilla.1 AAAUAGUUUCUAUCAAAAUUAAAGUAUUUAGAGAUUUUCCUCAAAUUUCAH.sapiens.1 AAAUGGUUUCUAUCAAAAUGAAAGUGUUUAGAGAUUUUCCUCAAGUUUCAP.troglodytes.1 AAAUAGUUUCUAUCAAAAUUAAAGUAUUUAGAGAUUUUCCUCAAAUUUCAP.abelii.1 AAAUAGUUUCUAUCAAAAUUAAAGUAUUUAGAGAUUUUCCUCAAAUUUCAC.lupus.1 AAACAGUUUCUAUCAAAAUUAAAGUAUUUAGAGAUUUUCCUCAAAUUUCAT.truncatus.1 GAACACUUUCUAUCAAAAUUAAAGUACUUAGCGAUUUUCCUUAAAUUUCAB.taurus.1 AAACCGUUUCUAUCAAAAUUAAAGUAUUUAGAGAUUUUCCUUAAAUUUCAV.pacos.1 AAACAGUUUCUAUCAAAAUUAAAGUAUUUAGAGACUUUCCUCAAAUUUCAM.lucifugus.1 AAACAGUUACGAUCAAAAUUAAAGUGUUUAGAGAUUUUCCUC.AAUUUUAO.anatinus.1 AAACAAUUUCUAUCAAAAUUAAAGUAUUUAGAGAUUUUCCUCAAAUUUCA#=GC SS_cons .....>>>>>....<<<<<..............>>>>>>>>>..>>>>>>#=GC RF AAAuaguuuCUAUcaaaauuAAAGUAUUUAGAGauuuuCCuCAAguuuCa//

Paul Gardner Visualising RNA

Building an Rfam family

I And the Wikipedia entry

Paul Gardner Visualising RNA

Conflicting priorities

I A Curator’s priorities

1. New families2. Accuracy of models3. Annotation4. Functional codebase5. Website6. Visualization

I A User’s priorities

1. FTP (Bioinformaticians)2. Website3. Visualization4. Number of families5. Accuracy of models6. Annotation

Image credits: www.conflictdynamics.org

Paul Gardner Visualising RNA

2007: challenges

I Quality ControlI Re-write the website and add some blingI Update codebaseI Export annotation to WikipediaI User community input via RNA Biology

Paul Gardner Visualising RNA

Visualisation priorities

SCALEI Two to two million sequences, 30 to 3,000 nucleotides long, 0

to 1,000 basepairs.

I AUTOMATED: thousands of families.

INFORMATIVEI Generates biologically relevant hypotheses

INCLUSIVEI Make the most of our fantastic Bioinformatic & Visualisation

community.

Paul Gardner Visualising RNA

Examples

I Caveat: none of these images I am showing are final solutions,everything can be improved upon.

I Secondary Structure

I Taxonomic Distribution

I Alignment

I Genomic contexts & GeneOrder

Paul Gardner Visualising RNA

RNA Secondary Structure

5’ 3’

0Sequence conservation

1

UVDWHAUGAUGA

GY

UC

MACUUCWUuGG

UC

CG U G U U U C U G A g a R MCYM

RUGAUMUBWRU

Ga

SA

AaGUUCUGAY

UHM

Gardner, Bateman & Poole (2010) SnoPatrol: how many snoRNA genes are there?. Journal of Biology.

Paul Gardner Visualising RNA

Old Taxonomic distributions: RybB

I Contamination displayed first.

Paul Gardner Visualising RNA

Old Taxonomic distributions: RybB

I After some scrolling

Paul Gardner Visualising RNA

New Taxonomic distributions: RybBI Sunbursts: concentric “pie charts”, each external ring

contains the “children” nodes of the internal ring.

Paul Gardner Visualising RNA

Alignments

I When we have sequenced everything, how is this view goingto look?

Paul Gardner Visualising RNA

Genomic contexts & Gene Order

I How can we display comparative gene-order information in ascalable fashion?

I Think of hundreds to thousands of genomes, tens to hundredsof features.

Barquist L, et al. (2013). A comparison of dense transposon insertion libraries in the Salmonella serovars Typhi and

Typhimurium. Nucleic Acids Research.

Paul Gardner Visualising RNA

Open problems

I Evolution and RNA structureI Scalable, alignment visualisation (and editing)

I As alignments grow, we need to be able to be able to partition,compress and summarize groupings of sequences. 1,000s ofsequences from the same species is not interesting to view, noris a screen full of gaps.

I Expression and conservation levels

I Genomic context & gene-order

Paul Gardner Visualising RNA

Thanks!

I The Rfam Consortium:I Alex Bateman, Sean

Eddy, SamGriffiths-Jones, SarahBurge, Eric Nawrocki,John Tate, Rob Finn,Jennifer Daub, RuthEberhardt

I Visualisation Tools:I Ivo Hofacker, Yann

Ponti, Jim Proctor,Ian Holmes, IrmtraudMeyer, ZashaWeinberg and manyothers.

PPG is supported by a Rutherford Discovery Fellowship from Government funding, administered by the RoyalSociety of New Zealand.

Paul Gardner Visualising RNA