Tools for Community Genome Annotation: Zmap and Otter · 2017-02-10 · Automated annotation alone...
Transcript of Tools for Community Genome Annotation: Zmap and Otter · 2017-02-10 · Automated annotation alone...
![Page 1: Tools for Community Genome Annotation: Zmap and Otter · 2017-02-10 · Automated annotation alone is not sufficient for researchers needs GENCODE geneset. ... The server controls](https://reader034.fdocuments.in/reader034/viewer/2022050515/5f9f7a102fc1fe4b85757d50/html5/thumbnails/1.jpg)
Tools for Community Genome Annotation:
Zmap and Otter
Jane Loveland Havana group, Wellcome Trust Sanger Institute,
Hinxton, Cambridge, UK
IWGSC – Standards and ProtocolsPAG XXV, Tuesday 17th January 2017
![Page 2: Tools for Community Genome Annotation: Zmap and Otter · 2017-02-10 · Automated annotation alone is not sufficient for researchers needs GENCODE geneset. ... The server controls](https://reader034.fdocuments.in/reader034/viewer/2022050515/5f9f7a102fc1fe4b85757d50/html5/thumbnails/2.jpg)
Havana: Human and vertebrate analysis and annotation• Manual annotation of human, mouse, zebrafish, pig and rat
whole chromosomes or genomes• Human GENCODE annotation and working on mouse
GENCODE annotation• Annotation of specific regions: human MHC & LRC
haplotypes, multiple species MHCs & LRCs,
Vega: Vertebrate Genome Annotation• Ensembl derived browser focusing on manual annotation
![Page 3: Tools for Community Genome Annotation: Zmap and Otter · 2017-02-10 · Automated annotation alone is not sufficient for researchers needs GENCODE geneset. ... The server controls](https://reader034.fdocuments.in/reader034/viewer/2022050515/5f9f7a102fc1fe4b85757d50/html5/thumbnails/3.jpg)
Overview
• Manual annotation process– Tools, data, biotypes
• Community Manual Annotation– Mouse, Swine autosomes (IRAG), Rat, Chicken
• New data and projects
![Page 4: Tools for Community Genome Annotation: Zmap and Otter · 2017-02-10 · Automated annotation alone is not sufficient for researchers needs GENCODE geneset. ... The server controls](https://reader034.fdocuments.in/reader034/viewer/2022050515/5f9f7a102fc1fe4b85757d50/html5/thumbnails/4.jpg)
• Manual annotation process– Tools, data, biotypes
• Community Manual Annotation– Mouse, Swine autosomes (IRAG), Rat, Chicken
• New data and projects
![Page 5: Tools for Community Genome Annotation: Zmap and Otter · 2017-02-10 · Automated annotation alone is not sufficient for researchers needs GENCODE geneset. ... The server controls](https://reader034.fdocuments.in/reader034/viewer/2022050515/5f9f7a102fc1fe4b85757d50/html5/thumbnails/5.jpg)
Automatic Annotation vs Manual
Automatic Annotation• Quick whole genome analysis ~
weeks• Consistent annotation• Use unfinished/illumina
sequence/shotgun assembly• No polyA sites/signals,
pseudogenes, lncRNAs• Limited functional annotation• Predicts ~75% loci
Manual Annotation• Slow~3 months per chromosome• Prefer finished (high quality) sequence• Flexible, can deal with inconsistencies in
data• Most rules have exception• Consult publications as well as databases• Extensive Biotypes:
– Excellent functional annotation– e.g. pseudogenes, lncRNA
Automated annotation alone is not sufficient for researchers needsGENCODE geneset
![Page 6: Tools for Community Genome Annotation: Zmap and Otter · 2017-02-10 · Automated annotation alone is not sufficient for researchers needs GENCODE geneset. ... The server controls](https://reader034.fdocuments.in/reader034/viewer/2022050515/5f9f7a102fc1fe4b85757d50/html5/thumbnails/6.jpg)
Analysis and Annotation pipeline: Otter/ZMap
BLAST Gene predictions
RepeatMaskerCpG prediction
PfamRefSeq
Ensembl
OtterEditing Windows
Annotation process on desktop
Annotators can work anywhere as software communicates with server over HTTPThe server controls data access and allows multi-user annotation
Otter works with Zmap to provide a complete integrated annotation workbench
![Page 7: Tools for Community Genome Annotation: Zmap and Otter · 2017-02-10 · Automated annotation alone is not sufficient for researchers needs GENCODE geneset. ... The server controls](https://reader034.fdocuments.in/reader034/viewer/2022050515/5f9f7a102fc1fe4b85757d50/html5/thumbnails/7.jpg)
Vertical and horizontalsplitting
Annotation tools: Zmap
![Page 8: Tools for Community Genome Annotation: Zmap and Otter · 2017-02-10 · Automated annotation alone is not sufficient for researchers needs GENCODE geneset. ... The server controls](https://reader034.fdocuments.in/reader034/viewer/2022050515/5f9f7a102fc1fe4b85757d50/html5/thumbnails/8.jpg)
Annotation tools: Otter
![Page 9: Tools for Community Genome Annotation: Zmap and Otter · 2017-02-10 · Automated annotation alone is not sufficient for researchers needs GENCODE geneset. ... The server controls](https://reader034.fdocuments.in/reader034/viewer/2022050515/5f9f7a102fc1fe4b85757d50/html5/thumbnails/9.jpg)
Annotation tools: Blixem
![Page 10: Tools for Community Genome Annotation: Zmap and Otter · 2017-02-10 · Automated annotation alone is not sufficient for researchers needs GENCODE geneset. ... The server controls](https://reader034.fdocuments.in/reader034/viewer/2022050515/5f9f7a102fc1fe4b85757d50/html5/thumbnails/10.jpg)
Annotation tools: Dotter
![Page 11: Tools for Community Genome Annotation: Zmap and Otter · 2017-02-10 · Automated annotation alone is not sufficient for researchers needs GENCODE geneset. ... The server controls](https://reader034.fdocuments.in/reader034/viewer/2022050515/5f9f7a102fc1fe4b85757d50/html5/thumbnails/11.jpg)
Manual Annotation: BiotypesAnnotation:
based on transcriptional evidenceProtein Coding
Known_CDSNovel_CDSPutative_CDSNonsense_mediated_decay
Transcript retained intronputative
Non-coding lincRNAAntisenseSense_intronicSense_overlapping3’_overlapping_ncRNA
PseudogeneProcessedUnprocessedTranscribedTranslatedUnitaryPolymorphic
ImmunoglobulinIG_pseudogeneIG_GeneTR_Gene
Biotypes
Sequencesfrom
databases
Set of guidelines to help make annotation decisions
![Page 12: Tools for Community Genome Annotation: Zmap and Otter · 2017-02-10 · Automated annotation alone is not sufficient for researchers needs GENCODE geneset. ... The server controls](https://reader034.fdocuments.in/reader034/viewer/2022050515/5f9f7a102fc1fe4b85757d50/html5/thumbnails/12.jpg)
Mutually exclusive
Skipped exon
Retained intron
Alternative splice donor
Alternative splice acceptor
Alternative first exon
Alternative final exon
Reference model
Alternative Splicing:
![Page 13: Tools for Community Genome Annotation: Zmap and Otter · 2017-02-10 · Automated annotation alone is not sufficient for researchers needs GENCODE geneset. ... The server controls](https://reader034.fdocuments.in/reader034/viewer/2022050515/5f9f7a102fc1fe4b85757d50/html5/thumbnails/13.jpg)
5’ end annotation:GPR56 (Human G protein-coupled receptor 56 gene)
![Page 14: Tools for Community Genome Annotation: Zmap and Otter · 2017-02-10 · Automated annotation alone is not sufficient for researchers needs GENCODE geneset. ... The server controls](https://reader034.fdocuments.in/reader034/viewer/2022050515/5f9f7a102fc1fe4b85757d50/html5/thumbnails/14.jpg)
GRIN2BEnsembl Bodymap
BrainBreast
polyA
RNAseq data to extend 3’UTRs
Extended 3’ end
![Page 15: Tools for Community Genome Annotation: Zmap and Otter · 2017-02-10 · Automated annotation alone is not sufficient for researchers needs GENCODE geneset. ... The server controls](https://reader034.fdocuments.in/reader034/viewer/2022050515/5f9f7a102fc1fe4b85757d50/html5/thumbnails/15.jpg)
Improvements of lncRNA annotation: understanding functionality
![Page 16: Tools for Community Genome Annotation: Zmap and Otter · 2017-02-10 · Automated annotation alone is not sufficient for researchers needs GENCODE geneset. ... The server controls](https://reader034.fdocuments.in/reader034/viewer/2022050515/5f9f7a102fc1fe4b85757d50/html5/thumbnails/16.jpg)
Unprocessed
DuplicationReverse transcription and re-integration
AAAAAA
Processed
AAAAAA
* * **
HAVANA Pseudogene Loci:
![Page 17: Tools for Community Genome Annotation: Zmap and Otter · 2017-02-10 · Automated annotation alone is not sufficient for researchers needs GENCODE geneset. ... The server controls](https://reader034.fdocuments.in/reader034/viewer/2022050515/5f9f7a102fc1fe4b85757d50/html5/thumbnails/17.jpg)
Polymorphic pseudogene
Sequencing error, pseudogene or polymorphic pseudogene?
Loss of function gene
![Page 18: Tools for Community Genome Annotation: Zmap and Otter · 2017-02-10 · Automated annotation alone is not sufficient for researchers needs GENCODE geneset. ... The server controls](https://reader034.fdocuments.in/reader034/viewer/2022050515/5f9f7a102fc1fe4b85757d50/html5/thumbnails/18.jpg)
Havana team move to EBI April 2017VEGA is being archived and final release will be February 2017
![Page 19: Tools for Community Genome Annotation: Zmap and Otter · 2017-02-10 · Automated annotation alone is not sufficient for researchers needs GENCODE geneset. ... The server controls](https://reader034.fdocuments.in/reader034/viewer/2022050515/5f9f7a102fc1fe4b85757d50/html5/thumbnails/19.jpg)
CCDS
ENSEMBL
Gold (merged): agreed ensembl/havana
Blue: non-coding
Red: coding (001 Havana, 201 Ensembl)
http://www.gencodegenes.org
Ensembl view: GENCODE geneset
![Page 20: Tools for Community Genome Annotation: Zmap and Otter · 2017-02-10 · Automated annotation alone is not sufficient for researchers needs GENCODE geneset. ... The server controls](https://reader034.fdocuments.in/reader034/viewer/2022050515/5f9f7a102fc1fe4b85757d50/html5/thumbnails/20.jpg)
Update genes:
![Page 21: Tools for Community Genome Annotation: Zmap and Otter · 2017-02-10 · Automated annotation alone is not sufficient for researchers needs GENCODE geneset. ... The server controls](https://reader034.fdocuments.in/reader034/viewer/2022050515/5f9f7a102fc1fe4b85757d50/html5/thumbnails/21.jpg)
ftp://ngs.sanger.ac.uk/production/gencode/update_trackhub/hub.txt
New Track Hub:
![Page 22: Tools for Community Genome Annotation: Zmap and Otter · 2017-02-10 · Automated annotation alone is not sufficient for researchers needs GENCODE geneset. ... The server controls](https://reader034.fdocuments.in/reader034/viewer/2022050515/5f9f7a102fc1fe4b85757d50/html5/thumbnails/22.jpg)
• Manual annotation process– Tools, data, biotypes
• Community Manual Annotation– Mouse, Swine autosomes (IRAG), Rat, Chicken
• New data and projects
![Page 23: Tools for Community Genome Annotation: Zmap and Otter · 2017-02-10 · Automated annotation alone is not sufficient for researchers needs GENCODE geneset. ... The server controls](https://reader034.fdocuments.in/reader034/viewer/2022050515/5f9f7a102fc1fe4b85757d50/html5/thumbnails/23.jpg)
Community Annotation:
•Part of IKMC with EUCOMM annotation in mouse:•KOMP and NorCOMM annotation (“Blessed Annotator”)
•Jamborees for species with strong community interest (“Gatekeeper”):•Xenopus tropicalis 2005 (cDNA)•Cow 2007 (Genomic WGS)
•Pig 2008 (Genomic WGS)2010 - 2013
•IR genes in Pig (~1300 genes) manually annotated by communityMany transcript variantsFound gene expansions and duplications Co-expression clustering analysis: some exhibited accelerated evolution
•Rat manual annotation 2013, 2015 (BBSRC)
•Chicken MHC 2016
![Page 24: Tools for Community Genome Annotation: Zmap and Otter · 2017-02-10 · Automated annotation alone is not sufficient for researchers needs GENCODE geneset. ... The server controls](https://reader034.fdocuments.in/reader034/viewer/2022050515/5f9f7a102fc1fe4b85757d50/html5/thumbnails/24.jpg)
Community Annotation Approaches:The value of a genome is only as good as its annotationRat whole genome annotation of Rnor 6.0Chicken MHC
Otter/Zmap Annotation SoftwareAuthentication:Sanger single sign-on account (email)Registered email for Otter permitted users:Access to our data and and analysis pipeline
Mac and Linux: Platforms of choiceMonthly updates/bugfixes
Windows:Virtual machine image installed and run using VirtualBoxRuns an Ubuntu desktop with a bespoke Otter release
Over 2600 genes manually annotated that have been chosen by the rat community and RGD. Targeted annotation.
Final rat Vega and Ensembl merge in progress
![Page 25: Tools for Community Genome Annotation: Zmap and Otter · 2017-02-10 · Automated annotation alone is not sufficient for researchers needs GENCODE geneset. ... The server controls](https://reader034.fdocuments.in/reader034/viewer/2022050515/5f9f7a102fc1fe4b85757d50/html5/thumbnails/25.jpg)
• Manual annotation process– Tools, data, biotypes
• Community Manual Annotation– Mouse, Swine autosomes (IRAG), Rat, Chicken
• New data and projects
![Page 26: Tools for Community Genome Annotation: Zmap and Otter · 2017-02-10 · Automated annotation alone is not sufficient for researchers needs GENCODE geneset. ... The server controls](https://reader034.fdocuments.in/reader034/viewer/2022050515/5f9f7a102fc1fe4b85757d50/html5/thumbnails/26.jpg)
Long-read data:Better for discovering novel alternatively spliced transcripts and full-length transcripts
PacBio-CaptureSeq(human: brain, testis, heart, liver, HeLa, K562)(mouse: brain, testis, heart, liver, E7, E15)
SLR-RNAseq(human/mouse brain)
![Page 27: Tools for Community Genome Annotation: Zmap and Otter · 2017-02-10 · Automated annotation alone is not sufficient for researchers needs GENCODE geneset. ... The server controls](https://reader034.fdocuments.in/reader034/viewer/2022050515/5f9f7a102fc1fe4b85757d50/html5/thumbnails/27.jpg)
SLR-seq readsHavanaannot.
Merged modelsSLR PacBio PacBio readsCAGE
PolyA-seq
Computationally combining evidence:
EDEM3
![Page 28: Tools for Community Genome Annotation: Zmap and Otter · 2017-02-10 · Automated annotation alone is not sufficient for researchers needs GENCODE geneset. ... The server controls](https://reader034.fdocuments.in/reader034/viewer/2022050515/5f9f7a102fc1fe4b85757d50/html5/thumbnails/28.jpg)
Mouse strains:
![Page 29: Tools for Community Genome Annotation: Zmap and Otter · 2017-02-10 · Automated annotation alone is not sufficient for researchers needs GENCODE geneset. ... The server controls](https://reader034.fdocuments.in/reader034/viewer/2022050515/5f9f7a102fc1fe4b85757d50/html5/thumbnails/29.jpg)
Itgb3
Mettl2
Efcab3Cagetags=2181
Cagetags=2685808aa
Ens_RNASeq_ENCODE_canonical_introns_plus
Mouse strain annotation reveals new genes:
Ruth Bennett
![Page 30: Tools for Community Genome Annotation: Zmap and Otter · 2017-02-10 · Automated annotation alone is not sufficient for researchers needs GENCODE geneset. ... The server controls](https://reader034.fdocuments.in/reader034/viewer/2022050515/5f9f7a102fc1fe4b85757d50/html5/thumbnails/30.jpg)
Reference C57BL/6
129S1-SvImJ
Mouse strain annotation reveals strain specific coding transcripts: Ifi214
393aa NMD 420aa coding>Reference longest NMD transcript 393 AA. MVNEYKRIVLLTGLMGINDHDFRMVKSLLSKELKLNKMQDEYDRVKIADLMEDKFPKDAGVVQLIKLYKQIPGLGDIANKLKNEKAKAKRKGKGKRKTAAKRQRQEEPSTSQPMSTTNEDAEPESGRSTPDTQVAQLSLPTASRRNQAIQISPTIASSSGQTSSRSSETLQSIIQSPETPTRSSSRILDPPVSPGTAYSSAQALGVLLATPAKRQRLKNVPKEPSEENGYQQGSKKVMVLKVTEPFAYDMKGEKMFHATVATETEFFRVKVFDIVLKEKFIPNKVLTISNYVGCNGFINIYSASSVSEVNDGEPMNIPLSLRKSANRTPKINYLCSKRRGIFVNGVFTVCKKEERGYYICYEIGDDTGMMEVEVYGRLTNIACNPGDKLRLML*
Stop codon isn’t a SNP. Caused by a much larger disruption.
>129 Patch long coding transcript (equivalent to ref NMD) 420 AA. MVNEYKRIVLLTGLMGINDHDFRMVKSLLSKELKLNKMQDEYDRVKIADLMEDKFPKDAGVVQLIKLYKQIPGLGDIANKLKNEKAKAKRKGKGKRKTAAKRQRQEEPSTSQPMSTTNEDAEPESGRSTPDTQVAQLSLPTASRRNQAIQISPTIASSSGQTSSRSSETLQSIIQSPETPTRSSSRILDPPVSPGTAYSSAQALGVLLATPAKRQRLKNVPKEPSEENGYQLGSKKVMVLKVTEPFAYDMKGEKMFHATVATETEFFRVKVFDIVLKEKFIPNKVLTISNYVGCNGFINIYSASSVSEVNDGEPMNIPLSLRKSANRTPKINYLCSKRRGIFVNGVFTVCKKEERGYYICYEIGDDTGMMEVEVYGRLTNIACNPGDKLRLICFELTPDEETAWLRSTTHSNMQVIKARN*
Ruth Bennett
![Page 31: Tools for Community Genome Annotation: Zmap and Otter · 2017-02-10 · Automated annotation alone is not sufficient for researchers needs GENCODE geneset. ... The server controls](https://reader034.fdocuments.in/reader034/viewer/2022050515/5f9f7a102fc1fe4b85757d50/html5/thumbnails/31.jpg)
Nlrp1a
Nlrp1b
Nlrp1c-ps
Mouse strain annotation reveals strain specific expansions and biotype differences:Nlrp locus Reference: C57BL/6 Strains: PWK/PhJ WSB/EiJ CAST/EiJ
Pseuo
Gene
Gene
Pseudogene
PWK WSB CAST
Gene Pseudo Gene
Gene Gene Pseudo
Pseudo Pseudo Gene
Pseudo Pseudo Gene
Pseudo Pseudo Pseudo
Expansion
Ruth Bennett
![Page 32: Tools for Community Genome Annotation: Zmap and Otter · 2017-02-10 · Automated annotation alone is not sufficient for researchers needs GENCODE geneset. ... The server controls](https://reader034.fdocuments.in/reader034/viewer/2022050515/5f9f7a102fc1fe4b85757d50/html5/thumbnails/32.jpg)
MOUSE1. Brain2. Testis3. Heart4. Liver5. Embryo 7d6. Embryo 15d
HUMAN1. Brain2. Testis3. Heart4. Liver5. HeLa6. K562
Clinical data: RNA capture-seq across a range of tissues
![Page 33: Tools for Community Genome Annotation: Zmap and Otter · 2017-02-10 · Automated annotation alone is not sufficient for researchers needs GENCODE geneset. ... The server controls](https://reader034.fdocuments.in/reader034/viewer/2022050515/5f9f7a102fc1fe4b85757d50/html5/thumbnails/33.jpg)
Clinical Data
Human KCNMA1:22 original transcriptsNow 92 transcripts25 from SLR-SeqSLR-Seq also extended other transcripts
Micro exons, NAGNAG, alternate exon use
Marie-Marthe Suner
![Page 34: Tools for Community Genome Annotation: Zmap and Otter · 2017-02-10 · Automated annotation alone is not sufficient for researchers needs GENCODE geneset. ... The server controls](https://reader034.fdocuments.in/reader034/viewer/2022050515/5f9f7a102fc1fe4b85757d50/html5/thumbnails/34.jpg)
Final CommentsZmap/otter tool
Can handle large datasetsTraining, QC and feedback
Let the pipelines take the strainTargeted manual annotationSpend time on the tricky things
Feedback genome qualityReport errorsImprove the assembly
![Page 35: Tools for Community Genome Annotation: Zmap and Otter · 2017-02-10 · Automated annotation alone is not sufficient for researchers needs GENCODE geneset. ... The server controls](https://reader034.fdocuments.in/reader034/viewer/2022050515/5f9f7a102fc1fe4b85757d50/html5/thumbnails/35.jpg)
Havana:Adam FrankishIf BarnesRuth BennettAndrew BerryAlex BignellClaire DavidsonGloria Despacio-ReyesSarah DonaldsonMatthew HardyToby HuntMike KayJane LovelandDeepa ManthravadiGaurab MukherjeeJonathan MudgeMarie-Marthe SunerMark Thomas
Vega:Stephen TrevanionDan Sheppard
Annosoft:Ed GriffithsSteve Miller
Acknowledgements
Gencode:Jose GonzalezStephen Fitzgerald
ftp://ngs.sanger.ac.uk/production/gencode/update_trackhub/hub.txt