Importing Community annotations into VectorBase. Aims Provide the VectorBase community with tools...
-
Upload
mercy-ellis -
Category
Documents
-
view
225 -
download
1
Transcript of Importing Community annotations into VectorBase. Aims Provide the VectorBase community with tools...
![Page 1: Importing Community annotations into VectorBase. Aims Provide the VectorBase community with tools for improving genome annotation. Must have low entry.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f355503460f94c52d24/html5/thumbnails/1.jpg)
Importing Community annotations into
VectorBase
![Page 2: Importing Community annotations into VectorBase. Aims Provide the VectorBase community with tools for improving genome annotation. Must have low entry.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f355503460f94c52d24/html5/thumbnails/2.jpg)
Aims
• Provide the VectorBase community with tools for improving genome annotation.
• Must have low entry requirements, be scaleable and (relatively) simple to use
![Page 3: Importing Community annotations into VectorBase. Aims Provide the VectorBase community with tools for improving genome annotation. Must have low entry.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f355503460f94c52d24/html5/thumbnails/3.jpg)
Genome annotation
• First-pass genome annotation is almost always based on “automatic” computational approaches
• ab initio
• Similarity based
• Transcript (ESTs, RNAseq)
• Protein (nr protein database)
![Page 4: Importing Community annotations into VectorBase. Aims Provide the VectorBase community with tools for improving genome annotation. Must have low entry.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f355503460f94c52d24/html5/thumbnails/4.jpg)
Genome assembly
Map Repeats
Genefinding
Protein-coding genes
Map Transcripts Map Peptides
nc-RNAs
Functional annotation
Submission to archival databases (Release)
Genome annotation - building a pipeline
![Page 5: Importing Community annotations into VectorBase. Aims Provide the VectorBase community with tools for improving genome annotation. Must have low entry.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f355503460f94c52d24/html5/thumbnails/5.jpg)
Current VectorBase annotation pipeline
• MAKER based automatic annotation
• includes SNAP training and ab initio
• RNAseq based transcript similarity prediction
• Taxonomically constrained peptide similarity prediction
• 2 rounds of prediction refinement & final round includes all peptide similarity
• Community annotation phase
• Capture gene structure changes
• Metadata associated with locus (symbol, description, citation)
• Submission to INSDC, propagation to UniProt
• Presentation through VectorBase
Start
1.0 set(automati
c)
1.1 set(published
)
![Page 6: Importing Community annotations into VectorBase. Aims Provide the VectorBase community with tools for improving genome annotation. Must have low entry.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f355503460f94c52d24/html5/thumbnails/6.jpg)
Processing submissions
• 4 phases
• Capture
• Moderation
• Storage
• Integration
![Page 7: Importing Community annotations into VectorBase. Aims Provide the VectorBase community with tools for improving genome annotation. Must have low entry.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f355503460f94c52d24/html5/thumbnails/7.jpg)
Capture: Community annotation decision tree
![Page 8: Importing Community annotations into VectorBase. Aims Provide the VectorBase community with tools for improving genome annotation. Must have low entry.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f355503460f94c52d24/html5/thumbnails/8.jpg)
Community annotation decision tree
![Page 9: Importing Community annotations into VectorBase. Aims Provide the VectorBase community with tools for improving genome annotation. Must have low entry.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f355503460f94c52d24/html5/thumbnails/9.jpg)
Tool of choice: WebApollo
• Web-based
• Eliminates main drawback of deprecated CAP system - GFF3 format validation
![Page 10: Importing Community annotations into VectorBase. Aims Provide the VectorBase community with tools for improving genome annotation. Must have low entry.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f355503460f94c52d24/html5/thumbnails/10.jpg)
WebApollo example
![Page 11: Importing Community annotations into VectorBase. Aims Provide the VectorBase community with tools for improving genome annotation. Must have low entry.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f355503460f94c52d24/html5/thumbnails/11.jpg)
Community annotation decision tree
![Page 12: Importing Community annotations into VectorBase. Aims Provide the VectorBase community with tools for improving genome annotation. Must have low entry.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f355503460f94c52d24/html5/thumbnails/12.jpg)
Community annotation decision tree
![Page 13: Importing Community annotations into VectorBase. Aims Provide the VectorBase community with tools for improving genome annotation. Must have low entry.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f355503460f94c52d24/html5/thumbnails/13.jpg)
Tool of choice: Web forms
![Page 14: Importing Community annotations into VectorBase. Aims Provide the VectorBase community with tools for improving genome annotation. Must have low entry.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f355503460f94c52d24/html5/thumbnails/14.jpg)
Moderation & Storage
• Gene metadata captured through forms to spreadsheets
• Batch submissions use similar spreadsheet format
![Page 15: Importing Community annotations into VectorBase. Aims Provide the VectorBase community with tools for improving genome annotation. Must have low entry.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f355503460f94c52d24/html5/thumbnails/15.jpg)
Integration: Dataflow for ‘patch’ build
CAP GFF3
WebApollo
Reference core
Updated geneset
TXT
Patch
Users
Stable IDs
Reports
Updated core
IDs
Reference core CAP
Release coreGoogle Fusion
TableXrefs
Release
XrefsGoogle Form
`
Metadata
Users
}Commit
![Page 16: Importing Community annotations into VectorBase. Aims Provide the VectorBase community with tools for improving genome annotation. Must have low entry.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f355503460f94c52d24/html5/thumbnails/16.jpg)
Presentation of community annotation
![Page 17: Importing Community annotations into VectorBase. Aims Provide the VectorBase community with tools for improving genome annotation. Must have low entry.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f355503460f94c52d24/html5/thumbnails/17.jpg)
Usage (as of 2015-03-30)
• 31 WebApollo instances (Organisms)
• 3,407 gene models
• Gene metadata (protein-coding loci)
• 4,987 gene symbols
• 512 gene synonyms
• 57,878 gene descriptions
• 910 loci citations from 208 publications
![Page 18: Importing Community annotations into VectorBase. Aims Provide the VectorBase community with tools for improving genome annotation. Must have low entry.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f355503460f94c52d24/html5/thumbnails/18.jpg)
Supplementing annotations
• Community jamboree’s
• ‘Standard’ improvement (e.g. Sandfly, snail communities)
• Glossina community (e.g. March 2015, Kenya)
• VectorBase
• Default Xref run includes symbol/description assignment via UniProt
• Projection of gene description via orthology from key marker species (e.g. An. gambiae). Due to be deployed for June (VB-2015-06) release.
• Supplemental data from genome papers (e.g. 16 Anopheles spp, Musca)
![Page 19: Importing Community annotations into VectorBase. Aims Provide the VectorBase community with tools for improving genome annotation. Must have low entry.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f355503460f94c52d24/html5/thumbnails/19.jpg)
![Page 20: Importing Community annotations into VectorBase. Aims Provide the VectorBase community with tools for improving genome annotation. Must have low entry.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f355503460f94c52d24/html5/thumbnails/20.jpg)
Deprecated CAP system example