Web Apollo at Genome Informatics 2014
-
Upload
monica-munoz-torres -
Category
Education
-
view
257 -
download
1
description
Transcript of Web Apollo at Genome Informatics 2014
APOLLOCollaborat ive Curation and Interact ive Analysis of Genomes Monica Munoz-Torres, PhD | @monimunoztoSuzanna Lewis, Ian Holmes, Colin Diesh, Deepak Unni, Christine Elsik. Berkeley Bioinformatics Open-Source Projects (BBOP)Genomics Division, Lawrence Berkeley National LaboratoryGenome Informatics. Cambridge, UK. September, 2014
OUTLINE
• MANUAL CURATION is necessary, but does not always scale
• EMPOWER CURATORS collabora@ve genome annota@on
• WEB APOLLO architecture, implementa@on, plans
• BBOP PROJECTS future plans
Web Apollo Collabora@ve Cura@on and Interac@ve Analysis of Genomes
2
MANUAL ANNOTATIONis necessary
v Automated genome analyses remain an imperfect art.
v Precise elucida@on of biological features encoded in the genome requires careful examina@on and review.
• Evaluate all available evidence and corroborate or modify genome element predic@ons.
• Resolve discrepancies and validate automated gene model hypotheses.
v Desktop version of Apollo was designed to fit the manual annota@on needs of genome projects such as Human, Mouse, Fruit fly, Zebrafish, etc.
Schiex et al. Nucleic Acids 2003 (31) 13: 3738-‐3741
Automated Predictions
Experimental Evidence
Manual Curation 3
4
CURATIONin this context
Iden@fies elements that best represent the underlying biology (including missing genes) and eliminates elements that reflect systemic errors of automated analyses.
Assigns func@on through compara@ve analysis of similar genome elements from closely related species using literature, databases, and researchers’ lab data.
1
2
Examples
Comparing 7 ant genomes contributed to be_er understanding evolu@on and organiza@on of insect socie@es at the molecular level; e.g. division of labor, mutualism, chemical communica@on, etc.
Libbrecht et al. 2012. Genome Biology 2013, 14:212
Queen Bee
Worker Bee Castes
Larva
Dnmt RNAi Royal jelly
Kucharski et al. 2008. Science (319) 5871: 1827-‐1830
Insect Methylome
Anchoring molecular markers to reference genome pointed to chromosomal rearrangements & detec@ng signals of adap@ve radia@on in Heliconius bu_erflies.
Joron et al. 2011. Nature, 477:203-‐206 Manual Curation
BUT, MANUAL CURATIONdoes not always scale
A small group of highly trained experts; e.g. GO
1 Museum
A few very good biologists and a few very good bioinforma@cians camp together, during intense but short periods of @me.
Jamboree 2
Researchers work by themselves, then may or may not publicize results; … may be a dead-‐end with very few people ever aware of these results.
Co>age 3
Elsik et al. 2006. Genome Res. 16(11):1329-‐33.
Manual Curation 5
Too many sequences and not enough hands to approach cura@on.
POWER TO THE CURATORSaugment existing tools
Fill in the gap for all the things that won’t be easy to cover with these approaches; this will allow researchers to be_er contribute their efforts.
Give more people the power to curate! Big data are not a subs@tute for, but a supplement to tradi@onal data collec@on and analysis.
The Parable of Google Flu. Lazer et al. 2014. Science 343 (6176): 1203-‐1205.
v Enable more curators to work
v Enable be_er scien@fic publishing
v Credit curators for their work
WEB APOLLO 6
IMPROVING TOOLS FOR MANUAL ANNOTATIONour plan
“More and more sequences”: more genomes, within popula@ons and across species, are now being sequenced.
This begs the need for a universally accessible genome cura@on tool:
WEB APOLLO 7
To produce accurate sets of genomic features.
To address the need to correct for more frequent assembly and automated predic@on errors due to new sequencing technologies.
GENOME ANNOTATIONan inherently collaborative task
Researchers onen turn to colleagues for second opinions and insight from those with exper@se in par@cular areas (e.g., domains, families). To facilitate and encourage this, we con@nue to improve Apollo. The new Javascript-‐based Apollo :
WEB APOLLO 8
v Web based for easy access. v Concurrent access supports real @me collabora@on. v Built-‐in support for standards (transparently compliant). v Automa@c genera@on of ready-‐made computable data. v Client-‐side applica@on relieves server bo_leneck and supports privacy. v Supports annota@on of genes, pseudogenes, tRNAs, snRNAs,
snoRNAs, ncRNAs, miRNAs, TEs, and repeats.
WEB APOLLOarchitecture
WEB APOLLO 9
1
2
3
WEB-BASED CLIENTuser interaction
v Plugin to JBrowse
v Graphic interface for edi@ng opera@ons and to handle user management
v Two new kinds of tracks: DNA and User-‐created Annota<ons.
1) Pulls from data service 2) Sends “edit” opera@ons to server, and 3) “Listens” to edits pushed back from server
WEB APOLLO 10
1
ANNOTATION EDITING ENGINEthe logic
v Server: Java servlet
v Data Model (and I/O): GBOL. Chado-‐based. Simple Hibernate layer & wrapper bio-‐layer that considers SO.
v EdiKng Logic: Selects longest ORFs, flags non-‐canonical splice sites. This is where biology “reasons”.
v Plug-‐in Architecture: for sequence alignment searches (BLAT).
v JE (Java version of Berkeley DB): stores annota@ons, edits, and History.
v Real-‐Kme support.
WEB APOLLO 11
2
SERVER SIDE DATA SERVICEaccess and broker data
v Data are processed with (perl) pipelines that generate sta@c JSON.
• Cultural shiO: Reliance on big genome
centers is not so prominent any more, and not all data come from large repositories.
• Mostly from GFF3s (both from sequencing centers and individual laboratories).
v Data repositories (i.e. Chado, UCSC-‐MySQL, DAS) accessed by Java data broker (Trellis): passes them as JSON to JBrowse for display.
WEB APOLLO 12
3
CURRENT COLLABORATIONScrowdsourcing development too
v New avenues for landing on Apollo and customiza@on of addi@onal applica@ons.
v Web services for alignment and func@onal annota@on tools. v RNAseq datasets being used to re-‐annotate the bovine genome, finding
genes that neither RefSeq nor Ensembl predicted. Also crea@ng track of disagreement between sets.
v Bovine genome consor@um making previous itera@ons of manual annota@on
efforts (from 3 assemblies ago) available for integra@on of curated models.
WEB APOLLO 13
UNIVERSITY of MISSOURI
National Agricultural Library
CURRENT COLLABORATIONStraining and contributions
Partnerships
WEB APOLLO 14
UNIVERSITY of MISSOURI
National Agricultural Library
Nature Reviews Gene<cs 2009 (10), 346-‐347
CURRENT COLLABORATIONStraining and contributions
Partnerships
WEB APOLLO 15
UNIVERSITY of MISSOURI
National Agricultural Library
Nature Reviews Gene<cs 2009 (10), 346-‐347
Norwegian Spruce h_p://congenie.org/
Phlebotomus papatasi
Tallapoosa darter h_p://darter2.westga.edu/
Wasmania auropunctata
Homo sapiens hg19
Pinus taeda hEp://dendrome.ucdavis.edu/treegenes/browsers/
FUTURE PLANSinteractive analysis and curation of variants
v Interac@ve explora@on of VCF files (e.g. from GATK, VAAST) in addi@on to BAM and GVF. Mul@ple tracks in one: visualiza@on of gene@c altera@ons and popula@on frequency of variants.
WEB APOLLO 16
1
1
2
v Clinical applica@ons: analysis of Copy Number Varia@ons for regulatory effects; overlaying display of the regulatory domains.
Philips-‐Creminis and Corces. 2013. Cell 50 (4):461-‐474
2 TADs: topologically associa@ng domains
FUTURE PLANSeducational tools
We are working with educators to make Web Apollo part of their curricula.
WEB APOLLO 17
Lecture Series.
In the classroom. At the lab.
Classroom exercises: from genome sequence to
hypothesis.
Cura@on group dedicated to producing educa@on materials for non-‐model organism communi@es.
Our team provides online documenta@on, hands-‐on
training, and rapid response to users.
FEDERATED ENVIRONMENTother BBOP tools
BBOP Projects 18
ALL ARE WELCOMEmonthly, or permanently!
Open Call for Developers on the First Thursday of each month at 9:00AM (Pacific Time).
Message @monimunozto for details.
BBOP Projects 19
We Are Hiring: join us in the San Francisco Bay Area! h_p://@nyurl.com/jobs-‐at-‐bbop
• Berkeley BioinformaKcs Open-‐source Projects (BBOP), Berkeley Lab: Web Apollo and Gene Ontology teams. Suzanna E. Lewis (PI).
• § Chris@ne G. Elsik (PI). University of Missouri.
• * Ian Holmes (PI). University of California Berkeley.
• Arthropod genomics community: i5K Steering Commi_ee, Alexie Papanicolaou (CSIRO), Monica Poelchau (USDA/NAL), fringy Richards (HGSC-‐BCM), BGI, Oliver Niehuis at 1KITE h_p://www.1kite.org/, and the Honey Bee Genome Sequencing Consor@um.
• Web Apollo is supported by NIH grants 5R01GM080203 from NIGMS, and 5R01HG004483 from NHGRI, and by the Director, Office of Science, Office of Basic Energy Sciences, of the U.S. Department of Energy under Contract No. DE-‐AC02-‐05CH11231.
• Insect images used with permission: h_p://AlexanderWild.com and O. Niehuis.
• For your a>enKon, thank you!
Thank you. 20
Web Apollo
Nathan Dunn
Colin Diesh §
Deepak Unni §
Gene Ontology
Chris Mungall
Seth Carbon
Heiko Dietze
BBOP
Web Apollo: h_p://GenomeArchitect.org
GO: h_p://GeneOntology.org
i5K: h_p://arthropodgenomes.org/wiki/i5K
Alumni
Gregg Helt
Ed Lee
Rob Buels*
Thanks!