GNPAnnot postergnp20100329 11

1
Project 2010 Aims: GNPAnnot is a project on green genomics which intends to develop a community system of structural and functional annotation supported by comparative genomics and dedicated to plant and bio-aggressor genomes allowing both automatic predictions and manual curations of genomic objects. Gaëtan Droc 1 , Valentin Guignon 1 , Franc-Christophe Baurens 1 , Vincent Jouffe 1 , Claire Poiron 1 , Juliette Lengellé 1 , Olivier Garsmeur 1 , Mathieu Rouard 2 ,Stéphanie Bocs 1 2 CfL, Bioversity, Montpellier Michael Alaux 3 , Leatitia Brigitte 3 , Delphine Steinbach Samson 3 , Erik Kimmel 3 , Cyril Pommier 3 , Isabelle Luyten 3 , Nancy Terrier 5 , Philippe Leroy 4 , Hadi Quesneville 3 4 UMR GDEC, INRA, Clermont 5 UMR SPO, INRA, Montpellier Joelle Amselem 6 , Baptiste Brault 6 , Adeline Simon 6 , Victoria Dominguez Del Angel 6 , Claire Hoede 6 , Sabine Fillinger 6 , Michel Meyer 6 , Thierry Rouxel 6 , Marc-Henri Lebrun 6 6 UMR BIOGER, INRA, Versailles 7 UMR BIO3P, INRA, Le Rheu BIVI 8 UMR BIVI, INRA, Montpellier References: http://www.gnpannot.org/ http://www.gmod.org Contact: [email protected] [email protected] [email protected] [email protected] Done with plant, insect, fungal genomic sequences: - Predictions of protein-coding genes and transposable elements - CAS core roundtrips: Chado, GBrowse, Apollo, Artemis - Feature, qualifier, value, annotation rule definitions - Annotator training courses & manual curation of biological features - GMOD report development - Chado controller development to manage access rights, annotation inspector & history - In collaboration with GnpIntegr project, advanced search user interface / query builders: Biomart, Hibernate search (lucene) - Communications (posters, talks, Web site) 1 UMR DAP, CIRAD, Montpellier Fabrice Legeai 7 , Goulven Kerbellec 9 , Olivier Collin 10 , Jean-Pierre Gauthier 7 , Emmanuelle d’Alençon 8 , François Cousserans 8 , Philippe Fournier 8 , Denis Tagu 7 9 Korilog SARL, Muzillac 10 IRISA, INRIA, Rennes 3 URGI, INRA, Evry Place Subject Unit Organism predicted curated current predicted curated current DAP Banana 7.13 1378 441 1298 3836 1279 2095 CfL Palm tree 0.27 43 30 41 5 5 9 Sugarcane 1.30 133 113 URGI Grapevine 480.00 26346 SPO GDEC Wheat 3B 18.21 175 175 10782 3222 3222 Botrytis 39.50 16360 1096 32 Leptosphaeria 44.90 12469 0 1850 472 Tuber 124.90 7496 1307 2520 0 BIO3P Aphid 460.00 34821 1926 34547 498474 ~800 498474 BIVI Lepidopteran 4.00 1086 70 1086 2027 0 2027 In progress Wheat & grapevine BIOGER Gene nb TE nb Genomic size (Mb) In progress Montpellier Versailles Rennes South & Tropical plants Fungi Insects Component core Montpellier Gene structure automatic annotation EuGène EuGène TriAnnot Gene function & genome comparison in-house pipeline Funannot pipeline MAUVE TE automatic annotation REPET REPET REPET SGBD Postgres Chado Postgres Chado MySQL BioDBSeqFeat Postgres Chado Genome browser GBrowse GBrowse GBrowse Genome editor Artemis Apollo Apollo Synteny Viewer Apollo Cmap Search & query builder Biomart Hibernate search Apache Lucene Versailles Rennes Results: Architecture of GNPAnnot CAS in three bioinformatics platforms GNPAnnot CAS resource statistics Database Storage CHADO with controller Gene Databanks Uniprot (Swiss-Prot, TrEMBL) Genbank / EMBL / DDBJ EST databanks MSU (rice) / TAIR (arabidopsis) TE Databanks Repbase TREP Plant Repeat Database Internal Ontologies Sequence Ontology Gene Ontology Feature Property Prediction pipeline Annotation storage Annotation Browser Annotation Editor GFF3 Query Builders Intermine BioMart Hibernate search GnpIntegr Comparative Genomics viewers CMAP GBrowse_syn Apollo synteny viewer Artemis Comparison Tool Genome Browsers GBrowse with access rights GMOD report Annotation history Genome Editors Apollo Artemis with inspector Gene automatic annotation EuGene Repeat automatic annotation Structure Combiner (nucleotide) - BLASTER / BLASTn - RepeatMasker - CENSOR - MATCHER - TRF - Mreps - BLASTER / tBLASTx Other repeat analyses - RepSeek - LTR_STRUC - LTR_Finder - LTRharvest - TE nest - FINDMITE REPET Structure combiner (nucleotide) - EugeneIMM - FGENESH - SpliceMachine - Gth - BLASTx Refinement structure (nucleotide region) function (protein) - Gth - tBLASTn | prot4EST | frameDP - Exonerate - BLASTp / BBMH - InterProScan Comparative Genomics - Ensembl Compara - Greenphyl Other ISs (e.g. GnpIS) DDBJ / EMBL / GenBank EST (GnpSeq, ESTtik) Marker (SIReGal, TropGene) Metabolism (BioCyc, KEGG) Interoperability faa GFF3 fna GFF3 EMBL fna faa db_xref EC_number Team work (Wiki, Alfresco, JIRA, CVS/SVN, Drupal) GFF3 nwk clustalW Other storage Ensembl CMAP GBrowse_syn Intermine BioMart Flat files Concept: The Community Annotation System (CAS) is user-friendly, generic, modular, portable, sustainable, upgradable and compatible History GBrowse Artemis GMOD report Ongoing work: - JBrowse - Annotation extractors, reconcilers & updaters (new genomic sequence, new gene annotation, other gene annotation set, new assembly of a genomic sequence) - Comparative genomics - Bioinformatics platform exchanges - Integration of annotation history in the GMOD report - Interoperability with other systems - Communication (CECILL licences, publications) Apollo

Transcript of GNPAnnot postergnp20100329 11

Page 1: GNPAnnot postergnp20100329 11

Project 2010

Aims: GNPAnnot is a project on green genomics which intends to develop a community system of structural and functional annotation supported by comparative genomics and dedicated to plant and bio-aggressor genomes allowing both automatic predictions and manual curations of genomic objects.

Gaëtan Droc1, Valentin Guignon1, Franc-Christophe Baurens1, Vincent Jouffe1, Claire Poiron1, Juliette Lengellé1, Olivier Garsmeur1, Mathieu Rouard2,Stéphanie Bocs1

2 CfL, Bioversity, Montpellier

Michael Alaux3, Leatitia Brigitte3, Delphine Steinbach Samson3, Erik Kimmel3, Cyril Pommier3, Isabelle Luyten3, Nancy Terrier5, Philippe Leroy4, Hadi Quesneville3

4 UMR GDEC, INRA, Clermont 5 UMR SPO, INRA, Montpellier

Joelle Amselem6, Baptiste Brault6, Adeline Simon6, Victoria Dominguez Del Angel6, Claire Hoede6, Sabine Fillinger6, Michel Meyer6, Thierry Rouxel6, Marc-Henri Lebrun6

6 UMR BIOGER, INRA, Versailles 7 UMR BIO3P, INRA, Le Rheu

BIVI8 UMR BIVI, INRA, Montpellier

References:http://www.gnpannot.org/http://www.gmod.org

Contact:[email protected]@[email protected]@versailles.inra.fr

Done with plant, insect, fungal genomic sequences:- Predictions of protein-coding genes and transposable elements- CAS core roundtrips: Chado, GBrowse, Apollo, Artemis- Feature, qualifier, value, annotation rule definitions- Annotator training courses & manual curation of biological features- GMOD report development- Chado controller development to manage access rights, annotation inspector & history- In collaboration with GnpIntegr project, advanced search user interface / query builders: Biomart, Hibernate search (lucene)- Communications (posters, talks, Web site)

1 UMR DAP, CIRAD, Montpellier

Fabrice Legeai7, Goulven Kerbellec9, Olivier Collin10, Jean-Pierre Gauthier7, Emmanuelle d’Alençon8, François Cousserans8, Philippe Fournier8, Denis Tagu7

9 Korilog SARL, Muzillac

10 IRISA, INRIA, Rennes3 URGI, INRA, Evry

Place Subject Unit Organism predicted curated current predicted curated currentDAP Banana 7.13 1378 441 1298 3836 1279 2095CfL Palm tree 0.27 43 30 41 5 5 9

Sugarcane 1.30 133 113URGI Grapevine 480.00 26346SPOGDEC Wheat 3B 18.21 175 175 10782 3222 3222

Botrytis 39.50 16360 1096 32Leptosphaeria 44.90 12469 0 1850 472Tuber 124.90 7496 1307 2520 0

BIO3P Aphid 460.00 34821 1926 34547 498474 ~800 498474BIVI Lepidopteran 4.00 1086 70 1086 2027 0 2027

In progress

Wheat & grapevine

BIOGER

Gene nb TE nbGenomic size (Mb)

In progress

Montpellier

Versailles

Rennes

South & Tropical plants

Fungi

Insects

Component core MontpellierGene structure automatic annotation EuGène EuGène TriAnnotGene function & genome comparison in-house pipeline Funannot pipeline MAUVETE automatic annotation REPET REPET REPETSGBD Postgres Chado Postgres Chado MySQL BioDBSeqFeat Postgres ChadoGenome browser GBrowse GBrowse GBrowseGenome editor Artemis Apollo ApolloSynteny Viewer Apollo CmapSearch & query builder Biomart Hibernate search Apache Lucene

Versailles Rennes

Results: Architecture of GNPAnnot CAS in three bioinformatics platforms

GNPAnnot CAS resource statistics

Database StorageCHADO

with controller

Gene DatabanksUniprot (Swiss-Prot, TrEMBL)Genbank / EMBL / DDBJEST databanksMSU (rice) / TAIR (arabidopsis)

TE DatabanksRepbaseTREPPlant Repeat DatabaseInternal

OntologiesSequence OntologyGene OntologyFeature Property

Prediction pipeline

Annotation storage

Annotation Browser

Annotation Editor

GFF3

Query Builders

IntermineBioMartHibernate search GnpIntegr

Comparative Genomicsviewers

CMAPGBrowse_synApollo synteny viewerArtemis Comparison Tool

Genome BrowsersGBrowse with access rightsGMOD reportAnnotation history

Genome Editors

ApolloArtemis with inspector

Gene automatic annotation

EuGene

Repeat automatic annotationStructure Combiner (nucleotide)- BLASTER / BLASTn- RepeatMasker- CENSOR- MATCHER- TRF- Mreps- BLASTER / tBLASTxOther repeat analyses- RepSeek- LTR_STRUC- LTR_Finder- LTRharvest- TE nest- FINDMITE

REPET

Structure combiner (nucleotide)- EugeneIMM- FGENESH- SpliceMachine- Gth- BLASTxRe�nementstructure (nucleotide region)function (protein)- Gth- tBLASTn | prot4EST | frameDP- Exonerate- BLASTp / BBMH- InterProScan

Comparative Genomics- Ensembl Compara- Greenphyl

Other ISs (e.g. GnpIS)

DDBJ / EMBL / GenBankEST (GnpSeq, ESTtik)Marker (SIReGal, TropGene)Metabolism (BioCyc, KEGG)

Interoperability

faa GFF3 fna

GFF3EMBLfnafaadb_xrefEC_number

Team work (Wiki, Alfresco,JIRA, CVS/SVN, Drupal)

GFF3

nwk clustalW

Other storageEnsemblCMAPGBrowse_synIntermineBioMartFlat �les

Concept: The Community Annotation System (CAS) is user-friendly, generic, modular, portable, sustainable, upgradable and compatible

HistoryGBrowse

Artemis GMOD report

Ongoing work:- JBrowse- Annotation extractors, reconcilers & updaters (new genomic sequence, new gene annotation, other gene annotation set, new assembly of a genomic sequence)- Comparative genomics- Bioinformatics platform exchanges- Integration of annotation history in the GMOD report- Interoperability with other systems- Communication (CECILL licences, publications)

Apollo