GNPAnnot postergnp20100329 11

Post on 15-Apr-2022

0 views 0 download

Transcript of GNPAnnot postergnp20100329 11

Project 2010

Aims: GNPAnnot is a project on green genomics which intends to develop a community system of structural and functional annotation supported by comparative genomics and dedicated to plant and bio-aggressor genomes allowing both automatic predictions and manual curations of genomic objects.

Gaëtan Droc1, Valentin Guignon1, Franc-Christophe Baurens1, Vincent Jouffe1, Claire Poiron1, Juliette Lengellé1, Olivier Garsmeur1, Mathieu Rouard2,Stéphanie Bocs1

2 CfL, Bioversity, Montpellier

Michael Alaux3, Leatitia Brigitte3, Delphine Steinbach Samson3, Erik Kimmel3, Cyril Pommier3, Isabelle Luyten3, Nancy Terrier5, Philippe Leroy4, Hadi Quesneville3

4 UMR GDEC, INRA, Clermont 5 UMR SPO, INRA, Montpellier

Joelle Amselem6, Baptiste Brault6, Adeline Simon6, Victoria Dominguez Del Angel6, Claire Hoede6, Sabine Fillinger6, Michel Meyer6, Thierry Rouxel6, Marc-Henri Lebrun6

6 UMR BIOGER, INRA, Versailles 7 UMR BIO3P, INRA, Le Rheu

BIVI8 UMR BIVI, INRA, Montpellier

References:http://www.gnpannot.org/http://www.gmod.org

Contact:stephanie.sidibe-bocs@cirad.frmichael.alaux@versailles.inra.frfabrice.legeai@rennes.inra.frjoelle.amselem@versailles.inra.fr

Done with plant, insect, fungal genomic sequences:- Predictions of protein-coding genes and transposable elements- CAS core roundtrips: Chado, GBrowse, Apollo, Artemis- Feature, qualifier, value, annotation rule definitions- Annotator training courses & manual curation of biological features- GMOD report development- Chado controller development to manage access rights, annotation inspector & history- In collaboration with GnpIntegr project, advanced search user interface / query builders: Biomart, Hibernate search (lucene)- Communications (posters, talks, Web site)

1 UMR DAP, CIRAD, Montpellier

Fabrice Legeai7, Goulven Kerbellec9, Olivier Collin10, Jean-Pierre Gauthier7, Emmanuelle d’Alençon8, François Cousserans8, Philippe Fournier8, Denis Tagu7

9 Korilog SARL, Muzillac

10 IRISA, INRIA, Rennes3 URGI, INRA, Evry

Place Subject Unit Organism predicted curated current predicted curated currentDAP Banana 7.13 1378 441 1298 3836 1279 2095CfL Palm tree 0.27 43 30 41 5 5 9

Sugarcane 1.30 133 113URGI Grapevine 480.00 26346SPOGDEC Wheat 3B 18.21 175 175 10782 3222 3222

Botrytis 39.50 16360 1096 32Leptosphaeria 44.90 12469 0 1850 472Tuber 124.90 7496 1307 2520 0

BIO3P Aphid 460.00 34821 1926 34547 498474 ~800 498474BIVI Lepidopteran 4.00 1086 70 1086 2027 0 2027

In progress

Wheat & grapevine

BIOGER

Gene nb TE nbGenomic size (Mb)

In progress

Montpellier

Versailles

Rennes

South & Tropical plants

Fungi

Insects

Component core MontpellierGene structure automatic annotation EuGène EuGène TriAnnotGene function & genome comparison in-house pipeline Funannot pipeline MAUVETE automatic annotation REPET REPET REPETSGBD Postgres Chado Postgres Chado MySQL BioDBSeqFeat Postgres ChadoGenome browser GBrowse GBrowse GBrowseGenome editor Artemis Apollo ApolloSynteny Viewer Apollo CmapSearch & query builder Biomart Hibernate search Apache Lucene

Versailles Rennes

Results: Architecture of GNPAnnot CAS in three bioinformatics platforms

GNPAnnot CAS resource statistics

Database StorageCHADO

with controller

Gene DatabanksUniprot (Swiss-Prot, TrEMBL)Genbank / EMBL / DDBJEST databanksMSU (rice) / TAIR (arabidopsis)

TE DatabanksRepbaseTREPPlant Repeat DatabaseInternal

OntologiesSequence OntologyGene OntologyFeature Property

Prediction pipeline

Annotation storage

Annotation Browser

Annotation Editor

GFF3

Query Builders

IntermineBioMartHibernate search GnpIntegr

Comparative Genomicsviewers

CMAPGBrowse_synApollo synteny viewerArtemis Comparison Tool

Genome BrowsersGBrowse with access rightsGMOD reportAnnotation history

Genome Editors

ApolloArtemis with inspector

Gene automatic annotation

EuGene

Repeat automatic annotationStructure Combiner (nucleotide)- BLASTER / BLASTn- RepeatMasker- CENSOR- MATCHER- TRF- Mreps- BLASTER / tBLASTxOther repeat analyses- RepSeek- LTR_STRUC- LTR_Finder- LTRharvest- TE nest- FINDMITE

REPET

Structure combiner (nucleotide)- EugeneIMM- FGENESH- SpliceMachine- Gth- BLASTxRe�nementstructure (nucleotide region)function (protein)- Gth- tBLASTn | prot4EST | frameDP- Exonerate- BLASTp / BBMH- InterProScan

Comparative Genomics- Ensembl Compara- Greenphyl

Other ISs (e.g. GnpIS)

DDBJ / EMBL / GenBankEST (GnpSeq, ESTtik)Marker (SIReGal, TropGene)Metabolism (BioCyc, KEGG)

Interoperability

faa GFF3 fna

GFF3EMBLfnafaadb_xrefEC_number

Team work (Wiki, Alfresco,JIRA, CVS/SVN, Drupal)

GFF3

nwk clustalW

Other storageEnsemblCMAPGBrowse_synIntermineBioMartFlat �les

Concept: The Community Annotation System (CAS) is user-friendly, generic, modular, portable, sustainable, upgradable and compatible

HistoryGBrowse

Artemis GMOD report

Ongoing work:- JBrowse- Annotation extractors, reconcilers & updaters (new genomic sequence, new gene annotation, other gene annotation set, new assembly of a genomic sequence)- Comparative genomics- Bioinformatics platform exchanges- Integration of annotation history in the GMOD report- Interoperability with other systems- Communication (CECILL licences, publications)

Apollo