The Generation Challenge Programme (GCP) Platform for Crop Research

35
1 1 1 1 The Generation Challenge Programme (GCP) Platform for Crop Research Richard Bruskiewich and the rest of …

description

The Generation Challenge Programme (GCP) Platform for Crop Research. Richard Bruskiewich and the rest of …. …The GCP SP4 team and Contributors. Theo van Hintum (WUR), GCP Subprogramme 4 Leader. IRRI-CIMMYT Crop Research Informatics Laboratory Graham McLaren - PowerPoint PPT Presentation

Transcript of The Generation Challenge Programme (GCP) Platform for Crop Research

Page 1: The Generation Challenge Programme (GCP)  Platform for Crop Research

1 1 1 1

The Generation Challenge Programme (GCP)

Platform for Crop Research

Richard Bruskiewich

and the rest of …

Page 2: The Generation Challenge Programme (GCP)  Platform for Crop Research

…The GCP SP4 team and Contributors

IRRI-CIMMYT

Crop Research

Informatics Laboratory

Graham McLaren

Thomas Metz

Martin Senger

Ramil Mauleon

Mylah Anacleto

Michael Jonathan Mendoza

Victor Jun Ulat

Arllet Portugal

Ryan Alamban

Lord Hendrix Barboza

Jeffrey Detras

Kevin Manansala

Jeffrey Morales

Barry Peralta

Rowena Valerio

Nelzo Ereful

CIP:

Reinhard Simon

Edwin Rojas

ICRISAT:

Jayashree Balaji

ICARDA:

Akinnola Akintunde

NCGR:

Andrew Farmer

Gary Schiltz

SCRI:

Jennifer Lee

David Marshall

Cornell University:

Terry Casstevens

Pankaj Jaiswal

Dave Matthews

ACGT:

Ayton Meintjes

Jane Morris

CIRAD:

Manuel Ruiz

Alexis Dereeper

Matthieu Conte

Brigitte Courtois

Bioversity:

Mathieu Rouard

Tom Hazekamp

Milko Skofic

Raj Sood

NIAS:

Masaru Takeya

Koji Doi

Kouji Satoh

Shoshi Kikuchi

EMBRAPA:

Marcos Costa

Natalia Martins

Georgios Pappas

Guy Davenport

Trushar Shah

Kyle Braak

Sebastian Ritter

Yi Zhang

Sergio Gregorio

Joseph Hermocilla

Michael Echavez

Roque Almodiel

Samart Wanchana

Supat Thongjuea

Theo van Hintum (WUR), GCP Subprogramme 4 Leader

University of British Columbia:

Mark Wilkinson

GSC Bioinformatics Graduate Program, BC Cancer Agency:

Benjamin Good

James Wagner

Page 3: The Generation Challenge Programme (GCP)  Platform for Crop Research

Overview

Generation Challenge Programme crop informatics research and development

GCP platform architecture: Domain model & ontology

Application development framework

Page 4: The Generation Challenge Programme (GCP)  Platform for Crop Research

Challenge Programme

“I challenge the next generation to use new scientific tools and techniques to address the problems that plague the world’s poor”

Dr. Norman Borlaug

http://www.generationcp.org

Page 5: The Generation Challenge Programme (GCP)  Platform for Crop Research

An international research programme established in 2003, projected to last 10 years, and hosted by the CGIAR with global partners from ARI and NARES

Research Themes Directed to Crop Improvement:

Genomics and comparative biology across species

Characterization of genetic diversity for allele mining

Gene transfer technologies

Five research subprogrammes, one of which is crop information systems development.

What is it?

Page 6: The Generation Challenge Programme (GCP)  Platform for Crop Research

Challenge Programme

Cornell University USA

Wageningen University Netherlands

John Innes Centre UK

NIAS Japan

Agropolis France

CIPPeru

CIATClombia

CIMMYTMexico

BioversityItaly

WARDACote d’Ivore

IRRIPhilippines

ICRISATIndia

ICARDASyrian Arab Rep.

IITANigeria

EMBRAPA Brazil

BioTecThailand

ACGTSouth Africa

ICARIndia

CAAS China

Page 7: The Generation Challenge Programme (GCP)  Platform for Crop Research

Genomic annotation,Forward and

Reverse Genetics,Gene arrays/gels

Candidate genes

NILs, RILsMapping pop.

Mutants

Beneficial allelesLinked to Traits

Genebank

GermplasmGenotyping &Phenotyping

Value-added varieties

Advanced breeding lines

as vehicles

Marker-aided Selection/

TransformationProcess

GeneticResources

Product

SP2: Functional Assignment

SP1: Allelic Mining

SP3: TraitSynthesis

GCP Research: from Genotype to Phenotype

Page 8: The Generation Challenge Programme (GCP)  Platform for Crop Research

• Anatomical• Developmental• Field Performance• Stress Response

GenotypeGermplasm Phenotype

MolecularExpression

Environmen

t

Integration across Diverse Crop Data

• Inventory• Identification (passport)• Genealogy

• Genetic Maps• Physical Maps• DNA Sequence• Functional Annotation• Molecular Variation (Natural or Induced)

• Location (GIS)• Climate• Day Length• Ecosystem• Agronomy• Stresses

• Transcripteome• Proteome• Metabolome• Physiology

has has

determinesdetermines

affects

Page 9: The Generation Challenge Programme (GCP)  Platform for Crop Research

Crop Information Systems: the Next

Large, globally distributed consortium

Diverse research requiring a diversity of tools

Large data sets with diverse data types

Many legacy informatics systems and tools

Global data integration required…

Key Issue: Interoperability

Page 10: The Generation Challenge Programme (GCP)  Platform for Crop Research

Some Basic GCP Research Objectives

Compile a list of germplasm meeting specific passport data criteria

Compile a list of genetic markers of interest from genetic and QTL maps

Retrieve genotypes of specified markers, for specified germplasm

Align gene expression data against QTL positional evidence to identify candidate gene loci for specified traits

Page 11: The Generation Challenge Programme (GCP)  Platform for Crop Research

A Generalized GCP Crop Research Integration Work Flow

ComparativeMap & Trait

Viewer(NCGR/ISYS)

GeneticMap DataSource(s)

Generation Challenge Programme Domain Model & Middleware

GermplasmPassport/

Phenotype/Genotype

Querybuilder

Comparative(Functional)Genomics

Tools

DIVA-GIS

GermplasmData

Source(s)

GenomicsData

Source(s)

GISData

Source(s)

Get/analyse agenetic map

Find germplasmgenotyped with

mapped markers

Get genotype & phenotype of germplasm

Get candidate genes in map

interval

Get functional information about

genes

Plot germplasm, genotype and phenotype on geographical

maps

Analyse source

environment of

germplasm

Select “interesting”

candidate genes; get

alleles

Select adapted germplasm with favorable

phenotype & alleles for further evaluation

Page 12: The Generation Challenge Programme (GCP)  Platform for Crop Research

An environment that provides improved access to data and analysis tools

applications

integrated databases and tools

GCP Information Platform: User Perspective

Page 13: The Generation Challenge Programme (GCP)  Platform for Crop Research

GCP Information Platform – Developers’ Perspective

application layerm

iddl

ewar

e

internet

Tapir MOBY, etc.

Data Registry

local database layer

Page 14: The Generation Challenge Programme (GCP)  Platform for Crop Research

Generation CP Platform

http://pantheon.generationcp.org

Page 15: The Generation Challenge Programme (GCP)  Platform for Crop Research

GCP Platform - General Architecture

“Model Driven Architecture” based on “platform independent” GCP scientific domain models, parameterized with controlled vocabulary (“ontology”)

GCP domain models mapped onto platform specific implementations.

Reference (Java) GCP platform application programming interface (API)

Page 16: The Generation Challenge Programme (GCP)  Platform for Crop Research

Semantics of the GCP Model Driven Architecture

GCP is trying to model the meaning (“semantics”) of the crop research world.

Semantics is found in the domain model at three distinct but interconnected levels: System architectural level: general scientific semantics in

terms of high-level object concepts (“object types”) and their global inter-relationships.

Entity level: attributes and behaviors internal to high-level object types.

Attribute level: attribute values of objects that range over data types: simple (e.g. identifiers, numbers), complex (other classes of entities) or ontology (such as Gene Ontology (GO) terms, for a gene product).

Page 17: The Generation Challenge Programme (GCP)  Platform for Crop Research

Germplasm

Phenotype

has an

Attribute

Value

Observable

with a

has a

ranges over Plant

Ontology

Layers of Semantics

1

Object Model of the Scientific Domain…

23

…Parameterized with Ontology

Page 18: The Generation Challenge Programme (GCP)  Platform for Crop Research

GCP Domain Model Specification

High-level object types are specified with Unified Modeling Language (UML) and associated text narratives.

Major object classes are represented in the object model. More specialized object types are specified by subclassing major object types using ontology.

Reference model is coded by Eclipse Modeling Language managed with source code versioning and automatically compiled into other representations.

http://pantheon.generationcp.org/demeter

Page 19: The Generation Challenge Programme (GCP)  Platform for Crop Research

Scope of GCP Domain Model & Ontology

Core models: generic concepts – identification, entities, features, organization, data management Models heavily parameterized by ontology (e.g. entity and

feature “type” attributes)

Scientific models: extends core model into specific scientific scopes relevant to GCP: Germplasm data (including genetic resources passport)

Genomics including genotypes, maps, sequences and functional annotation.

Phenotype data

Environmental data (including geographical location)

Page 20: The Generation Challenge Programme (GCP)  Platform for Crop Research

GCP Ontology

Every attribute in the GCP domain model with data type SimpleOntologyTerm or subclass thereof, is an integration point for an external ontology.

External public ontology (e.g. GO, PO, SO) reused when available, and new ontology developed within GCP to fill gaps.

Ontology consolidated into GCP database based on GMOD Chado CV tables, indexed within platform using a GCP formatted identifier (that retains the source’s identifier).

Page 21: The Generation Challenge Programme (GCP)  Platform for Crop Research

GCP Domain Model Mappingsonto Platform Specific

Implementations

GCP PlatformJava Middleware& Applications

OWL/RDF Ontology:VPIN/SSWAP.info

SOAP Web Services(BioMOBY, SoapLab, GDPC)

XML Schemata:GCP Data Templates,

BioCASE/Tapir

GCP Domain Model (UML/EMF)

GCP OntologyDatabase

http://pantheon.generationcp.org/demeter

Page 22: The Generation Challenge Programme (GCP)  Platform for Crop Research

Reference GCP Platform API

PantheonBase: a relatively simply core Java Application Programming Interface (API) for software integration: DataSource: query data resources, using simple,

ontology-driven SearchFilter specifications

DataTransformer: computational input/output

DataConsumer: communicate data to viewers

http://pantheon.generationcp.org

Page 23: The Generation Challenge Programme (GCP)  Platform for Crop Research

GCP DataSource Interface

Page 24: The Generation Challenge Programme (GCP)  Platform for Crop Research

DataSource Interface

Page 25: The Generation Challenge Programme (GCP)  Platform for Crop Research

GCP Data Source Implementations

Direct Integration of relational databases (Spring HttpInvoker, Hibernate, JPA): Developed for ICIS, GMOD Chado (beta)

Protocols: Generalized Java Client to connect to BioMoby web

services; Java support for GCP-compliant BioMoby web service provider development (beta)

Support for BioCase/Tapir data source integration (prototyped)

GCP-compliant GDPC data source (prototyped) SSWAP/VPIN wrapper (under discussion)

Some other direct custom data source wrappers

Page 26: The Generation Challenge Programme (GCP)  Platform for Crop Research

Some GCP BioMOBY docs…

http://cropwiki.irri.org/gcp/index.php/MOBY_Rice_Network

http://pantheon.generationcp.org/moby

http://moby.generationcp.org

Page 27: The Generation Challenge Programme (GCP)  Platform for Crop Research

GCP BioMoby Support – a Synopsis

1. MoSES + Dashboard developed (M. Senger).

2. GCP model specific BioMoby datatypes specified.

3. Java libraries partly developed for interconversion of GCP BioMoby data types to/from GCP domain model Java objects (Barboza).

4. GCP DataSource Java implementation developed for client side of BioMoby that maps GCP DataSource find() use cases onto BioMoby web services using a using XML configuration files (no coding).

5. Java design pattern for modular implementation of BioMoby web services that get their data from any GCP-compliant DataSource that supports a given find() use case.

Page 28: The Generation Challenge Programme (GCP)  Platform for Crop Research

GCP BioMoby “Sandwich”

Page 29: The Generation Challenge Programme (GCP)  Platform for Crop Research

(Partial) Inventory of 3rd Party Data Resources targeted for wrapping as GCP

Data SourcesData Type Description

Microarray Data MAXD database with microarray datasets from diverse GCP commissioned or competitive projects.

Genetic and QTL Mapping Data

QTL data available in ICIS, TropGenes. Genomic Diversity and Phenotype Connector (GDPC) connecting to Gramene, Panzea, GrainGenes et al.

Genomic Sequence Data and Annotation

NIAS KOME full length cDNA and RAP genome databases (?), connected to GCP web services by NIAS. OryzaSNP and GCP comparative genomic databases. Public sequence databases (via BioJava?)

Functional Genomics OryGenesDb mutant data (CIRAD); IR64 rice mutant database (IRRI); Tos17 database (NIAS).

Germplasm Sample Characterization Data

Germplasm, passport, genotype and associated field data available in ICIS databases; TropGenes, MGIS, ICRIS.

Page 30: The Generation Challenge Programme (GCP)  Platform for Crop Research

GCP Platform Implementations

Standalone workbench (“GenoMedium”) Eclipse Rich Client Platform (RCP)

Web-based workbench (“Koios”) AJAX, PHP, Java (server side), Java Web Start

NCGR Integrated SYStem (ISYS)

Direct tool integration (e.g. GCP MaxdLoad)

Page 31: The Generation Challenge Programme (GCP)  Platform for Crop Research

http://moby.generationcp.org

Page 32: The Generation Challenge Programme (GCP)  Platform for Crop Research

GCP Web-Based Search Engine

http://koios.generationcp.org

GCP semantics defined query

Summary of query

hits

List of items matched

View details at 3rd party web site or in locally invoked

3rd party data viewer

Page 33: The Generation Challenge Programme (GCP)  Platform for Crop Research

(Partial) Inventory of 3rd Party Analysis/Viewer Software being targeted for

GCP IntegrationTool Purpose

SoapLab2 Remote computational services access

Taverna Bioinformatics work flow management

Apollo Genome sequence browser

Cytoscape Visualization of networks

ATV Phylogenetic tree visualization

JalView Comparative sequence alignments

TMEV Microarray data analysis

EASE, Mapman Gene functional annotation

CMTV Comparative mapping and QTL

MAXDLoad & MAXDView Microarray data management

GDPC tools (Browser,Tassel) Genomic diversity analysis

Page 34: The Generation Challenge Programme (GCP)  Platform for Crop Research

GCP “Pantheon” Project in CropForge

http://cropforge.org/projects/pantheon/

Page 35: The Generation Challenge Programme (GCP)  Platform for Crop Research

Closing Perspective

The GCP is a global consortium of 22++ crop research partners who need to share diverse large data sets and tools, in a globally distributed manner.

Given the scope and duration of the GCP, developers within the consortium embraced the task of developing public global informatics standards for interoperability and integration.

The effort is an open source, global community building exercise.

We welcome the participation of any and all interested scientists and developers who might wish to use and/or contribute to the further evolution and application of these standards.