ODD-Genes: Accelerating data-driven scientific discovery NeSC Review 2003 NeSC 2003-09-30.

9
ODD-Genes: Accelerating data-driven scientific discovery NeSC Review 2003 NeSC 2003-09-30

Transcript of ODD-Genes: Accelerating data-driven scientific discovery NeSC Review 2003 NeSC 2003-09-30.

ODD-Genes:Accelerating data-driven

scientific discoveryNeSC Review 2003

NeSC2003-09-30

Introduction

ODD-Genes BackgroundScience enabled by ODD-Genes

Automating routine statistical conditioning of highly variable microarray results.Discovering related data sourcesQuerying discovered data sources for relevant dataIdentifying significant targets for focussed investigation

Caveats & further work

ODD-Genes Background

ODD-Genes is a demonstratorDemonstrates how Grid technologies enable e-Science, accelerating scientific discoverySunDCG’s TOG software allows for job submission on remote compute resources OGSA-DAI provides access, control and discovery of data resources

ODD-Genes used to investigate Wilms TumourRoutine statistical conditioning of microarray resultsData-driven discovery of novel targets for investigation and potential therapy

Collaborative projectNeSC/EPCC, Edinburgh, UKScottish Centre for Genomic Technology and Informatics, Edinburgh, UK (GTI)Human Genetics Unit at MRC, Western General Hospital, Edinburgh, UK (HGU)

SunDCG – Enabling Routine Statistical Conditioning

Choose analysis to perform

Automates analysis processProvides predetermined workflowCan run more than one analysis at a timeMultiple reproducible avenues for investigationReduces cost (human, machine), increases availability

TOG enables this by allowing access to HPC resources

SunDCG - Conditioning Results

Results of conditioning can be analysed and investigated

Researcher has potentially several views of data to explore, all presented simultaneously in parallel (cp traditional serialised, manual process)Researcher can reproduce this initial condition for repeated analysesResearcher need not perform each step manually and serially, or ask dedicated statistician to do so.

OGSA-DAI - Results Investigation

Multiple views of data

RawHeat MapCluster Map

Wilms Tumour study takes a new direction

two genes appear significant in early development

Researchers would like more info on these genes…

OGSA-DAI - Data Resource Discovery

OGSA-DAI uses keywords to locate relevant data resourcesMay return data resources previously unknown to researcherResearcher selects most interesting data resource to query for information about gene

Researcher selects Mouse atlas – narrow, deep database of spatial gene expression in mice embryonic developmentContrast with GTI database of broad, shallow genome-wide gene expression across multiple organisms, stages & conditions

OGSA-DAI - Data Resource Query

OGSA-DAI returns data from query

Data and annotation displayed

Data contains references to related images

Researcher rapidly moves from numeric and textual description to spatial representation of relevant gene expression

These show that the genes are stem cell markers

Targets for focussed investigation, potential therapy

ODD-Genes Caveats & Further Work

ODD-Genes is a demonstratorNeed to develop production applications for both routine statistical processing and data resource discovery and queryNeed to parameterise routine conditioning appropriately to complete automation

ODD-Genes requires GRID infrastructureParticipating researchers need to partner with centres who host application front-ends (or, host the infrastructure themselves)However, alternatives often proprietary, expensive, less flexible

ODD-Genes requires registration by data-hostsCritical mass of registered data sources.