The Royal Society London, May 19-21st, 2010Mouse models for human disease Phenotype database...

18
The Royal Society London, May 19-21st, Mouse models for human disease Phenotype database interoperability and integration Damian Smedley, EBI

Transcript of The Royal Society London, May 19-21st, 2010Mouse models for human disease Phenotype database...

The Royal Society London, May 19-21st, 2010Mouse models for human disease

Phenotype database interoperability and integration

Damian Smedley, EBI

The Royal Society London, May 19-21st, 2010Mouse models for human disease

Why do we need data integration and interoperability?

The Royal Society London, May 19-21st, 2010Mouse models for human disease

Centralised vs distributed solutions

Genomics

MGI

Ensembl

IKMC projects

KOMP EUCOMM NorCOMM Eurexpress/GXD etc

JaxMice

Phenotype/Expression

Strains

IMSR EMMA

EurophenomeTIGM

portal

Centralised warehouse v1

Centraldatabase

Centralised warehouse v2 Distributed solution

nightly data syncsweb services

The Royal Society London, May 19-21st, 2010Mouse models for human disease

Centralised solutions

Advantages– Better query performance for large datasets– Easier to analyse raw data in one location

Disadvantages– Regular data deposition is non-trivial– Designing a single schema to store different types

of data is not simple.– Persuading people to “give up” their

data/databases/websites– Will still need to make interoperable with other data

sources

The Royal Society London, May 19-21st, 2010Mouse models for human disease

Distributed solutions

Advantages– Domain expertise at production site exploited– Different types of data easily integrated as long as they share

something in common such as a gene identifier– No need for nightly data flow to keep data up to date– No need for redundant data in each database– Easier to persuade people to collaborate in a distributed scenario

Disadvantages– Technical knowledge required to deploy the web services– Potential query performance problems for large datasets (may need

to provide summary level data)– Potential problems performing analysis over all datasets– Problems with services going down

The Royal Society London, May 19-21st, 2010Mouse models for human disease

1000 Genomes - centralisation

The Royal Society London, May 19-21st, 2010Mouse models for human disease

International Cancer Genome Consortium

CanadaPancreas

AustraliaPancreas

ChinaStomach

JapanLiver (virus related)

FranceLiver (alcohol-related)

Breast (HER2+ve)

UKBreast (several subtypes)

SpainCLL

IndiaOral Cavity

The Royal Society London, May 19-21st, 2010Mouse models for human disease

ICGC - distributed

The Royal Society London, May 19-21st, 2010Mouse models for human disease

Joint Ensembl and EurExpress query

The Royal Society London, May 19-21st, 2010Mouse models for human disease

IKMC portal: knockoutmouse.org

GXD

EurexpressNorCOMM

EUCOMM

KOMP

TIGM

EMMAKOMP rep

CMMRIMSR

Ensembl

CREATE

Europhenome

The Royal Society London, May 19-21st, 2010Mouse models for human disease

IKMC interoperability strategy

IKMC

Sanger, UK

ES cells + lines

EMMA (UK), KOMP (USA), CMMR (Canada)

Harwell, UK

Phenotype(EuroPhenome etc)

JAX, USA

MGI

Edinburgh, UK

EURExpress

Sanger, UK

Ensembl

JAX, USA

GXD

CREATE

EBI, UK

BioMart query interface(s)

MGI ID

MGI ID

MGI ID

MGI ID

MGI ID

MGI ID

MGI ID

The Royal Society London, May 19-21st, 2010Mouse models for human disease

www.knockoutmouse.org/martsearch

The Royal Society London, May 19-21st, 2010Mouse models for human disease

Europhenome: raw and summary data

The Royal Society London, May 19-21st, 2010Mouse models for human disease

Possible strategy for phenotype data

BioMart query interface(s)

IKMC

Sanger, UK

ES cells + lines

EMMA (UK), KOMP (USA), CMMR (Canada)

MGI ID

JAX, USA

MGI

Edinburgh, UK

EURExpress

Sanger, UK

Ensembl

MGI ID

MGI ID

MGI ID

MGI ID

JAX, USA

GXD

MGI ID

CREATE

EBI, UK

Centraldatabase

High thoughput phenotyping centres

Presentation of raw results

Analysis to assign phenotypes to genes

MGI ID

High throughput phenotyping

The Royal Society London, May 19-21st, 2010Mouse models for human disease

Linking from IKMC portal

Phenotyping

Phenotype searches

The Royal Society London, May 19-21st, 2010Mouse models for human disease

Linking from IKMC portal

The Royal Society London, May 19-21st, 2010Mouse models for human disease

Mouse models for human disease

The Royal Society London, May 19-21st, 2010Mouse models for human disease

Acknowledgements

The whole CASIMIR consortium and in particular:• Paul Schofield, Michael Gruenberger, Chao-Kung Chen, George Gkoutos,

Ann-Marie Mallon, John Hancock: MouseFinder tool.

• MartSearch: Vivek Iyer, Darren Oakley, Bill Skarnes

• BioMart: Arek Kaspryzk, Syed Haider, Edoardo Marcora