2009 GMOD Meetinggmod.org/mediawiki/images/7/70/2009_GMOD_Meeting_Dhileep... · 2009. 1. 16. ·...

15
2009 GMOD Meeting Dhileep Sivam & Isabelle Phan Seattle Biomedical Research Institute

Transcript of 2009 GMOD Meetinggmod.org/mediawiki/images/7/70/2009_GMOD_Meeting_Dhileep... · 2009. 1. 16. ·...

Page 1: 2009 GMOD Meetinggmod.org/mediawiki/images/7/70/2009_GMOD_Meeting_Dhileep... · 2009. 1. 16. · Dhileep Sivam & Isabelle Phan Seattle Biomedical Research Institute. Seattle Biomedical

2009 GMOD MeetingDhileep Sivam & Isabelle Phan

Seattle Biomedical ResearchInstitute

Page 2: 2009 GMOD Meetinggmod.org/mediawiki/images/7/70/2009_GMOD_Meeting_Dhileep... · 2009. 1. 16. · Dhileep Sivam & Isabelle Phan Seattle Biomedical Research Institute. Seattle Biomedical

Seattle Biomedical Research Institute(SBRI)

• Founded in 1976• About 250 full-time staff• Focus on infectious disease• 13 Labs• Strong ties to the University of Washington• Bioinformatics Core

Page 3: 2009 GMOD Meetinggmod.org/mediawiki/images/7/70/2009_GMOD_Meeting_Dhileep... · 2009. 1. 16. · Dhileep Sivam & Isabelle Phan Seattle Biomedical Research Institute. Seattle Biomedical

How we first came to useChado

LmjF Probe Set LinJ Probe Set

LmjF V5.2 LinJ V2.0 LinJ V3.0 LinJ V4.0LmjF V4.0

Mapping MappingMapping

Result Set Result Set Result Set

Page 4: 2009 GMOD Meetinggmod.org/mediawiki/images/7/70/2009_GMOD_Meeting_Dhileep... · 2009. 1. 16. · Dhileep Sivam & Isabelle Phan Seattle Biomedical Research Institute. Seattle Biomedical

Microarray Project

ChadoNimblegen

DataParsers

Analysis ToolsNormalization

ScalingFeature-level aggregation

RemappingVisualization

Page 5: 2009 GMOD Meetinggmod.org/mediawiki/images/7/70/2009_GMOD_Meeting_Dhileep... · 2009. 1. 16. · Dhileep Sivam & Isabelle Phan Seattle Biomedical Research Institute. Seattle Biomedical

Use Case: SSGCIDSeattle Structural Genomics Center for Infectious Disease

Vaccine Targets!

Gene Cloning & Expression

Protein Crystallization

Structure Determination

Bioinformatic Screening

Project Aim

3D Protein Structure

NIAID Emerging and re-emergingpriority pathogens

Structures will serve as a startingpoint for drug development

Multi-center

Page 6: 2009 GMOD Meetinggmod.org/mediawiki/images/7/70/2009_GMOD_Meeting_Dhileep... · 2009. 1. 16. · Dhileep Sivam & Isabelle Phan Seattle Biomedical Research Institute. Seattle Biomedical

SSGCID

Vaccine Targets!

Gene Cloning & Expression

Protein Crystallization

Structure Determination

Bioinformatic Screening

Page 7: 2009 GMOD Meetinggmod.org/mediawiki/images/7/70/2009_GMOD_Meeting_Dhileep... · 2009. 1. 16. · Dhileep Sivam & Isabelle Phan Seattle Biomedical Research Institute. Seattle Biomedical

SSGCID

Chado

ExternalSequenceResources

BLAST Screening

ExportParsers

Bulk Loader

Page 8: 2009 GMOD Meetinggmod.org/mediawiki/images/7/70/2009_GMOD_Meeting_Dhileep... · 2009. 1. 16. · Dhileep Sivam & Isabelle Phan Seattle Biomedical Research Institute. Seattle Biomedical

Things that have come up…Complexity of querying

BLAST results

Gene Models

Complexity of queryingmicroarray data

MaterializedViews

SimplestPossible Model

“Grouping of Genes” DBXrefs

Page 9: 2009 GMOD Meetinggmod.org/mediawiki/images/7/70/2009_GMOD_Meeting_Dhileep... · 2009. 1. 16. · Dhileep Sivam & Isabelle Phan Seattle Biomedical Research Institute. Seattle Biomedical

warehouse

ProteomicsMicroarrayStructural genomics

Data access

curation

Automatedanalysispipeline

Sequence data management at SBRI

Page 10: 2009 GMOD Meetinggmod.org/mediawiki/images/7/70/2009_GMOD_Meeting_Dhileep... · 2009. 1. 16. · Dhileep Sivam & Isabelle Phan Seattle Biomedical Research Institute. Seattle Biomedical

Chado + GUS: why do weneed both?

• Chado– Collaboration with IGS– Annotation tools: Manatee (apollo), Ergatis

• Internal data production

• Gus– Collaboration with UPenn– Web front end

• External data access

Page 11: 2009 GMOD Meetinggmod.org/mediawiki/images/7/70/2009_GMOD_Meeting_Dhileep... · 2009. 1. 16. · Dhileep Sivam & Isabelle Phan Seattle Biomedical Research Institute. Seattle Biomedical

Chado

ProteomicsMicroarrayStructural genomics

ManateeManual annotation

ErgatisAnalysis pipeline

Sequence data management at SBRI

GUS

GUS WDK

Page 12: 2009 GMOD Meetinggmod.org/mediawiki/images/7/70/2009_GMOD_Meeting_Dhileep... · 2009. 1. 16. · Dhileep Sivam & Isabelle Phan Seattle Biomedical Research Institute. Seattle Biomedical

Chado2GUS: Lost intranslation

• Chado– Denormalized

schema• Polymorphism

– Mysql (IGS Chado)

• GUS– Normalized schema

• Subclassing

– Postgres port fromOracle

Page 13: 2009 GMOD Meetinggmod.org/mediawiki/images/7/70/2009_GMOD_Meeting_Dhileep... · 2009. 1. 16. · Dhileep Sivam & Isabelle Phan Seattle Biomedical Research Institute. Seattle Biomedical

Picking the best of two worlds

• Chado– Biological data model– Flexibility

• GUS– Software engineering– Flexibility

Page 14: 2009 GMOD Meetinggmod.org/mediawiki/images/7/70/2009_GMOD_Meeting_Dhileep... · 2009. 1. 16. · Dhileep Sivam & Isabelle Phan Seattle Biomedical Research Institute. Seattle Biomedical

The future?

• SQL-free data production– Instead of custom wrappers over raw SQL:

• ORMs: Chado Hibernate, ActiveRecords• Unified object model

• RDBMS-free data mining– Instead of GUS predefined query + set

combination• Biomart + Galaxy• RDF + triple store + sparql (object store + Lucene)

Page 15: 2009 GMOD Meetinggmod.org/mediawiki/images/7/70/2009_GMOD_Meeting_Dhileep... · 2009. 1. 16. · Dhileep Sivam & Isabelle Phan Seattle Biomedical Research Institute. Seattle Biomedical