Aseaofstandardsfor omics data:sinkorswim? of Standards...Exercise:"IdenFfy" standards"in"genomics"...

15
A sea of standards for omics data: sink or swim? J Tenenbaum, SA Sansone, M Haendel Open Access Journal Club 10/3/13

Transcript of Aseaofstandardsfor omics data:sinkorswim? of Standards...Exercise:"IdenFfy" standards"in"genomics"...

Page 1: Aseaofstandardsfor omics data:sinkorswim? of Standards...Exercise:"IdenFfy" standards"in"genomics" Standard Type MIAME ReporFng"guideline ISAUTAB Exchange"format MAGEUTAB Exchange"format

A  sea  of  standards  for  omics  data:  sink  or  swim?  

J  Tenenbaum,  SA  Sansone,  M  Haendel  Open  Access  Journal  Club  

10/3/13  

Page 2: Aseaofstandardsfor omics data:sinkorswim? of Standards...Exercise:"IdenFfy" standards"in"genomics" Standard Type MIAME ReporFng"guideline ISAUTAB Exchange"format MAGEUTAB Exchange"format

Trend  toward  data  

sharing  

Page 3: Aseaofstandardsfor omics data:sinkorswim? of Standards...Exercise:"IdenFfy" standards"in"genomics" Standard Type MIAME ReporFng"guideline ISAUTAB Exchange"format MAGEUTAB Exchange"format

Report:  RecommendaFons  to  NIH  Director  

•  Data  and  InformaFcs  Working  Group  Report  to  Advisory  CommiNee  to  the  Director  of  NIH  (6/12)  – expert  advice  on  management,  integraFon,  and  analysis  of  large  biomedical  research  datasets  

– Goals  included    “advance  basic  and  translaFonal  science  by  facilitaFng  and  enhancing  the  sharing  of  research-­‐generated  data”  

– RecommendaFon  1a.  Establish  a  Minimal  Metadata  Framework  for  Data  Sharing  

Page 4: Aseaofstandardsfor omics data:sinkorswim? of Standards...Exercise:"IdenFfy" standards"in"genomics" Standard Type MIAME ReporFng"guideline ISAUTAB Exchange"format MAGEUTAB Exchange"format

IOM:  Toward  Precision  Medicine  

•  Revised  disease  taxonomy  based  on  molecular  mechanisms  

•  Build  informaFon  commons-­‐  data  on  large  populaFons  of  paFents  become  broadly  available  for  research  use  

•  Build  knowledge  network-­‐  add  value  by  converFng  data  to  knowledge  in  context  of  biology  and  clinical  care  

Page 5: Aseaofstandardsfor omics data:sinkorswim? of Standards...Exercise:"IdenFfy" standards"in"genomics" Standard Type MIAME ReporFng"guideline ISAUTAB Exchange"format MAGEUTAB Exchange"format

2009:  Let’s  build  a  standards-­‐compliant  omics  data  repository  

• What  does  it  mean  to  build  an  omics  data  repository  that  is  standards  compliant?  

• What  standards  exist?  • What  is  a  data  standard?  • What’s  the  best  one  for  our  purposes?  

Page 6: Aseaofstandardsfor omics data:sinkorswim? of Standards...Exercise:"IdenFfy" standards"in"genomics" Standard Type MIAME ReporFng"guideline ISAUTAB Exchange"format MAGEUTAB Exchange"format

But how much do we know about these standards?

Courtesy of SA Sansone

Page 7: Aseaofstandardsfor omics data:sinkorswim? of Standards...Exercise:"IdenFfy" standards"in"genomics" Standard Type MIAME ReporFng"guideline ISAUTAB Exchange"format MAGEUTAB Exchange"format

The  Punchline[s]  

1.  Many  different  definiFons  for  what  consFtutes  a  ‘data  standard’.  

2.  No  one  standard  is  the  ‘right’  standard-­‐  depends  on  parFcular  needs.  

3.  Resource  are  needed  to  help  researchers  navigate  the  standards  landscape  

Page 8: Aseaofstandardsfor omics data:sinkorswim? of Standards...Exercise:"IdenFfy" standards"in"genomics" Standard Type MIAME ReporFng"guideline ISAUTAB Exchange"format MAGEUTAB Exchange"format

Exercise:  IdenFfy  standards  in  genomics  

Standard Type MIAME ReporFng  guideline ISA-­‐TAB Exchange  format MAGE-­‐TAB Exchange  format MAGE-­‐ML Exchange  format SOFT Exchange  format MIMiML Exchange  format GO Terminology  arFfact EFO Terminology  arFfact OBI Terminology  arFfact MGED  Ontology Terminology  arFfact MAGE-­‐OM Object  model FuGE Object  model SEND Exchange  format GEML Exchange  format FUGO Terminology  arFfact MAML Exchange  format

Page 9: Aseaofstandardsfor omics data:sinkorswim? of Standards...Exercise:"IdenFfy" standards"in"genomics" Standard Type MIAME ReporFng"guideline ISAUTAB Exchange"format MAGEUTAB Exchange"format

Use  cases  Level  of  Rigor  

Use  case  example   Explana:on  

Low   Inter-­‐lab  collaboraFon   Data  should  meet  minimal  standards  for  structure  and  documentaFon  to  enable  comprehension,  but  answers  to  quesFons  are  just  an  email/phone  call/hallway  away.  At  least  unFl  that  person  leaves  the  lab.  

Medium   Publishing   Data  should  use  standardized  formats  and  annotaFon  sufficient  to  enable  both  comprehension  and  reproducibility,  with  liNle  or  no  interacFon  with  the  data  owner.  

High   Make  available  through  public  data  repository  

In  addiFon  to  being  comprehensible  and  reproducible,  annotaFon  should  be  structured  in  a  way  that  enables  querying  for  datasets  that  match  specific  criteria.  

Page 10: Aseaofstandardsfor omics data:sinkorswim? of Standards...Exercise:"IdenFfy" standards"in"genomics" Standard Type MIAME ReporFng"guideline ISAUTAB Exchange"format MAGEUTAB Exchange"format

Standards  Criteria  

•  The  standard  itself  •  AdopFon  and  user  community  •  AddiFonal  factors  

Page 11: Aseaofstandardsfor omics data:sinkorswim? of Standards...Exercise:"IdenFfy" standards"in"genomics" Standard Type MIAME ReporFng"guideline ISAUTAB Exchange"format MAGEUTAB Exchange"format

The  Standard  Itself  

•  SpecificaFon  documentaFon  •  Ease  of  implementaFon  (e.g.  need  for  programmer  support)  

•  Human  and  machine  readability  •  Formal  structure  •  Expressivity—breadth  of  informaFon  that  can  be  represented  

•  Ease  of  use,  e.g.,  minimal  required  fields,  text-­‐based  interface  familiarity  to  biologists.  

Page 12: Aseaofstandardsfor omics data:sinkorswim? of Standards...Exercise:"IdenFfy" standards"in"genomics" Standard Type MIAME ReporFng"guideline ISAUTAB Exchange"format MAGEUTAB Exchange"format

AdopFon  and  User  Community  

•  Broad  adopFon  and  implementaFon,  outside  iniFal  group  

•  Support  supplied  by  the  user  community  •  Use  by  community  databases  •  Sojware  development  that  supports  the  standard  (eg,  for  curaFng,  submikng  to  databases)  

•  Responsiveness  to  community  requests  •  Availability  of  examples  of  use  •  Requirements  of  relevant  authoritaFve  bodies,  e.g.  funders,  publishers,  etc.  

Page 13: Aseaofstandardsfor omics data:sinkorswim? of Standards...Exercise:"IdenFfy" standards"in"genomics" Standard Type MIAME ReporFng"guideline ISAUTAB Exchange"format MAGEUTAB Exchange"format

AddiFonal  Factors  

•  IntegraFon/compaFbility  with  other  standards  

•  Extensibility  and  flexibility  to  cover  new  domains  

•  Conversion  and  mapping,  when  applicable  

•  Cost  (e.g.,  open  vs.  licensing  fee)  

Page 14: Aseaofstandardsfor omics data:sinkorswim? of Standards...Exercise:"IdenFfy" standards"in"genomics" Standard Type MIAME ReporFng"guideline ISAUTAB Exchange"format MAGEUTAB Exchange"format

PotenFal  resources  to  assist  in  standards  selecFon  and  adopFon  

•  Lay  person’s  primer  to  standards  •  Consumer  reviews  •  Standards  selecFon  wizard  •  Standards  adopFon  helpdesk  •  Quality  assurance  tools  

Page 15: Aseaofstandardsfor omics data:sinkorswim? of Standards...Exercise:"IdenFfy" standards"in"genomics" Standard Type MIAME ReporFng"guideline ISAUTAB Exchange"format MAGEUTAB Exchange"format

Acknowledgments  •  Contributors  to  the  BioSharing  catalog    

•  CTSA  Omics  data  standards  working  group  

Funding  –  NIH  UL1RR024128  –  David  H.  Murdock  –  NIH  R24OD011883  –  CTSA  10-­‐001:100928SB23  –  Oxford  e-­‐Research  Centre    –  UK  Biotechnology  and  Biological  Sciences  

Research  Council  (BBSRC)  BB/I000771/1  and  BB/I025840/1  

•  Simon  Lin  •  Bill  Barry  •  David  Beck  •  ColeNe  Blach  •  Jim  Cimino  •  Todd  Ferris  •  Carol  Haynes  •  CurFs  Hendrickson  •  Carol  Hill  •  Ken  Kawamoto  •  Tahsin  Kurc  •  John  Osborne  •  Jeff  Pennington  •  Sarah  Wheelan  

•  Helpful  Resources  –  Mark  Musen  –  Richard  Scheuermann  –  KrisF  Eckeron  –  Russ  Altman