NIH BD2K DataMed metadata model - Force11, 2016

Consultant, Honorary Academic Editor

Associate Director, Principal Investigator

Susanna-Assunta Sansone, PhD

A Data Discovery Index prototype that:

•  Helps users find and access shared data

•  Interoperates in the NIH Commons

(biomedical digital assets)

aggregator'A'

B C

Aaggregator'

Data'Discovery'Index'

data'

Organizing framework and portal for data

Dashed lines: mapping of metadata standards, links to aggregators, data Aggregators: repositories or various indices Data: digital research objects

Pilot projects* Core development team

* There is work for everyone (and more)

Designed as an element of the ecosystem

v  Define a metadata specification that support intended capability of the DataMed prototype

v  Synergies with many groups, including: ²  BD2K Center for Expanded Data Annotation

and Retrieval (CEDAR)

²  BD2K cross-centers Metadata WG

²  ELIXIR EXCELERATE WP5 Interoperability

The model and serializations

Created using 2 complementary approaches top-down

analyzing use cases bottom-up

mapping existing standards/schemas

The model and serializations

Bottom-up approach: mapped schemasv  schema.orgv  DataCitev  RIF-CSv  W3C HCLS dataset descriptions (mapping of many models including DCAT, PROV, VOID, Dublin Core)

v  Project Open Metadata (used by HealthData.gov )

v  ISAv  BioProjectv  BioSample

v  MiNIMLv  PRIDE-mlv  MAGE-tabv  GA4GH metadata schemav  SRA xmlv  CDISC SDM / element of BRIDGE model

v  model to be implemented and tested in DataMed

²  we have aimed to have maximum coverage of use cases with minimal number of data elements

²  we do foresee that not all questions can be answered in full

v  Repositories workshop on June 23

²  hands-on experience mapping to the model

²  many databases won’t have all these metadata elements; conversely, domain-specific databases have more

v  Discussion ongoing to create an extension as part of bioschemas.org

What is next?

Prototype, model, mappings, documentation and more athttps://biocaddie.org and https://github.com/biocaddie

Supported by the NIH grant 1U24 AI117966-01 to the University of California, San Diego

NIH BD2K DataMed metadata model - Force11, 2016

Data & Analytics

Transcript of NIH BD2K DataMed metadata model - Force11, 2016