Data Standards for Data Integration - NIH Common Fund · PDF filesoftware classification Data...
Transcript of Data Standards for Data Integration - NIH Common Fund · PDF filesoftware classification Data...
caspase activity PubChem nomenclature
fluorescence viability
cheminformatics semantic
domain
binding based programming knowledge search technology end point
thesauri article object PDSP XML enzyme substrate based versioning natural language high-thoughput screening (HTS) software classification polysemes Data Standards for Data Integration
biological pathways Beta-Lactamase Induction dehydrogenase activity specificity
subject headings cyclic AMP redistribution RDF calcium redistribution OWL novel chemical tools
library pharmaceutical
individuals tags
semantic web structural biology
Stephan Schrer University of Miami
indexing
homographs energy transfer
authorized terms
data sets screening
chemical biology
activity enzyme reporter
standards
servers search tool
small molecule
biological assay
mical probes ChemBank
disease networks biomedical knowledge
Fluorogenic substrate GFP induction
controlled vocabulary
subject indexing schemes
taxonomies
synonyms
concepts structure meta-data
information exchange properties annotation
clas
cheATP Lucises ferin Coupled
SPARC Workshop, NIH, Feb 25-26 2015
Types of data standards Reporting guideline (checklist) specifies what
information need to be captured about an experiment for a particular purpose
(Controlled) vocabulary terminological resource that provides the identification and definition of entities
Data exchange format is a specification how data are encoded to be computer-readable / -processable
Data structure refers to organization of data, data schema, entity relations
2
O. Search all of BloSharlng LOG IN OR REGISTER
biosharing O
COMMUNITY CONTENT SUMMARY
Contribute by submitting a standard Found a bug? Please tell us!
Standards
BioSharing standards have been partly compiled by linking to BioPortal, MIBBI and the Equator Network.
Or you can filter on MIBBI Foundry reporting guidelines or 080 Foundry terminology artifacts.
X REPORTING GUIDELINE
View as Grid View as Table
D No Publlc1tlon [J Hl!IS Publication
~ No M1lnt1lner ~ Has M1lnulner
Standard Type C'ear
REPORTING GUIDELINE
EXCHANGE FORMAT
TERMINOLOGY ARTIFACT
arch for tandar O. Search
AMIS Article Minimum Information Standard
REPORTING GUIOELINE
~Systems
L.:., Publications a
MHHF ~Reset
Showing records 1 50 of 69.
ARRIVE Animals in Research: Reporting In Vivo ..
REPORTING GUIOELINE
S Systems
L.:., Publications a
BioDBCore Core Attributes of Biological Databases
REPORTING GUIOELINE
~Systems
L.:., Publications
a a
ABOUT
BioSharing Standards (http://www.biosharing.org)
3
http:http://www.biosharing.org
a mibbi Bioscience reporting guidelines and tools @ Portal
@ Foundry
@ About
Minimum Information guidelines from diverse bioscience communities
If you want to register your checklist to MIBBI, please contact the BioSharing team Excel spreadsheet and XML document (schema) describing all registered projects
Bioscience projects registered with MIBBI
CIMR Core Information for Metabolomics Reporting
GIATE Guidelines for Information A bout Therapy Experiments
MIABE Minimal Information About a Bioactive Entity
MIABiE Minimum Information About. a Biofilm Experiment
MIACA Minimal Information About a Cellular Assay
MIAME Minimum Information About. a Microarray Experiment
MIAPA Minimum Information About a Phylogenetic Analysis
MIAPAR Minimum Information About. a Protein Affinity Reagent
MIAPE Minimum Information About a Proteomics Experiment
MIAPegAE Minimum Information About. a Peptide Array Experiment
MIARE Minimum Information About a RNAi Experiment
MIASE Minimum Information About. a Simulation Experiment
MIASPPE Minimum Information About Sample Preparation for a Phosphoproteomics Experiment
MIATA Minimum Information About. T Cell Assays
MICEE Minimum Information about a Cardiac Electrophysiology Experiment
MIDE Minimum Information required for a DMET Experiment
MIFlowCyj Minimum Information for a Flow Cytometry Experiment
I !. x
I !. x
I !. x
I !. x I !. x
I !. x
Checklists Minimum Information Guidelines
4
Minimum Information Standard may not exist
Regenbase: Integration of diverse data related to nerve regeneration in the context of spinal cord injury
http://regenbase.org 5
http:http://regenbase.org
Vocabulary vs. ontology Controlled vocabularies / thesauri
describe what things mean (link terms to human description)
Entities with identity criteria Share knowledge in a common language Natural language synonyms for search and text mining
6
Vocabulary vs. ontology Ontologies
Contains entities (classes) and their relationships (object properties)
Capture / abstract knowledge using logical axioms Explicit specification (OWL-DL)
Building formal (computable) models Computing with knowledge (reasoning engines) Foundation of Semantic Web information systems
7
Ontology resources
NCBO Bioportal EBI Ontology Lookup Service OLS
OBO Foundry
8
Metadata specifications Metadata: Data not directly measured in an experiment (or obtained in a study)
Why metadata: Facilitate data replicability, reproducibility, reuse Interpret results, perform data analysis, hypotheses Repurpose data for other projects Information systems (search, query, data integration
and exchange)
What metadata to capture in a standardized format with controlled vocabularies (and formal descriptions)?
9
A useful distinction of metadata Model metadata: Required to understand, interpret, and meaningfully
integrate experimental results Typically queryable in software systems Important parameters to describe conclusions (data
visualizations)
Confounder metadata: Non model metadata required to replicate and
reproduce experimental results Needed for data forensics (e.g. batches of reagents,
maintenance of experimental equipment, etc.) 10
Standardized metadata Capture all (detailed descriptions, SOP) Make model metadata explicit (controlled vocabulary, standard format)
But whats really model metadata? Data and informatics use cases
Types of queries and analyses Integration with other data sources Information systems / UI components Consider re-use of data for other projects
LINCS Metadata Standards: Vempati et al J Biomol Screen 2014 11
Data Coordination
12
Repository
V' '\/'l tw I
Central Database I repository
-
-~ I
Stch+flesulls+Nav ~
lfoAv"J ID@
=
~ t:~ ~~ @ ~Standards Computing Globalindex Data APls liiill!
"-t:~-Repository
Data set IDs and provenance Permanent ID via (authoritaitve) repository or data publication
DOI PURL
Capture data provenance PROV-O: The PROV Ontology (W3C)
http://www.w3.org/TR/prov-o/ PAV (Provenance, Authoring Versioning) Ontology
http://purl.org/pav/
13
http://purl.org/pavhttp://www.w3.org/TR/prov-o
Provenance The link between source data, computation / processing and derived data / results static verifiable record track changes compare / discrepancies repeat / reproduce Citation version data release PDIFF, Woodman 2011
14
RDBMS Ontologies Closed world assumption Open world assumption No reasoning support Reasoning support Need to know schema
for highly specialized queries
Provide restriction-free framework (formal semantics)
Data sharing not easy, no semantics
Easy data/knowledge sharing
Efficient RDBMS access Triple store access Established technology Relatively early stage Industry standard Standards emerging
RDBMS vs Semantic Web technologies
Replicability vs Reproducibility
[repeat] same
experiment same lab
same experiment
different set up
[ rep1rod u ce]
[replicate] same
experiment different lab
different experiment
some of same
reuse Drummond C Replicabil~y is not Re oducibility: Nor is it Good Science, online
Peng RD, Reproducible Research in Computational Science Science 2 Dec 2011: 1226-1227.
Data Standards for Data IntegrationSlide Number 2Slide Number 3Slide Number 4Slide Number 5Slide Number 6Slide Number 7Slide Number 8Slide Number 9Slide Number 10Slide Number 11Slide Number 12Slide Number 13Slide Number 14RDBMS vs Semantic Web technologiesReplicability vs Reproducibility