What is in there for Public Health? · 11/21/2008 · Ontologies and data integration in...
Transcript of What is in there for Public Health? · 11/21/2008 · Ontologies and data integration in...
Ontologies and data integration in biomedicineWhat is in there for Public Health?
Olivier Bodenreider
Lister Hill National Centerfor Biomedical CommunicationsBethesda, Maryland - USA
Public Health Informatics Fellowship ProgramNovember 21, 2008
Why integrate data?
Lister Hill National Center for Biomedical Communications 3
Motivation
Multiple sources of dataEach focusing on one aspect
Annotated with specific vocabulariesSpecific format
Need to be integrated to infer new factsTranslational medicine (“Bench to bedside”)BiosurveillanceTranslation into policy (Public health)
Lister Hill National Center for Biomedical Communications 4
Example Oncology
Cancerregistries
Researchdata
Clinicalrepositories
Lister Hill National Center for Biomedical Communications 5
Integrating subdomains
Cancerregistries
ICD-O
Researchdata
NCIThesaurus
Clinicalrepositories
SNOMED CT
Lister Hill National Center for Biomedical Communications 6
Example Oncology
Multiple data repositoriesInternational Classification of Diseases-Oncology (ICD-O-3)
Cancer registriesEpidemiology, Public health
SNOMED CTPatient recordsClinical care
NCI ThesaurusAnnotation of research data
Lister Hill National Center for Biomedical Communications 7
SNOMED CT
http://www.clinical-info.co.uk/
Lister Hill National Center for Biomedical Communications 8
NCI Thesaurus
http://nciterms.nci.nih.gov/
Lister Hill National Center for Biomedical Communications 9
ICD-O-3
Morphology[…]814-838 Adenomas and adenocarcinomas
8140/3 Adenocarcinoma, NOS
Anatomy[…]C60-C63 Male genital organs
C61 Prostate gland– C61.9 Prostate, NOS
Prostate gland
Adenocarcinomaof prostate
Lister Hill National Center for Biomedical Communications 10
Cancerregistries
Researchdata
Clinicalrepositories
Terminology integration NCI Metathesaurus
ICD-O
NCIThesaurus
SNOMED CT
NCI
Meta.C61.9 Prostate, NOS
C12410 Prostate gland
41216001 Prostatic structure
C0033572
Lister Hill National Center for Biomedical Communications 11
Approaches to data integration
WarehousingSources to be integrated are transformed into a common format and converted to a common vocabulary
MediationLocal schema (of the sources)Global schema (in reference to which the queries are made)
Lister Hill National Center for Biomedical Communications 12
Ontologies and warehousing
RoleProvide a conceptualization of the domain
Help define the schemaInformation model vs. ontology
Provide value sets for data elementsEnable standardization and sharing of data
ExamplesBioSense (original design)Repositories for translational research (CTSA)Clinical information systems
Lister Hill National Center for Biomedical Communications 13
Ontologies and mediation
RoleReference for defining the global schemaMap between local and global schemas
ExamplesBioSense (redesign)BioMediatorOntoFusion
Lister Hill National Center for Biomedical Communications 14
Ontologies Normalization
From the London Bills of Mortality…
Lister Hill National Center for Biomedical Communications 15
Ontologies Normalization
…to HITSP vocabulary specifications for interoperability http://www.hitsp.org/
Lister Hill National Center for Biomedical Communications 16
Gene Ontology
Ontologies Inference
gene
GO
PubMed
Gene name
OMIM
Sequence
InteractionsGlycosyltransferase
Congenital muscular dystrophy
Entrez Gene
[Sahoo, Medinfo 2007]
Lister Hill National Center for Biomedical Communications 17
From glycosyltransferaseto congenital muscular dystrophy
MIM:608840 Muscular dystrophy, congenital, type 1D
GO:0008375
has_associated_phenotype
has_molecular_function
EG:9215LARGE
acetylglucosaminyl-transferase
GO:0016757glycosyltransferase
GO:0008194isa
GO:0008375 acetylglucosaminyl-transferase
GO:0016758
Ontologies, data integrationand public health
Case study: BioSense
Lister Hill National Center for Biomedical Communications 19
Lister Hill National Center for Biomedical Communications 20
BioSense Integrating multiple datasets
Acute care data (emergency room)Demographic dataClinical data
Chief complaintsInitial / final diagnosisLaboratory data
Pharmacy data (national pharmacies)Laboratory data (national labs)Poison control center data…
Lister Hill National Center for Biomedical Communications 21
Case study Materials and methods
MaterialsBioSense publications and presentations
Methods: look for mentions ofData integration approachesUse of ontologies / terminologies / vocabularies
NormalizationInference
Lister Hill National Center for Biomedical Communications 22
BioSense Approaches to data integration
Initial approach: Warehouse
[Chambers,PHIN 2008]
Lister Hill National Center for Biomedical Communications 23
BioSense Approaches to data integration
Redesign: Mediation (Federation)
[Lenert, ASTHORoudtable 2008]
Lister Hill National Center for Biomedical Communications 24
BioSense Ontologies
Normalized vocabulary for data exchange
Lister Hill National Center for Biomedical Communications 25
BioSense Ontologies
Normalized vocabulary for data exchangeLink data elements to value sets from standard vocabularies
http://www.hitsp.org/
Lister Hill National Center for Biomedical Communications 26
BioSense Ontologies
Ontologies for aggregation and inference
[Lenert, ASTHORoudtable 2008]
Lister Hill National Center for Biomedical Communications 27
Public Health Reference Ontology
Summary
Lister Hill National Center for Biomedical Communications 29
Summary
Integration is key to making sense of the many health data repositories
E.g., Biosurveillance
Two major data integration approachesData warehousingMediation
Ontologies play a major role in data integrationNormalization of vocabularySupporting aggregation and inference
Lister Hill National Center for Biomedical Communications 30
Current trends
Semantic Web technologies provide a convenient platform for integrating biomedical data
Standard tooling
Many biomedical ontologies available, ongoing harmonization
Standard vocabulary
Emerging semantic interoperability specifications (e.g., HITSP, BRIDG, caBIG, HL7, …)
Standard information models
Lister Hill National Center for Biomedical Communications 31
Some persisting challenges
Reconcile data annotated to different ontologiesIncomplete terminology integration systems (e.g., UMLS, RxNorm), limited harmonization between terminologiesLack of permanent identifiers for biomedical entities
Biomedical ontologiesAvailabilityFormalism (OWL, OBO, RRF, …)Quality
MedicalOntologyResearch
Olivier Bodenreider
Lister Hill National Centerfor Biomedical CommunicationsBethesda, Maryland - USA
Contact:Web: