Data aggregation across an academic medical center: The...
Transcript of Data aggregation across an academic medical center: The...
Biomedical Informatics
Data aggregation across an academic medical center:
The role of semantic harmonization
CG Chute MD DrPHProfessor and Chair, Biomedical Informatics
Mayo Clinic College of Medicine
© Mayo Clinic College of Medicine 2006 2
Biomedical Informatics
Data Aggregationa how-to guide
1.1. Build up the Blue space (Core registry content)Build up the Blue space (Core registry content)2.2. Support Support cloneablecloneable registry infrastructureregistry infrastructure3.3. Ensure data governance adopts standardsEnsure data governance adopts standards4.4. Annotate stuff with semantic metadataAnnotate stuff with semantic metadata5.5. Balance data investment with AnalyticsBalance data investment with Analytics
•• Semantic Integration Semantic Integration –– the LexGrid projectthe LexGrid project
© Mayo Clinic College of Medicine 2006 3
Biomedical Informatics
From Practice-based Evidenceto Evidence-based Practice
PatientEncounters
ClinicalDatabases Registries et al.
ClinicalGuidelines
Medical Knowledge
ExpertSystems
Informaticstheory & application
inference
DataData InferenceInference
KnowledgeKnowledgeManagementManagement
DecisionDecisionsupportsupport
© Mayo Clinic College of Medicine 2006 4
Biomedical Informatics
The Historical Center of theHealth Data Universe
Clinical DataClinical Data
Billable DiagnosesBillable Diagnoses
Billable DiagnosesBillable Diagnoses
© Mayo Clinic College of Medicine 2006 5
Biomedical Informatics
Copernican Health Data Universe
Billable DiagnosesBillable Diagnoses
Clinical DataClinical Data(Niklas Koppernigk)
GuidelinesGuidelines
Scientific LiteratureScientific LiteratureMedical LiteratureMedical Literature
Clinical DataClinical Data
Genomic CharacteristicsGenomic Characteristics
© Mayo Clinic College of Medicine 2006 6
Biomedical Informatics
On the nature of data registries in CancerBon
eBlo
odBra
inCol
onPa
ncrs
Pros
tLi
ver
Disease Registry SilosClinical and Genomic Data elements
© Mayo Clinic College of Medicine 2006 7
Biomedical Informatics
Most registries share “core” dataThe balance of data will always be specific
Bone
Blood
Brain
Colon
Panc
rsPr
ost
Live
r
““CoreCore”” registry dataregistry dataCommon to all registriesCommon to all registriesDemographics, labs, Demographics, labs, DxsDxs, , PxsPxs
““specializedspecialized”” registry dataregistry dataDisease, site, or Disease, site, or PxPx specificspecific
© Mayo Clinic College of Medicine 2006 8
Biomedical Informatics
Most registries share “core” dataThe balance of data will always be specific
Bone
Blood
Brain
Colon
Panc
rsPr
ost
Live
r
““CoreCore”” registry dataregistry dataCommon to all registriesCommon to all registriesDemographics, labs, Demographics, labs, DxsDxs, , PxsPxs
““specializedspecialized”” registry dataregistry dataDisease, site, or Disease, site, or PxPx specificspecificS
iteSite
----““ C
ore
Cor
e ””----
© Mayo Clinic College of Medicine 2006 9
Biomedical Informatics
Panc
rsAutomated data populationAutomated data population
Humanly annotatedHumanly annotatedTranscribed, Transcribed, curatedcurated, edited, , edited, ……
“Core” registry data has two primary sourcesDatabases and Humans
Bone
Blood
Brain
Colon
Pros
tLi
ver
SiteSite
----““ C
ore
Cor
e ””----
© Mayo Clinic College of Medicine 2006 10
Biomedical Informatics
Panc
rs
Automated data populationAutomated data population
Humanly annotatedHumanly annotatedTranscribed, Transcribed, curatedcurated, edited, , edited, ……
An obvious goal is to increase the proportion ofthe Blue space
Bone
Blood
Brain
Colon
Pros
tLi
ver
SiteSite
----““ C
ore
Cor
e ””----
© Mayo Clinic College of Medicine 2006 11
Biomedical Informatics
Panc
rs
Automated data populationAutomated data population
Humanly annotatedHumanly annotatedTranscribed, Transcribed, curatedcurated, edited, , edited, ……
Recognize that registry specific data will always exist
Bone
Blood
Brain
Colon
Pros
tLi
ver
SiteSite
----““ C
ore
Cor
e ””----
© Mayo Clinic College of Medicine 2006 12
Biomedical Informatics
Recommendation #1Build up the Blue space
Focus on infrastructure resources that can• Systematically collect data from clinical and
genomic source systems – build warehouse• Prioritize use of the warehouse – build marts
• Automate Core sections of registries ∅End-user data retrievals from warehouse• May create epidemiology, quality marts…
• Emphasize tools that have greatest impact on the questions and users
© Mayo Clinic College of Medicine 2006 13
Biomedical Informatics
Building the Blue spaceExtract, Transform, Load (ETL) – the obvious• Problem lists, diagnoses, findings, Chief Comp.• Co-Path, pathology notes, CAP protocols• Laboratory data (atop SOA LIMS environment)• Imaging data – Radiology notes, Px, Dx• Treatments and Interventions
• Drug, chemotherapies, OTCs, Rx elsewhere…• Surgeries, bedside procedures, interventional Ψ…• Radiation, implants, devices, stents…
• Genomics, Proteomics, LIMS, …
© Mayo Clinic College of Medicine 2006 14
Biomedical Informatics
Registries, intelligently implemented,are good
Bone
Blood
Brain
Colon
Panc
rsPr
ost
Live
r
Suite of Cancer RegistriesSuite of Cancer RegistriesSynergistically wroughtSynergistically wroughtComputationally populatedComputationally populated
© Mayo Clinic College of Medicine 2006 15
Biomedical Informatics
Mayo’s Registry Design and Deploymentstrategic, consistent, supportable, useful
Bbbb
Cccc
Dddd
Eeee
Fffff
fGgg
gHhh
CancerCancer
Bbbb
Cccc
Dddd
Eeee
Fffff
fGgg
gHhh
OBOB--GynGyn
Bbbb
Cccc
Dddd
Eeee
Fffff
fGgg
gHhh
UrologicUrologicBb
bbCcc
cDdd
dEe
eeFf
ffff
Gggg
Hhh
QualityQuality
Bbbb
Cccc
Dddd
Eeee
Fffff
fGgg
gHhh
PulmonaryPulmonary
Bbbb
Cccc
Dddd
Eeee
Fffff
fGgg
gHhh
NeuroNeuro
Bbbb
Cccc
Dddd
Eeee
Fffff
fGgg
gHhh
HeartHeart
© Mayo Clinic College of Medicine 2006 16
Biomedical Informatics
Registries connected to scalable “Blue” spaceEfficiently get consistent core data to registriesBb
bbCcc
cDdd
dEe
eeFf
ffff
Gggg
Hhh
CancerCancer
Bbbb
Cccc
Dddd
Eeee
Fffff
fGgg
gHhh
OBOB--GynGyn
Bbbb
Cccc
Dddd
Eeee
Fffff
fGgg
gHhh
UrologicUrologicBb
bbCcc
cDdd
dEe
eeFf
ffff
Gggg
Hhh
QualityQuality
Bbbb
Cccc
Dddd
Eeee
Fffff
fGgg
gHhh
PulmonaryPulmonary
Bbbb
Cccc
Dddd
Eeee
Fffff
fGgg
gHhh
NeuroNeuro
Bbbb
Cccc
Dddd
Eeee
Fffff
fGgg
gHhh
HeartHeart
Core from Core from WarehouseWarehouse
© Mayo Clinic College of Medicine 2006 17
Biomedical Informatics
Registries can inform the Warehouse about their site specific or human annotated data
Bbbb
Cccc
Dddd
Eeee
Fffff
fGgg
gHhh
CancerCancer
Bbbb
Cccc
Dddd
Eeee
Fffff
fGgg
gHhh
OBOB--GynGyn
Bbbb
Cccc
Dddd
Eeee
Fffff
fGgg
gHhh
UrologicUrologicBb
bbCcc
cDdd
dEe
eeFf
ffff
Gggg
Hhh
QualityQuality
Bbbb
Cccc
Dddd
Eeee
Fffff
fGgg
gHhh
PulmonaryPulmonary
Bbbb
Cccc
Dddd
Eeee
Fffff
fGgg
gHhh
NeuroNeuro
Bbbb
Cccc
Dddd
Eeee
Fffff
fGgg
gHhh
HeartHeart
Warehouse the Warehouse the ExtensionsExtensions
© Mayo Clinic College of Medicine 2006 18
Biomedical Informatics
Registries may evolve into non-transactional, federated resources →“marts” of their own
Bbbb
Cccc
Dddd
Eeee
Fffff
fGgg
gHhh
CancerCancer
Bbbb
Cccc
Dddd
Eeee
Fffff
fGgg
gHhh
OBOB--GynGyn
Bbbb
Cccc
Dddd
Eeee
Fffff
fGgg
gHhh
UrologicUrologicBb
bbCcc
cDdd
dEe
eeFf
ffff
Gggg
Hhh
QualityQuality
Bbbb
Cccc
Dddd
Eeee
Fffff
fGgg
gHhh
PulmonaryPulmonary
Bbbb
Cccc
Dddd
Eeee
Fffff
fGgg
gHhh
NeuroNeuro
Bbbb
Cccc
Dddd
Eeee
Fffff
fGgg
gHhh
HeartHeart
UI
UI
UIUI
UI
UI
UI
© Mayo Clinic College of Medicine 2006 19
Biomedical Informatics
Recommendation #2Support Cloneable Registry Infrastructure
• Metaphor of registries as “kits”• Just add your specialized data
• Interfaces to “Core” data as commodity • Federation to Network or Grid of Registries
• Warehouse as coordinating core – initially• Retrieval Tools scalable across registries
• Emergence of standard registry “Mart”• Common interfaces across federated data• Common security and auditing infrastructure• Establish cost-effective research resource base
© Mayo Clinic College of Medicine 2006 20
Biomedical Informatics
Data StandardsPresume an accepted premise…
• Comparable and consistent data is crucial for pooling data and meta-analyses
• Data generated in standard form is more interoperable
• Data interchange outside the institution• Patient – Personal Health Records • Referring physicians• NHIN – RHIO (within One Mayo)• NIH collaboration, Big Science
• caBIG, PharmGKB, CTSA - NCRR
© Mayo Clinic College of Medicine 2006 21
Biomedical Informatics
The Genomic Era• The genomic transformation of medicine far
exceeds the introduction of antibiotics and aseptic surgery
• The binding of genomic biology and clinical medicine will accelerate
• The implications for shared semantics across the basic science and clinical communities are unprecedented
• The implications for Public Health surveillance and inference are profound
© Mayo Clinic College of Medicine 2006 22
Biomedical Informatics
The Continuum Of Biomedical InformaticsBioinformatics meets Medical Informatics
01
2345
678
910
Genomics
Proteo
mics
Metabolic
Pathway
s
Molecu
lar M
odeli
ng
Molecu
lar Sim
ulatio
n
Cellula
r Mod
els
Molecu
lar A
ssay
s
Genomic
Testi
ng
Biospec
imen
sLab
Data
Trials
Data
Diseas
e & Syn
drome
Medica
l Imag
ing
Patien
t Rec
ord D
ata
EHR Structu
res
Biology Medicine
Chasm of Semantic Despair
© Mayo Clinic College of Medicine 2006 23
Biomedical Informatics
Recommendation #3Ensure data governance adopts standards
• Process well on track in Data Trust efforts• Must synergize research and clinical agendas• Recognize this is all the same problem• Mayo cannot sustain costs of idiosyncratic data
• Maintenance of “one of a kind”• Opportunity costs in collaboration and interchange• Cannot leverage vision of patient “repositories”• Divergence: Warehouse, registries, data-marts...
• Efficient Phenotype-Genotype discovery
© Mayo Clinic College of Medicine 2006 24
Biomedical Informatics
Semantic MetadataAnnotations for Applications
• Data Organization• Data Annotation• Ontology driven architectures• Semantic interoperability• Data retrieval Analytic subsets
• The right data for the problem• Knowledge enabled analyses
• The right analyses for the data
© Mayo Clinic College of Medicine 2006 25
Biomedical Informatics
Content vs. StructureSemantic implications for applications
Family History of Breast CancerFamily History of Heart DiseaseFamily History of Stroke
Breast CancerBreast CancerHeart DiseaseHeart DiseaseStrokeStroke
Family HistoryFamily History
TerminologicModel
Information Model Equivalent
Content[adapted from Rossi-Mori]
© Mayo Clinic College of Medicine 2006 26
Biomedical Informatics
Aggregation Logics by domainrule-based aggregations
Decision Support Decision Support and Error Detectionand Error Detection
Public Health andPublic Health andSurveillanceSurveillance
Reimbursement Reimbursement and Management and Management
Outcome Research Outcome Research and Epidemiologyand EpidemiologyFindingsFindings InterventionsInterventionsEventsEvents
© Mayo Clinic College of Medicine 2006 27
Biomedical Informatics
Data organization
• Health IT Standards• Syntactic – XML, HL7, database schema,• Semantic – OWL, HL7, ontologies• Interoperable
• Self Describing Objects• Data storage technology (NCRR grant)
• Annotation Metadata• Provenance• Data management• Content, Context, Structure, Semantics
© Mayo Clinic College of Medicine 2006 28
Biomedical Informatics
Recommendation #4Annotate stuff with semantic metadata
• Know what data you have• Extract, transform, load (ETL) the right stuff• Craft data marts (registries) from the right pieces• Use the right data for the question• Ensure data integrity, reliability, consistency
© Mayo Clinic College of Medicine 2006 29
Biomedical Informatics
Mayo-IBM Semantic Annotation Project• Collaboration on application of semantic annotation to
clinical and genomic data• Apply UIMA tools• Leverage LexGrid standards• De facto evolution of distributed/SOA in Mayo research
• Demonstrate transformation of information model to interpretable semantics
• HL7 TermInfo; NCI caBIG; ONC NHIN (HITSP, HL7, NLM)• Evolve warehouse collaboration to support Semantics
• Data Navigation• Context Specification• Query Formation• Intelligent Analytic pipelining
© Mayo Clinic College of Medicine 2006 30
Biomedical Informatics
Recommendation #5Balance data investment with Analytics
• Data all dressed up with no where to go• Analytics poses considerable challenge
• De facto bottleneck in many areas• Ideal methods not always obvious
• Collaborative opportunity with many partners• IBM, TGen, Workflow partners
Biomedical Informatics
The Lexical Grid(LexGrid)
© Mayo Clinic College of Medicine 2006 32
Biomedical Informatics
The Lexical GridDefinition
• The LexGrid package represents a comprehensive set of software and services to load, publish, and access vocabulary or ontological resources.
• Provides a single information model • Published online, cross-linked, and indexed on
demand• Provides standardized building blocks and tools• Supports large-scale vocabulary adoption and use
© Mayo Clinic College of Medicine 2006 33
Biomedical Informatics
LexGridNode
DataServices
Java
.NET
...
Import
Editors
Browsers
Query Tools
XML
Browse andEdit
Export
EmbedLexBIG
Index
LexGrid Conceptual ArchitectureRRF
OBO
OBO
Text
Protégé CTS
Text
OWL
XML
Lex*Web
Clients
© Mayo Clinic College of Medicine 2006 34
Biomedical Informatics
Common Terminology Services (CTS) • An HL7 ANSI standard
• Defines the minimum set of requirements for interoperability across disparate healthcare applications
• A specification for accessing terminology content• The CTS identifies the minimum set of functional
characteristics a terminology resource must possess for use in HL7.
• A functional model• Defining the functional characteristics of vocabulary
as a set of Application Programming Interfaces (APIs)
© Mayo Clinic College of Medicine 2006 35
Biomedical Informatics
LexGrid for caBIG(LexBIG)
• Software development for NCI and Cancer Biomedical Informatics Grid (caBIG)
• Focused on common model and API• Use Cases and requirements demanded a rich
API• Coordinated infrastructure for Cancer Research• Clinical Trials, Integrative Cancer Research,
Tissue Banking and Pathology Tools• Vocabulary, Common Data Elements,
Architecture
© Mayo Clinic College of Medicine 2006 36
Biomedical Informatics
NCBO LexBIONational Center for Biomedical Ontology
• Technology and methods allowing scientists to create, disseminate, and manage biomedical information
• Infrastructure for Biomedical Ontologies and Biomedical Data (LexBIO)
• Machine-able Knowledge Representation• LexGrid – Spanning the Chasm of Semantic
Despair
© Mayo Clinic College of Medicine 2006 37
Biomedical Informatics
LexPHINCDC Public Health Informatics Network
• Adoption of the LexGrid Model• Replace PHIN Vocabulary Services (VS)• Addresses genomic characterization of disease
• Span semantic chasm with Gene Ontology• Organized Value Sets
• Outbreak Management System• Biosurveillance and Biosense aggregation
© Mayo Clinic College of Medicine 2006 38
Biomedical Informatics
Health Level Seven (HL7)
• Vocabulary and value domain management• Tooling for vocabulary submissions• Includes change events for HL7 governance
process
© Mayo Clinic College of Medicine 2006 39
Biomedical Informatics
LexGrid Applications toSemantic Annotation and Integration
• Basis for NLP (Natural Language Processing) entity annotation – clinical notes
• Harmonize data elements, values sets• Getting the data right
• Information retrieval and navigation• Getting the right data
• Grounding for data governance• Foundation for semantic interoperability
© Mayo Clinic College of Medicine 2006 40
Biomedical Informatics
Conclusions – Data Integration
• Heterogeneous data breeds idiosyncratic structure
• Data element alignment is not sufficient• Semantics derive from information models and
content• Semantic annotation of data is crucial• Tools such as LexGrid must be applied
© Mayo Clinic College of Medicine 2006 41
Biomedical Informatics
ResourcesLexGrid Projecthttp://informatics.mayo.edu/LexGrid
LexBIG Forge Sitehttp://gforge.nci.nih.gov/projects/lexbig
caBIG LexGrid CVShttp://cabigcvs.nci.nih.gov/viewcvs.cgi/lexgrid
NCBO Projecthttp://www.bioontology.org