Systems Architecture & Big data management - rhis.net.bd · Systems Architecture & Big data...
Transcript of Systems Architecture & Big data management - rhis.net.bd · Systems Architecture & Big data...
E-Health Data Standardizations
Systems Architecture &
Big data management
Javed Mostafa, MA, PhDMcColl Distinguished Term Professor
Carolina Health Informatics Program
The University of North Caroline at
Chapel Hill
December 12th, 2017
Part 1: Electronic Health Record System
• Why Data Standardization?
• System Components
• Integrating Data and Data Services
Seminar Outline
Part 2: Big Data Management & Analytics
• Big Data in Healthcare: Data Management Principles
• Data Analytics
Why EHR?
EHR promises to:
Reduce error and improve safety
Improve organizational and service efficiency
Particularly over large, distributed services
Why Data Standardization?
In EHR systems a large number of data
standards are utilized, for:
1) Transactions across diverse systems
2) Organization and storage
3) Data manipulation and access
4) Reporting and presentation
5) Quality improvement and safety
6) Reimbursement and payment
Major Components of EHR
Over long period of system developments,
conducted in isolation, several different
systems have been created to serve different
purposes
● Clinical Documentation
● Nursing Documentation
● Laboratory
● Pharmacy
● Computerized Physician Order Entry (CPOE)
System Architecture
https://www.slideshare.net/MegasChara/course-7-unit-1-introduction-overview-components-of-hit-systems
Key Areas of Data
● Registration & Admin data + meta data (MD)
● Clinical data + MD
● Lab data + MD
● Nursing data + MD
● Radiology data + MD
● Pharmacy data + MD
● Care coordination data + MD
All data above are assembled to create the EHR
for a single patient
RADT: Registration & Admission I
These data include vital information for
accurate patient identification and
assessment, including, but not necessarily
limited to, name, demographics, next of kin,
employer information, chief complaint, patient
disposition, etc.
RADT: Registration & Admission II
The registration portion of an EHR contains a
unique patient identifier, usually consisting of a
numeric or alphanumeric sequence that is
unidentifiable outside the organization or
institution in which it serves.
RADT data allows an individual’s health
information to be aggregated for use in
clinical analysis and research.
LIS: Laboratory Information System I
Laboratory information systems (LIS) that are
used as hubs to integrate orders, results from
laboratory instruments, schedules, billing, and
other administrative information.
Laboratory data is integrated entirely with the
EHR only infrequently
LIS: Laboratory Information System II
Even when the LIS is made by the same vendor
as the EHR, many machines and analyzers are
used in the diagnostic laboratory process that
are not easily integrated within the EHR.
For example, the Cerner LIS interfaces over 400
different laboratory instruments. Cerner, a
major vendor of both LIS and EHR systems,
reported that 60 percent of its LIS installations
were standalone (i.e., not integrated).
RIS: Radiology Information System
Radiology information systems (RIS) are used
by radiology departments to tie together
patient radiology data (e.g., orders,
interpretations, patient identification
information) and images
The typical RIS will include patient tracking,
scheduling, results reporting, and image
tracking functions
RIS systems are usually used in conjunction with
picture archiving communications systems
(PACS), which manage digital radiography
studies
PIS: Pharmacy Information System
Inventory management for drugs/medications;
often include drug order fulfilling robots and
payer formularies
When in-house, based in the hospital or the
provider setting, the system is linked to
computerized-order-entry
Frequently, exists outside the provider setting
and hence not directly integrated
CPOE: Computerized Order Entry
Computerized physician order entry (CPOE)
permits clinical providers to electronically order
laboratory, pharmacy, and radiology services.
CPOE systems offer a range of functionality,
from pharmacy ordering capabilities alone to
more sophisticated systems such as complete
ancillary service ordering, alerting, customized
order sets, and result reporting
CDS: Clinical Documentation System
Physician, nurse, and other clinician notes
Flow sheets (vital signs, input and output, problem lists,
MARs)
Discharge summaries
Transcription document management
Medical records abstracts
Advance directives or living wills
Durable powers of attorney for healthcare decisions
Consents (procedural)
Big Picture …
Many more systems
and components exist
… associated with the
EHR Platform
Data Integration into CDR (Clinical Data
Repository)
https://www.slideshare.net/MegasChara/course-7-unit-1-introduction-overview-components-of-hit-systems
CDR: Data Access and Presentation
Data aggregated from diverse systems
and assembled for access and
presentation on demand
Query: Find cases of CHF not taking
ACE (angiotensin-converting-enzyme)
inhibitor
Would not be possible without
integration as the billing system
holds the diagnosis code and the
pharmacy system holds the
medication profile
IDN: Integration Data Network
Integrated Delivery Network (IDN) is an
infrastructure typically used to assemble data
into a single repository called Clinical Data
Repository (CDR)
Several challenges need to be resolved:
Data standards in and across components
Interface or Application Programming Interfaces
(APIs)
Reconciling duplication and redundancies
Best-of-Breed Architecture
RADT LAB PHARM RADIOLOGY CPOE
CLINICAL DOCS NURSING DOCS
INTERFACES
Clinical Data Dictionary
Clinical Data Repository
Unified Database
CLINICAL DOCS NURSING DOCS
Clinical Data Repository
LAB PHARM CPOERADT RADIOLOGY
Which Architecture to use?
Best-of-breed offer the maximum flexibility of choosing the best systems for specific departments / applications
Require interfaces for eachBack-up / recovery difficult
Unified database option demand dealing with a single vendor for all major components
Introduces less ambiguity / unpredictability hence ensure higher availability
Hybrid option more common
Should it be onsite or cloud (ASP)?
Most EHR vendors require installation of client-
server type systems whereby expertise is
needed to support server maintenance and
customizing/modifying client interfaces as
applications demand
Application Service Provider (ASP) approach
removes the necessity of setting up and
maintaining the server environment as the
vendor takes responsibility of this aspect Customization is still needed on the user-end to
match local needs
Tea Break!
Next: Big Data
Management &
Analytics
Data & Insights
Volume of data production poses difficult challengesClinical DecisionsEvidence based medicineComparative effectiveness
Analytics is a key functional component to supportVisualizationData summarizationRecommendationsOnline, human-machine driven decision making
A basic example of analytics technique
Data Production in General
According to International Data Corp. (IDC) data production is doubling every two years
Data production in the sciences as well as in clinical operational setting is v. intensive and growing
Units of Databit (b) 0 or 1 1/8 of a byte
byte (B) 8 bits 1 byte
kilobyte (KB) 10001 bytes 1,000 bytes
megabyte (MB) 10002 bytes 1,000,000 bytes
gigabyte (GB) 10003 bytes 1,000,000,000 bytes
terabyte (TB) 10004 bytes 1,000,000,000,000 bytes
petabyte (PB) 10005 bytes 1,000,000,000,000,000 bytes
exabyte (EB) 10006 bytes 1,000,000,000,000,000,000 bytes
zettabyte (ZB) 10007 bytes1,000,000,000,000,000,000,000 bytes
yottabyte (YB) 10008 bytes1,000,000,000,000,000,000,000,000 bytes
http://techterms.com/help/data_storage_units_of_measurement
Big Data in Health Care
• Kaiser Permanente, the California-based health network which has more than 9 million members, is estimated to have between 26.5 petabytes and 44 petabytes of patient data under management just from electronic health record (EHR) data, including images and annotations. This amounts to the same amount of information contained in 4,400 Libraries of Congress.
• U.S. health care data alone reached 150 exabytes in 2011. Five exabytes (1018 gigabytes) of data would contain all the words ever spoken by human beings on earth. At this rate, big data for U.S. health care will soon reach zettabyte (1021 gigabytes) scale and even yottabytes (1024 gigabytes) not long after.
According to a Institute for Heath Technology Transformation Report ©, 2015
Advances in technology are creating an explosion of data across all industries
Variety of Information
▪ 80% of new data growth is unstructured content
▪ Emails, images, audio, video..
Volume of Digital Data
▪ Machine generated data: Sensors, RFID, etc
Velocity of Decision Making
▪ Rapidly changing business climate
▪ Need to get ahead of the curve : predict issues
and fix them
New Data New Information!
Enterprise Data Warehouse: Spencer and Merkel (2010), HIMSS 2010, Session 68.
Health Analytics Objective: Inquiry Centered Analytics Environment
Administrative
Research Clinical
Can I
access my
lab test
results?
Can I
reduce the
cost of
care?
30
Health
Analytics
Can I aggregate
data to ID public
health risks?
Can I provide
safer care?
Enterprise Data Warehouse: Spencer and Merkel (2010), HIMSS 2010, Session 68.
Challenges to the expert or experience-based practice
2000 20101990 2020
Analytics Requirement
Decisions for patients
with multiple conditions
Genetics
Proteomics and other
effector molecules
Decisions by
Clinical Symptoms
Diagnostic Imaging:
Functional and
Anatomical
Facts per
Decision
1M proteins
Gene Sequencing Data
$1K
10PB/Yr
Volume
Cost
2012
1500
16000
80000
0
10,000
20,000
30,000
40,000
50,000
60,000
70,000
80,000
90,000
US CT/day
Human Cognitive CapacityEnterprise Data Warehouse: Spencer and Merkel (2010), HIMSS 2010, Session 68.
Five Focus Areas for Healthcare Analytics
Patient/Member Analytics
Quality of Care Analytics
Incentive Analytics
Wellness & Chronic Disease Analytics
Operational Efficiency
Da
ta G
ove
rna
nc
e
Enterprise Data Warehouse: Spencer and Merkel (2010), HIMSS 2010, Session 68.
Data Storage & Representation Standards
Relational model vs. multidimensional data
Data warehouse data model and applications
Secondary analysis of EHR data
Relational Structure
Relation is a term which comes from mathematics and represents a simple two-dimensional table. Representation based on logical associations only! No pointers …
Relation = Table
Patient_ID Visit_Date Branch
Table Creation
Schema Generation
Normalization
Elimination of anomalies and redundancies
Normal forms by decomposition
Table Manipulation
Structured Query Language (SQL)
SelectInsertUpdateDelete, etc.
Using SQL: Schema Generation
Create SCHEMA s_nameAuthorization owner_namedomain definition table definition view definition, etc.
Schema owner can grant access to tables, columns, and views
Table without Normalization
Last_Name DOB Medication1 Medication 2
Faraday 1/1/1960 Acetaminophen Cough Syrup
Thomas 7/5/1975 Cimetidine Ibuprofin
Nemo 31/71966 Acetaminophen Aspirin
Coulomb 12/2/1980 Advil Saline
Nemo 31/71966 Lexapro Prozac
Improved Table …
Nurse_ID Nurse_Role Bonus_Rate
1235 Surgery 3.5
1412 Critical_Care 3
1311 ED 3.5
Nurse_ID-> Nurse_RoleNurse_ID- -> Bonus_Rate
Nurse_Role-> Bonus_Rate
Functional Dependencies (FD)
Normalized Table
Nurse_ID Nurse_Role
1235 Surgery
1412 Critical_Care
1311 ED
Nurse_Role Bonus_Rate
Surgery 3.5
Critical_Care 3
ED 3.5
Components of the Relational
Model
AttributesDomainDegree of relation
Tuples
KeysPrimaryForeign
Relational Model Trade-offs
AdvantagesEasy to express associations among tuples/recordsEasy to manipulate
DisadvantagesHard to express multi-dimensional relationships
Multi-dimensional Relational Structure
Example of Two- Dimensional vs. Multi- Dimensional
REGION
REG1 REG2 REG3
P123
P124
P125
P126
:
:
P
R
O
D
U
C
T
Two Dimensional Model
:
:
Three dimensional data cube
P
r
o
d
u
c
t
Fiscal Quarter
Qtr 1 Q
tr 2 Q
tr 3 Q
tr 4
Reg 1
P123
P124
P125
P126
Reg 2 Reg 3
R e g i o n
Multi-dimensional Schemas
• Multi-dimensional schemas are specified using:
– Dimension table•It consists of tuples of attributes of the dimension.
– Fact table•Each tuple is a recorded fact. This fact contains some measured or observed variable (s) and identifies it with pointers to dimension tables. The fact table contains the data and the dimensions associated with the data
Multi-dimensional Schemas
• Star schema:
– Consists of a fact table with a single table for each dimension
Data Warehouse: Multidimensional
Representation
Comparison with Traditional Databases
• Data Warehouses are mainly optimized for appropriate retrospective data access– Traditional databases are transactional and optimized for “real-
time” access
• Data warehouses emphasize historical data as their main purpose is to support time-series and trend analysis
• Compared with transactional databases, data warehouses are nonvolatile
• In transactional databases transactions usually change records in the database. By contrast information in data warehouse is relatively coarse grained and refresh policy is carefully chosen, usually incremental
CDW: Clinical Data Warehouse Architecture
ETL
ADS
Diabetes
InpatientETL
3
4
5
DSS
CDRSTAGE
ETL
1
2
1
2 34
5
CDR to Stage93 ETL Jobs Stage to ADS
175 ETL Jobs
ADS to IDM33 ETL Jobs
ADS to DDM17 ETL Jobs
18 – 20 hours from source data to application
DSS to Stage98 ETL Jobs
DSS: Decision SupportCDR: Clinical Data RepositoryETL: Extract Transform LoadADS: Atomic Data Store
CDW Data: Subjects
AccountAllergyAmbulatory ClaimChargeContact InformationCore MeasuresDiagnosisDrugDrug Order er
Health MaintenanceImmunizationsLabsMedicationsObservationOrderOrganizationPatientPatient Infection
Patient ReadmissionPatient Visit ProviderPayerPaymentProblemProcedureProviderVital Signs
Notes and Reports include:Ancillary Reports
Cardiology ReportsClinical NotesECG ReportsGI Reports orts
How Do We Use CDW?
Multiple ways
Scholarly Usage: Research Portal
Internal Quality and Performance Assessment
Standards review and reporting (accreditation and payor guidelines)
Research Portal: Cohort Discovery
CDW: Different Levels of Access
Level of Access Scope of Data Required Actions
De-identified & Aggregated Data Must not contain any HIPAA
defined data elements that may
potentially reveal identity
• No authorization needed, log into
research portal “Guest” login
De-identified Data with Ad-hoc Reporting Must not contain any HIPAA
defined data elements that may
potentially reveal identity
• Approval by CDW-H governance
committee
• Signed CDW-H Data Access
Agreement
• UNC IRB approval or waiver
Limited Data Set Largely de-identified PHI but may
include some identifiers
• Approval by CDW-H governance
committee
• Signed CDW-H Data Access
Agreement
• UNC IRB approval or waiver
Complete Data Set PHI that includes identifiers
beyond the limited fields
• Approval by CDW-H governance
committee
• Signed CDW-H Data Access
Agreement
• UNC IRB approval or waiver
• HIPAA Accounting of Disclosures.
Questions: Javed Mostafa, [email protected]
Please take a look at:Carolina Health Informatics Program, UNC, http://chip.unc.edu
Thank you!
This presentation was produced with the support of the United States
Agency for International Development (USAID) under the terms of MEASURE
Evaluation cooperative agreement AID-OAA-L-14-00004. MEASURE
Evaluation is implemented by the Carolina Population Center, University of
North Carolina at Chapel Hill in partnership with ICF International; John
Snow, Inc.; Management Sciences for Health; Palladium; and Tulane
University. Views expressed are not necessarily those of USAID or the United
States government.
www.measureevaluation.org