Tuesday, March 1, 2016 - HIMSS20Calculation of relevant features suggested by literature Raw limited...
Transcript of Tuesday, March 1, 2016 - HIMSS20Calculation of relevant features suggested by literature Raw limited...
Enhancing Patient Outcomes with Big Data: Two Case Studies
Tuesday, March 1, 2016
David A. Friedenberg Ph.D., Principal Research Statistician, Battelle
Nancy McMillan Ph.D., Research Leader, Battelle
Conflict of Interest
David Friedenberg, Ph.D. and Nancy McMillan, Ph.D.
Salary: Salaried employees of Battelle
Agenda
• Introduction
• Using EHR data to predict acute kidney injuries (AKI)
• Analyzing intracortical brain data to reanimate a paralyzed limb
• Conclusion
• Q&A
Learning Objectives
• Demonstrate with real data how data from EHRs can be used to develop
accurate disease prediction models
• Describe a system for bypassing a damaged spinal cord by using large
amounts of data collected from a cortical implant to control a muscle
stimulation system which moves a paralyzed limb controlled by the
subject's thoughts
• Discuss the potential and some of the pitfalls of using big data to improve
patient outcomes using two real world examples
Benefits Realized for the Value of Health IT The value steps impacted were:
Treatment/Clinical
Electronic Secure Data
http://www.himss.org/ValueSuit
e
86%/77%
Sensitivity/Specificity
of 24hr AKI Prediction
Movement
Possible
Paralyzed person
can control hand
movements with
thoughts
Introduction and Methods Used
• Hypothesis – Inpatient AKI is predictable in advance based on electronic health record (EHR) data
• Goal - Predict AKI 24 hours in advance of its occurrence
• Approach – Conduct a retrospective analysis of hospital inpatients to develop a predictive model for identifying patients that are at-risk for AKI
• Monitoring Requirement: 6-hour urinary output rate and serum creatinine concentration difference from baseline are both available continuously for a six hour period
• AKI Encounters – 878 adult, non-prisoner encounters meeting the AKI Network Level 2 or 3 criteria and satisfying the monitoring requirement for 6 or more hours immediately prior to the first AKI event
• Control Encounters – 5096 adult, non-prisoner encounters for which there was a period of 6 or more hours sometime during the encounter during which the monitoring criterion was continuously met
Methods Used (Continued)
• The study database was populated with the following data types for each study encounter:
• Employing two-thirds of the study data:
– Statistical optimization routines were applied to select the risk factors that were most predictive of a future occurrence of AKI
– A logistic regression model employing the selected risk factors was derived from the data and used to produce an AKI risk index on a scale of 0 to 100
– demographic data
– medications administered
– lab test results
– urinary output rates
– vital measurements
– present-on-admission (POA) diagnoses
– problem list diagnoses
– procedures performed
Etiological Model
• The purpose of the etiological model is to identify:
– Identify physiological causal pathways leading to the adverse outcome
– Identify risk factors associated with the causal pathways
• Model development involves:
– Literature review
– Consultation with healthcare professionals
– Knowledge of risk factors for which useful information exists in electronic patient records
Data Management Process Stage
Receipt
Stage
Process
Stage
Transfer
Stage
Clean
Stage
Compute
Stage
Release
Raw data
converted to
limited use;
PHI reviewer
reviews
Raw limited
use data
loaded to
SQL
database
Map data to
standardized
format
Backup;
validation
processes
performed
Calculation of
relevant
features
suggested by
literature
Raw limited
use data
released to
analysis
team
Database
made
available to
analysis team;
PHI reviewer
reviews
Figures,
tables, and
summary
results
PH
I S
erv
er
LU
Se
rve
r L
U A
na
lytics
Se
rve
r
Exte
rna
l
Syste
m
EHR Data Types Utilized
1. Events
Hospital Admission/Discharge
Patient Location Intervals
Patient Care Level Intervals
2. Admission and Discharge Data
Hospital Admission Data
Hospital Discharge Data
3. Patient Data
4. Clinical Flowsheet Data
Clinical Observations
Intake & Output
Ventilation Data
Vital Signs
5. Test and Procedure Results
6. Medication Administered
7. Problem List Entries
8. Diagnoses and Procedures
Diagnosis Codes
Procedure Codes
9. Orders
Lab Test Orders
Procedure Orders
Medication Orders
Analysis Dataset
• Static Data Set – Contains:
– A single record per patient
– Data for static variables, variables that do not vary significantly during a patient’s ICU stay (Age, Race, Weight, etc)
• Dynamic Data Set – Contains:
– Multiple time-stamped records per patient
– Data for dynamic variables, variables that vary significantly during a patient’s ICU stay (Vitals, Urinary Output Rate, Serum Creatinine Concentration, etc)
– Each data set record contains the full record of dynamic variable values that may be used for clinical decision-making from the record’s timestamp to the timestamp of the subsequent record
– A new record is generated when (a) a new value of any dynamic variable is entered into the patient EMR or (b) a value in the current dynamic patient record expires
Model Selection and Fitting
• Stepwise selection of static & dynamic variables to include in the outcome likelihood model based on statistical and practical significance
• Easy for data sets with one observation per patient
• Challenging for data sets with multiple observations per patient
– Correlations among records for each patient invalidate statistical inferences for simple models
– Fitting models that include a random patient effect and produce meaningful estimates of static variable effects was ineffective
– Currently implementing a “k-fold” method based on contributions to area under the ROC curve
• Alerts are based on:
– Selection of an alerting threshold that balances the competing goals of minimizing false negatives and false positives
• Performance of fixed-threshold alerting procedures is characterized in terms of sensitivity and specificity
Alerts – Using the Model
• Receiver-operator characteristic curves are used to:
– Characterize true positive/false positive behavior for all possible thresholds
– Aid in threshold selection
Risk Attribution
• For every outcome likelihood model, there is a list of contributing risk variables
• The attribution process creates a vector of attribution percentages, one for each risk variable
• The attribution percentages add to 100%
• Each attribution percentage characterizes the degree to which the likelihood of an adverse outcome is attributable to the corresponding risk variable
The Predictive Model
• Example applications of the predictive model:
– The AKI incidence rate for a patient weighing 68 kg and experiencing none of the 6 conditions in the figure above is 1.72% and the corresponding odds ratio for AKI is 0.0175
– The AKI incidence rate for a patient weighing 68 kg who is post-open-heart-surgery and on a ventilator (but is experiencing none of the other 4 conditions in the figure above) is 20.3% with a corresponding AKI odds ratio of 0.254
• The risk factors selected for the predictive model and their multiplicative contributions to the odds of a future AKI event are reported in the figure to the right
Odds =𝑝1
1−𝑝1
Prediction Performance • The performance of the predictive model / AKI risk index was
characterized:
– Employed the complementary one-third of the study data that was not used in developing the model as the test set
– Determined sensitivity and specificity values for all risk index thresholds and plotted sensitivity vs. 1-specificity to create receiver-operator characteristic (ROC) curves
– Produced ROC curves for predicting AKI at 6, 12 and 24 hours prior to AKI event (see figure below for 24-hr ROC curve)
• The AKI risk index was very effective at predicting the future occurrence of AKI as evidenced in the table below
AKI Prediction Conclusions
• Based on the results of this study, it is concluded that many patients who are at-risk for AKI can be identified in advance based on data typically stored in an electronic health record (EHR) system
• The methods used in this study should be similarly applicable to other hospital complications such as the need for ventilator assistance, sepsis, shock, infection following surgery, heart attack, and stroke
• Predictive models for hospital complications (like the model derived in this study) can form the basis for in-hospital EHR applications (see mock-up) that monitor patients’ risk indices over time and attribute risk to contributing factors included in the predictive models
Clinical Predictive Analytics Acknowledgements
• Battelle
– Steve Rust, Ph.D.
– Dan Haber
– Mark Davis
– Doug Mooney, Ph.D.
– Michele Morara
– Darlene Wells
• Ohio State University
– Naeem Ali, MD
– Andrew Thomas, MD
– Phyllis Teeter
Analyzing intracortical brain data to reanimate a paralyzed limb
Study overview • FDA- and IRB-approved
clinical IDE study
• Investigate the
effectiveness of a
cortically controlled
neuromuscular
stimulation to restore
movement in a paralyzed
person
• Study participant is a 24-
yr old male who suffered
a complete C5/C6 spinal
cord injury from a diving
accident
Source: Battelle and Ohio State
Big Data Challenges
• Large Volume of Data
• 30,000 samples/sec x 96 electrodes
• Filter data artifacts
• Isolate signal from noise
• Fast Processing
• Need to pull data, filter, isolate, decode and stim in 0.1 sec
• Offline Data Analysis
• Optimize Decoders and Algorithms
• Design experiments
• Publications
How does it work?
Image Source: Wikimedia Commons
Neural Spikes
Image by Eric Chudler, UW
Neural data
Neural Spike Panel
Turning Big Data into Manageable Data
• 300,000 samples/sec is a ton of data
• By isolating the signal and filtering out the noise data can be summarized in a much more compact form
• Common theme in many successful big data applications
• Big raw data can be more useful when converted to smaller more concise data
Filters
• 0.3Hz 1st order low pass and 7.5kHz 3rd order high pass Butterworth analog hardware filter applied to data
• 60Hz background filter
• Stimulation artifact
• Static shock
• We can build filters to remove known signal artifacts
• Use wavelets to build robust features
Wavelets as an alternative to spikes
• Wavelets are used for nonparametric regression, signal processing, image analysis etc.
• Represent the raw electrical signal using a wavelet basis
• Localized in both time and frequency
• Coefficients can then be used to represent the signal
Business Sensitive 31
http://www.aticourses.com/blog/index.php/tag/continuous-wavelet-transform/ Elements of Statistical Learning 2nd edition
Spikes vs. Wavelets
• Spikes are usually manually sorted (subjective)
• There are methods for automatic spike sorting – PCA, threshold crossing etc.
• Wavelet choice may have some effect, but optimization can be automated
• We hypothesize the wavelet signal will not decline over time as much as the spike signal
Business Sensitive 32
Wavelet choice
Business Sensitive 33
D. Farina et al, 2007
34
Wavelet decomposition and
Multi-unit activity (MUA)
• Wavelet decomposition was used to
extract and characterize the raw signal
into different frequency sub-bands
• Wavelet methods does not require spike
sorting
• Multi-unit activity (MUA) is defined as that
corresponding to wavelet scales 4 and 5
• Single-unit activity (SUA) is defined as
that corresponding to wavelet scales 0-3
SUA
MUA
LFP
Sharma et al., 2015
db4 Wavelet Decomposition
Business Sensitive 35
Source: Battelle
Decoding
• Signal is input into our decoding (aka classification) algorithms
• Translate brain activity to imagined movement
• Training data is acquired by having the subject imagine the movements of an animated hand he is seeing on a screen
• Decoders are trained and then used to control sleeve in test mode
Business Sensitive 36
SVM Decoders
• We use a custom regularized Support Vector Machine (Humber et al.,2012)
• Separate decoders for each movement
• 96 wavelet features, one for each channel
• Predictor variables go through a non-linear Gaussian radial basis kernel to capture relationships between channels
Business Sensitive 37
Business Sensitive 38
NeuroLife Acknowledgements
• Battelle
– Chad Bouton, MS
– Nicholas Annetta, MS
– Gaurav Sharma, PhD
– Stephanie Kute, PhD
– Nick Skomrock, MS
– Vimal Buck, MS
– Fritz Eubanks, PhD
– Jeff Friend
– Brad Glenn, PhD
– Mingming Zhang, PH
• Ohio State University
• Ali Rezai, MD
• Jerry Mysiw, MD
• Dina Aziz
• Marcie Bockbrader, MD, PhD
• Ammar Shaikhouni, MD, PhD
• Per Sederberg, PhD
• Dylan Nielson, MD, PhD
Benefits Realized for the Value of Health IT The value steps impacted were:
Treatment/Clinical
Electronic Secure Data
http://www.himss.org/ValueSuit
e
86%/77%
Sensitivity/Specificity
of 24hr AKI Prediction
Movement
Possible
Paralyzed person
can control hand
movements with
thoughts
Questions
• David A. Friedenberg, Ph.D
• https://www.linkedin.com/in/davidafriedenberg
• Nancy McMillan, Ph.D.