Tuesday, March 1, 2016 - HIMSS20Calculation of relevant features suggested by literature Raw limited...

Enhancing Patient Outcomes with Big Data: Two Case Studies

Tuesday, March 1, 2016

David A. Friedenberg Ph.D., Principal Research Statistician, Battelle

Nancy McMillan Ph.D., Research Leader, Battelle

Conflict of Interest

David Friedenberg, Ph.D. and Nancy McMillan, Ph.D.

Salary: Salaried employees of Battelle

Agenda

• Introduction

• Using EHR data to predict acute kidney injuries (AKI)

• Analyzing intracortical brain data to reanimate a paralyzed limb

• Conclusion

• Q&A

Learning Objectives

• Demonstrate with real data how data from EHRs can be used to develop

accurate disease prediction models

• Describe a system for bypassing a damaged spinal cord by using large

amounts of data collected from a cortical implant to control a muscle

stimulation system which moves a paralyzed limb controlled by the

subject's thoughts

• Discuss the potential and some of the pitfalls of using big data to improve

patient outcomes using two real world examples

Benefits Realized for the Value of Health IT The value steps impacted were:

Treatment/Clinical

Electronic Secure Data

http://www.himss.org/ValueSuit

e

86%/77%

Sensitivity/Specificity

of 24hr AKI Prediction

Movement

Possible

Paralyzed person

can control hand

movements with

thoughts

Introduction and Methods Used

• Hypothesis – Inpatient AKI is predictable in advance based on electronic health record (EHR) data

• Goal - Predict AKI 24 hours in advance of its occurrence

• Approach – Conduct a retrospective analysis of hospital inpatients to develop a predictive model for identifying patients that are at-risk for AKI

• Monitoring Requirement: 6-hour urinary output rate and serum creatinine concentration difference from baseline are both available continuously for a six hour period

• AKI Encounters – 878 adult, non-prisoner encounters meeting the AKI Network Level 2 or 3 criteria and satisfying the monitoring requirement for 6 or more hours immediately prior to the first AKI event

• Control Encounters – 5096 adult, non-prisoner encounters for which there was a period of 6 or more hours sometime during the encounter during which the monitoring criterion was continuously met

Methods Used (Continued)

• The study database was populated with the following data types for each study encounter:

• Employing two-thirds of the study data:

– Statistical optimization routines were applied to select the risk factors that were most predictive of a future occurrence of AKI

– A logistic regression model employing the selected risk factors was derived from the data and used to produce an AKI risk index on a scale of 0 to 100

– demographic data

– medications administered

– lab test results

– urinary output rates

– vital measurements

– present-on-admission (POA) diagnoses

– problem list diagnoses

– procedures performed

Etiological Model

• The purpose of the etiological model is to identify:

– Identify physiological causal pathways leading to the adverse outcome

– Identify risk factors associated with the causal pathways

• Model development involves:

– Literature review

– Consultation with healthcare professionals

– Knowledge of risk factors for which useful information exists in electronic patient records

Data Management Process Stage

Receipt

Stage

Process

Stage

Transfer

Stage

Clean

Stage

Compute

Stage

Release

Raw data

converted to

limited use;

PHI reviewer

reviews

Raw limited

use data

loaded to

SQL

database

Map data to

standardized

format

Backup;

validation

processes

performed

Calculation of

relevant

features

suggested by

literature

Raw limited

use data

released to

analysis

team

Database

made

available to

analysis team;

PHI reviewer

reviews

Figures,

tables, and

summary

results

PH

I S

erv

er

LU

Se

rve

r L

U A

na

lytics

Se

rve

r

Exte

rna

l

Syste

m

EHR Data Types Utilized

1. Events

Hospital Admission/Discharge

Patient Location Intervals

Patient Care Level Intervals

2. Admission and Discharge Data

Hospital Admission Data

Hospital Discharge Data

3. Patient Data

4. Clinical Flowsheet Data

Clinical Observations

Intake & Output

Ventilation Data

Vital Signs

5. Test and Procedure Results

6. Medication Administered

7. Problem List Entries

8. Diagnoses and Procedures

Diagnosis Codes

Procedure Codes

9. Orders

Lab Test Orders

Procedure Orders

Medication Orders

Analysis Dataset

• Static Data Set – Contains:

– A single record per patient

– Data for static variables, variables that do not vary significantly during a patient’s ICU stay (Age, Race, Weight, etc)

• Dynamic Data Set – Contains:

– Multiple time-stamped records per patient

– Data for dynamic variables, variables that vary significantly during a patient’s ICU stay (Vitals, Urinary Output Rate, Serum Creatinine Concentration, etc)

– Each data set record contains the full record of dynamic variable values that may be used for clinical decision-making from the record’s timestamp to the timestamp of the subsequent record

– A new record is generated when (a) a new value of any dynamic variable is entered into the patient EMR or (b) a value in the current dynamic patient record expires

Model Selection and Fitting

• Stepwise selection of static & dynamic variables to include in the outcome likelihood model based on statistical and practical significance

• Easy for data sets with one observation per patient

• Challenging for data sets with multiple observations per patient

– Correlations among records for each patient invalidate statistical inferences for simple models

– Fitting models that include a random patient effect and produce meaningful estimates of static variable effects was ineffective

– Currently implementing a “k-fold” method based on contributions to area under the ROC curve

• Alerts are based on:

– Selection of an alerting threshold that balances the competing goals of minimizing false negatives and false positives

• Performance of fixed-threshold alerting procedures is characterized in terms of sensitivity and specificity

Alerts – Using the Model

• Receiver-operator characteristic curves are used to:

– Characterize true positive/false positive behavior for all possible thresholds

– Aid in threshold selection

Risk Attribution

• For every outcome likelihood model, there is a list of contributing risk variables

• The attribution process creates a vector of attribution percentages, one for each risk variable

• The attribution percentages add to 100%

• Each attribution percentage characterizes the degree to which the likelihood of an adverse outcome is attributable to the corresponding risk variable

The Predictive Model

• Example applications of the predictive model:

– The AKI incidence rate for a patient weighing 68 kg and experiencing none of the 6 conditions in the figure above is 1.72% and the corresponding odds ratio for AKI is 0.0175

– The AKI incidence rate for a patient weighing 68 kg who is post-open-heart-surgery and on a ventilator (but is experiencing none of the other 4 conditions in the figure above) is 20.3% with a corresponding AKI odds ratio of 0.254

• The risk factors selected for the predictive model and their multiplicative contributions to the odds of a future AKI event are reported in the figure to the right

Odds =𝑝1

1−𝑝1

Prediction Performance • The performance of the predictive model / AKI risk index was

characterized:

– Employed the complementary one-third of the study data that was not used in developing the model as the test set

– Determined sensitivity and specificity values for all risk index thresholds and plotted sensitivity vs. 1-specificity to create receiver-operator characteristic (ROC) curves

– Produced ROC curves for predicting AKI at 6, 12 and 24 hours prior to AKI event (see figure below for 24-hr ROC curve)

• The AKI risk index was very effective at predicting the future occurrence of AKI as evidenced in the table below

AKI Prediction Conclusions

• Based on the results of this study, it is concluded that many patients who are at-risk for AKI can be identified in advance based on data typically stored in an electronic health record (EHR) system

• The methods used in this study should be similarly applicable to other hospital complications such as the need for ventilator assistance, sepsis, shock, infection following surgery, heart attack, and stroke

• Predictive models for hospital complications (like the model derived in this study) can form the basis for in-hospital EHR applications (see mock-up) that monitor patients’ risk indices over time and attribute risk to contributing factors included in the predictive models

Clinical Predictive Analytics Acknowledgements

• Battelle

– Steve Rust, Ph.D.

– Dan Haber

– Mark Davis

– Doug Mooney, Ph.D.

– Michele Morara

– Darlene Wells

• Ohio State University

– Naeem Ali, MD

– Andrew Thomas, MD

– Phyllis Teeter

Analyzing intracortical brain data to reanimate a paralyzed limb

Study overview • FDA- and IRB-approved

clinical IDE study

• Investigate the

effectiveness of a

cortically controlled

neuromuscular

stimulation to restore

movement in a paralyzed

person

• Study participant is a 24-

yr old male who suffered

a complete C5/C6 spinal

cord injury from a diving

accident

Source: Battelle and Ohio State

Big Data Challenges

• Large Volume of Data

• 30,000 samples/sec x 96 electrodes

• Filter data artifacts

• Isolate signal from noise

• Fast Processing

• Need to pull data, filter, isolate, decode and stim in 0.1 sec

• Offline Data Analysis

• Optimize Decoders and Algorithms

• Design experiments

• Publications

How does it work?

Image Source: Wikimedia Commons

Neural Spikes

Image by Eric Chudler, UW

Neural data

Neural Spike Panel

Turning Big Data into Manageable Data

• 300,000 samples/sec is a ton of data

• By isolating the signal and filtering out the noise data can be summarized in a much more compact form

• Common theme in many successful big data applications

• Big raw data can be more useful when converted to smaller more concise data

Filters

• 0.3Hz 1st order low pass and 7.5kHz 3rd order high pass Butterworth analog hardware filter applied to data

• 60Hz background filter

• Stimulation artifact

• Static shock

• We can build filters to remove known signal artifacts

• Use wavelets to build robust features

Wavelets as an alternative to spikes

• Wavelets are used for nonparametric regression, signal processing, image analysis etc.

• Represent the raw electrical signal using a wavelet basis

• Localized in both time and frequency

• Coefficients can then be used to represent the signal

Business Sensitive 31

http://www.aticourses.com/blog/index.php/tag/continuous-wavelet-transform/ Elements of Statistical Learning 2nd edition

Spikes vs. Wavelets

• Spikes are usually manually sorted (subjective)

• There are methods for automatic spike sorting – PCA, threshold crossing etc.

• Wavelet choice may have some effect, but optimization can be automated

• We hypothesize the wavelet signal will not decline over time as much as the spike signal


Wavelet choice


D. Farina et al, 2007

34

Wavelet decomposition and

Multi-unit activity (MUA)

• Wavelet decomposition was used to

extract and characterize the raw signal

into different frequency sub-bands

• Wavelet methods does not require spike

sorting

• Multi-unit activity (MUA) is defined as that

corresponding to wavelet scales 4 and 5

• Single-unit activity (SUA) is defined as

that corresponding to wavelet scales 0-3

SUA

MUA

LFP

Sharma et al., 2015

db4 Wavelet Decomposition


Source: Battelle

Decoding

• Signal is input into our decoding (aka classification) algorithms

• Translate brain activity to imagined movement

• Training data is acquired by having the subject imagine the movements of an animated hand he is seeing on a screen

• Decoders are trained and then used to control sleeve in test mode


SVM Decoders

• We use a custom regularized Support Vector Machine (Humber et al.,2012)

• Separate decoders for each movement

• 96 wavelet features, one for each channel

• Predictor variables go through a non-linear Gaussian radial basis kernel to capture relationships between channels


NeuroLife Acknowledgements

• Battelle

– Chad Bouton, MS

– Nicholas Annetta, MS

– Gaurav Sharma, PhD

– Stephanie Kute, PhD

– Nick Skomrock, MS

– Vimal Buck, MS

– Fritz Eubanks, PhD

– Jeff Friend

– Brad Glenn, PhD

– Mingming Zhang, PH

• Ohio State University

• Ali Rezai, MD

• Jerry Mysiw, MD

• Dina Aziz

• Marcie Bockbrader, MD, PhD

• Ammar Shaikhouni, MD, PhD

• Per Sederberg, PhD

• Dylan Nielson, MD, PhD

Benefits Realized for the Value of Health IT The value steps impacted were:

Treatment/Clinical

Electronic Secure Data

http://www.himss.org/ValueSuit

e

86%/77%

Sensitivity/Specificity

of 24hr AKI Prediction

Movement

Possible

Paralyzed person

can control hand

movements with

thoughts

Questions

• David A. Friedenberg, Ph.D

• [email protected]

• https://www.linkedin.com/in/davidafriedenberg

• Nancy McMillan, Ph.D.

• [email protected]

mailto:[email protected]

mailto:[email protected]

Tuesday, March 1, 2016 - HIMSS20Calculation of relevant features suggested by literature Raw limited...

Documents

Transcript of Tuesday, March 1, 2016 - HIMSS20Calculation of relevant features suggested by literature Raw limited...