Phenotype generation from EMR by tensor factorization SEDI Durham Cohort James Lu M.D. Ph.D....

11
Phenotype generation from EMR by tensor factorization SEDI Durham Cohort James Lu M.D. Ph.D. Department of Electrical and Computer Engineering Department of Medicine

Transcript of Phenotype generation from EMR by tensor factorization SEDI Durham Cohort James Lu M.D. Ph.D....

Page 1: Phenotype generation from EMR by tensor factorization SEDI Durham Cohort James Lu M.D. Ph.D. Department of Electrical and Computer Engineering Department.

Phenotype generation from EMR by tensor factorization

SEDI Durham Cohort

James Lu M.D. Ph.D.Department of Electrical and Computer EngineeringDepartment of Medicine

Page 2: Phenotype generation from EMR by tensor factorization SEDI Durham Cohort James Lu M.D. Ph.D. Department of Electrical and Computer Engineering Department.

3.2 Trillion / yr (~21% of GDP)

Health System Under Pressure

Page 3: Phenotype generation from EMR by tensor factorization SEDI Durham Cohort James Lu M.D. Ph.D. Department of Electrical and Computer Engineering Department.

Small Molecules, Medical Devices, Biologics, diagnostics, genomics,

transcriptomics….

Operations Novel technology

Align incentives, risk sharing, quality metrics, reducing readmissions, six

sigma/ lean, …

Where do I achieve cost arbitrage?

How do we identify which patients to

study?

Where is my patient going to do next?

Can we reorganize

patient flow?

Page 4: Phenotype generation from EMR by tensor factorization SEDI Durham Cohort James Lu M.D. Ph.D. Department of Electrical and Computer Engineering Department.

Computable phenotypes are a top down process

PheKB, Northwestern

Page 5: Phenotype generation from EMR by tensor factorization SEDI Durham Cohort James Lu M.D. Ph.D. Department of Electrical and Computer Engineering Department.

Many variations of computable phenotypes require adjudication by physicians.

Richesson, et al. 2013

Expensive and time consuming

Page 6: Phenotype generation from EMR by tensor factorization SEDI Durham Cohort James Lu M.D. Ph.D. Department of Electrical and Computer Engineering Department.

EMR Data is large and ComplicatedDurham County, 2007-2011

Patient level

>240,000 patients Birthday Death (where available) Gender Race Ethnicity

Visit level

4.4 Million patient visits Average 18 measurements recorded

per visit

Indicator of presence/absence of particular diseases (computed)

Encounter date (start, end) Location (DHRH, DUH, DRH) Path (ED -> inpatient for example) Inpatient / Outpatient

> 60,000 types of observations

• CPT

• ICD9 diagnoses

• ICD9 procedures

• Lab values

• Medications

• Vitals

Intervention level

• Caveats:• Temporal gaps – People are only patients when they are sick• We want to incorporate all of this information• Don’t want to be fooled by mistakes and bias

Page 7: Phenotype generation from EMR by tensor factorization SEDI Durham Cohort James Lu M.D. Ph.D. Department of Electrical and Computer Engineering Department.

Decompose each touch with the health care system into its parts

● Each visit is a 5-D tensor (~1 billion elements)

● Patient● Diagnosis/ Billing Codes● Labs ● Medications● Time

● Model as Counts

● Decompose into set of K rank 1 vectors

With Piyush Rai and Changwei Hui

𝒴 𝑃𝑜𝑖𝑠 ¿

x

Code

s

Labs

Medications

Time

+…

Page 8: Phenotype generation from EMR by tensor factorization SEDI Durham Cohort James Lu M.D. Ph.D. Department of Electrical and Computer Engineering Department.

Computational phenotypes are a bottom-up process. Factors represent latent phenotypesEvaluate 11242 pts with ~23MM data-points with morbidity outcomes in diabetes

Alprazolam

Urate

Factor 2

Factor 10

Malignant Neoplasm Prostate

Clinical Trial Participation

Secondary Malignant Neoplasms of Bone

External Catheter Set

CEAAG 15-3

Allopurinol

Evening Primrose Oil

Systemic Lupus Erythematosus

Side Effects from Statins

Shoulder Pain

Calcidiol

Jo-1

Page 9: Phenotype generation from EMR by tensor factorization SEDI Durham Cohort James Lu M.D. Ph.D. Department of Electrical and Computer Engineering Department.

Patients are composites of common and rare latent phenotypes.

ER/ EKG

Standard Labs (i.e. CBC/ BMP)

Kidney Disease

Hypertension

Surgical Patient

Patient by Factor Score Matrix, 40 most common phenotypes

Page 10: Phenotype generation from EMR by tensor factorization SEDI Durham Cohort James Lu M.D. Ph.D. Department of Electrical and Computer Engineering Department.

Compare Outcome prediction to Known Algorithm (UKPDS)

UKPDS: UK Prospective Diabetes Study outcomes model used to predict MI, Death, and Stroke

7 demographic + lab variables: age, ethnicity, smoking status A1c, HDL, Total Cholesterol and

Systolic BP

Dataset Original 7 variable model All Data Non Matrix Factorization Tensor Factorization

Can we predict outcome in next year

Death AMI Stroke

Classification Model: Fit data with Random Forests 10 fold cross validation

With Joseph Lucas

Page 11: Phenotype generation from EMR by tensor factorization SEDI Durham Cohort James Lu M.D. Ph.D. Department of Electrical and Computer Engineering Department.

Tensor derived factors performs better than original UKPDS in all outcomes, provides comparable performance to “all-data” model

Stroke is similar to Dat