Phenotype generation from EMR by tensor factorization SEDI Durham Cohort James Lu M.D. Ph.D....
-
Upload
andrew-morrison -
Category
Documents
-
view
218 -
download
3
Transcript of Phenotype generation from EMR by tensor factorization SEDI Durham Cohort James Lu M.D. Ph.D....
Phenotype generation from EMR by tensor factorization
SEDI Durham Cohort
James Lu M.D. Ph.D.Department of Electrical and Computer EngineeringDepartment of Medicine
3.2 Trillion / yr (~21% of GDP)
Health System Under Pressure
Small Molecules, Medical Devices, Biologics, diagnostics, genomics,
transcriptomics….
Operations Novel technology
Align incentives, risk sharing, quality metrics, reducing readmissions, six
sigma/ lean, …
Where do I achieve cost arbitrage?
How do we identify which patients to
study?
Where is my patient going to do next?
Can we reorganize
patient flow?
Computable phenotypes are a top down process
PheKB, Northwestern
Many variations of computable phenotypes require adjudication by physicians.
Richesson, et al. 2013
Expensive and time consuming
EMR Data is large and ComplicatedDurham County, 2007-2011
Patient level
>240,000 patients Birthday Death (where available) Gender Race Ethnicity
Visit level
4.4 Million patient visits Average 18 measurements recorded
per visit
Indicator of presence/absence of particular diseases (computed)
Encounter date (start, end) Location (DHRH, DUH, DRH) Path (ED -> inpatient for example) Inpatient / Outpatient
> 60,000 types of observations
• CPT
• ICD9 diagnoses
• ICD9 procedures
• Lab values
• Medications
• Vitals
Intervention level
• Caveats:• Temporal gaps – People are only patients when they are sick• We want to incorporate all of this information• Don’t want to be fooled by mistakes and bias
Decompose each touch with the health care system into its parts
● Each visit is a 5-D tensor (~1 billion elements)
● Patient● Diagnosis/ Billing Codes● Labs ● Medications● Time
● Model as Counts
● Decompose into set of K rank 1 vectors
With Piyush Rai and Changwei Hui
𝒴 𝑃𝑜𝑖𝑠 ¿
x
Code
s
Labs
Medications
Time
+…
Computational phenotypes are a bottom-up process. Factors represent latent phenotypesEvaluate 11242 pts with ~23MM data-points with morbidity outcomes in diabetes
Alprazolam
Urate
Factor 2
Factor 10
Malignant Neoplasm Prostate
Clinical Trial Participation
Secondary Malignant Neoplasms of Bone
External Catheter Set
CEAAG 15-3
Allopurinol
Evening Primrose Oil
Systemic Lupus Erythematosus
Side Effects from Statins
Shoulder Pain
Calcidiol
Jo-1
Patients are composites of common and rare latent phenotypes.
ER/ EKG
Standard Labs (i.e. CBC/ BMP)
Kidney Disease
Hypertension
Surgical Patient
Patient by Factor Score Matrix, 40 most common phenotypes
Compare Outcome prediction to Known Algorithm (UKPDS)
UKPDS: UK Prospective Diabetes Study outcomes model used to predict MI, Death, and Stroke
7 demographic + lab variables: age, ethnicity, smoking status A1c, HDL, Total Cholesterol and
Systolic BP
Dataset Original 7 variable model All Data Non Matrix Factorization Tensor Factorization
Can we predict outcome in next year
Death AMI Stroke
Classification Model: Fit data with Random Forests 10 fold cross validation
With Joseph Lucas
Tensor derived factors performs better than original UKPDS in all outcomes, provides comparable performance to “all-data” model
Stroke is similar to Dat