Financing Your EHR Implementation - hp.com · Financing Your EHR Implementation ...
Bulk Learning on EHR Data
-
Upload
barnett-p-chiu -
Category
Science
-
view
76 -
download
0
Transcript of Bulk Learning on EHR Data
BulkLearningonEHRData
Po-Hsiang Chiu, George Hripcsak Department of Biomedical Informatics
Columbia University
InaNutshell…
Bulk Learning is a batch-phenotyping method/framework that uses multiple diseases collectively (i.e. bulk learning set) as a substrate for model learning and evaluation in which model stacking is used to construct abstract feature representation of low sample complexity in order to reduce training requirements.
Phenotyping
• Defini<on
source:h?p://www.evolu<on.berkeley.edu/
• Diseasesandsubtypes• Concept-drivendiseasecohorts
– 100infec<ousdiseasesasthedomainofstudy(i.e.bulklearningset)– Phenotypicmodelsassociatedwithlabtests,medicinalprescrip<ons
• Dimensionalityreduc<on
BulkLearningBasicsI
• ABatch-phenotypingmethod/framework• Addressestwocentralissuesinpredic<veanaly<calapproachtocomputa<onalphenotyping– Featureengineering
• Medicalontologyforfeaturedecomposi<on• E.g.MED(h?p://med.dmi.columbia.edu)
– Dataannota<on• Ensemblelearning(e.g.stackedgeneraliza<on[Wolpert1992])
• Featureabstrac<onfordimensionalityreduc<on
BulkLearningBasicsII
• Usesdiagnos<ccodes(e.g.ICD-9)assurrogatelabelstoestablish“approximatepredic<vemodels.”
• Whysurrogatelabels(e.g.ICD-9)?– FeaturesextractedfromEHRcanbelarge– Morecompactrepresenta<onofthetrainingdata– “Free”supervisedsignalsthataresufficientlyclosebutcanbeobtainedwithoutextrawork
• Objec<ve:Buildsta<s<calmodelsinabstractfeaturespace– Createasmallannota<onset(i.e.goldstandard)thatservesaproxydatasetfordownstreammodelevalua<ons
BulkLearningBasicsIII• Whyinspec<ngmul<ple(infec<ous)diseases?
– Usingmul&plediseasesassubstrateandiden<fyingtheircommonelements– Examplestackingarchitecture(understackedgeneraliza<onmethod)
Level 1
Level 0
Antibiotic Measure
Urinary Chemistry Measure
Intravenous Chemistry Measure
Microbiology Measure
Level 2
Attributes: Level-0 Probabilities and IndicatorsTarget: Diagnostic Codes (Silver Standard)
Other Phenotypic Measures (e.g. Antiviral)
Attributes: Level-1 Probabilities and ICD-9Target: True Labels (Gold Standard)
⌃
⌃
⌃
⌃
m1
a1
b1
u1
logistic unitsraw
featuresf11
f12
f1j
f21
f2j
f31
f41
f3j
m1
a1
b1
u1
⌃
Level 0 Level 1
Microbiology
An<bio<c
Bloodtest
Urinetest
⌃
⌃
⌃
⌃
⌃
m1
a1
b1
u1
m1(1)
m1g a1g b1g u1g
global2
(i)
(i-1)
(i+1)
a1(1) b1
(1) u1(1)
logistic unitsraw
features
microbiology
antibioticblood test
urine test
f11
f12
f1j
f21
f2j
f31
f41
f3j
Four Example Base Models
MovingForward• Summary
– Bulklearningisaframeworkwithatleastthefollowingsystemchoices• Thebulklearningset(oftargetcondi<ons)=>basemodels• Classifica<onalgorithms(guideline:probabilis<cclassifiers+well-calibrated)• Stackingarchitecture(mul<ple<ers=>levelsofabstrac<ons)• Strategyforcombiningindividual(local)diseasemodelstoaglobalmodel
– Advantage:Canuseasmallannotatedsampleformodelconstruc<onandevalua<onwithintheabstractfeaturespace(e.g.level-1data)
• 83clinicalcaseswerelabeledinthisstudy(tobediscussedmorecomprehensively)– Challenge:Themodelinvolvingtheinterac<onbetweenabstractfeaturesand
ICD-9donotgeneralizewellintotheregionofthedatawheretheICD-9codingwasincorrect
• Mul<pletypesofsurrogatelabels⌃
⌃
⌃
⌃
⌃
m1
a1
b1
u1
m1(1)
m1(i)
a1(i)
b1(i)
u1(i)
⌃
m1g a1g b1g u1g
local2(i)
global2
(i)
(i-1)
(i+1)
a1(1) b1
(1) u1(1)
(i-1)(i)
(i+1)
Semi-supervisedlearningAc&velearning
Complexdecisionboundary?
Othersurrogatelabels
• Ongoingandfuturework
Reference[1]D.H.Wolpert,Stackedgeneraliza<on,NeuralNetworks.5(1992)241–259.[2]K.M.Ting,I.H.Wi?en,Issuesinstackedgeneraliza<on,J.Ar<f.Intell.Res.10(1999)271–289.[3]J.JinChen,C.ChengWang,R.RunshengWang,UsingStackedGeneraliza<ontoCombineSVMsinMagnitudeandShapeFeatureSpacesforClassifica<onofHyperspectralData,IEEETrans.Geosci.RemoteSens.47(2009)2193-2205.[4]DavidBaorto,JamesCimino,etal.Available:h?p://med.dmi.columbia.edu.Accessdate:Oct20,2016.
⌃
⌃
⌃
⌃
⌃
m1
a1
b1
u1
m1(1)
m1(i)
a1(i)
b1(i)
u1(i)
⌃
m1g a1g b1g u1g
local2(i)
global2
(i)
(i-1)
(i+1)
a1(1) b1
(1) u1(1)
(i-1)(i)
(i+1)
logistic unitsraw
features
microbiology
antibioticblood test
urine test
2. Compute Base Models
Level-1 Global Unit
Individual Level-1 Local Units
Level-1 abstractfeatures
f11
f12
f1j
f21
f2j
f31
f41
f3j
Four Example Base Models
3. Compute Meta Models (via Ensemble Learning)1. Define Feature Groups Using Medical Ontology
1a. Gather EHR data according to medical concepts
1b. Use Medical Entities Dictionary to delineate feature scopes
1c. Apply feature selection within each
concept group
3a. Per-disease ensembles:compute local level-1 models
3b. Cross-disease ensemble: compute a global
level-1 model
Global level-1 features