Learning Patient Engagement in Care Management ...Learning Patient Engagement in Care Management:...

9
Learning Patient Engagement in Care Management: Performance vs. Interpretability Subhro Das [email protected] MIT-IBM Watson AI Lab IBM Research, Cambridge, MA, USA Chandramouli Maduri [email protected] IBM T. J. Watson Research Center Yorktown Heights, NY, USA Ching-Hua Chen [email protected] IBM T. J. Watson Research Center Yorktown Heights, NY, USA Pei-Yun S. Hsueh [email protected] IBM T. J. Watson Research Center Yorktown Heights, NY, USA ABSTRACT The health outcomes of high-need patients can be substantially influenced by the degree of patient engagement in their own care. The role of care managers includes that of enrolling patients into care programs and keeping them sufficiently engaged in the pro- gram, so that patients can attain various goals. The attainment of these goals is expected to improve the patients’ health outcomes. In this paper, we present a real world data-driven method and the be- havioral engagement scoring pipeline for scoring the engagement level of a patient in two regards: (1) Their interest in enrolling into a relevant care program, and (2) their interest and commitment to program goals. We use this score to predict a patient’s propensity to respond (i.e., to a call for enrollment into a program, or to an assigned program goal). Using real-world care management data, we show that our scoring method successfully predicts patient en- gagement. We also show that we are able to provide interpretable insights to care managers, using prototypical patients as a point of reference, without sacrificing prediction performance. KEYWORDS Care management, personalization, metric learning, patient-centered care, interpretability, augment intelligence 1 INTRODUCTION Care management is a patient-centered approach to population health that is “designed to assist patients and their support systems in managing medical conditions more effectively.” [4] The aims of care management decision support (CMDS) may include: (a) identifying populations with modifiable risks, (b) aligning care management services to population needs, and (c) identifying and training personnel to deliver care management services [1]. The focus of our paper is on the first of these three aims. To improve the identification of populations with modifiable risks, the Agency for Healthcare Research and Quality (AHRQ) summarized a set of recommendations [1]. Among the recommendations was that researchers should investigate (a) the benefit of care management services to different patient segments and (b) the parameters that affect modifiable risks. With respect to the AHRQ recommendation for achieving the benefits of CMDS, understanding patient segments to drive patient engagement is an inextricable part of the equation for success. By patient engagement, we refer to “the actions individuals take to obtain the greatest benefit from the health care services available to them.” [24] In addition, it is also essential to develop methods that can identify segment-differentiating parameters that affect modifiable risk factors. These risk factors contributed to a signifi- cant portion of global disease burden, especially those with chronic conditions and those who are transitioning from one care setting to another [2, 30, 31]. The return on investment of care management depends not only on how much the patient stands to gain from a clinical perspective, but also on how likely the patient is to actively engage in care management interventions. At the same time, we recognize that, despite its high potential, the adoption of machine learning methods in CMDS scenarios has been slow. This is partly due to the gap between how humans and machines make decisions – the “black-box” nature of high- performing ML methods. To bridge the gap, there is a recent push for more studies in model explainability in AI/ML to help human decision makers understand how the insights were derived and how they can act on the data-driven insights [3, 7]. As these real-world applications need to work in production en- vironments that often do not come with clearly defined data schema, we will further discuss how to incorporate the developed methods in a Behavioral Engagement Scoring (BES) pipeline (including dy- namic feature engineering and API) to enable its applications on care management transaction records in a schema-agnostic fashion. The pipeline developed is expected to enhance decision support for care managers, by helping them prioritize their efforts based on eventual outcomes/success, but not just clinical health risk. Our methods are informed and validated using real-world care man- agement records for patients who are either transitioning from “hospital to home” or are eligible for a chronic disease management program. The rest of this paper is organized as follows. We will first discuss how the current study is positioned in the related work in CMDS application and model interpretability. Then, we will introduce the CMDS dataset used in this study, as well as the data-driven methods developed for two CMDS tasks: program enrollment and goal attainment. We will discuss the trade-off between model in- terpretability and performance observed in the development of machine learning models for identifying explainable engagement behavioral profiles and for personalized engagement scoring from real-world care management data. arXiv:1906.08339v1 [cs.LG] 19 Jun 2019

Transcript of Learning Patient Engagement in Care Management ...Learning Patient Engagement in Care Management:...

Page 1: Learning Patient Engagement in Care Management ...Learning Patient Engagement in Care Management: Performance vs. Interpretability Subhro Das subhro.das@ibm.com MIT-IBM Watson AI Lab

Learning Patient Engagement in Care Management:Performance vs. Interpretability

Subhro [email protected]

MIT-IBM Watson AI LabIBM Research, Cambridge, MA, USA

Chandramouli [email protected]

IBM T. J. Watson Research CenterYorktown Heights, NY, USA

Ching-Hua [email protected]

IBM T. J. Watson Research CenterYorktown Heights, NY, USA

Pei-Yun S. [email protected]

IBM T. J. Watson Research CenterYorktown Heights, NY, USA

ABSTRACTThe health outcomes of high-need patients can be substantiallyinfluenced by the degree of patient engagement in their own care.The role of care managers includes that of enrolling patients intocare programs and keeping them sufficiently engaged in the pro-gram, so that patients can attain various goals. The attainment ofthese goals is expected to improve the patients’ health outcomes. Inthis paper, we present a real world data-driven method and the be-havioral engagement scoring pipeline for scoring the engagementlevel of a patient in two regards: (1) Their interest in enrolling intoa relevant care program, and (2) their interest and commitment toprogram goals. We use this score to predict a patient’s propensityto respond (i.e., to a call for enrollment into a program, or to anassigned program goal). Using real-world care management data,we show that our scoring method successfully predicts patient en-gagement. We also show that we are able to provide interpretableinsights to care managers, using prototypical patients as a point ofreference, without sacrificing prediction performance.

KEYWORDSCaremanagement, personalization,metric learning, patient-centeredcare, interpretability, augment intelligence

1 INTRODUCTIONCare management is a patient-centered approach to populationhealth that is “designed to assist patients and their support systemsin managing medical conditions more effectively.” [4] The aimsof care management decision support (CMDS) may include: (a)identifying populations with modifiable risks, (b) aligning caremanagement services to population needs, and (c) identifying andtraining personnel to deliver care management services [1]. Thefocus of our paper is on the first of these three aims. To improvethe identification of populations with modifiable risks, the Agencyfor Healthcare Research and Quality (AHRQ) summarized a setof recommendations [1]. Among the recommendations was thatresearchers should investigate (a) the benefit of care managementservices to different patient segments and (b) the parameters thataffect modifiable risks.

With respect to the AHRQ recommendation for achieving thebenefits of CMDS, understanding patient segments to drive patientengagement is an inextricable part of the equation for success. By

patient engagement, we refer to “the actions individuals take toobtain the greatest benefit from the health care services availableto them.” [24] In addition, it is also essential to develop methodsthat can identify segment-differentiating parameters that affectmodifiable risk factors. These risk factors contributed to a signifi-cant portion of global disease burden, especially those with chronicconditions and those who are transitioning from one care setting toanother [2, 30, 31]. The return on investment of care managementdepends not only on how much the patient stands to gain from aclinical perspective, but also on how likely the patient is to activelyengage in care management interventions.

At the same time, we recognize that, despite its high potential,the adoption of machine learning methods in CMDS scenarios hasbeen slow. This is partly due to the gap between how humansand machines make decisions – the “black-box” nature of high-performing ML methods. To bridge the gap, there is a recent pushfor more studies in model explainability in AI/ML to help humandecision makers understand how the insights were derived andhow they can act on the data-driven insights [3, 7].

As these real-world applications need to work in production en-vironments that often do not comewith clearly defined data schema,we will further discuss how to incorporate the developed methodsin a Behavioral Engagement Scoring (BES) pipeline (including dy-namic feature engineering and API) to enable its applications oncare management transaction records in a schema-agnostic fashion.The pipeline developed is expected to enhance decision supportfor care managers, by helping them prioritize their efforts basedon eventual outcomes/success, but not just clinical health risk. Ourmethods are informed and validated using real-world care man-agement records for patients who are either transitioning from“hospital to home” or are eligible for a chronic disease managementprogram.

The rest of this paper is organized as follows.Wewill first discusshow the current study is positioned in the related work in CMDSapplication and model interpretability. Then, we will introducethe CMDS dataset used in this study, as well as the data-drivenmethods developed for two CMDS tasks: program enrollment andgoal attainment. We will discuss the trade-off between model in-terpretability and performance observed in the development ofmachine learning models for identifying explainable engagementbehavioral profiles and for personalized engagement scoring fromreal-world care management data.

arX

iv:1

906.

0833

9v1

[cs

.LG

] 1

9 Ju

n 20

19

Page 2: Learning Patient Engagement in Care Management ...Learning Patient Engagement in Care Management: Performance vs. Interpretability Subhro Das subhro.das@ibm.com MIT-IBM Watson AI Lab

2 RELATEDWORKIn this paper, we focus our investigation on implementing quanti-tative, data-driven ML methods to identify patient segments whoare most likely to engage in care management, and on care man-ager decision support for explaining for why they may or maynot engage. While a few quantitative survey-based methods formeasuring patient engagement exist [12, 13], no prior studies havesuccessfully measured patient engagement from real-world data.

Moreover, although prior studies have attempted to identify riskstratification and disease progression parameters that differenti-ate clinical risk and longer-term outcomes [21, 23], to our knowl-edge, this is the first study that has proposed to learn engagementstrategies from data, based on the understanding of quantifiabledifference of patient engagement levels and segment-differentiatingparameters that affect modifiable risk factors.

To bridge the gap between human and machine understandingin the context of CMDS, this study has also explored the poten-tial effect of providing explainable insights on the performance ofour models. Recent reviews [14, 19] have shown that a majorityof studies in model interpretability are tied to the optimization ofcertain model properties that are presumed to be beneficial for im-proving human understanding of the models. Among them, manystudies focused on reducing model complexity. Examples includeapplying regularization operators to reduce the number of param-eters [9], restricting policy search to a subspace of simpler forms[15, 27], bringing semantically similar items together [8], re-trainingeasier-to-understand models to classify on the results obtained byblack-box models for model-agnostic explanation [25].

To make AI/ML more “actionable” for health decision makers (ei-ther health professionals or patients themselves), human-computerinteraction researchers have been conducting qualitative studiesto identify interpretability-impeding confounders and understandindividual differences [26]. With the emergence of deep learningapproaches in AI/ML, more studies are now learning patterns thatcan be represented in explicitly presentable formats (e.g., tempo-ral visualization [6], natural language rationalization explanations[20]), as well as developing interactive tools to untangle modelslearned in a high-dimensional space [11].

To further differentiate engagement strategies from CMDS data,in this paper we particularly focus onmethods that account for case-based reasoning to improve the interpretability of clusteringmodels,e.g., selecting prototypical cases to represent the learned clusters[18]. In particular, we incorporate locally supervisedmetric learning[28] and prototypical case-based reasoning in a machine learningmodel to identify explainable engagement behavioral profiles andto produce personalized engagement scores.

The main contributions of our paper are as follows: First, wepresent a quantitative and personalized approach to identifyingpatients who are more likely to engage in care management, anddemonstrate empirically, using real-world data, that our methodsprovide more accurate engagement behavior predictions comparedto a ”one-size-fits-all” population approach; Second, these insightsare made explainable by identifying prototypical patients within apersonalized patient segment; we show that, in our case, explain-ability does not come at the expense of model performance.

Figure 1: Care Management flow.

3 DATA3.1 Care Management Decision SupportFor patients with complex care needs, it is important to coordinateacross the patients’ care givers and providers to account for thediffering advice received from clinicians, the varying medications,and the adverse drug events [22]. In practice, this is often achievedby implementing structured care programs, in which a predeter-mined set of rules are given to care managers to coordinate withpatients and bridge care gaps between hospital and home. Caremanagement history, therefore, captures the important transactionsbetween care managers and patients during the care coordinationprocess, and is an important and growing source of data for behav-ioral understanding.

The CMDS workflow from which this dataset was derived isdepicted in Fig. 1. At the center of this figure is the Care Manager(e.g., a licensed nurse, social worker or other certified specialist),who attempts to engage the patient, typically via the telephone, andwhose primary objective is to influence modifiable and prioritizedrisk factors, as identified by the Patient’s engagement strategy. TheCare Manager receives her assigned pool of patients from the Qual-ity Director, whose primary objective is to align care manager skillswith patient needs, and to determine the appropriate care strategies.Finally, the Patient responds to the Care Manager’s feedback andcoaching and may provide his/her own input on the goals to beset and how to achieve them. The interactions between the caremanager and patient are captured in both structure and unstruc-tured format. In the current study, we use only the structured datacontained in the care management transaction records.

3.2 Care Management RecordsWe apply our method to care program logs of a private, not-for-profit healthcare network, including 4,504 transition of care and440 chronic care patient interactions over a 22-month period. Thoseprogram engagement records were collected between December2015 and October 2017. For each patient engagement timeline, weextracted 53 features ranging from the basic demographic informa-tion (e.g., age, gender), to the patients’ care program context (e.g.,program experience, whether the patient enrolled in the program,

2

Page 3: Learning Patient Engagement in Care Management ...Learning Patient Engagement in Care Management: Performance vs. Interpretability Subhro Das subhro.das@ibm.com MIT-IBM Watson AI Lab

Figure 2: Program enrollment timeline

Figure 3: Goal attainment timeline.

days in the program, number of days until completion of the pro-gram) and the interactions between care managers and patients(e.g., the date when the recorded call occurred).

We then prepared datasets with respect to the two realworldtasks we aim to apply the BES pipeline for decision support: pro-gram enrollment (“ENROLL”) and goal attainment (“GOAL”).

3.3 Program EnrollmentTable 1 summarizes the CM records used to generate the ENROLLdataset from all patients with different enrollment status for eachassigned care program. The type of assigned care programs in-clude those transitioning from hospital to home after being dis-charged from the hospital ("Transition") and those programs in-volving chronic disease management process ("Involve").

Status Involve (Chronic Care) Transition

Completed Program 30.00% 67.30%DidNotEnroll 4.09% 5.71%Disenrolled 42.27% 25.95%Enrolled 23.64% 1.04%

Table 1: Enrollment status of patients.

The structural CM transaction records capture the dates when acare program is assigned, started and ended. These records also con-tain the indicators of whether the care program is completed withits goals attained, or ended pre-maturely. The assigned programtakes 16 days to complete on average and 279 days to complete atthe maximum. The structural CM records of program enrollmentalso show that enrolling into the programs on average takes oneday and a maximum of 15 days. Fig 4 summarizes the programenrollment status based on the day of making the recorded call tothe target patient. We observed most of the decisions are made inthe calls made in the starting of the week.

Figure 4: Enrollment distribution over the week.

3.4 Goal AttainmentThe GOAL dataset is composed of 28 different goals which weclassified into six focus areas: Educational (e.g., demonstrates un-derstanding of post discharge, diabetes education), Implementa-tion (e.g., adequate functional, transportation , support for healthycoping), Medications (e.g., adherence with medication regimen),Reducing Risks (e.g., resolving care gaps), Self Care (e.g., under-stands benefits of/demonstrates being physically active, healthydiet needs, failure of symptoms management), and Other (e.g., ef-fective care transition and management plans). Fig 5 summarizethe goals assigned based on the age category.

Figure 5: Goal assignment across different focus areas.

Every patient has multiple goals to achieve: 90% of the patientshave fewer than 3 goals, and 65% patients have fewer than 2 goals.The goal attainment status is indicated by a binary flag (i.e., 1 formeeting the goal and 0 for otherwise). Table 6 summarizes the goalattainment percentage (as indicated by the status shown in the CMrecords) for each goal focus area, i.e., the number of goals whosestatus has been shown as ’met’ divided by the total number of goalsassigned for each key area.

The interventions of each goal area are grouped into seven cate-gories: Referral (e.g., referral to see a nutritionist for diabetes dieteducation), Education (e.g., educate patients on the importanceof physical activity), Coordination (e.g., follow up with providerson refills), Screening (e.g., assess breathing symptoms), Coaching(e.g., provide a log for side effect recording), and Other (includingfollowing up with provider treatment).

3

Page 4: Learning Patient Engagement in Care Management ...Learning Patient Engagement in Care Management: Performance vs. Interpretability Subhro Das subhro.das@ibm.com MIT-IBM Watson AI Lab

Focus Areas Coaching Coordination Education Referral Screening Tracking Other Total Status Met

Educational 138 18 277 100 65 0 23 621 81.62%Implementation 3 192 4 7 0 0 0 206 98.54%Medications 7 96 30 7 90 0 10 240 84.91%Reducing Risks 0 1 0 545 53 0 1 600 97.00%Self-care 130 0 189 12 116 0 49 496 78.69%Other 29 0 2561 14 29 12 19 2644 99.19%

Total 307 307 3061 685 353 12 102 4827

Figure 6: Goal attainment records of patients distributed across focus areas & intervention categories.

4 BEHAVIORAL ENGAGEMENT SCORINGPIPELINE

In this paper, we aim to address two key questions: (1) What are thepatient segments that lead to the difference of engagement benefitsof CM services? (2) What drive differential behavioral responsesin CMDS? To answer these questions, we develop a BehavioralEngagement Scoring (BES) pipeline to identify the engagementoutcome-differentiating factors in care management. The task ofengagement scoring serves as a key step in the pipeline to quantifypatient engagement tendency for care plan personalization anddownstream decision support.

Specifically, the BES pipeline is composed with four components:1) dynamically extract behavioral features and outcomes basedon care management transaction records, 2) apply engagementoutcome-driven feature transformation through locally superviseddistance metric learning, 3) uncover distinctive patient segmentsand their behavioral profiles (including prototypical users) based onhierarchical clustering, and 4) learn a BES scorer for each behavioralprofile based on a Generalized Linear Model (GLM) to estimate thepropensity to respond, e.g., whether a patient is inclined to enrollin a certain program, or complete a goal, given the interventionassigned by his/her care manager.

4.1 Dynamic Feature EngineeringIn addition to the main research goal of answering the two keyquestions, another developmental goal of BES is to enable scalableand flexible engagement scoring over incoming provider data soas to assist in the devising of engagement strategies for real-lifeCMDS tasks in a production environment. To support this develop-mental goal, the component of dynamic feature engineering pro-vides a common data model and standard run-time that supports astandards-based analytics environment so as to allow feature gen-eration rules to be written once and used repeatedly with differentprovider data sources.

Feature engineering is the process that converts raw data intoexplanatory factors. These factors are used in the BES pipeline totrain engagement scoring models. A multitude of knowledge-basedand data-driven approaches are available for feature generation.On one hand, knowledge-based features can be generated fromthe literature on related topics. On the other hand, data-drivenapproaches are applied to convert elements of the raw data intofeatures, and then to train models for understanding which featuresare the most important in gauging patient engagement level.

In this study, we adopt a hybrid knowledge-augmented, data-driven approach to perform feature engineering. To save time ofmanual feature generation process and enable better model gener-alizability across CM data from different providers, we further au-tomate the BES pipeline to generate features in a provider-agnosticfashion, through implementing a back-end logic module with em-bedded rules that contain knowledge-based rules for convertingfeatures from data based on a universal schema coded in configura-tion files.

The dynamic feature generation process has further resultedover 700 features. This component has also employed an automaticfeature selection procedure based on L1 and L2-based regulariza-tion.

4.2 Engagement Outcome-driven DistanceLearning

The primary motivation to learn an engagement outcome-drivendistance metric is to project patient feature-based vectors ontoa subspace wherein patients with similar engagement outcomesare closer to each other, whereas those with opposite engagementoutcomes are far away from each other. In this transformed vec-tor subspace, we could then further identify cohesive behavioralprofiles (using clustering) that drive differential patient responses.

In this study, we adapt Locally Supervised Metric Learner (LSML)[28, 29], which helped estimate the engagement outcome-adjusteddistances among patients in the newly transformed vector subspace.The patient features extracted using the protocols mentioned in Sub-section 4.1 are represented as X ∈ Ω, where Ω is the vector space,and the class labels Y ∈ 0, 1. For the task of program enrollment,the outcome variable is y = 1, if the status is ”Enrolled” or ”Com-pleted”, and, y = 0, if the status is "DidNotEnroll" or "Disenrolled".Similarly, for the task of goal attainment, the outcome variable isy = 1, if the status is ”Met”, and,y = 0, otherwise. Considering thereare n program enrollment records and d-dimensional features, thenthe feature matrix is X = [x1, · · · ,xi , · · · ,xn ] ∈ Rn×d . A similaroperational definition is also applicable to goal attainment.

Here, we consider a generalizedMahalanobis distance,dΣ(xi ,x j

),

dΣ(xi ,x j

)=((xi − x j )T Σ(xi − x j )

) 12 (1)

4

Page 5: Learning Patient Engagement in Care Management ...Learning Patient Engagement in Care Management: Performance vs. Interpretability Subhro Das subhro.das@ibm.com MIT-IBM Watson AI Lab

where, Σ ∈ Rd×d is a positive semi-definite matrix. We aim tominimize the following distance J over the matrix Σ:

J =n∑i=1

©­«∑

j :x j ∈Noi

d2Σ(xi ,x j

)−

∑k :xk ∈Ne

i

d2Σ (xi ,xx )ª®¬ (2)

where,Noi , the homogeneous neighborhood of xi , is the |No

i | near-est data points of xi with same outcome, and, Ne

i , the heteroge-neous neighborhood of xi , is the |Ne

i | nearest data points of xi withopposite outcomes. Since Σ is positive semi-definite and symmetric,it can be decomposed as Σ = WTW . TheW ∗ that minimizes (2),renders the data into the desired space, where records with similaroutcomes are compact and those with opposite outcome are distant,

W ∗ = arg minW :Σ=W TW

J . (3)

Refer to [29] for the complete LSML algorithm that derivesW ∗ ∈Rd×d , the feature transformation matrix. We employedW ∗ to ob-tain the projected feature set, X ∈ Ω,

xi =W∗xi , ∀i . (4)

For the rest of the pipeline, we leverage the outcome-adjustedprojection of features (4) and the corresponding Mahalanobis dis-tance (1) between patient-based vectors to learn patient segmentsand to estimate each patient’s propensity to respond to care man-agers’ interventions.

4.3 Learning Patient Segment-based BehavioralProfiles

Because we hypothesize that their exists patient segments wherewithin each segment they tend to exhibit certain levels of simi-larity, we aim to capture that similarity into behavioral profilesand understand engagement-indicative patterns with respect tothe profiles they fit in. However, it is not known "a priori" what isthe optimal number of patient segments to be clustered into. Withthat objective and constraint in mind, hierarchical clustering [10]is employed to identify patient segments on the outcome-adjusteddistances in the newly projected vector subspace and learn the keyfactors that drive the differential engagement outcomes.

Among a variety of linkage methods for hierarchical clustering,we choose Complete Linkage to compute inter-segment similar-ity dCL (G1,G2) of the furthest pair from segments, say G1 and G2,as,

dCL (G1,G2) = maxxi ∈G1,x j ∈G2

dΣ∗(xi ,x j

), Σ∗ =W ∗TW ∗ (5)

= maxxi ∈G1, x j ∈G2

deuclidean(xi , x j

). (6)

Experimentation with other linkage methods, e.g., Ward’s method,further confirms that the Complete Linkage method uncovers pa-tient segmentation leading to the highest engagement outcome-differentiating power (as measured by ANOVA scores across seg-ments). The remaining challenge is thus to determine the num-ber of segments. As such, an automatic tuning algorithm, Elbowmethod [17], is applied to compute the optimal number of seg-ments. The Elbowmethod tracks the acceleration of distance growthamong segments and thresholds the agglomeration at the pointwhere the acceleration is the highest. Hence, our population is

clustered into k∗ segments, C = C1, · · · ,Ck∗ , each capturingdistinctive patterns that drive differential patient responses in en-gagement.

In addition, as we expect it to be easier to interpret patient needfrom behavior profiles by examples, we propose to identify proto-typical patients in each of the patient segments. The prototypical pa-tient cases in each segment is expected to serve as examples to show-case the distinctive patterns in their behavioral profile. This willhelp interpret the engagement scores output by the BES pipeline.The prototypical patient cases, pUk , are defined as the p = |pUk |subjects with positive engagement outcome, i.e.,y = 1, who are clos-est to the centroid of each patient segment Ck in the engagementoutcome-adjusted vector subspace,

pUk = arg minS : S ⊂Ck , |S |=p, i :yi=1∀yi ∈S

∑xi ∈S

(xi − µCk

)T (xi − µCk

),

∀k = 1, · · · ,k∗, (7)

where, µCk is the centroid of the patient segment Ck . In our analysis,we have chosen p = 20. The advantages of having a prototypicalcase-based component in the pipeline is illustrated in Figure 9 usinga synthetic set of 2-D data.

The advantages of using prototypical patient cases include: (a)removing model training noise due to the ambiguous cases nearthe segment borders, and (b) improving computation efficiency asit takes significantly less run-time to update the models learned ona significantly reduced set of data.

4.4 Estimating propensity to respondFor each of the patient segments projected on the transformed vec-tor subspace, we learn a separate generalized linear model (GLM)to compute engagement scores for each patient in that segmentwho has been assigned a program to enroll or a goal to attain. Theengagement scores are the estimation of each patient’s propensityto respond to his/her care manager’s engagement calls or interven-tions. The GLM for each segment k is represented by

yi = βk0 + βk1 x1i + · · · + β

kd xdi , ∀xi ∈ Ck , (8)

where the feature weights βk= [βk0 , · · · , β

kd ] are computed by

minimizing the least squared errors over all the data points from thesegment k . Using the optimized feature weights for each segment,the propensity to respond is estimated for each patient based onhis/her features. We then use the computed engagement scoresto train a Support Vector Machine (SVM) classifier to predict theengagement outcome of each patient. The feature weights of theGLM also provide us more explainable insights specific to eachpatient segment and to the patients belonging to that segment.

5 RESULTS & DISCUSSIONIn this study, we introduce the Behavioral Engagement Scoringpipeline to gauge patient engagement level based on patient seg-mentation and identify distinctive patterns driving differential re-sponses to engagement. The pipeline is designed to (1) uncoverpatient segments that lead to the difference of engagement benefitsof care management services, (2) identify behavioral profiles thatdrive differential engagement responses, and (3) enable scalable

5

Page 6: Learning Patient Engagement in Care Management ...Learning Patient Engagement in Care Management: Performance vs. Interpretability Subhro Das subhro.das@ibm.com MIT-IBM Watson AI Lab

Method Prediction Program Enrollment Goal

Performance Attai

Metrics nment

Involve Transition(ChronicCare)

PopulationBased(BASE-LINE)

Accuracy 0.96 0.96 0.89Precision 0.95 0.94 0.93Recall 0.99 1 0.95F1 0.97 0.97 0.94

BehaviorProfile-Driven

Accuracy 0.93 0.74 0.79Precision 0.95 0.94 0.92Recall 0.96 0.65 0.83F1 0.95 0.76 0.87

PrototypicalUser-Driven

Accuracy 0.93 0.74 0.95Precision 0.95 0.93 0.99Recall 0.96 0.65 0.96F1 0.95 0.76 0.97

Table 2: Performance comparison with 5-fold cross-validation.

and flexible engagement scoring in a production environment forreal-life care management tasks at each touch point.

The BES pipeline first segment patients based on the patterns ex-hibited during patient-CM interactions and engagement outcomes.Our hypothesis is that although each feature contains only weaksignals to differentiate overall engagement outcomes, when con-sidered collectively in a segment, the combined feature sets canexplain rich engagement behaviors for care planning.

Take the task of program enrollment for example. The BESpipeline first identifies a group of five patient segments from theENROLL dataset, each of which is found to associate with onebehavioral profile of distinctive engagement characteristics. Eachsegment is then exemplified by incorporating information about theprototypical patient cases (as defined as a subset of top 20 patientcases that are the most representative of the identified segment).The BES pipeline also identifies segment-specific interaction pat-terns that are specific to program enrollment behaviors for furtherinterpretation and trains GLM models for predicting engagementoutcomes. The same is then repeated for training the BES pipelinefor the task of goal attainment from the GOAL dataset.

5.1 Performance evaluation on engagementoutcome prediction

To evaluate the performance of the BES pipeline, the predictedengagement outcome of each patient task is compared with whatactually happened as indicated in the care management transactionrecords. The results across all patient tasks are then aggregated toevaluate the overall performance of models in terms of precision,recall, accuracy and F1-score.

Two versions of the BES pipeline are evaluated to understandthe trade-off between model performance and interpretability. Thefirst "Behavioral Profile-Driven" version trains engagement scoringmodels for each of the patient segments using all available data

Figure 7: Interpretable engagement insights for Care Man-agers for shared decision making.

belonging to patients in that segment. The second "PrototypicalUser-Driven" version trains models using only the prototypicalpatient cases. The performance of the two BES pipelines are alsocompared with the baseline condition, wherein the "Population-Based" version trains a SVM classification model for engagementoutcome prediction using all available data without differentiatingpatient segments.

Performance evaluation with 5-fold cross validation is shownin Table 2. Results show that the BES pipeline yields engagementresponse prediction models of high precision, which implies a highpercentage of successful engagements if following the BES recom-mendations for prioritization. It helps predict patient responses(“whether to engage") for each type of engagement tasks with highprecision (>90%).

We also explore the potential effect of providing explainableinsights on the performance of our models. The precision-basedperformance metrics of both the Behavioral Profile-Driven and Pro-totypical User-Driven version are comparatively similar or betterthan the BASELINE, e.g., training engagement scoring models fromthe entire population data. The results are encouraging as our pro-posed BES solution produces more explainable insights based onpatient segments and prototypical patient cases, without sacrificingon model performance.

5.2 Drivers of Differential Patient ResponseFor each of the two care management tasks, we identify five behav-ioral profiles for tailoring care management strategies of patientengagement and surface insights for each target patient based onthe behavioral profile of his/her closely related patient segment. Toachieve this, we analyze feature weights of the model trained foreach segment to pinpoint drivers that contribute the most to en-gagement outcome prediction. This is for generating interpretationsfor Care Managers to understand the rationale of the predictionsoffered by the behavioral engagement scorer in the pipeline.

Fig 8 demonstrates the variability of differential patient responsedrivers across the different behavioral profiles for goal attainment.The feature rankings are significantly different when comparedthe population-level feature rankings with those among patientsegments (as indicated by Spearman’s coefficient ρ; p < 0.01). Mostof the patient segments exhibit a complex pattern of behavioralresponse than the population-level ones.

6

Page 7: Learning Patient Engagement in Care Management ...Learning Patient Engagement in Care Management: Performance vs. Interpretability Subhro Das subhro.das@ibm.com MIT-IBM Watson AI Lab

Figure 8: Feature weights that indicate patient response driver across the five different behavioral profiles for goal attainment.

In some patient segments, we observe strong indicators of caremanager influence, e.g., how long the care manager (CM) beentrying to help this patient obtain this goal, the number of attemptsbefore goal attainment or before the CM decided to close a goal. Insome other patient segments, we observe that certain goals yieldmore positive engagement responses than the others. For example,in the first and 3rd segment, Self-care and Educational goals aremore likely to be attained. The opposite has also been observed insome segments. For example, in the second segment, Medication-related goals are less likely to engage patients in this segment toattain. Moreover, call context does matter in some segment. Forexample, for patients in the 4th segment, calling on Tuesday wouldbe more likely to help yield better engagement responses.

It is worthy to note that although we do include age and genderas a part of the features in the analysis. Results show that patientdemographics matter less than expected, yielding only neutral con-tribution to the modeling of engagement response prediction.

5.3 Interface and API for Shared DecisionMaking based on Engagement Insights

The BES pipeline also includes a web-based user interface that pro-vides access to explainable insights about the gauged engagementlevel and the predicted response. In addition, the pipeline derivesbest practice insights using the example illustrated by the proto-typical patient cases in each segment. Care managers can use theinterface to learn about the target patients and their related patientsegments for prioritization and best practice learning.

Fig 7 shows the interactive tooling based on a demo API cus-tomized for care managers to make their decisions before the callregarding which patients to call first for program enrollment andfor goal attainment, and what are the reasons that they might ormight not be engaged.

6 CONCLUSION & FUTUREWORKThemain contributions of our paper are as follows: First, we presenta quantitative and personalized approach to identifying patients

who are more likely to engage in care management, and demon-strate empirically, using real-world data, that our methods pro-vide more accurate engagement behavior predictions comparedto a “one-size-fits-all” population-based approach; Second, theseinsights are made explainable by identifying prototypical patientcases within a personalized patient segment. Performance evalua-tion results show that, in our case, explainability does not come atthe expense of model performance.

Analyzing observational transaction data of care managementinteraction logs regarding to clinical factors only overlooks animportant part of equation leading to better outcomes. It is expectedthat segment-level incidence rates might result in biased estimatesof the effect of interventions. Applying the BES pipeline, whichproperly adjusts for individual and patient segment information,enables a more accurate estimate of engagement effect and supportcare management decision-making and in a shared decision-makingscenario. The quantification of heterogeneous engagement effectsin patient segments goes beyond the existing care quality metrics toadd another perspective of behavioral understanding to providersusing care management programs.

As for the issue of bridging the gap between human and ma-chine decision making, simply optimizing model properties is notsufficient to warrant actions from health decision makers [5]. Moreresearch is needed to identify additional evaluation metrics that canserve as a proxy measure of the performance in real-life user tasks[16]. This line of research can help evaluate how to make senseof the models and analytical results in order to support decisionmakers’ actions, as well as how to validate the derived insights di-rectly to automate decisions. Doing this is a truly interdisciplinarywork and expected to enhance future milestones to develop toolsfor creating deployment and feedback process and aligning withthe need of generating real-world evidence on best practice.

REFERENCES[1] Agency for Healthcare Research and Quality. 2015. Implications for Medical

Practice, Health Policy, and Health Services Research. Technical Report. Rockville,MD, USA.

7

Page 8: Learning Patient Engagement in Care Management ...Learning Patient Engagement in Care Management: Performance vs. Interpretability Subhro Das subhro.das@ibm.com MIT-IBM Watson AI Lab

Figure 9: Pipeline evaluation using prototypical patient cases.

[2] Randall S Brown, Deborah Peikes, Greg Peterson, Jennifer Schore, and Carol MRazafindrakoto. 2012. Six features of Medicare coordinated care demonstrationprograms that cut hospital admissions of high-risk patients. Health Affairs 31, 6(2012), 1156–1166.

[3] Rich Caruana, Yin Lou, Johannes Gehrke, Paul Koch, Marc Sturm, and NoemieElhadad. 2015. Intelligible Models for HealthCare. In Proceedings of the 21th ACMSIGKDD International Conference on Knowledge Discovery and Data Mining - KDD’15. ACM Press, New York, New York, USA, 1721–1730. https://doi.org/10.1145/2783258.2788613

[4] Center for Health Care Strategies. 2007. Care Management Definition and Frame-work. Technical Report.

[5] Jonathan H. Chen and Steven M. Asch. 2017. Machine Learning and Prediction inMedicine âĂŤ Beyond the Peak of Inflated Expectations. New England Journal ofMedicine 376, 26 (jun 2017), 2507–2509. https://doi.org/10.1056/NEJMp1702071

[6] Edward Choi, Mohammad Taha Bahadori, Joshua A. Kulas, Andy Schuetz, Wal-ter F. Stewart, and Jimeng Sun. 2016. RETAIN: An Interpretable PredictiveModel for Healthcare using Reverse Time Attention Mechanism. (aug 2016).arXiv:1608.05745 http://arxiv.org/abs/1608.05745

[7] DARPA. [n. d.]. ARPA explainable AI Program. Retrieved January 28, 2019 fromhttps://www.darpa.mil/program/explainable-artificial-intelligence

[8] Sanjoy Dey, Kelvin Lim, Gowtham Atluri, Angus MacDonald, Michael Steinbach,and Vipin Kumar. 2012. A pattern mining based integrative framework forbiomarker discovery. In Proceedings of the ACM Conference on Bioinformatics,Computational Biology and Biomedicine - BCB ’12. ACM Press, New York, NewYork, USA, 498–505. https://doi.org/10.1145/2382936.2383000

[9] Jacob Feldman. 2000. Minimization of Boolean complexity in human conceptlearning. Nature 407, 6804 (oct 2000), 630–633. https://doi.org/10.1038/35036586

[10] Jerome Friedman, Trevor Hastie, and Robert Tibshirani. 2001. The elements ofstatistical learning. Vol. 1. Springer series in statistics New York, NY, USA:.

[11] google. [n. d.]. Google AI experiment: high-dimensional space. Re-trieved January 28, 2019 from https://experiments.withgoogle.com/ai/visualizing-high-dimensional-space

[12] Barello S. Bonanomi A. Graffigna, G. and E. Lozza. 2015. Measuring patientengagement: development and psychometric properties of the Patient HealthEngagement (PHE) Scale. Frontiers in psychology 6, 274 (2015).

[13] Stockard J. Mahoney E. R. Hibbard, J. H. and M. Tusler. 2004. Development of thePatient Activation Measure (PAM): conceptualizing and measuring activation inpatients and consumers. Health services research 39, 4 Pt 1 (2004), 1005–26.

[14] Pei-Yun S. Hsueh, S. Dey, S. Das, and T. Wetter. 2017. Making sense of patient-generated health data for interpretable patient-centered care: The transition from"More" to "Better". Vol. 245. https://doi.org/10.3233/978-1-61499-830-3-113

[15] Xinyu Hu, Pei-Yun S Hsueh, Ching-Hua Chen, Keith M Diaz, Ying-Kuen KCheung, and Min Qian. 2017. A First Step Towards Behavioral Coaching forManaging Stress: A Case Study on Optimal Policy Estimation with Multi-stage

Threshold Q-learning. AMIA ... Annual Symposium proceedings. AMIA Sympo-sium 2017 (2017), 930–939. http://www.ncbi.nlm.nih.gov/pubmed/29854160http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=PMC5977571

[16] Ravi Karkar, Jasmine Zia, Roger Vilardaga, Sonali RMishra, James Fogarty, Sean AMunson, and Julie A Kientz. 2015. A framework for self-experimentation inpersonalized health. Journal of the American Medical Informatics Association(2015).

[17] David J Ketchen and Christopher L Shook. 1996. The application of clusteranalysis in strategic management research: an analysis and critique. Strategicmanagement journal 17, 6 (1996), 441–458.

[18] Been Kim, Cynthia Rudin, and Julie A. Shah. 2014. The Bayesian Case Model: AGenerative Approach for Case-Based Reasoning and Prototype Classification. ,1952–1960 pages.

[19] Himabindu Lakkaraju, Stephen H. Bach, and Jure Leskovec. 2016. InterpretableDecision Sets. In Proceedings of the 22nd ACM SIGKDD International Conferenceon Knowledge Discovery and Data Mining - KDD ’16. ACM Press, New York, NewYork, USA, 1675–1684. https://doi.org/10.1145/2939672.2939874

[20] Tao Lei, Regina Barzilay, and Tommi Jaakkola. 2016. Rationalizing Neural Predic-tions. (jun 2016). arXiv:1606.04155 http://arxiv.org/abs/1606.04155

[21] Bin Liu, Ying Li, Zhaonan Sun, Soumya Ghosh, and Kenney Ng. 2018. EarlyPrediction of Diabetes Complications from Electronic Health Records: A Multi-Task Survival Analysis Approach. AAAI (2018). https://www.semanticscholar.org/paper/Early-Prediction-of-Diabetes-Complications-from-A-Liu-Li/28dec33fc71b9139e7e1b6c4a1b32b2947c53176

[22] Peter V Long. 2017. Effective Care for High-need Patients: Opportunities for Im-proving Outcomes, Value, and Health. National Academy Of Medicine.

[23] Yen-Fu Luo and Anna Rumshisky. [n. d.]. Interpretable Topic Features for Post-ICU Mortality Prediction. ([n. d.]). http://www.cs.uml.edu/~arum/publications/YFLuo_AMIA_2016.pdf

[24] Michigan Care Management Resource Center Home. [n. d.]. Patient Engagement.Retrieved January 28, 2019 from https://micmrc.org/topics/patient-engagement-0

[25] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. "Why Should ITrust You?". In Proceedings of the 22nd ACM SIGKDD International Conference onKnowledge Discovery and Data Mining - KDD ’16. ACM Press, New York, NewYork, USA, 1135–1144. https://doi.org/10.1145/2939672.2939778

[26] James M. Robins, Andrea Rotnitzky, and Lue Ping Zhao. 1994. Estimation ofRegression Coefficients When Some Regressors Are Not Always Observed. J.Amer. Statist. Assoc. 89, 427 (sep 1994), 846. https://doi.org/10.2307/2290910

[27] Manu Sridharan and Gerald Tesauro. 2002. Multi-agent Q-learning and Regres-sion Trees for Automated Pricing Decisions. Springer, Boston, MA, 217–234.https://doi.org/10.1007/978-1-4615-1107-6_11

[28] Jimeng Sun, Daby Sow, Jianying Hu, and Shahram Ebadollahi. 2010. Localizedsupervised metric learning on temporal physiological data. In Pattern Recognition(ICPR), 2010 20th International Conference on. IEEE, 4149–4152.

8

Page 9: Learning Patient Engagement in Care Management ...Learning Patient Engagement in Care Management: Performance vs. Interpretability Subhro Das subhro.das@ibm.com MIT-IBM Watson AI Lab

[29] Jimeng Sun, Fei Wang, Jianying Hu, and Shahram Edabollahi. 2012. Supervisedpatient similarity measure of heterogeneous patient records. ACM SIGKDDExplorations Newsletter 14, 1 (2012), 16–24.

[30] Jaakko Tuomilehto, Jaana Lindström, Johan G Eriksson, Timo T Valle, HelenaHämäläinen, Pirjo Ilanne-Parikka, Sirkka Keinänen-Kiukaanniemi, Mauri Laakso,Anne Louheranta, Merja Rastas, et al. 2001. Prevention of type 2 diabetes mellitusby changes in lifestyle among subjects with impaired glucose tolerance. New

England Journal of Medicine 344, 18 (2001), 1343–1350.[31] Jeffrey A West, Nancy H Miller, Kathleen M Parker, Deborah Senneca, Ghassan

Ghandour, Mia Clark, George Greenwald, Robert S Heller, Michael B Fowler, andRobert F DeBusk. 1997. A comprehensive management system for heart failureimproves clinical outcomes and reduces medical resource utilization. AmericanJournal of Cardiology 79, 1 (1997), 58–63.

9