Using Visual Analytics to Interpret Predictive …...Using Visual Analytics to Interpret Predictive...

Using Visual Analytics to Interpret Predictive Machine Learning Models

Josua Krause [email protected]

New York University Tandon School of Engineering, Brooklyn, NY 11201 USA

Adam Perer [email protected]

IBM T.J. Watson Research Center, Yorktown Heights, NY 10598 USA

Enrico Bertini [email protected]

New York University Tandon School of Engineering, Brooklyn, NY 11201 USA

AbstractIt is commonly believed that increasing the inter-pretability of a machine learning model may de-crease its predictive power. However, inspectinginput-output relationships of those models usingvisual analytics, while treating them as black-box, can help to understand the reasoning behindoutcomes without sacrificing predictive quality.We identify a space of possible solutions andprovide two examples of where such techniqueshave been successfully used in practice.

1. IntroductionThere is growing demand among data scientists to generateand deploy predictive models in a variety of domains sothat the patterns unearthed from massive amounts of datacan be leveraged and converted into actionable insights.Predictive modeling is defined as the process of developinga mathematical tool or model that generates an accurateprediction (Kuhn & Johnson, 2013). Data scientists oftenturn to machine learning, where the goal is to create pre-dictive models based on information automatically learnedfrom data with ground truth. However, these machinelearning techniques are often black-boxes and may beselected based only on performance metrics such as highaccuracy scores, and not necessarily based on the inter-pretability and actionable insights of the model.

Recently, we have worked on a variety of techniques tomake predictive models more interpretable by bringinghumans-in-the-loop through visual analytics. In this paper,we provide initial reflections on interpretability and the rolevisual analytics plays in this space based on our experience

2016 ICML Workshop on Human Interpretability in MachineLearning (WHI 2016), New York, NY, USA. Copyright by theauthor(s).

building such solutions. We first reflect on the role andmeaning of interpretation and model transparency, then wereflect on the role of visual analytics in this space, thenwe describe two visual analytics systems we developed formodel interpretation, and finally we conclude by proposinginteresting questions for further research.

2. Why and when is interpretation needed?It is important to start this discussion by clarifying thatinterpretation may not always be necessary in machinelearning. There are plenty of situations in which buildinga model, testing and refining it, and finally putting it inproduction is absolutely appropriate (e.g., chess playingor face recognition). It is also important to point out thatinterpretation is necessarily a human activity and, as such,it may be costly and error prone. When, then, do we needhuman interpretation? Why do we need to involve humansin the machine learning process? Without loss of gener-ality, we have identified three main opportunities/needs inwhich interpretability is a highly desirable feature:

1. Data understanding and discovery. Machine learn-ing is typically used as a tool to make predictions but itdoes not have to be used exclusively for this purpose.Machine learning models can also be used as a wayto help understand and observe complex realities inproviding abstractions that can be used by humans toenable interpretation and discovery.

2. Trust building and accountability. Some mission-critical situations, e.g., when models make importantdecisions about human beings, it is important to havea better understanding of what a model does and why.Increasing model interpretation and transparency canplay a role in increasing trust and accountability.

3. Model comparison and diagnostics. Model devel-opers often need to understand where models fail

106


Figure 1. Debugging model performance using partial depen-dence. Instead of a direct relationship between higher Glucose labvalues (x-axis) and higher risk scores (y-axis) the model predictsa low risk for average Glucose lab values. The histogram belowthe plot, showing the distribution of the values found in the inputdata, indicates that most patients have the average value. Sincemissing values are imputed using the average Glucose value thevalley in the plot can be explained by the outcome independenceof this value due to the high number of missing values.

to make correct decisions and also how multiplemodels differ. This is another circumstance in whichinterpretability can play a positive role.

It is important to notice that many subjective propertiesplay a role in model interpretation: e.g., plausibility,comprehension, trustworthiness, actionability.

3. Model Transparency, Representation, andInterpretability

While a machine learning technique is often defined asbeing more or less interpretable, it is important to pointout that it is hard to assign a level of interpretability toa whole class of techniques. For instance, classificationrules are considered as a highly interpretable method but:how does one interpret the results of classification ruleswhen the number of rules is very high (e.g., in the order ofhundred or even thousands)? Similarly, neural networks,are commonly regarded as having low interpretability, yet,recent visualization methods allows to look into some ofthe internal decisions the network makes which increasethe interpretability of the model (Yosinski et al., 2015; Xuet al., 2015; Zeiler & Fergus, 2013).

We therefore propose to make the following distinctions.First, we propose to distinguish between model structureand model representation. As we will explain with our ex-amples, it is possible through visual analytics to representa model even by not having access to its internal logic or

Figure 2. Identifying changes to features that reduce the risk of ahigh risk patient. The features are sorted by decreasing impact tomake large sets of features manageable. The background color ofeach feature indicates the predicted risk for this value. Impossiblevalues, if known, appear grayed out and the slider snaps back tothe last possible value if selected.

structure. Therefore, we propose that its interpretabilitycannot be defined exclusively by what specific trainingmethod was used or what internal representation/structurehas been derived. Second, we propose to describe modelsas having different degrees of transparency, from fullyopaque to fully transparent and define their representationsas more or less interpretable.

This observation is particularly important when we con-sider the commonly held belief that interpretability andaccuracy in machine learning are at odds (Breiman, 2001).While we cannot definitely refute such statement here, webelieve it is important to consider that this may actuallybe a false dichotomy. If models can be interpreted bylooking at input/output behavior, disregarding what internalstructures produce such behavior, then it is possible toreconcile the need of using highly accurate models and yetretain interpretability.

4. The Role of Visual Analytics inInterpretation

Visual analytics uses the power of visual perception as away to involve humans in solving complex problems withdata. Such human involvement, broadly speaking, can bedesirable for two reasons: (1) because humans are betterthan machines in solving some tasks; (2) because humanunderstanding and interpretation is desired. By design-ing perceptually effective visual representations (Munzner,2014; Treisman, 1985; Ware, 2004) we can enable fast, ac-curate, and hopefully trustworthy interpretation of machinelearning models.

107


Figure 3. Using class signatures to describe groups of patients admitted to the hospital because of different medications. Each columnrepresents one group (cluster-step) whereas each row shows the amount of patients in this group taking a particular medication (thebar from the middle towards the right shows the percentage of patients taking the medication; the bar towards the left shows not takingmedication). The color of the bar shows the distribution of the true outcome labels as found in the input data. The background of therows shows the discriminativeness of a medication (dark being more discriminative wrt. all other clusters; rank-step). Above each groupa t-SNE projection of the items shows its relation to the other groups. ROC, precision-recall, and accuracy curves are shown on the leftto facilitate selecting two thresholds to filter high signal patients (contrast-step).

We identify two generic modalities that can be used formodel interpretation with visual analytics:

1. Visualizing Model Structure (White-Box). Fortransparent models, e.g., decision trees, one optionis to use visualization to represent the structure builtthrough the training method. Several examples existin this area, especially for decision trees and rules(eg., (van den Elzen & van Wijk, 2011; Yang, 2005;Liu et al., 2006)).

2. Visualizing Model Behavior (Black-Box). Anotheroption is to use visualization as a way to look at thebehavior of the model by looking exclusively at therelationship between input and output.

Following, we focus exclusively on the second case.

4.1. Visualizing Model Behavior (Input/Output)

Model interpretation through visualization of input/outputbehavior of a model has a number of interesting properties,and it has received so far, comparatively, less attention thanwhite-box approaches.

The most interesting property of this approach is that itdoes not depend on what specific method has been usedto train a model and, as such, it has the big advantageof being extremely flexible and generic. Furthermore, bycreating methods that enable interpretation of models bylooking exclusively at their behavior, we can study the roleof interpretation independently from model representation.

We have identified three main mechanisms through whichmodel behavior can be observed and analyzed:

1. Item(s) to outcome. In this modality, the user“probes” the model by creating input examples withdesired properties and observing what output themodel generates (e.g., an input image obtained from acamera or a patient descriptor obtained by specifyinga given set of properties).

2. Single feature to outcome. In this modality, the userinspects and observes the relationship between onefeature at a time and how its values relate (correlate) tothe outcome. In our first example below (Prospector),we show how this can be done in practice.

3. Multiple features to outcome. Finally, in this modal-ity, the visualization aims at representing the relation-ship between many features and their values and howthey related to the outcome. In our second examplebelow (Class Signatures), we show one way this ideacan be realized in practice.

An additional aspect worth mentioning in relation to thesethree mechanisms, is how the data necessary to observeinput/output behavior is obtained. Here, we have identifiedthree main options (not mutually exclusive): training data,test data (hold-out), simulated data.

While the first two types of data are very common inmachine learning training and validation steps, we noticethat simulated data is much less common.

108


With simulated data we mean data that is synthetically gen-erated (and as such may not belong to training or test data)by letting the user specify its properties. One example ofthis situation is given in the example below (Prospector),in which, in a disease prediction task based on electronichealth records, the user can specify a “fictional” patient bydragging sliders that define his or her values.

Being able to work with simulated data seems to be aparticularly useful and promising direction of researchwhen we consider the idea of probing machine learningmodels through visual analytics.

5. ProspectorProspector is a novel visual analytics system designed tohelp analysts better understand predictive models (Krauseet al., 2016). Prospector aims to support data scientists togo beyond judging predictive models solely based on theiraccuracy scores by also including model interpretabilityand actionable insights. It leverages the concept of partialdependence (Friedman, 2001), a diagnostic technique thatwas designed to communicate how features affect theprediction, and makes this technique fully interactive.

Figure 1 shows how partial dependence can be used todebug machine learning models in Prospector. In thisexample imputation of missing values created unexpectedbehaviour of the inspected classifier. Partial dependence isgiven by

pdpf (v) =1

N

N∑i

pred(xi) with xif = v

where N is the number of rows in the input matrix x, predis the prediction function that takes one input row, a featurevector, and returns a prediction score, and f is the featureused to compute the partial dependence plot. The formulacomputes the average outcome over all input rows whilechanging the value of feature f to the input value v foreach row xi. The original input data is kept fixed. Thisallows for observing the influence of f on the predictionscores. Unlike generalized additive models, eg., (Caruanaet al., 2015), this technique is model agnostic.

Prospector also supports localized inspection, so users canunderstand why certain data results in a specific prediction,and even lets users hypothesize new data by tweakingvalues and seeing how the predictive model responds.Users can interactively tweak feature values and see howthe prediction responds, as well as find the most impactfulfeatures using a novel model agnostic local feature impor-tance metric that only depends on partial dependence.

Figure 2 shows the prediction inspection portion of theProspector UI, which allows users to examine the features

true positives

false positives

true negativesfalse negatives

Figure 4. Using the receiver operating characteristics curve(ROC-Curve) for determining filter thresholds. The curve showshow the true positive rate (TPR; y-axis) and the false positive rate(FPR; x-axis) changes with decreasing thresholds (the thresholdvalues are not shown in the graph). The extended contingencymatrix on the right shows the actual numbers.

contributing to the prediction of a selected data point.All of the features partial dependence bars are shown ina scrollable list, with the data’s feature values set withcircular labels. Users can drag the circular label to changethe value of any feature and see the prediction change inreal-time. Users can change the sort order of the partialdependence bars by using the buttons at the top. Inaddition to sorting by the feature weight and relevanceas determined by the predictive model if available, userscan also sort according to our local feature importanceand impactful changes. If impactful changes are chosenas the sort order, the suggested changes to each featureare indicated with a white circular label in the partialdependence bar.

6. Class SignaturesWith Class Signatures, we propose a visual analyticsworkflow to interpret predictive associations between alarge set of binary features and a binary target. For this weuse a 4 step pipeline: model, contrast, cluster, and rank,and a visual analytics interface that allows the end user todetect and interpret such associations. After modeling thepredictive associations using a binary classifier we leveragethe prediction scores with two user defined thresholds,one for positive cases and one for negative cases, tofocus only on data items with a strong predictive signal,increasing contrast. Then, we cluster both positive andnegative examples separately. This groups together datapoints that have the same predicted outcome and a similarconfiguration of values. Finally, we rank each feature in thecomputed clusters using discriminative analysis across allclusters.

For interpreting the results with visual analytics, we useClass Signatures as shown in Figure 3. As our inputfeatures are binary in nature we show in the class signatures

109


how consistently present a feature is in a cluster. This isindicated by bars growing both to the right (percentageof which feature is present) and the left (percentage ofwhich feature is not present). In combination with thediscriminative measure of the features (computed as gini-importance; the shade of feature backgrounds is visuallyencoded, so darker means more discriminative) users canformulate rules that explain predictions for different sub-groups of data items.

The proposed workflow allows for a more fine-grainedanalysis of the driving factors of a predictive task thanusing commonly used feature importance techniques. Thisis due to the observation that many phenomena havemultiple underlying reasons for the same result. Thusan explanation is needed that distinguishes which featureswere actually responsible for given data points. ClassSignatures provide this distinction in the form of userinterpretable rules.

7. ConclusionWe have provided an initial characterization of how visualanalytics can be used for model interpretation, with a focuson visualizing input/output behavior rather than modelstructure. To exemplify these approaches, we presentedtwo practical examples of systems we built to understandthe reasoning of classification algorithms while treatingthem as black-box. As such, these examples provide initialanecdotal evidence that we do not necessarily need toaccept lower prediction performance in order to gain ininterpretability (Breiman identifies an inverse relation be-tween model performance and interpretability of a machinelearning model (Breiman, 2001)). Of course, there is muchfuture work to do as the space of possible solutions inanalyzing and understanding input-output relationships ofmachine learning models using visual analytics is largelyunexplored. We identify two major research trajectories todevelop these ideas further: (1) develop and validate moresolutions that exploit this specific modality and (2) studyhuman interpretation through user studies aimed at betterdefining interpretation and establishing metrics useful forcomparison of alternative solutions.

ReferencesBreiman, Leo. Statistical modeling: The two cultures.

Statistical Science, 16(3):199–231, Aug 2001.

Caruana, Rich, Lou, Yin, Gehrke, Johannes, Koch, Paul,Sturm, Marc, and Elhadad, Noemie. Intelligiblemodels for healthcare: Predicting pneumonia risk andhospital 30-day readmission. In Proceedings of the 21thACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining, KDD ’15, pp. 1721–1730,

New York, NY, USA, 2015. ACM. ISBN 978-1-4503-3664-2.

Friedman, Jerome H. Greedy function approximation: Agradient boosting machine. Annals of Statistics, 29(5):1189–1232, Oct 2001.

Krause, Josua, Perer, Adam, and Ng, Kenney. Interactingwith predictions: Visual inspection of black-boxmachine learning models. ACM CHI 2016, 2016.

Kuhn, M. and Johnson, K. Applied Predictive Modeling.Springer London, Limited, 2013.

Liu, Bing, Zhao, Kaidi, Benkler, Jeffrey, and Xiao,Weimin. Rule interestingness analysis using olapoperations. In Proceedings of the 12th ACM SIGKDDinternational conference on Knowledge discovery anddata mining, pp. 297–306. ACM, 2006.

Munzner, Tamara. Visualization Analysis and Design. AK Peters/CRC Press, Natick, MA, USA, 2014. ISBN9781466508910.

Treisman, Anne. Preattentive processing in vision.Comput. Vision Graph. Image Process., 31(2):156–177, Aug 1985. ISSN 0734-189X. doi: 10.1016/S0734-189X(85)80004-9. URL http://dx.doi.org/10.1016/S0734-189X(85)80004-9.

van den Elzen, Stef and van Wijk, Jarke J. Baobabview:Interactive construction and analysis of decision trees. InVisual Analytics Science and Technology (VAST), IEEEConference on, pp. 151–160, Oct 2011.

Ware, Colin. Information Visualization: Perceptionfor Design. Morgan Kaufmann Publishers Inc., SanFrancisco, CA, USA, 2004. ISBN 1558608192.

Xu, Kelvin, Ba, Jimmy, Kiros, Ryan, Cho, Kyunghyun,Courville, Aaron, Salakhudinov, Ruslan, Zemel, Rich,and Bengio, Yoshua. Show, attend and tell: Neuralimage caption generation with visual attention. InProceedings of the 32nd International Conference onMachine Learning (ICML-15), pp. 2048–2057, 2015.

Yang, Li. Pruning and visualizing generalized associationrules in parallel coordinates. Knowledge and DataEngineering, IEEE Transactions on, 17(1):60–70, 2005.

Yosinski, Jason, Clune, Jeff, Nguyen, Anh, Fuchs, Thomas,and Lipson, Hod. Understanding neural networksthrough deep visualization. In Deep Learning Workshop,International Conference on Machine Learning (ICML),2015.

Zeiler, Matthew D. and Fergus, Rob. Visualizingand understanding convolutional networks. CoRR,abs/1311.2901, 2013. URL http://arxiv.org/abs/1311.2901.

110

http://dx.doi.org/10.1016/S0734-189X(85)80004-9

http://dx.doi.org/10.1016/S0734-189X(85)80004-9

http://arxiv.org/abs/1311.2901

http://arxiv.org/abs/1311.2901

Using Visual Analytics to Interpret Predictive …...Using Visual Analytics to Interpret Predictive...

Documents

Transcript of Using Visual Analytics to Interpret Predictive …...Using Visual Analytics to Interpret Predictive...