Simultaneous Measurement Imputation and Outcome Prediction...

26
Proceedings of Machine Learning Research 106:126, 2019 Machine Learning for Healthcare Simultaneous Measurement Imputation and Outcome Prediction for Achilles Tendon Rupture Rehabilitation Charles Hamesse [email protected] KTH Royal Institute of Technology Royal Military Academy Stockholm, Sweden Brussels, Belgium Ruibo Tu [email protected] KTH Royal Institute of Technology Stockholm, Sweden Paul Ackermann [email protected] Karolinska University Hospital Stockholm, Sweden Hedvig Kjellstr¨ om [email protected] KTH Royal Institute of Technology Stockholm, Sweden Cheng Zhang [email protected] Microsoft Research Cambridge, UK Abstract Achilles Tendon Rupture (ATR) is one of the typical soft tissue injuries. Rehabilitation after such a musculoskeletal injury remains a prolonged process with a very variable out- come. Accurately predicting rehabilitation outcome is crucial for treatment decision sup- port. However, it is challenging to train an automatic method for predicting the ATR rehabilitation outcome from treatment data, due to a massive amount of missing entries in the data recorded from ATR patients, as well as complex nonlinear relations between measurements and outcomes. In this work, we design an end-to-end probabilistic frame- work to impute missing data entries and predict rehabilitation outcomes simultaneously. We evaluate our model on a real-life ATR clinical cohort, comparing with various baselines. The proposed method demonstrates its clear superiority over traditional methods which typically perform imputation and prediction in two separate stages. 1. Introduction Soft tissue injuries, such as Achilles Tendon Rupture (ATR), are increasing in recent decades (Huttunen et al., 2014). Such injuries require lengthy healing processes with abundant com- plications, which can cause severe incapacity in individuals. Influences of various factors such as patient demographics and dierent treatment methods are not clear for the re- habilitation outcome due to large variations in symptoms and the long healing process. Additionally, many medical examinations are not carried out for a large portion of patients Both authors contributed equally to this manuscript. c 2019 C. Hamesse, R. Tu, P. Ackermann, H. Kjellstr¨om & C. Zhang.

Transcript of Simultaneous Measurement Imputation and Outcome Prediction...

Page 1: Simultaneous Measurement Imputation and Outcome Prediction ...proceedings.mlr.press/v106/hamesse19a/hamesse19a.pdf · work to impute missing data entries and predict rehabilitation

Proceedings of Machine Learning Research 106:1–26, 2019 Machine Learning for Healthcare

Simultaneous Measurement Imputation and Outcome

Prediction for Achilles Tendon Rupture Rehabilitation

Charles Hamesse⇤

[email protected]

KTH Royal Institute of Technology Royal Military AcademyStockholm, Sweden Brussels, Belgium

Ruibo Tu⇤

[email protected]

KTH Royal Institute of TechnologyStockholm, Sweden

Paul Ackermann [email protected]

Karolinska University HospitalStockholm, Sweden

Hedvig Kjellstrom [email protected]

KTH Royal Institute of TechnologyStockholm, Sweden

Cheng Zhang [email protected]

Microsoft Research

Cambridge, UK

Abstract

Achilles Tendon Rupture (ATR) is one of the typical soft tissue injuries. Rehabilitationafter such a musculoskeletal injury remains a prolonged process with a very variable out-come. Accurately predicting rehabilitation outcome is crucial for treatment decision sup-port. However, it is challenging to train an automatic method for predicting the ATRrehabilitation outcome from treatment data, due to a massive amount of missing entriesin the data recorded from ATR patients, as well as complex nonlinear relations betweenmeasurements and outcomes. In this work, we design an end-to-end probabilistic frame-work to impute missing data entries and predict rehabilitation outcomes simultaneously.We evaluate our model on a real-life ATR clinical cohort, comparing with various baselines.The proposed method demonstrates its clear superiority over traditional methods whichtypically perform imputation and prediction in two separate stages.

1. Introduction

Soft tissue injuries, such as Achilles Tendon Rupture (ATR), are increasing in recent decades(Huttunen et al., 2014). Such injuries require lengthy healing processes with abundant com-plications, which can cause severe incapacity in individuals. Influences of various factorssuch as patient demographics and di↵erent treatment methods are not clear for the re-habilitation outcome due to large variations in symptoms and the long healing process.Additionally, many medical examinations are not carried out for a large portion of patients

⇤ Both authors contributed equally to this manuscript.

c� 2019 C. Hamesse, R. Tu, P. Ackermann, H. Kjellstrom & C. Zhang.

Page 2: Simultaneous Measurement Imputation and Outcome Prediction ...proceedings.mlr.press/v106/hamesse19a/hamesse19a.pdf · work to impute missing data entries and predict rehabilitation

Simultaneous Measurement Imputation and Outcome Prediction for ATR

since they can be costly and/or painful. Thus, accurately predicting the ATR rehabilitationoutcome at di↵erent stages using existing measurements is highly interesting, and can beused for decision support for practitioners. Moreover, ATR is one example of a wider classof medical conditions. In these situations, patients first need acute treatments, then gothrough a long-term and uncertain rehabilitation process (Horstmann et al., 2012). Deci-sion support tools for practitioners are in general of great need, and outcome predictionplays an important role in the medical decision-making.

Predicting ATR rehabilitation outcomes is extremely challenging for both medical ex-perts and machines. This is mainly due to large number of noisy or absent measurementsfrom various medical instruments. Medical tests and outcome scores for ATR involve a largevariety of metrics. The total number of those metrics is on the magnitude of hundreds. How-ever, only a subset of all possible medical measurements are used for a patient. Thus, theobservations are very sparse. In this work, we use an ATR cohort which is collected frommultiple hospitals in the past five years. The sparsity of this cohort is the consequence ofseveral phenomena: firstly, they are aggregated from di↵erent studies realized by di↵erentclinicians who have di↵erent procedures; secondly, some measurements can be painful, costlyor time-consuming such that not all patients are willing to take them. Such phenomena arecommon in many medical cohorts. Moreover, among those measurements, many are noisyand establish highly non-linear relationship to the rehabilitation outcome, which makes theoutcome prediction task di�cult.

Leveraging data-driven approaches, we design machine learning models to predict po-tential ATR rehabilitation outcomes with sparse and noisy data for patients, and providedecision support for practitioners. In particular, we develop a probabilistic model to addresstwo problems at once: imputing the missing values of the costly medical measurements forpatients, and predicting patients’ final rehabilitation outcome. We focus on predicting theATR rehabilitation outcome, while our framework can be further applied to a wider domainof conditions beyond ATR.

Technical significance. We propose a novel probabilistic framework where probabilisticmatrix factorization is combined with a Bayesian Neural Network (BNN) for rehabilitationoutcome prediction with the noisy and sparse dataset. Our method shows clear improvementfor this task comparing to traditional methods. For outcome prediction with such a cohort,traditional methods commonly need two stages: Firstly, missing values are imputed usingmethods such as mean-imputing or zero-imputing; secondly, a linear model is used to predictthe outcome with the imputed data (Arverud et al., 2016; Bostick et al., 2010). Thesemethods commonly lead to a low prediction quality because of the low imputation qualityand the linear relationship assumption between measurements and outcomes.

Our framework simultaneously imputes the missing values and predicts the rehabilita-tion outcomes. We first adopt a probabilistic latent variable model to predict the missingentries in the hospital stay measurements. Prediction based methods, such as using la-tent variable models, in general, demonstrate superior performance for data imputationcompared to traditional methods such as mean imputation (Sche↵er, 2002; Buuren andGroothuis-Oudshoorn, 2010; Keshavan et al., 2010; Ma et al., 2018). These latent variablessummarize patients’ underlying health situation in a low-dimensional space and impute themissing entries based on the patient status. We then combine the latent variable imputation

2

Page 3: Simultaneous Measurement Imputation and Outcome Prediction ...proceedings.mlr.press/v106/hamesse19a/hamesse19a.pdf · work to impute missing data entries and predict rehabilitation

Simultaneous Measurement Imputation and Outcome Prediction for ATR

model with BNN to predict ATR rehabilitation outcomes as one integrated probabilisticmodel. BNN is highly flexible, thus can handle non-linear relationships between rehabilita-tion outcomes and measurements. Moreover, our model is fully probabilistic, thus providesuncertainty estimation of the prediction results. In an end-to-end manner, our frameworkprovides significant improvement in the clinical standard.

Clinical relevance. ATR rehabilitation is a prolonged process with unpredictable vari-ation in the individual long-term outcome. The optimal and individualized rehabilitationprotocol is unknown and therefore inappropriate treatments may often be provided leadingto worse outcome for the patient and increased cost for society. In this case, prediction ofATR rehabilitation outcome can shorten the healing process by helping clinicians to choosee↵ective treatments based on varying patient characteristics.

There is a large number of ATR treatments and rehabilitation protocols and also assess-ments made on the patients. Choosing suitable treatments for patients is still challenging.Moreover, di↵erent measurements may vary in price and time to perform. Given imputedvalues for measurements and the predicted values for outcomes with calibrated uncertaintyusing our method, the clinicians can make decisions on the patient treatment and moni-toring more easily. For example, if the model predicts an unobserved measurement valuewith high confidence and the predicted value is in a clinical normal range, the clinician doesnot need to measure this value anymore. Thus, time and cost are saved by not perform-ing unnecessary medical assessments. Otherwise, if the prediction indicates any abnormalsituation or high uncertainty for an important measurement, it is worthwhile to apply thismedical instrument and obtain the measurement value for this patient. The predicted out-come also helps the clinician to better estimate the patient status in general and aids thetreatment decisions.

The paper is structured as follows: We discuss related work (Section 2) and describethe ATR cohort (Section 3). We then introduce the proposed model (Section 4). Finally weevaluate our proposed method against multiple baselines. The experimental results demon-strate clear improvement for ATR rehabilitation outcome prediction using our proposedmodel (Section 5).

2. Related Work

Our work focuses on utilizing machine learning methods for ATR rehabilitation outcomeprediction. We use a latent variable model based on probabilistic matrix factorization toaddress the missing entry problem, and then use the estimated patient state to predict therehabilitation outcome through BNN. There is very limited work on using machine learningto address the ATR outcome prediction. We revisit the related work in the following threeaspects: ATR analysis, AI in a generic health-care setting and missing value imputation,which is a key component for this type of applications.

Achilles tendon rupture analysis. Numerous studies have been carried out on under-standing the treatment and rehabilitation of ATR due to its importance in health-care.However, most studies are performed with a clinical approach, and use traditional statis-tic analysis, typically linear regression. Machine learning based approaches have not beenwidely adopted in the field of ATR research. As such, tools for rehabilitation outcome pre-

3

Page 4: Simultaneous Measurement Imputation and Outcome Prediction ...proceedings.mlr.press/v106/hamesse19a/hamesse19a.pdf · work to impute missing data entries and predict rehabilitation

Simultaneous Measurement Imputation and Outcome Prediction for ATR

diction using machine learning are of great interest. Here, we briefly review some relatedwork on ATR.

Olsson et al. (2014) employ linear regression to predict the rehabilitation outcome usingvariables such as age, sex, body mass index (BMI) or physical activity. The result shows thatusing traditional statistical models such as linear regression yields a limited prediction abilitydespite having a wide range of clinically relevant variables. A more recent study showsthat assessing clinical markers of tendon callus production (procollagen type I N-terminalpropeptide (PINP) and type III N-terminal propeptide (PIIINP)) shortly after operationcan help improve the prediction of long-term patient-reported outcomes applying multiplelinear regression on Achilles Tendon Total Rupture Score (ATRS) one year post-injury (Alimet al., 2016). Additionally, microcirculation in the tendon was also shown to be a strongpredictor of the patient outcome after ATR (Praxitelous et al., 2017). Although insightful,this research also shows that some accurate measurements, such as microcirculation, canbe expensive or di�cult to obtain. Therefore, utilizing a large range of cost-e�cient datato predict the rehabilitation outcome is desirable.

AI in health-care. There is a broad spectrum of machine learning methods used forgeneric medical applications. When dealing with large amounts of data, deep learningalgorithms show promising results. For example, long short-term memory networks (LSTM)and convolutional neural networks (CNN) has been applied to various clinical tasks such asmortality prediction in ICU setting where the data are often gathered from sensor readings(Suresh et al., 2017; Chalapathy et al., 2016; Jo et al., 2017; Purushotham et al., 2017).

However, health-care datasets often have a limited number of patients with large num-bers of variables from di↵erent instruments. This often leads to datasets with many missingentries. At the same time, being able to encode existing medical research results in newmodels and providing interpretable results are desirable features in many health-care re-lated applications. In this case, probabilistic models are needed. Depending on the medicalcontext, di↵erent types of models are used. For example, Lasko (2014) employs Gaussianprocesses to predict irregular and discrete medical events. Schulam and Saria (2015) de-sign a hierarchical latent variable model to predict the trajectory of an individual’s disease.These models are developed for di↵erent medical contexts and are not directly applicableto our application setting. In this work, we use ATR as an example and design a model topredict patients’ rehabilitation outcomes after acute treatments.

Missing value imputation. Most real-life medical cohorts have a large amount of miss-ing values. Traditional methods such as zero imputation or mean imputation ease theanalysis but may lead to low imputation accuracy. For the datasets with missing values,matrix factorization based methods are shown to be e↵ective for many missing value impu-tation applications (Shi et al., 2016; Troyanskaya et al., 2001), and frequently used for otherapplications of the matrix completion problem, i.e., collaborative filtering (Ocepek et al.,2015). Many e�cient algorithms have been proposed, such as Singular Value Thresholding(SVT) (Cai et al., 2010), Fixed Point Continuation (FPC) (Ma et al., 2011), and Inex-act Augmented Lagrange Multiplier (IALM) (Lin et al., 2010). Typically, these methodsconstruct a matrix factorization objective and optimize it using traditional convex optimiza-tion techniques. Extensions, such as Singular Value Projection (SVP) (Jain et al., 2010)and OptSpace (Keshavan and Oh, 2009) consider observation noise in the objective. How-

4

Page 5: Simultaneous Measurement Imputation and Outcome Prediction ...proceedings.mlr.press/v106/hamesse19a/hamesse19a.pdf · work to impute missing data entries and predict rehabilitation

Simultaneous Measurement Imputation and Outcome Prediction for ATR

ever, the sparse dataset can damage the performance of matrix factorization based methods(Mnih and Salakhutdinov, 2008). In this case, probabilistic matrix factorization (Mnih andSalakhutdinov, 2008) is an alternative solution for sparse and imbalanced datasets.

In this work, we combine a probabilistic matrix factorization approach similar to thatof Matchbox (Stern et al., 2009), with a supervised learning approach using models such asBayesian neural networks (Neal, 2012). Therefore, we can impute missing values based onlatent patient traits with the sparse ATR dataset and predict the rehabilitation outcome inan end-to-end probabilistic framework.

3. Cohort

In our work, the cohort is a real-life dataset collected from multiple previous studies by anorthopedic group (Valkering et al., 2017; Domeij-Arverud et al., 2016). A snapshot of thedataset is shown in Figure 1. There are 442 patients in the dataset (N = 442). The numberof measurements is M = 297, and the number of the outcome scores is S = 63. We denotethe first N ⇥ M part of the dataset as the predictors, P, and the second N ⇥ S part asthe scores, S. The percentage of missing values is 69.5% in the predictors and 64.2% in thescores.

Length Weight . . . DVT 2 . . . ATRS 12 sti↵1 190 79.8 . . . ⇥ . . . 82 ⇥ 76.5 . . . 0 . . . ⇥3 ⇥ ⇥ . . . 1 . . . 104 178 96.7 . . . 0 . . . ⇥

Figure 1: A snapshot of the Achilles Tendon Rupture (ATR) cohort. Each row representsa patient’s medical record and each column represents a measurement. In thisexample, DVT 2 refers to the presence of deep venous thrombosis after two weeksand the ATRS 12 sti↵ measurements refer to the Achilles Tendon Rupture Score(ATRS) metrics of sti↵ness after 12 months. ⇥ indicates the entry is missing.

Problem setting. We review a typical case of patient journey first and then introducethe problems. ATR patients typically go to hospital to get a treatment immediately afteran injury. There, their demographic data are registered. As part of the treatment process,they go through a number of tests from various medical instruments. Due to the complexityof these tests (e.g. in terms of time, cost, pain, invasiveness, accuracy), not all patients gothrough the same procedure. This leads to a lot of missing data and a lot of variation inwhich measurements are missing. After the treatment, patients are discharged from thehospital to heal. To monitor the healing process, they are asked to return to the hospitalfor rehabilitation examination after 3, 6 and 12 months. Not all tests are applied for allpatients in the study, since not all patients return on time for rehabilitation examination.Thus, the rehabilitation outcome scores also have a large amount of missing entries.

Based on the patient journey, we split these variables into two categories. The first onecontains patient demographics and measurements realized during their stay at a hospital.

5

Page 6: Simultaneous Measurement Imputation and Outcome Prediction ...proceedings.mlr.press/v106/hamesse19a/hamesse19a.pdf · work to impute missing data entries and predict rehabilitation

Simultaneous Measurement Imputation and Outcome Prediction for ATR

These measurements include features such as age, BMI, blood tests of various chemicalsrelated to tendon callus production, whether there was surgical intervention, or informationon post-operative treatment. Variables in this category are referred to as predictors in thefollowing text. The second category is the scores, and includes all metrics of rehabilitationoutcomes such as ATRS or Foot and Ankle Outcome Score (FAOS). An example snapshotof the dataset is depicted in Figure 1. In this work, we will impute the predictors andpredict the scores.

4. Methods

We design an end-to-end probabilistic model to simultaneously impute the missing entries inthe predictors and predict the rehabilitation outcomes. The data imputation part is a latentvariable model which can be used separately or be part of the end-to-end model. For theprediction part, we provide multiple alternatives of modeling choices, including Bayesianlinear regression and Bayesian neural networks, using either the learned latent variablesor the imputed predictors as inputs. In this section, we first introduce the basic dataimputation unit and then introduce our end-to-end model for simultaneous data imputationand rehabilitation outcome prediction.

4.1. Measurement imputation

We first present the component of the model which aims to recover the missing measure-ments in the predictors part of the matrix, P 2 RN⇥M . We formulate the missing dataimputation problem into a collaborative filtering problem. Typically, matrix factorizationmodels are used in collaborative filtering for recommender systems. They work by de-composing the user-item interaction matrix into the product of two lower dimensionalityrectangular matrices to predict unseen ratings. Thus, this technique can be used for dataimputation. We adopt a probabilistic matrix factorization based method (Stern et al., 2009),where the latent traits are used to model the personal preference of users and the ratingsfor all items are predicted, but in this work we model the patient state and predict themissing measurements. The result is a latent variable model with Gaussian distributions,and its graphical representation is shown in Figure 2(a).

For N patients, M measurements, S scores and a latent space of size D, the model as-sumes that the patient measurement a�nity matrix A 2 RN⇥M is generated from thepatient traits U 2 RN⇥D, which reflect the health status of the patient, and predic-tor traits V 2 RM⇥D, which map di↵erent health status to measurements from vari-ous medical instruments. We use Gaussian distributions to model these entries. Thus,p(U|�2

U) =QN

i=1N (ui|µU,�2

U1), p(V|�2

V) =QM

j=1N (vj |µV,�2

V1). Not all measurementsare observed. The measurement imputation model is

p(P|U,V,�2

P) =NY

n=1

MY

m=1

"N (Pnm|uT

nvm,�2

P)

#I(n,m)

, (1)

6

Page 7: Simultaneous Measurement Imputation and Outcome Prediction ...proceedings.mlr.press/v106/hamesse19a/hamesse19a.pdf · work to impute missing data entries and predict rehabilitation

Simultaneous Measurement Imputation and Outcome Prediction for ATR

Pnm

udnvdm

D

M

N

(a) Matrixfactoriza-tion

Pnm

udnvdm

Sns bs

wms

M

N

S

D

(b) Using dense predic-tors P

Pnm

udnvdm

Sns bs

wds

D

M S

N

(c) Using patient traitsU

Figure 2: Graphical representation of the proposed models with probabilistic matrix fac-torization and Bayesian linear regression (or BNN). Half-shaded nodes describepartially observed variables. Panel (a) is the graphical representation of the prob-abilistic matrix factorization model for data imputation only. Panel (b) showsthe model which uses the imputed measurements to predict the rehabilitationoutcome. Panel (c) shows the model which uses the patient traits, a latent rep-resentation of the patient state, to predict the rehabilitation outcome.

where N (x|µ,�2) is the probability density function of the Gaussian distribution with meanµ and variance �2. I is an observation indication matrix, defined as

I(n,m) =

(1 if Pn,m is observed,

0 otherwise.(2)

Thus, we can use the observed measurements to train the model, and use the generatedmeasurements a�nity A to impute the missing data. Eventually, we want the observedentries in P to be as close as possible to the corresponding entries in A.

4.2. Simultaneous data imputation and outcome prediction

We present our proposed models which impute the missing entries and predict the rehabil-itation outcome. Based on the model presented before, we add the second component topredict the scores matrix S 2 RN⇥S using the patient information. The patient informationcan be the imputed measurement matrix as shown in Figure 2(b), or the patient trait vectorwhich is a low-dimensional summary of the patient state as shown in Figure 2(c).

7

Page 8: Simultaneous Measurement Imputation and Outcome Prediction ...proceedings.mlr.press/v106/hamesse19a/hamesse19a.pdf · work to impute missing data entries and predict rehabilitation

Simultaneous Measurement Imputation and Outcome Prediction for ATR

Bayesian linear regression. We consider a Bayesian linear regression model first. Thescore is modeled as

p(S | W,b,X) =NY

n=1

SY

s=1

"N (Sns | xnws + bs,�

2

S)

#I0ns

, (3)

p(W) = N (W | 0,�2

w1) ,

p(b) = N (b | 0,�2

b) ,

where the input X is either the predictors or the patient traits, W 2 RM⇥S and b 2 RS

are weights and bias parameters for Bayesian linear regression. S 2 RN⇥S indicates theobserved rehabilitation scores, which can be seen as the rehabilitation outcome Bmasked byboolean observation indicator I0, described similarly to the previous section. For a patientwho has gone through the rehabilitation monitoring, our model can be used to predictthe missing scores. For a new patient who has just received treatment, our model canpredict the future healing outcome. In the case of the predictors (Figure 2(b)), we makeuse of the observed values so that the input X is P = I ⇤ P + (1 � I) ⇤ A, where I is theN ⇥ M measurement observation indicator described in Section 4.1. In the case of thepatient traits (Figure 2(c)), we simply use U as the input X. In fact, predictors P containmore information but also more noise, and U can be seen as a summary of each patient’scharacteristics. Therefore, we do our experiments with either P or U as inputs for thesecond component. Figure 2 displays the graphical model in these two cases.

Bayesian neural network. We also consider a BNN, i.e. a neural network with proba-bilistic distributions on its weights and biases. In this case, we have the following conditionaldistribution of the scores

p(S | ✓,X) =NY

n=1

SY

s=1

"N (Sns | NN(xn; ✓),�

2

S)

#I0ns

, (4)

where NN is a Bayesian neural network parameterized by ✓, the collection of all weightsand biases of the network. Typically, we consider fully connected layers with hyperbolictangent activations. For a network of L layers, we have

Hl = tanh⇣Hl�1Wl + bl

⌘for l = 1, ..., L , (5)

where Hl is the output of layer l (H0 = X), Wl is the matrix of weights from neuronsof layer l � 1 to neurons of layer l, and bl is the bias vector for layer l. We start ourexperiments by setting up priors on weights according to Xavier’s initialization (Glorot andBengio, 2010). That is, the prior variance of a weight w that feeds into the j-th neuronof layer l depends on nlj,in, the number of neurons feeding into this neuron, and nlj,out,the number of neurons which the result is fed to. For a weight wlij going from layer l � 1,neuron i to layer l, neuron j, we have

p(wlij) = N (wlij | 0,�2

wlij), (6)

�2

wlij=

2

nlj,in + nlj,out. (7)

8

Page 9: Simultaneous Measurement Imputation and Outcome Prediction ...proceedings.mlr.press/v106/hamesse19a/hamesse19a.pdf · work to impute missing data entries and predict rehabilitation

Simultaneous Measurement Imputation and Outcome Prediction for ATR

We limit the complexity of the networks that we evaluate a small number of hidden layers,since there is a limited amount of data. A model with increased complexity would be moreprone to overfitting. The graphical model resembles the one in Figure 2, except that insteadof the weights W and biases b, we have the set of parameters of the network ✓. The exactshape of the network is described in the experiments section.

Rehabilitation outcome prediction at various timestamps. As we discussed before,the patient returns to hospital for rehabilitation monitoring after 3, 6, and 12 months. Thusat di↵erent timestamps, we have di↵erent amounts of observed data. We move further inthe patient’s journey and apply changes to the previous model so that it can be used at the3-month or 6-month mark. In the first case, we rearrange our inputs and move the scoresat 3 months from S to P. We denote these rearranged inputs P3 and S3. We apply thesame procedure for the 6-month mark and define P6 and S6. We demonstrate in the nextsection that if we add more information from the healing monitoring, the performance ofthe final healing outcome prediction (at 12 months) is clearly improved.

Inference. We run inference on this whole model in a end-to-end manner. We use vari-ational inference with the KL divergence (Blei et al., 2017; Zhang et al., 2017). We imple-mented all our models with the Edward library (Tran et al., 2016), a probabilistic program-ming library. We use Gaussian distributions to approximate all posteriors.

5. Experiments

We evaluate our method in this section. We first verify our model and inference algorithmusing a synthetic dataset. We then focus on the real-world ATR rehabilitation cohort andpresent preprocessing details. We compare our proposed method with multiple baselines.Finally, we discuss all experimental results. The experimental results show that our pro-posed end-to-end model clearly improves the predictive performance in comparison to thebaselines. Additionally, we evaluate the rehabilitation outcome prediction at various times-tamps and show that the accuracy of the rehabilitation outcome prediction increases withmore observations.

5.1. Inference verification with synthetic data

We test our model and inference algorithm with a synthetic dataset first. We build thissynthetic dataset based on the generative process of the model and infer the latent param-eters. We observe that our algorithms can successfully recover the latent parameters in alldi↵erent settings.

More precisely, we use N = 100, M = 30, P = 10, and a latent space of size D =10. We generate the true patient and measurement traits by sampling from a normaldistribution with mean 0.5 and variance 0.5. As discussed in (Zhang et al., 2015), thismodel is symmetric: parameters can rotate and inference can yield multiple valid solutionsfor U and V. To ensure that we recover the true latent traits, we fix the upper squareof V to the D ⇥ D identity matrix to avoid parameter rotation. Also, we add Gaussiannoise with variance 0.1 to the resulting P matrix. We set priors matching this generativeprocess. For evaluation, we randomly split the data into a training and a testing set withproportions 80%-20%.

9

Page 10: Simultaneous Measurement Imputation and Outcome Prediction ...proceedings.mlr.press/v106/hamesse19a/hamesse19a.pdf · work to impute missing data entries and predict rehabilitation

Simultaneous Measurement Imputation and Outcome Prediction for ATR

0 1 2

0

2

True traits

Recovered

traits

Figure 3: Synthetic data experiment. Theplot demonstrates the learned la-tent traits with respect to theground-truth. For a point in thefigure, its x-coordinate is the truetrait value, and y-coordinate isthe corresponding recovered traitvalue. Di↵erent colors representdi↵erent latent traits. For brevity,we don’t show all the recovered la-tent traits.

By comparing the inferred latent variables with their ground-truth values, we see thatwe can recover all of them. As an example, we show the ability of our end-to-end modelwith linear regression on P (Figure 2(b)) to recover the patients’ latent traits and to predictmissing values in P and S. Figure 3 depicts examples of the recovered patient traits. Wecan see that recovered values are close to true values because points in Figure 3 are close tothe diagonal of the square figure. Additionally, we evaluate the training and testing errorwith the Mean Absolute Error (MAE). We obtain an average training error of 0.042 for Pand 0.027 for S, and an average testing error of 0.056 and 0.037 in the same order. Theseresults validate our model and inference algorithm. Next, we evaluate our model on thereal-life ATR dataset.

5.2. Preprocessing of the Achilles tendon rupture rehabilitation cohort

The cohort comes from clinical records with various formats, thus preprocessing is neededbefore we evaluate our method. First, we convert the whole dataset to numerical values, forexample we convert the starting and ending time of the surgery to surgery duration. Thisprocess is done under the supervision of medical experts and the whole list of variables thatwe use are presented in the appendix. The ranges of measurements di↵er significantly dueto the various units in use. We normalize every variable to be in the range of [0, 1]. Theseare a�ne transformations and the original value can be easily recovered. We do not fill inthe missing data in the preprocessing steps as it is one of the goals of our model.

5.3. Baselines

We compare our proposed method with seven variations of our proposed model with twotypes of baselines. The first type of baseline uses traditional data imputation methodsto impute the missing values in P and predict S. The second one is a two-stage versionof our proposed model where data imputation and rehabilitation outcome prediction areperformed in a sequential manner.

Traditional data imputation. We first consider imputing the per-patient mean (Schef-fer, 2002) to all missing values. For each patient, the mean value of their observationsbelonging to the training set is imputed to all their missing measurements. We also apply

10

Page 11: Simultaneous Measurement Imputation and Outcome Prediction ...proceedings.mlr.press/v106/hamesse19a/hamesse19a.pdf · work to impute missing data entries and predict rehabilitation

Simultaneous Measurement Imputation and Outcome Prediction for ATR

Component 2 Input BLR P BLR S BLR SATRS

P 2-stage (mean) 0.228 ± 0.0014 0.230± 0.008 0.200 ± 0.010P 2-stage (OptSpace) 0.224± 0.0028 0.207 ± 0.009 0.193 ± 0.010P 2-stage (SoftImpute) 0.2049 ± 0.002 0.206 ± 0.008 0.192 ± 0.010P 2-stage (SVP) 0.316 ± 0.003 0.205 ± 0.012 0.200 ± 0.014P 2-stage (IALM) 0.237 ± 0.008 0.201 ± 0.011 0.201 ± 0.010P 2-stage (PMF) 0.164 ± 0.002 0.220 ± 0.006 0.201 ± 0.007U 2-stage (PMF) 0.164 ± 0.002 0.237 ± 0.006 0.208 ± 0.006P EE (proposed) 0.181 ± 0.001 0.202± 0.003 0.195± 0.005U EE (proposed) 0.178± 0.001 0.164± 0.004 0.146±0.005

Component 2 Input BNN P BNN S BNN SATRS

P 2-stage (mean) 0.228 ± 0.0014 0.233 ± 0.005 0.202 ± 0.005P 2-stage (OptSpace) 0.224± 0.0028 0.203 ± 0.008 0.187± 0.009P 2-stage (SoftImpute) 0.2049 ± 0.002 0.201 ± 0.007 0.186 ± 0.008P 2-stage (SVP) 0.316 ± 0.003 0.194 ± 0.010 0.187 ± 0.010P 2-stage (IALM) 0.237 ± 0.008 0.187± 0.011 0.187 ± 0.009P 2-stage (PMF) 0.164 ± 0.002 0.207 ± 0.007 0.190 ± 0.007U 2-stage (PMF) 0.164 ± 0.002 0.208± 0.007 0.190 ± 0.007P EE (proposed) 0.158 ± 0.001 0.152±0.004 0.143±0.004U EE (proposed) 0.167±0.001 0.174±0.003 0.152± 0.003

Table 1: Mean Absolute Error (MAE) and standard deviation over 5 runs for outcomeprediction. Each time, we use random splits of the data with 80% data for trainingand 20% data for testing. “EE” indicates end-to-end which is our proposed model.“2-stage” is the baseline model where data imputation and rehabilitation outcomeprediction are performed in a sequential manner. BLR stands for Bayesian LinearRegression and BNN stands for Bayesian Neural Network. For the 2-stage models,the error on P remains the same because the matrix P is imputed once with themean imputation or the matrix factorization based methods. In addition, wereport the MAE for the ATRS separately. The target is that the MAE of SATRS

gets smaller than 0.1, because only a di↵erence larger than 0.1 is considered to beclinical di↵erent.

traditional matrix factorization based methods: OptSpace (Keshavan et al., 2010), Soft-Impute (Mazumder et al., 2010), Singular Value Projection (Jain et al., 2010) and InexactAugmented Lagrange Multiplier (Lin et al., 2010), to missing values. The predicted valuesbased on observations are imputed to all the missing values. We then use the imputed datato predict rehabilitation outcomes using Bayesian linear regression and Bayesian neuralnetwork.

11

Page 12: Simultaneous Measurement Imputation and Outcome Prediction ...proceedings.mlr.press/v106/hamesse19a/hamesse19a.pdf · work to impute missing data entries and predict rehabilitation

Simultaneous Measurement Imputation and Outcome Prediction for ATR

Two-stage version of the proposed model. We run inference on the probabilisticmatrix factorization part and only retrieve predictions for the first part of the dataset, P,and the patient trait matrix U. Then, we define the second model which uses either linearregression or a neural network on this output to give the scores predictions, S. Inference isrun separately on each component. The intent is to compare our end-to-end model with itsdirect multi-staged equivalent.

5.4. Results

We split the training and testing set to reflect the treatment journey. In all of our experi-ments, we first pick training and testing data for P and S with the following strategy: werandomly pick 80% of the patients, take all their available data for training and leave theremaining 20% of patients for testing. In other words, we split the dataset on a per-patientbasis. We do this since the goal of our work is to predict the rehabilitation outcome aftera patient receives the initial treatment. All experiments are repeated 5 times, with all thelearned variables getting reset at each run.

We use grid search for hyper-parameter tuning, starting with the matrix factorizationpart. We observe that the prior mean on the traits has little e↵ect on the end performance.However, the prior variance on both traits and scores has a big impact on how the modelfits the training data. We evaluate latent space sizes D 2 [1, 20] as well as latent traitvariances �2

U and �2

V, ranging from 0.1 to 0.9 by steps of 0.2. We find the optimal D to be 8and the optimal variance to be 0.5. Next, we tune the linear regression and neural network.We start by tuning the linear regression then turn it into a neural network with growingcomplexity by adding activation functions and layers as soon as we find that the model lacksexpressive power. We run grid searches on the prior means and variances of the weights andbiases as well as the observation noise on S. We notice that for the model to properly fit thetraining data, weights need to have a very small variance. This is expected since the datais very high-dimensional and the results of dot products need to be constrained in [0, 1]).Doing so, we find that dividing the weight variance computed with Xavier’s initializationby 103 and the prior observation noise by 104 yields the best results.

We report the performance of the rehabilitation outcome prediction in Table 1. TheMean Absolute Error (MAE) on the testing set is used as metric for our results. In practice,only a di↵erence larger than 0.1 is considered to be clinically di↵erent. Thus, results withMAE within 0.1 are ideal. To compare with this standard, we evaluate prediction methodsonly with 11 ATRS (10 criteria and the sum), whose results are shown in SATRS columns ofTable 1 and Table 2. We can see that our proposed end-to-end model with neural networkapplied on the whole P matrix achieves the best performance for predicting rehabilitationoutcomes and its SATRS result is close to the ideal MAE target 0.1. We see that forpredicting S, using the patient traits U works better in the case of linear regression, andusing the whole predictors matrix P works better in the case of neural network. Thisis certainly due to the fact that the dimensionality of P makes it di�cult for a simplemodel such as linear regression to extract the key features; a task that a more complexneural network would manage better. In these experiments, we start with a neural networkthat basically replicates the linear regression then gradually add complexity until we can’t

12

Page 13: Simultaneous Measurement Imputation and Outcome Prediction ...proceedings.mlr.press/v106/hamesse19a/hamesse19a.pdf · work to impute missing data entries and predict rehabilitation

Simultaneous Measurement Imputation and Outcome Prediction for ATR

improve the performance without overfitting. The optimal network we found has 1 hiddenlayer with P (the number of columns of S) hidden units and a hyperbolic tangent activation.

Our proposed method shows clear improvement on the rehabilitation outcome predictionover baselines. We can also see that latent variable models have a good performance on themissing value imputation. Our proposed model is trained for the rehabilitation outcomeprediction, so SVP and IALM could have the better performance on the missing valueimputation than ours.

Discharge P 3 Month P3 6 Month P6

MAE S3 0.177 ± 0.006MAE SATRS�3 0.173 ± 0.005MAE S6 0.172± 0.007 0.178± 0.006MAE SATRS�6 0.167 ± 0.009 0.169±0.010MAE S12 0.138 ± 0.006 0.140 ± 0.006 0.132± 0.006MAE SATRS�12 0.111 ± 0.003 0.114 ± 0.004 0.108±0.003

Table 2: Rehabilitation outcome prediction performance comparison at various timestamps.Our proposed model with Bayesian neural network is used for this evaluation. Weshow that the final rehabilitation outcome prediction accuracy increases with timeand our model can be used for the rehabilitation outcome prediction at variousrehabilitation stages. SATRS is evaluated only with ATRS.

Evaluation of the rehabilitation outcome prediction at di↵erent timestamps.

Here we evaluate the ability of our model to predict scores at di↵erent timestamps whenwe extend P to include scores at 3 months (yielding P3) and 6 months (yielding P6). Wereport the performance per-timestamp of our model with P, P3 and P6 in Table 2.

We observe that including future measurements helps predicting the final scores. In-cluding all the previously observed data in the predictors helps improving the accuracy offuture score predictions. Moreover, results of the ATRS prediction at 12 months are closeto 0.1 which is our target value.

Per-variable analysis. We further evaluate the prediction accuracy of our best perform-ing model by looking the mean error for each variable. Taking the model with P with BNNas an example, Figures 4 and 5 display the errors and the number of data points availablefor each variable.

In Figure 5, we show that the number of scores per period varies. In fact, each periodhas at least 11 ATRS (10 criteria and the sum in blue) and 5 FAOS scores (in red). On topof that, scores at 6 and 12 months both include additional tests such as the evaluation of theheel rise angle (in green). The clinical practice uses scores at 12 months more because theycan reflect rehabilitation states better. Figure 5 shows that our model is able to predict therehabilitation outcome at 12 months better comparing to 3 and 6 months.

13

Page 14: Simultaneous Measurement Imputation and Outcome Prediction ...proceedings.mlr.press/v106/hamesse19a/hamesse19a.pdf · work to impute missing data entries and predict rehabilitation

Simultaneous Measurement Imputation and Outcome Prediction for ATR

0

0.2

0.4Error

0 20 40 60 80 100 120 140 160 180 200 220 240 260 2800

200

400

Variable in P

Cou

nt

Figure 4: Per-variable mean MAE and number of data points available for training for thepredictors P.

0 5 10 15 20 25 30 35 40 45 50 55 600

0.10.20.30.40.5

Error

0 5 10 15 20 25 30 35 40 45 50 55 600

20406080

100

Cou

nt

3 months 6 months 12 months

Figure 5: Per-variable mean absolute error (MAE) and number of data points available fortraining for di↵erent components of scores at di↵erent rehabilitation time. Bluebars represent ATRS. Red bars represent FAOS. Green bars represent other testscores.

6. Conclusions

We developed a probabilistic end-to-end framework to simultaneously predict the rehabil-itation outcome and impute the missing entries in data cohort in the context of AchillesTendon Rupture (ATR) rehabilitation. We evaluated our model and compared its perfor-mance with multiple baselines. We demonstrated a clear improvement in the accuracy ofthe predicted outcomes in comparison with traditional data imputation methods. Addi-tionally, the performance of our method on rehabilitation outcome prediction is close to theideal clinical result.

14

Page 15: Simultaneous Measurement Imputation and Outcome Prediction ...proceedings.mlr.press/v106/hamesse19a/hamesse19a.pdf · work to impute missing data entries and predict rehabilitation

Simultaneous Measurement Imputation and Outcome Prediction for ATR

Future work. There is still considerable work to be done in the interpretation of ourresults, in a clinical sense. An analysis of the impact of each predictor in each model as inPopkes et al. (2019) and a discussion on how these relate to the ATR clinical experiencesare desirable to strengthen this work from a medical point of view. Additionally, we arekeen to work closely with practitioners to validate our method in a real-life clinical context.We would work on improving the accuracy and interpretability of our model to make itbeneficial for both patients and practitioners in real-life health-care process.

A computational aspect to be considered is to investigate in depth on the uncertaintyestimation of our model. The Bayesian framework o↵ers a way to compute uncertaintieswhen predicting outcome scores, however, in many tasks, these models have shown to beover confident (Nalisnick et al., 2018). It would be useful for to know when a new patientarrives, how well - with which certainty - we can predict their outcome scores.

The proposed method is a general framework that can be applied to numerous health-care applications involving a long-term healing process after the treatment. In the future,we would collaborate with more health-care departments, test and improve our method inthese applications.

References

Md Abdul Alim, Simon Svedman, Gunnar Edman, and Paul Ackermann. Procollagenmarkers in microdialysate can predict patient outcome after achilles tendon rupture.2016.

E Domeij Arverud, Per Anundsson, Eva Hardell, Gunilla Barreng, Gunnar Edman, AliLatifi, Fausto Labruto, and PW Ackermann. Ageing, deep vein thrombosis and malegender predict poor outcome after acute achilles tendon rupture. The bone & joint journal,98(12):1635–1641, 2016.

David M Blei, Alp Kucukelbir, and Jon D McAuli↵e. Variational inference: A review forstatisticians. Journal of the American Statistical Association, 2017.

Geo↵ P Bostick, Nadr M Jomha, Amar A Suchak, and Lauren A Beaupre. Factors associatedwith calf muscle endurance recovery 1 year after achilles tendon rupture repair. journalof orthopaedic & sports physical therapy, 40(6):345–351, 2010.

S van Buuren and Karin Groothuis-Oudshoorn. mice: Multivariate imputation by chainedequations in r. Journal of statistical software, pages 1–68, 2010.

Jian-Feng Cai, Emmanuel J Candes, and Zuowei Shen. A singular value thresholding algo-rithm for matrix completion. SIAM Journal on Optimization, 2010.

Raghavendra Chalapathy, Ehsan Zare Borzeshi, and Massimo Piccardi. Bidirectional lstm-crf for clinical concept extraction. arXiv preprint arXiv:1611.08373, 2016.

E Domeij-Arverud, P Anundsson, E Hardell, G Barreng, G Edman, A Latifi, F Labruto, andPW Ackermann. Ageing, deep vein thrombosis and male gender predict poor outcomeafter acute achilles tendon rupture. Bone Joint J, 2016.

15

Page 16: Simultaneous Measurement Imputation and Outcome Prediction ...proceedings.mlr.press/v106/hamesse19a/hamesse19a.pdf · work to impute missing data entries and predict rehabilitation

Simultaneous Measurement Imputation and Outcome Prediction for ATR

Xavier Glorot and Yoshua Bengio. Understanding the di�culty of training deep feedforwardneural networks. In Proceedings of the thirteenth international conference on artificialintelligence and statistics, 2010.

Thomas Horstmann, C Lukas, J Merk, Torsten Brauner, and Annegret Mndermann. Deficits10-years after achilles tendon repair. 2012.

Tuomas T. Huttunen, Pekka A Kannus, Christer Gustav Rolf, Li Fellander-Tsai, andVille M. Mattila. Acute achilles tendon ruptures: incidence of injury and surgery insweden between 2001 and 2012. The American journal of sports medicine, 2014.

Prateek Jain, Raghu Meka, and Inderjit S Dhillon. Guaranteed rank minimization viasingular value projection. In Advances in Neural Information Processing Systems, 2010.

Yohan Jo, Lisa Lee, and Shruti Palaskar. Combining lstm and latent topic modeling formortality prediction. arXiv preprint arXiv:1709.02842, 2017.

Raghunandan H Keshavan and Sewoong Oh. A gradient descent algorithm on the grassmanmanifold for matrix completion. arXiv preprint arXiv:0910.5260, 2009.

Raghunandan H Keshavan, Andrea Montanari, and Sewoong Oh. Matrix completion fromnoisy entries. Journal of Machine Learning Research, 2010.

Thomas A Lasko. E�cient inference of gaussian-process-modulated renewal processes withapplication to medical event data. In Uncertainty in artificial intelligence: proceedings ofthe... conference. Conference on Uncertainty in Artificial Intelligence, 2014.

Zhouchen Lin, Minming Chen, and Yi Ma. The augmented lagrange multiplier method forexact recovery of corrupted low-rank matrices. arXiv preprint arXiv:1009.5055, 2010.

Chao Ma, Sebastian Tschiatschek, Konstantina Palla, Jose Miguel Hernandez Lobato, Se-bastian Nowozin, and Cheng Zhang. Eddi: E�cient dynamic discovery of high-valueinformation with partial vae. arXiv preprint arXiv:1809.11142, 2018.

Shiqian Ma, Donald Goldfarb, and Lifeng Chen. Fixed point and bregman iterative methodsfor matrix rank minimization. Mathematical Programming, 2011.

Rahul Mazumder, Trevor Hastie, and Robert Tibshirani. Spectral regularization algorithmsfor learning large incomplete matrices. Journal of machine learning research, 2010.

Andriy Mnih and Ruslan R Salakhutdinov. Probabilistic matrix factorization. In Advancesin neural information processing systems, 2008.

Eric Nalisnick, Akihiro Matsukawa, Yee Whye Teh, Dilan Gorur, and Balaji Lakshmi-narayanan. Do deep generative models know what they don’t know? arXiv preprintarXiv:1810.09136, 2018.

Radford M Neal. Bayesian learning for neural networks. 2012.

Uros Ocepek, Joze Rugelj, and Zoran Bosnic. Improving matrix factorization recommen-dations for examples in cold start. Expert Systems with Applications, 2015.

16

Page 17: Simultaneous Measurement Imputation and Outcome Prediction ...proceedings.mlr.press/v106/hamesse19a/hamesse19a.pdf · work to impute missing data entries and predict rehabilitation

Simultaneous Measurement Imputation and Outcome Prediction for ATR

N Olsson, J Karlsson, BI Eriksson, A Brorsson, M Lundberg, and KG Silbernagel. Abilityto perform a single heel-rise is significantly related to patient-reported outcome afterachilles tendon rupture. Scandinavian journal of medicine & science in sports, 2014.

Anna-Lena Popkes, Hiske Overweg, Ari Ercole, Yingzhen Li, Jose Miguel Hernandez-Lobato, Yordan Zaykov, and Cheng Zhang. Interpretable outcome prediction with sparsebayesian neural networks in intensive care. arXiv preprint arXiv:1905.02599, 2019.

Praxitelis Praxitelous, Gunnar Edman, and Paul W Ackermann. Microcirculation afterachilles tendon rupture correlates with functional and patient-reported outcome. Scandi-navian journal of medicine & science in sports, 2017.

Sanjay Purushotham, Chuizheng Meng, Zhengping Che, and Yan Liu. Benchmark of deeplearning models on large healthcare mimic datasets. arXiv preprint arXiv:1710.08531,2017.

Judi Sche↵er. Dealing with missing data. 2002.

Peter Schulam and Suchi Saria. A framework for individualizing predictions of disease tra-jectories by exploiting multi-resolution structure. In Proceedings of the 28th InternationalConference on Neural Information Processing Systems, 2015.

Weiwei Shi, Yongxin Zhu, S Yu Philip, Tian Huang, Chang Wang, Yishu Mao, and YufengChen. Temporal dynamic matrix factorization for missing data prediction in large scalecoevolving time series. IEEE Access, 2016.

David Stern, Ralf Herbrich, and Thore Graepel. Matchbox: Large scale bayesian recom-mendations. In Proceedings of the 18th International World Wide Web Conference, 2009.

Harini Suresh, Nathan Hunt, Alistair Johnson, Leo Anthony Celi, Peter Szolovits, andMarzyeh Ghassemi. Clinical intervention prediction and understanding with deep neuralnetworks. In Machine Learning for Healthcare Conference, pages 322–337, 2017.

Dustin Tran, Alp Kucukelbir, Adji B. Dieng, Maja Rudolph, Dawen Liang, and David M.Blei. Edward: A library for probabilistic modeling, inference, and criticism, 2016.

Olga Troyanskaya, Michael Cantor, Gavin Sherlock, Pat Brown, Trevor Hastie, RobertTibshirani, David Botstein, and Russ B Altman. Missing value estimation methods fordna microarrays. Bioinformatics, 2001.

Kars P Valkering, Susanna Aufwerber, Francesco Ranuccio, Enricomaria Lunini, GunnarEdman, and Paul W Ackermann. Functional weight-bearing mobilization after achillestendon rupture enhances early healing response: a single-blinded randomized controlledtrial. Knee Surgery, Sports Traumatology, Arthroscopy, 2017.

Cheng Zhang, Mike Gartrell, Thomas Minka, Yordan Zaykov, and John Guiver. Groupbox:A generative model for group recommendation. Technical Report MSR-TR-2015-61, 2015.

Cheng Zhang, Judith Butepage, Hedvig Kjellstrom, and Stephan Mandt. Advances invariational inference. arXiv preprint arXiv:1711.05597, 2017.

17

Page 18: Simultaneous Measurement Imputation and Outcome Prediction ...proceedings.mlr.press/v106/hamesse19a/hamesse19a.pdf · work to impute missing data entries and predict rehabilitation

Simultaneous Measurement Imputation and Outcome Prediction for ATR

Appendix A. Variables

A.1. Predictors

0 ID1 Study2 Gender3 Age4 DIC age 405 Length6 Weight7 BMI8 DIC BMI 279 Smoker10 ln Age11 ln Lenght12 ln Weight13 ln BMI14 Inj side15 Complication16 Paratenon17 Fascia18 PDS19 Surg comp20 Treatment Group21 Ort B22 Plast23 Healthy control24 Plast foot25 Vacoped26 VTIS27 TTS28 ln TTS29 TTS no of 24h cycles30 ln TTS no of 24h cycles31 TEST TTS 48h POL32 TEST TTS 24h POL33 TEST TTS 12h POL34 DIC TTS by VTIS median35 TRICH 48 84 TTS36 TRICH 48 96 TTS37 TRICH 48 72 TTS38 NEW Time er to op start39 DIC NEW Time er to op start

18

Page 19: Simultaneous Measurement Imputation and Outcome Prediction ...proceedings.mlr.press/v106/hamesse19a/hamesse19a.pdf · work to impute missing data entries and predict rehabilitation

Simultaneous Measurement Imputation and Outcome Prediction for ATR

40 DIC3 NEW Time er to op start41 OLD Time er to op start42 Op time43 DIC op 34min44 Op B Dic45 OP GBG dic46 Op time dic47 OP NR48 EXP49 Q RANK B50 Q RANK ABCD51 NR of Op52 DIC nr OP53 ASS Y N54 DIK SPEC55 PP IPC Study B56 Pump pat reg57 Pump reg58 Highest pump reg59 DIC 86h highest pump reg60 Pump comp61 Incl Excl62 DVT 263 DVT 6w64 DVT 2w and 6w65 DVT 2w or 6w66 DVT 8w67 Any dvt68 Inf 2w69 Inf 6w70 Any inf71 Rerupture72 Adeverse events 173 Adeverse events 274 Adeverse events 375 Adeverse events 476 Preinjury77 Post op78 D PAS79 Preinj 2cl80 Preinj 3cl81 Post op 2cl82 Con Power I

19

Page 20: Simultaneous Measurement Imputation and Outcome Prediction ...proceedings.mlr.press/v106/hamesse19a/hamesse19a.pdf · work to impute missing data entries and predict rehabilitation

Simultaneous Measurement Imputation and Outcome Prediction for ATR

83 Con Power U84 LSI Con Power85 Total work I86 Total work U87 NEW LSI Total work88 LSI Total work89 Repetition I90 Repetition U91 LSI Repetitions92 Height Max I93 Height Max U94 LSI Height95 Height A I96 Height A U97 LSI Height Ave98 Ecc Power I99 Height Min I100 Ecc Power U101 Height Min U102 LSI Height 2cl103 LSI Height Min104 LSI Ecc Power105 Muscle vein thrombosis 2106 Thompson 2107 Wound 2108 Podometer day1109 Podometer day2110 Podometer day3111 Podometer day4112 Podometer day5113 Podometer day6114 Podometer day7115 Podometer day8116 Podometer day9117 Podometer day10118 Podometer day11119 Podometer day12120 Podometer day13121 Podometer day14122 Podometer day15123 Podometer day16124 Mean podometer125 Total podometer

20

Page 21: Simultaneous Measurement Imputation and Outcome Prediction ...proceedings.mlr.press/v106/hamesse19a/hamesse19a.pdf · work to impute missing data entries and predict rehabilitation

Simultaneous Measurement Imputation and Outcome Prediction for ATR

126 DIC podometer 16500127 Podometer on day of microdialysis128 Podometer on day minus 1129 Podometer on day minus 2130 Subjective load day1131 Subjective load day2132 Subjective load day3133 Subjective load day4134 Subjective load day5135 Subjective load day6136 Subjective load day7137 Subjective load day8138 Subjective load day9139 Subjective load day10140 Subjective load day11141 Subjective load day12142 Subjective load day13143 Subjective load day14144 Subjective load day15145 Subjective load day16146 Mean subjective load147 DIC 43 Mean subjective load148 Days until microdialysis149 Load on day of microdialysis150 Load on day minus 1151 Load on day minus 2152 Number of days with load prior to microdialysis153 DIC 13 days with load154 VAS day1 act155 VAS day1 pas156 VAS day2 act157 VAS day2 pas158 VAS day3 act159 VAS day3 pas160 VAS day4 act161 VAS day4 pas162 VAS day5 act163 VAS day5 pas164 VAS day6 act165 VAS day6 pas166 VAS day7 act167 VAS day7 pas168 VAS injured 2weeks

21

Page 22: Simultaneous Measurement Imputation and Outcome Prediction ...proceedings.mlr.press/v106/hamesse19a/hamesse19a.pdf · work to impute missing data entries and predict rehabilitation

Simultaneous Measurement Imputation and Outcome Prediction for ATR

169 VAS control 2weeks170 Calf circumference injured 1171 Calf circumference injured mean172 Calf circumference control 1173 Calf circumference control mean174 Plantar flexion injured 1175 Plantar flexion injured 2176 Plantar flexion injured 3177 Plantar flexion injured mean178 Plantar flexion control 1179 Plantar flexion control 2180 Plantar flexion control 3181 Plantar flexion control mean182 Dorsal flexion injured 1183 Dorsal flexion injured 2184 Dorsal flexion injured 3185 Dorsal flexion injured mean186 Dorsal flexion control 1187 Dorsal flexion control 2188 Dorsal flexion control 3189 Dorsal flexion control mean190 Q1191 Q2192 Q3193 Q4194 Q5195 EQ5D ix196 VAS197 VAS 2198 Gluc2 2 i199 Gluc2 3 i200 Gluc2 4 i201 GLUC injured mean202 Gluc2 2 c203 Gluc2 3 c204 Gluc2 4 c205 GLUC control mean206 Lact2 2 i207 Lact2 3 i208 Lact2 4 i209 LAC injured mean210 Lact2 2 c211 Lact2 3 c

22

Page 23: Simultaneous Measurement Imputation and Outcome Prediction ...proceedings.mlr.press/v106/hamesse19a/hamesse19a.pdf · work to impute missing data entries and predict rehabilitation

Simultaneous Measurement Imputation and Outcome Prediction for ATR

212 Lact2 4 c213 LAC control mean214 Pyr2 2 i215 Pyr2 3 i216 Pyr2 4 i217 PYR injured mean218 Pyr2 2 c219 Pyr2 3 c220 Pyr2 4 c221 PYR control mean222 Glyc2 2 i223 Glyc2 3 i224 Glyc2 4 i225 GLY injured mean226 Glyc2 2 c227 Glyc2 3 c228 Glyc2 4 c229 GLY control mean230 Glut2 2 i231 Glut2 3 i232 Glut2 4 i233 GLUT injured mean234 Glut2 2 c235 Glut2 3 c236 Glut2 4 c237 GLUT control mean238 Lac2 Pyr2 ratio 2 i239 Lac2 Pyr2 ratio 3 i240 Lac2 Pyr2 ratio 4 i241 LAC2 PYR2 ratio injured mean242 Lac2 Pyr2 ratio 2 c243 Lac2 Pyr2 ratio 3 c244 Lac2 Pyr2 ratio 4 c245 LAC2 PYR2 ratio control mean246 PINP injured247 PIIINP injured248 Bradford injured249 PINP normalized Injured250 PIIINP normalized injured251 PINP uninjured252 PIIINP uninjured253 Bradford uninjured254 PINP normalized Uninjured

23

Page 24: Simultaneous Measurement Imputation and Outcome Prediction ...proceedings.mlr.press/v106/hamesse19a/hamesse19a.pdf · work to impute missing data entries and predict rehabilitation

Simultaneous Measurement Imputation and Outcome Prediction for ATR

255 PIIINP normalized uninjured256 Collagen257 Glut 2 inj values258 P ratio inj259 DIC PIIINP260 DIC PIIINP 3261 P ratio uninj262 FIL OP STUDY263 Gly inv264 RF injured265 RF uninjured266 BZ injured267 BZ uninjured268 MF injured269 MF uninjured270 T RF injured271 T RF uninjured272 T MF injured273 T MF uninjured274 T HR injured275 T HR uninjured276 Ratio MF RF injured277 Ratio MF RF uninjured278 B1 D66279 Sthlm gbg280 stepsxload day1281 stepsxload day2282 stepsxload day3283 stepsxload day4284 stepsxload day5285 stepsxload day6286 stepsxload day7287 stepsxload day8288 stepsxload day9289 stepsxload day10290 stepsxload day11291 stepsxload day12292 stepsxload day13293 stepsxload day14294 stepsxload day15295 stepsxload day16296 INC A42

24

Page 25: Simultaneous Measurement Imputation and Outcome Prediction ...proceedings.mlr.press/v106/hamesse19a/hamesse19a.pdf · work to impute missing data entries and predict rehabilitation

Simultaneous Measurement Imputation and Outcome Prediction for ATR

A.2. Scores

0 DIC TTS by valid ATRS80 median1 Control 1yr2 Heel rise average height injured 6mo3 Heel rise average height control 6mo4 di↵erence heel raise 6mo5 Heel rise average height injured 1yr6 Heel rise average height control 1yr7 di↵erence heel raise 1yr8 ATRS 3 Strenght9 ATRS 3 tired10 ATRS 3 sti↵11 ATRS 3 pain12 ATRS 3 ADL13 ATRS 3 Surface14 ATRS 3 stairs15 ATRS 3 run16 ATRS 3 jump17 ATRS 3 phys18 ATRS 3 Sum19 ATRS item1 6month20 ATRS item2 6month21 ATRS item3 6month22 ATRS item4 6month23 ATRS item5 6month24 ATRS item6 6month25 ATRS item7 6month26 ATRS item8 6month27 ATRS item9 6month28 ATRS item10 6month29 ATRS total score 6month30 ATRS 12 strength31 ATRS 12 tired32 ATRS 12 sti↵33 ATRS 12 pain34 ATRS 12 ADL35 ATRS 12 Surface36 ATRS 12 stairs37 ATRS 12 run38 ATRS 12 jump39 ATRS 12 phys40 ATRS 12m41 valid ATRS 12m

25

Page 26: Simultaneous Measurement Imputation and Outcome Prediction ...proceedings.mlr.press/v106/hamesse19a/hamesse19a.pdf · work to impute missing data entries and predict rehabilitation

Simultaneous Measurement Imputation and Outcome Prediction for ATR

42 ATRS 2cl43 FAOS 3 Pain44 FAOS 3 Symptom45 FAOS 3 ADL46 FAOS 3 sport rec47 FAOS 3 QOL48 FAOS 6 Symptom49 FAOS 6 Pain50 FAOS 6 ADL51 FAOS 6 Sport Rec52 FAOS 6 QOL53 FAOS 12 Pain54 FAOS 12 Symptom55 FAOS 12 ADL56 FAOS 12 Sport Rec57 FAOS 12 QOL58 ATRS 12 pain log59 ATRS 12 ADL log60 ATRS 12 surface log61 ATRS 12 phys log62 FAOS 12 pain log

26