evaluacion mixto no lineal

13
computer methods and programs in biomedicine 90 ( 2 0 0 8 ) 154–166 journal homepage: www.intl.elsevierhealth.com/journals/cmpb Computing normalised prediction distribution errors to evaluate nonlinear mixed-effect models: The npde add-on package for R Emmanuelle Comets a,b,, Karl Brendel c , France Mentr´ e a,b,d a INSERM, U738, Paris, France b Universit´ e Paris 7, UFR de M´ edecine, Paris, France c Institut de Recherches Internationales Servier, Courbevoie, France d AP-HP, Hˆ opital Bichat, UF de Biostatistiques, Paris, France article info Article history: Received 3 August 2007 Received in revised form 14 November 2007 Accepted 3 December 2007 Keywords: Population pharmacokinetics Population pharmacodynamics Model evaluation Nonlinear mixed-effects model npde abstract Pharmacokinetic/pharmacodynamic data are often analysed using nonlinear mixed-effect models, and model evaluation should be an important part of the analysis. Recently, nor- malised prediction distribution errors (npde) have been proposed as a model evaluation tool. In this paper, we describe an add-on package for the open source statistical package R, designed to compute npde. npde take into account the full predictive distribution of each individual observation and handle multiple observations within subjects. Under the null hypothesis that the model under scrutiny describes the validation dataset, npde should fol- low the standard normal distribution. Simulations need to be performed before hand, using for example the software used for model estimation. We illustrate the use of the package with two simulated datasets, one under the true model and one with different parame- ter values, to show how npde can be used to evaluate models. Model estimation and data simulation were performed using NONMEM version 5.1. © 2007 Elsevier Ireland Ltd. All rights reserved. 1. Introduction The analysis of longitudinal data is prominent in pharma- cokinetic (PK) and pharmacodynamic (PD) studies, especially during drug development [1]. Nonlinear mixed-effect models are increasingly used as they are able to represent complex nonlinear processes and to describe both between and within subject variability. The evaluation of these models is gaining importance as the field of their application widens, ranging from dosage recommendation to clinical trial simulations [2]. Following the definition of Yano et al. [2]: “the goal of model evaluation is objective assessment of the predictive ability of Corresponding author at: INSERM U738, Universit ´ e Paris 7, UFR de M´ edecine, site Bichat, 16 rue Henri Huchard, 75 018 Paris, France. Tel.: +33 1 44 85 62 77; fax: +33 1 44 85 62 80. E-mail address: [email protected] (E. Comets). a model for domain-specific quantities of interest, or to deter- mine whether the model deficiencies (the final model is never the ‘true model’) have a noticeable effect in substantive infer- ences”. Despite the recommendations of drug agencies [3,4] stress- ing the importance of model evaluation, a recent survey based on all published PK and/or PD analyses over the period of 2002–2004 shows that it is infrequently reported and often inadequately performed [5]. One possible explanation is the lack of consensus concerning a proper evaluation method. Following the development of linearisation-based approaches for the estimation of parameters in nonlinear mixed-effect models, standardised prediction errors [6] have been widely 0169-2607/$ – see front matter © 2007 Elsevier Ireland Ltd. All rights reserved. doi:10.1016/j.cmpb.2007.12.002

description

mixto no lineal

Transcript of evaluacion mixto no lineal

Page 1: evaluacion mixto no lineal

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 9 0 ( 2 0 0 8 ) 154–166

journa l homepage: www. int l .e lsev ierhea l th .com/ journa ls /cmpb

Computing normalised prediction distribution errorsto evaluate nonlinear mixed-effect models:The npde add-on package for R

Emmanuelle Cometsa,b,∗, Karl Brendel c, France Mentre a,b,d

a INSERM, U738, Paris, Franceb Universite Paris 7, UFR de Medecine, Paris, Francec Institut de Recherches Internationales Servier, Courbevoie, Franced AP-HP, Hopital Bichat, UF de Biostatistiques, Paris, France

a r t i c l e i n f o

Article history:

Received 3 August 2007

Received in revised form

14 November 2007

Accepted 3 December 2007

Keywords:

a b s t r a c t

Pharmacokinetic/pharmacodynamic data are often analysed using nonlinear mixed-effect

models, and model evaluation should be an important part of the analysis. Recently, nor-

malised prediction distribution errors (npde) have been proposed as a model evaluation

tool. In this paper, we describe an add-on package for the open source statistical package R,

designed to compute npde. npde take into account the full predictive distribution of each

individual observation and handle multiple observations within subjects. Under the null

hypothesis that the model under scrutiny describes the validation dataset, npde should fol-

Population pharmacokinetics

Population pharmacodynamics

Model evaluation

Nonlinear mixed-effects model

npde

low the standard normal distribution. Simulations need to be performed before hand, using

for example the software used for model estimation. We illustrate the use of the package

with two simulated datasets, one under the true model and one with different parame-

ter values, to show how npde can be used to evaluate models. Model estimation and data

simulation were performed using NONMEM version 5.1.

1. Introduction

The analysis of longitudinal data is prominent in pharma-cokinetic (PK) and pharmacodynamic (PD) studies, especiallyduring drug development [1]. Nonlinear mixed-effect modelsare increasingly used as they are able to represent complexnonlinear processes and to describe both between and withinsubject variability. The evaluation of these models is gainingimportance as the field of their application widens, ranging

from dosage recommendation to clinical trial simulations [2].Following the definition of Yano et al. [2]: “the goal of modelevaluation is objective assessment of the predictive ability of

∗ Corresponding author at: INSERM U738, Universite Paris 7, UFR de MeTel.: +33 1 44 85 62 77; fax: +33 1 44 85 62 80.

E-mail address: [email protected] (E. Comets).0169-2607/$ – see front matter © 2007 Elsevier Ireland Ltd. All rights resdoi:10.1016/j.cmpb.2007.12.002

© 2007 Elsevier Ireland Ltd. All rights reserved.

a model for domain-specific quantities of interest, or to deter-mine whether the model deficiencies (the final model is neverthe ‘true model’) have a noticeable effect in substantive infer-ences”.

Despite the recommendations of drug agencies [3,4] stress-ing the importance of model evaluation, a recent survey basedon all published PK and/or PD analyses over the period of2002–2004 shows that it is infrequently reported and ofteninadequately performed [5]. One possible explanation is the

decine, site Bichat, 16 rue Henri Huchard, 75 018 Paris, France.

lack of consensus concerning a proper evaluation method.Following the development of linearisation-based approachesfor the estimation of parameters in nonlinear mixed-effectmodels, standardised prediction errors [6] have been widely

erved.

Page 2: evaluacion mixto no lineal

i n b

ucywstcnd(phdoptcvtbbnPaCddEMhurwndtedtd

gfctamtufuto

SastNt

c o m p u t e r m e t h o d s a n d p r o g r a m s

sed as diagnostic tools, not the least because they wereomputed in the main software used in population PKPD anal-ses, NONMEM[7], where they are reported under the nameeighted residuals (WRES). However, because of the lineari-

ation involved in their computation there is no adequateest statistic. In 1998, Mesnil et al. proposed prediction dis-repancies, which were easily computed due to the discreteature of the non-parametric distribution estimated, to vali-ate a PK model for mizolastine [8]. Prediction discrepancies

pd) are defined as the percentile of an observation in theredictive distribution for that observation, under the nullypothesis (H0) that the model under scrutiny adequatelyescribes a validation dataset. The predictive distribution isbtained assuming the posterior distribution of the estimatedarameters by maximum likelihood estimation, disregardinghe estimation error (the so-called plug-in approach [9]). Byonstruction pd follow a uniform distribution over [0,1], pro-iding a test. In the Bayesian literature this idea of usinghe whole predictive distribution for model evaluation haseen proposed by Gelfand et al. [10] and is also discussedy Gelman et al. [11]. Yano et al. extended this notion in aon-Bayesian framework, proposing the approach known asosterior Predictive Check (PPC) [2], while Holford advocatedmore visual approach under the name Visual Predictive

heck (VPC) [12]. Mentre and Escolano [13] discuss how pre-iction discrepancies relate to one of the three forms of PPCescribed by Yano. For non-discrete distributions, Mentre andscolano proposed to compute prediction discrepancies byonte–Carlo integration [13,14]. In their original version, pdowever did not take into account the fact that subjectssually contributes several measurements which induces cor-elations between pd, leading to increased type I error. Thisas improved in a further work, and the uncorrelated andormalised version of pd was termed normalised predictionistribution errors (npde) [15]. npde have better propertieshan WRES, and can also be used to evaluate covariate mod-ls [16]. They can be used for internal or external evaluation,epending on whether they are computed on the dataset usedo build the model (internal evaluation) or on an externalataset.

The computation of the npde however requires some pro-ramming. We therefore developed an add-on package, npde,or R, the open source language and environment for statisti-al computing and graphics [17], to enable easy computation ofhe npde [18]. Other packages such as Xpose [19], for diagnosticnd exploration, and PFIM [20,21], for the evaluation and opti-isation of population designs, have been developed in R for

he analysis of population PK and/or PD studies. Xpose is veryseful as an aid for model assessment and run management

or studies performed with the NONMEM software [7], widelysed in this field but with next to no plotting capabilities, sohat R was a good choice of language for the implementationf npde.

In Section 2 , we briefly recall how npde are computed. Inection 3 we describe the main features and usage of the pack-ge. In Section 4 we illustrate the use of the package with two

imulated examples. The examples are simulated based onhe well known dataset theophylline, available both in R andONMEM: the first (Vtrue) is simulated with the model used for

he evaluation, while the second (Vfalse) is simulated assum-

i o m e d i c i n e 9 0 ( 2 0 0 8 ) 154–166 155

ing a different set of parameters, and we show how npde canbe used to reject the model for Vfalse but not for Vtrue.

2. Computational method and theory

2.1. Models and notations

Let B denotes a building (or learning) dataset and V a validationdataset (V can be the same as B for internal evaluation). B isused to build a population model called MB. Evaluation meth-ods compare the predictions obtained by MB, using the designof V, to the observations in V. V can be the learning datasetB (internal evaluation) or a different dataset (external evalu-ation). The null hypothesis (H0) is that data in the validationdataset V can be described by model MB.

Let i denote the ith individual (i = 1, . . . , N) and j the jth mea-surement in an individual (j = 1, . . . , ni, where ni is the numberof observations for subject i). Let ntot denote the total numberof observations (ntot =

∑ini). Let Yi be the ni-vector of obser-

vations observed in individual i. Let the function f denote thenonlinear structural model. f can represent for instance the PKmodel. The statistical model for the observation yij in patienti at time tij, is given by:

yij = f (tij, �i) + �ij (1)

where �i is the vector of the individual parameters and �ij isthe residual error, which is assumed to be normal, with zeromean. The variance of �ij may depend on the predicted con-centrations f (tij, �i) through a (known) variance model. Let �

denote the vector of unknown parameters of this variancemodel.

In PKPD studies for instance, it is frequently assumed thatthe variance of the error follows a combined error model:

var(�ij) = �2inter + �2

slope f (tij, �i)2 (2)

where �inter and �slope are two parameters characterising thevariance. In this case, � = (�inter, �slope)′. This combined vari-ance model covers the case of an homoscedastic variance errormodel, where �slope = 0, and the case of a constant coefficientof variation error model when �inter = 0.

Another usual assumption in PKPD analyses is that thedistribution of the individual parameters �i follows a normaldistribution, or a log-normal distribution, as in:

�i = h(�, Xi) e�i (3)

where � is the population vector of the parameters, Xi a vectorof covariates, h is a function giving the expected value of theparameters depending on the covariates, and �i represents thevector of random effects in individual i. �i usually follows a nor-mal distribution N(0, ˝), where ˝ is the variance-covariancematrix of the random effects, but other parametric or non-parametric assumptions can be used for the distribution of

the random effects, as in the first paper proposing predictiondiscrepancies in the context of non-parametric estimation [8].Although npde were developed in the area of PK and PD analy-ses, they are a general way of evaluating mixed-effect models
Page 3: evaluacion mixto no lineal

s i n

we expect to see no trend if H0 is true. For the last plot,

156 c o m p u t e r m e t h o d s a n d p r o g r a m

and require only observations and corresponding predicteddistributions.

We denote P the vector of population parameters (alsocalled hyperparameters) estimated using the data in the learn-ing dataset B: P = (�′,vect(˝)′, �′)′, where vect(˝) is the vectorof unknown values in ˝. Model MB is defined by its structureand by the hyperparameters PB estimated from the learningdataset B.

2.2. Definition and computation of npde

Let Fij denote the cumulative distribution function (cdf) of thepredictive distribution of Yij under model MB. We define theprediction discrepancy pdij as the value of Fij at observationyij, Fij(yij). Fij can be computed using Monte–Carlo simulations.

Using the design of the validation dataset V, we simu-late under model MB K datasets Vsim(k)(k = 1, . . . , K). Let Ysim(k)

i

denote the vector of simulated observations for the ith subjectin the k th simulation.

pdij is computed as the percentile of yij in the empirical

distribution of the ysim(k)ij

:

pdij = Fij(yij) ≈ 1K

K∑k=1

ıijk (4)

where ıijk = 1 if ysim(k)ij

< yij and 0 otherwise.By construction, prediction discrepancies (pd) are expected

to follow U(0, 1), but only in the case of one observation persubject; within-subject correlations introduced when multipleobservations are available for each subject induce an increasein the type I error of the test [13]. To correct for this corre-lation, we compute the empirical mean E(Yi) and empiricalvariance-covariance matrix var(Yi) over the K simulations. Theempirical mean is obtained as:

E(Yi) = 1K

K∑i=1

Ysim(k)i

and the empirical variance is:

var(Yi) = 1K − 1

K∑i=1

(Ysim(k)i

− E(Ysim(k)i

))(Ysim(k)i

− E(Ysim(k)i

))′

We use the var function from R to provide unbiased estimatesof var(Yi).

Decorrelation is performed simultaneously for simulateddata:

Ysim(k)∗i

= var(Yi)−1/2(Ysim(k)

i− E(Yi))

and for observed data:

Y∗i = var(Yi)

−1/2(Yi − E(Yi))

Decorrelated pd are then obtained using the same formulaas in (4) but with the decorrelated data, and we call the result-ing variables prediction distribution errors (pde):

pdeij = F∗ij(y

∗ij) ≈ 1

K

K∑k=1

ı∗ijk (5)

where ı∗ijk

= 1 if ysim(k)∗ij

< y∗ij

and 0 otherwise.

b i o m e d i c i n e 9 0 ( 2 0 0 8 ) 154–166

Sometimes, it can happen that some observations lie eitherbelow or above all the simulated data corresponding to thatobservation. In this case, we define the corresponding pdeij

as:

pdeij ={

1/K if yij < ysim(k)ij

∀k

1 − 1/K if yij > ysim(k)ij

∀k(6)

Under H0, if K is large enough, the distribution of the predic-tion distribution errors should follow a uniform distributionover the interval [0,1] by construction of the cdf. Normalisedprediction distribution errors can then be obtained using theinverse function of the normal cumulative density functionimplemented in most software:

npdeij = ˚−1(pdeij) (7)

By construction, if H0 is true, npde follow the N(0, 1) distribu-tion without any approximation and are uncorrelated withinan individual.

2.3. Tests and graphs

Under the null hypothesis that model MB describes ade-quately the data in the validation dataset, the npde follow theN(0, 1) distribution. We use three tests to test this assump-tion: (i) a Wilcoxon signed rank test, to test whether themean is significantly different from 0; (ii) a Fisher test forvariance, to test whether the variance is significantly dif-ferent from 1; (iii) a Shapiro–Wilks test, to test whether thedistribution is significantly different from a normal distribu-tion. The package also reports a global test, which consistsin considering the three tests above with a Bonferroni cor-rection. The p-value for this global test is then reported asthe minimum of the three p-values multiplied by 3 (or 1 ifthis value is larger than 1) [22]. Before these tests are per-formed, we report the first three central moments of thedistribution of the npde: mean, variance, skewness, as well asthe kurtosis, where we define kurtosis as the fourth momentminus three so that the kurtosis for N(0, 1) is 0 (sometimescalled excess kurtosis). The expected value of these fourvariables for N(0, 1) are respectively 0, 1, 0 and 0. We alsogive the standard errors for the mean (S.E. = s/

√ntot) and

variance (S.E. = s2√

2/(ntot − 1)) (where s is the empirical vari-ance).

Graphs can be used to visualise the shape of the dis-tribution of the npde. The following graphs are plotted bydefault: (i) QQ-plot of the npde (the line of identity is over-laid, and the npde are expected to fall along this line) (ii)histogram of the npde (the density line of the expectedN(0, 1) is overlaid to show the expected shape), scatterplotsof (iii) npde versus X and (iv) npde versus predicted Y, where

the package computes for each observation the predicted Yas the empirical mean over the k simulations of the sim-ulated predicted distribution (denoted E(ysim(k)

ij)), which is

reported under the name ypred along with the npde and/orpd.

Page 4: evaluacion mixto no lineal

i n b

3

3

TtRChog

Fu

c o m p u t e r m e t h o d s a n d p r o g r a m s

. Program description

.1. Overview

he program is distributed as a add-on package or library forhe free statistical software R. A guide for the installation of

and add-on packages such as npde can be found on the

RAN (Comprehensive R Archive Network) at the following url:ttp://cran.r-project.org/. R is available free of charge and runsn all operating systems, which made it a very convenient lan-uage for the development of npde. The package requires only

ig. 1 – Function hierarchy for the npde library, and brief descripser call to npde. With autonpde, the hierarchy is the same save

i o m e d i c i n e 9 0 ( 2 0 0 8 ) 154–166 157

observed and simulated data to compute the npde, and doesnot use the model itself.

The npde library contains 14 functions. Fig. 1 presentsthe functions hierarchy starting with function npde. A simi-lar graph is obtained with function autonpde without the callto function pdemenu.

An additional function (plotpd) can be called directly by theuser to plot diagnostic graphs involving the prediction discrep-

ancies instead of the npde, and is therefore not representedon the graph. The functions for skewness and kurtosis weremodified from the two functions of the same name proposedin the e1071 package for R [23].

tion of each function. The functional hierarchy is given for afor the initial call to pdemenu.

Page 5: evaluacion mixto no lineal

s i n

158 c o m p u t e r m e t h o d s a n d p r o g r a m

The methods described in Section 2 are implemented asfollows. Observed and simulated data are read in two matrices.For each subject, the empirical mean and variance of the simu-lated data are computed using the Rfunctions mean, apply andcov. The inverse square root of the variance matrix is obtainedby the Cholesky decomposition using the functions chol andsolve. The remaining computations involve matrix and vec-tor multiplications. All these functions are available in the Rprogram.

The documentation contains the simulated examplesvtrue.dat and vfalse.dat, as well as the original data file andthe control files used for estimation and simulation. The sim-ulated data simdata.dat used to compute the npde for bothsimulated datasets can be downloaded from the website.

3.2. Preparing the input

The package needs two files: the file containing the datasetto be evaluated (hereafter named ‘observed data’) and thefile containing the simulations (hereafter named ‘simulateddata’). The package does not perform the simulations. R, NON-MEM [7], MONOLIX [24] or another program can be used forthat purpose, and the two following files should be preparedbeforehand.

Observed data: the observed data file must contain at leastthe following three columns: id (patient identification), xobs(design variable such as time, X, . . .), yobs (observations suchas DV, concentrations, effects. . .). An additional column maybe present in the dataset to indicate missing data (MDV). Inthis case, this column should contain values of 1 to indicatemissing data and 0 to indicate observed data (as in NONMEMor MONOLIX). Alternatively, missing data can be coded usinga dot (‘·’) or the character string NA directly in the columncontaining yobs. The computation of the npde will removemissing observations.

Other columns may be present but will not be used by thelibrary. The actual order of the columns is unimportant, sincethe user may specify which column contain the requestedinformation, but the default order is 1 = id, 2 = xobs, 3 = yobsand no MDV column. A file header may be present, and col-umn separators should be one of: blank space(s), tabulationmark, comma (,) or semi-colon (;).

Simulated data: the simulated data file should contain theK simulated datasets stacked one after the other. Within eachsimulated dataset, the order of the observations must be thesame as within the observed dataset. The dimensions of thetwo datasets must be compatible: if nobs is the number oflines in the observed dataset, the file containing the simulateddatasets must have Knobs lines. The simulated data file maycontain a header but not repeated headers for each simulateddataset.

The simulated data file must contain at least threecolumns, in the following order: id (patient identification),xsim (independent variable), ysim (dependent variable). Thecolumn setup is fixed and cannot be changed by the user, con-trary to the observed data. Additional columns may be present

but will not be used by the package. The id column must beequal to K times the id column of the observed dataset, andthe xsim column must be equal to K times the xobs column ofthe observed dataset. If missing data is present in the observed

b i o m e d i c i n e 9 0 ( 2 0 0 8 ) 154–166

data, they should be present in the simulated datasets and thecorresponding lines will be removed for all simulated datasetsduring the computation.

Examples of a simulated and observed dataset are availablein the subdirectory doc/inst of the library.

BQL data: BQL (below the quantification limit LOQ) or oth-erwise censored data are currently not appropriately handledby npde. If a maximum likelihood estimation method takingcensored data into account has been used for the estima-tion, these data should be removed from the dataset or setto missing, using for example an MDV item, pending futureextensions of npde. On the other hand, if BQL data were setto LOQ or LOQ/2, they should remain in the dataset. npdewill likely detect model misspecification related to these data,and we suggest to remove times for which too many obser-vations are BQL before computing npde, since otherwise theymight bias the results of the tests. During the simulations,negative or BQL data may be simulated due to the errormodel. At present, these values should be kept as is becausethe decorrelation step requires the whole predictive distribu-tion. A transform both sides approach or the use of a doubleexponential model can be used to avoid simulating negativeconcentrations but this will not solve the BQL problem.

3.3. Computing npde

The package provides a function called npde to enter an inter-active menu where the user is prompted to enter the namesof the files and the value of the different parameters requiredto compute npde. The menu is self-explanatory, and helppages are provided to understand the meaning of the differ-ent parameters. Fig. 3 shows an example of using this function(text entered by the user is shown in bold grey). The examplewill be detailed in Section 4 . The package checks the namesthat are provided and prompts the user for a new name if thecorresponding file cannot be found.

Optionally, pd can also be computed. Although pd do nottake multiple observations into account [13], they are fasterto compute than npde and can be used to perform diagnos-tics of model deficiencies. Also, when computation of npdefails due to numerical difficulties, an error message is printedand pd are computed instead (with corresponding plots). Thisproblem can happen especially when model adequacy is verypoor.

3.4. Output

During execution, the function prints the results of the testsdescribed in methods (Section 2.3). An example of runningnpde can be found in Section 4 .

In addition to the output printed on screen, three addi-tional types of results are produced by default: first, an R objectcontaining several elements, including the npde and/or pd, isreturned as the value of the function; second, a graph file con-taining diagnostic plots of the npde is shown in the graphicwindow and saved in a file; third, the results are saved to a

text file. Options are available so that the numerical resultsand graphs are not saved on disk, and so that the functionreturns nothing. Let us now discuss these three outputs inmore detail.
Page 6: evaluacion mixto no lineal

i n b

mwa

d(opa

ipfidP.

cYt

lTiccp

3

TriToaRbmo

d

c o m p u t e r m e t h o d s a n d p r o g r a m s

The object returned by the function contains seven ele-ents: (i) a data frame obsdat containing the observed data,ith the following elements: id (patient ID), xobs (observed X)

nd yobs (observed Y); (ii) ydobs: the decorrelated observed

ata y∗ij; (iii) ydsim: the decorrelated simulated data y

sim(k)∗ij

;iv) ypred: the predicted value. (v) xerr: an integer (0 if no errorccurred during the computation); (vi) npde: the normalisedrediction distribution errors; (vii) pd: the prediction discrep-ncies.

A graphic R window appears after the computation is fin-shed, containing the four plots detailed in Section 2.3 . Theselots are saved to disk (unless boolsave=F). The name of thele is given by the user (see Fig. 3), and an extension is addedepending on the format of the graph (one of: Postscript, JPEG,NG or PDF, corresponding to extensions .eps, .jpeg, .png and

pdf respectively).The results are saved in a text file with the following

olumns: id (patient ID), xobs (observed X), ypred (predicted), npde, pd. The name of the file is the same as the name ofhe file in which graphs are saved, with the extension .npde.

Sometimes the function is unable to compute the decorre-ated prediction distribution errors for one or more subjects.his failure occurs during the decorrelation step and a warn-

ng message is printed on screen. When npde cannot beomputed, the program computes automatically pd even if thealc.pd=F option was used. In this case, diagnostic graphs arelotted (see next section) but tests are not performed.

.5. Other functions of interest

he npde function can be used to interactively fill in theequired information. Alternatively, a function called autonpdes provided, in which this information can be set as arguments.his function requires two mandatory arguments: the namef the observed data file (or the name of the R dataframe);nd the name of the simulated data file (or the name of thedataframe). A number of additional optional arguments can

e used to control message printing and output. These argu-

ents and their significance are given in Table 1. An example

f a call to autonpde is given in Section 4 .A function called plotnpde can be used to plot the graphs

escribed in the previous section. The arguments for this func-

Table 1 – Options available for the autonpde function

Option Effect

iid Column with ID in the observed data fileix Column with X in the observed data fileiy Column with Y in the observed data fileimdv Column with MDV in the observed data filenamsav Name of the file where results will be saved (withoboolsave Whether results should be saved to disktype.graph Graph format (one of PDF, postscript, JPEG or PNG)output Whether the function returns the resultsverbose Whether a message should be printed as the compcalc.npde Whether normalised prediction distribution errorscalc.pd Whether prediction discrepancies should be comp

a JPEG and PNG format are only available if the version of R used has beenuser selects JPEG or PNG format, the program will automatically switch t

i o m e d i c i n e 9 0 ( 2 0 0 8 ) 154–166 159

tion are the observed X, the npde and the predicted Y (ypred).The function plotnpde is called by autonpde and npde. A simi-lar function, plotpd, can be used to plot diagnostic plots for thepd. These include a QQ-plot of pd versus the expected uniformU(0, 1) distribution, a histogram of the pd, and scatterplots ofpd versus X and versus ypred.

The tests described in the previous section for npde can beperformed using the function testnpde (called by autonpde andnpde). This function requires only the npde as argument.

4. Illustrative example

4.1. Data

To illustrate the use of the package, we simulated data basedon the well known toy dataset recording the pharmacoki-netics of the anti-asthmatic drug theophylline. The datawere collected by Upton in 12 subjects given a single oraldose of theophylline who then contributed 11 blood sam-ples over a period of 25 h [7]. We removed the data attime zero from the dataset, and applied a one-compartmentmodel with first-order absorption and elimination, as previ-ously proposed [25]. The variability was modelled using anexponential model for the interindividual variability and acombined error model for the residual variability. The modelwas parameterised in absorption rate constant ka (h−1) vol-ume of distribution V (L) and elimination rate constant k(h−1) and did not include covariates. Interindividual vari-ability was modelled using an exponential model for thethree PK parameters. A correlation between the parametersk and V was assumed (cor(�k, �V )). Using NONMEM (ver-sion 5.1) with the FOCE INTERACTION estimation method,we obtained the parameter estimates reported in Table 2.This model and these parameter estimates correspond toMB.

As in [15], we then simulated two external validationdatasets, with the design of the real dataset: V was simu-

true

lated under MB(H0), using the parameters reported in Table 2,while Vfalse (H1) was simulated assuming a bioavailabilitydivided by 2 (so that V /F is multiplied by 2). These datasets arestored in two files called respectively vtrue.dat and vfalse.dat.

Default value

1230 (none)

ut extension) outputT

a eps (postscript)T

utation of npde begins in a new subject Fshould be computed T

uted F

built to enable JPEG and PNG output. If this is not the case, and theo PDF and a warning will be printed.

Page 7: evaluacion mixto no lineal

160 c o m p u t e r m e t h o d s a n d p r o g r a m s i n

Table 2 – Parameter estimates for the theophyllineconcentration dataset

Population mean Interindividual variability

ka (h−1) 1.51 ωka (−) 0.67V (L) 0.46 ωV (−) 0.12k (h−1) 0.087 ωk (−) 0.13�inter (mg L−1) 0.088 cor(�k, �V ) (−) 0.99�slope (−) 0.26

A one-compartment model was used, parameterised with theabsorption rate constant ka, the volume of distribution V, andthe elimination rate constant k. A correlation between V and k

(cor(�k, �V )) was estimated along with the standard deviations ofthe three parameters. The model for the variance of the residualerror is given in Eq. (2).

Fig. 2 shows plots of the concentration versus time profiles forthe two datasets.

4.2. Simulation setup

The K simulations under MB, needed to compute the npde,were also performed using NONMEM. The control file used forthe estimation was modified to set the values of the parame-ters (PK parameters, variability and error model) to the valuesin Table 2, and the number of simulations was set to K = 2000.The simulated data were saved to a file called simdata.dat.

The simulated data describes the predicted distribution forMB, so we use it to compute the npde for both Vtrue and Vfalse.

4.3. Computing npde for Vtrue

The function npde was used to compute the npde for the sim-

ulated dataset Vtrue, and the results were redirected to the Robject myres.true with the following command:

myres.true<-npde()

Fig. 2 – Concentration versus time data for the two

b i o m e d i c i n e 9 0 ( 2 0 0 8 ) 154–166

Fig. 3 shows the set of questions (in black) answered by theuser (in grey).

Fig. 4 shows the output printed on screen. The first fourcentral moments of the distribution of the npde are first given;here they are close to the expected values for N(0, 1), that is,0 for the mean, skewness and (excess) kurtosis and 1 for thevariance. Then, the three tests for mean, variance and normal-ity, as well as the adjusted p-value for the global test, is given.Here, none of the tests are significant. Fig. 5 shows the graphsplotted for npde. The upper left graph is a quantile–quantileplot comparing the distribution of the npde to the theoreticalN(0, 1) distribution, and the upper right graph is the histogramof the npde with the density of N(0, 1) overlayed. Both graphsshow that the normality assumption is not rejected. In the twolower graphs, npde are plotted against respectively time (theindependent variable X) and predicted concentrations (pre-dicted Y). These two graphs do not show any trend within thenpde.

4.4. Computing npde for Vfalse

We now use the autonpde function to compute the npde for thesecond dataset, Vfalse, setting the parameters as arguments tothe function with the following command:

myres.false<-autonpde(“vfalse.dat”,“simdata.

dat”,1,3,4,namesav=“vfalse”,calc.pd=T)

Fig. 6 shows the output printed on screen and Fig. 7 showsthe corresponding graphs. The graphs and the Shapiro–Wilkstest show that the normality assumption itself is not rejected,but the test of the mean and variance indicate that the distri-bution is shifted (mean −0.45) and has an increased variance

(standard deviation 1.3) compared to N(0, 1). The scatterplotsin the lower part of Fig. 7 also shows a clear pattern, with pos-itive npde for low concentrations and negative npde for highconcentrations, reflecting model misfit.

simulated datasets Vtrue (left) and Vfalse (right).

Page 8: evaluacion mixto no lineal

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 9 0 ( 2 0 0 8 ) 154–166 161

Fig. 3 – Example of a call to the function npde, where user input is shown in bold grey.

Fig. 4 – Output of the function npde applied to dataset Vtrue.

4.5. Influence of the number of simulations

A full study of the choice of the number of simulations(K), that should be performed with respect to the size ofV, is beyond the scope of this paper. However, to assessthe influence of the number of simulations on the results,we performed a small simulation study. Because the com-putation of the npde can be time-consuming, we simulateddesigns where all subjects have the same sampling timesand the same dose, and thus the same predicted distribu-tion. The dose chosen was the median dose received by theactual patients (4.5 mg) and the 10 times were close to thoseobserved (t = {0.25, 0.5, 1, 2, 3.5, 5, 7, 9, 12, 24}). Then, we sim-ulated the predicted distribution with K simulations (K in{100, 200, 500, 1000, 2000, 5000}) and used the same var(Yi) andE(Yi) to decorrelate the vectors of observations for each simu-lated subject. To assess the influence of K for different designs,we simulated four different datasets, with N = 12, 100, 250and 500 subjects respectively. One set of simulations wasperformed under H0 while the other set of simulations wasperformed under the same parameter assumptions as for

Vfalse.

Fig. 8 shows the base 10-logarithm of the p-values (log10(p))obtained for the global test, for the first set of simulations. Thethree tests (mean, variance, and normality) show the same

Page 9: evaluacion mixto no lineal

162 c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 9 0 ( 2 0 0 8 ) 154–166

Fig. 5 – Graphs plotted for Vtrue. Quantile–quantile plot of npde versus the expected standard normal distribution (upperleft). Histogram of npde with the density of the standard normal distribution overlayed (upper right). Scatterplot of npde

red (

versus observed X (lower left). Scatterplot of npde versus yp

qualitative behaviour (data not shown). Each graph representsone simulated data set under H0 for N = 12, 100, 250 and500 subjects respectively. In the graphs, we represent log10(p)because for large number of subjects, p-values become very

Fig. 6 – Output of the function npde applied to dataset Vfalse.

lower right).

small, and we jitterised the value of K by randomly adding anumber between −50 and 50, so that the points would not besuperimposed. In these graphs, we observe that small valuesof K are unreliable: for N = 100, the scatter stabilises aroundK = 1000, but for N = 250 K = 2000 appears to be necessary andeven larger values should be used for N = 500. When K is smalland N is large (here, N = 100), we do not simulate enough con-centrations to reliably describe the predicted distribution ofthe concentrations, and several observed concentrations maybe ascribed the same value of npde, so that the normalitytest in particular often fails. When K increases, the variabil-ity in the p-values decreases and the mean p-value stabilises,but large number of subjects require large values of K. Theprogram issues a warning when K is smaller than 1000, buteven that may not be sufficient when dealing with very largedatabases. In particular, we see that for 500 subjects with 10observations per subject, even K = 5000 may not be sufficient.Further work is needed to give more specific recommenda-tions.

The second set of simulations, under H1, is shown in Fig. 9for the datasets simulated with 12, 100 and 250 subjects. Forthe simulations with 500 subjects and some of the simulationswith 250 subjects, the p-value of the test was reported as 0 dueto the numerical approximation involved in the software so

we fixed an arbitrary cut-off of log10(p) = −150. The model isstrongly rejected regardless of the value of K and N. With ourchoice of model assumptions for Vfalse therefore the value ofK has little influence in rejecting the wrong model.
Page 10: evaluacion mixto no lineal

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 9 0 ( 2 0 0 8 ) 154–166 163

Fig. 7 – Graphs plotted for Vfalse. Quantile–quantile plot of npde versus the expected standard normal distribution (upperleft). Histogram of npde with the density of the standard normal distribution overlayed (upper right). Scatterplot of npdeversus observed X (lower left). Scatterplot of npde versus ypred (lower right).

Fig. 8 – Influence of the number of simulations (K) on the p-value, represented as log10(p), of the global test under H0, forfour simulated datasets with respectively 12, 100, 250 and 500 subjects. In each graph, the solid line represents the medianof the 10 simulations (×) performed for each value of K. A dotted line is plotted for y = log10(0.05).

Page 11: evaluacion mixto no lineal

164 c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 9 0 ( 2 0 0 8 ) 154–166

Fig. 9 – Influence of the number of simulations (K) on the p-value, represented as log10(p), of the global test under H1, forfour simulated datasets with respectively 12, 100, 250 and 500 subjects. In each graph, the solid line represents the median

tted

of the 10 simulations (×) performed for each value of K. A do

5. Concluding remarks

Model evaluation is an important part of modelling. Diag-nostic graphs are useful to diagnose potential problemswith models, and plots of (weighted) residuals versus inde-pendent variables or predicted concentrations are a majorpart of this diagnostic. Weighted residuals are computedusing the dataset used for model estimation (internal evalu-ation) whereas standardised prediction errors are computedusing a different dataset (external evaluation). The short-comings of standardised prediction errors however havebeen publicised when improved approaches based on sim-ulations were made possible by the increasing computerpower [13,16]. Conditional weighted residuals have been pro-posed recently [26] but still suffer from the approximationinvolved. More sophisticated approaches now use simu-lations to obtain the whole predictive distribution. Theyinclude visual predictive check (VPC), which complement tra-ditional diagnostic graphs and improve detection of modeldeficiencies [12], as well as normalised prediction distri-bution errors. npde do not involve any approximation ofthe model function and therefore have better properties[15].

Concerning the evaluation of the npde, the posterior dis-tribution of the parameters is assumed to be located only atthe maximum likelihood estimate without taking into accountthe estimation error; this plug-in approach was shown to

line is plotted for y = log10(0.05).

be equally efficient in a very simple pharmacokinetic set-ting [2]. Mentre and Escolano discuss the implications ofthis choice in more detail, noting in particular that thereare debates in the Bayesian literature as to whether theplug-in approach may not actually be preferable in practice[13]. A second practical limitation consists in using a lim-ited number of simulations to compute the npde. Based onthe results of the simulation study (Section 4.5), we rec-ommend to use at least K = 1000 but the actual numberdepends on the dataset involved, and should be increasedwhen the dataset includes a large number of subjects.This will be investigated in more details in future work onnpde.

Although the computation of npde is not difficult, it doesrequire some programming ability. With the package npdewe provide a tool to compute them easily, using the valida-tion dataset and data simulated under the null hypothesisthat model MB describes the validation dataset, with thedesign of the validation dataset. A global test is performedto check whether the shape, location and variance parame-ter of the distribution of the npde correspond to that of thetheoretical distribution. The tests based on npde have bet-ter properties than the tests based on pd [16], because ofthe decorrelation. However, the decorrelation does not make

the observations independent, except when the model is lin-ear so that the joint distribution of the Yi is normal. Fornonlinear models such as those under study, more work isnecessary to assess the statistical properties of the tests.
Page 12: evaluacion mixto no lineal

i n b

IcrdtmGmcai

ptIfcfagtnsmtNatdtotwp

acwtlttbthesefippfV

asd[lo

r

c o m p u t e r m e t h o d s a n d p r o g r a m s

n addition, the normality test appears very powerful, espe-ially when the datasets become large. When a model isejected, the QQ-plots and plots of npde versus time, pre-ictions or covariates should therefore also be consideredo assess whether, in spite of the significant p-value, the

odel may not be considered sufficient to describe the data.raphs of the pd should also be plotted when investigatingodel deficiencies, since the decorrelation involved in the

omputation of the npde may sometimes smooth the plotsnd mask model misfit, and pd can then offer additionalnsight.

To combine the three tests, the Bonferroni procedure wasreferred to the previously used Simes procedure based onhe result of a simulation study, in which we found the typeerror of the global test to be close to 5% when using a Bon-

erroni correction [16]. The global test with a Simes correctionan be easily computed using the p-values returned by theunction testnpde(). Default diagnostic graphs and diagnosticsre also plotted to check model adequacy. Other diagnosticraphs can be plotted, against covariates for instance, usinghe npde returned by the package. A current limitation ofpde concerns BQL concentrations, which the present ver-ion of npde does not handle properly. Recently, estimationethods that handle censored data by taking into account

heir contribution to the log-likelihood were implemented inONMEM [27] and MONOLIX [28], making them readily avail-ble to the general user. In the next extension to npde, weherefore plan to propose and implement a method to han-le BQL data for models using these estimation methods. Inhe meantime, we suggest to remove times for which too manybservations are BQL before computing npde, since otherwisehey might bias the results of the tests. A column specifyinghich concentrations should be removed can be used for thaturpose.

Simulations need to be performed before using the pack-ge. This was not thought to be problematic since simulationsan be performed easily with the most frequently used soft-are in the field, NONMEM, with a minimal modification of

he control file, or with MONOLIX, out of the box. The simu-ations involved in the computation of npdeare the same ashose needed to perform VPC [12]. There is however no clearest for VPC, although testing strategies have been evaluatedased on quantiles [29], and in addition multiple observa-ions per subject induce correlations in the VPC. On the otherand, npde have been decorrelated, and should follow thexpected standard normal distribution, thus providing a one-tep test of model adequacy. Another problem is that whenach subject has different doses and designs, it may be dif-cult to make sense of VPC. An unbalanced design is not aroblem with npde since simulations are used to obtain theredictive distribution for each observation using the designor each individual. This makes them a kind of normalisedPC.

As a recent review points out, npde should be considereds an addition to the usual diagnostic metrics, and, as for allimulation-based diagnostics, care must be taken to simulate

ata reproducing the design and feature of the observed data

30]. In particular, caution must be exercised when featuresike BQL or missing data, dropouts, poor treatment adherence,r adaptive designs are present.

i o m e d i c i n e 9 0 ( 2 0 0 8 ) 154–166 165

6. Availability

npde can be downloaded from http://www.npde.biostat.fr. Thedocumentation included in the package provides a detailedUser Guide as well as an example of simulation setup, con-taining the NONMEM estimation and simulation control files,the observed and simulated datasets.

npde is a package distributed under the terms of the GNUGENERAL PUBLIC LICENSE Version 2, June 1991.

Conflict of interest statement

Karl Brendel is an employee of Servier (Courbevoie, France).

Acknowledgment

The authors wish to thank Dr Saik Urien (INSERM, Paris) and PrNick Holford (University of Auckland, New Zealand) for helpfuldiscussions and suggestions.

e f e r e n c e s

[1] L. Aarons, M.O. Karlsson, F. Mentre, F. Rombout, J.L. Steimer,A. van Peer, C.B. Experts, Role of modelling and simulationin Phase I drug development, Eur. J. Pharm. Sci. 13 (2001)115–122.

[2] Y. Yano, S.L. Beal, L.B. Sheiner, Evaluatingpharmacokinetic/pharmacodynamic models using theposterior predictive check, J. Pharmacokinet. Pharmacodyn.28 (2001) 171–192.

[3] Food and Drug Administration, Guidance for Industry:Population Pharmacokinetics, FDA, Rockville, Maryland,USA, 1999.

[4] Committee for Medicinal Products for Human Use, EuropeanMedicines Agency, Draft Guideline on Reporting the Resultsof Population Pharmacokinetic Analyses, EMEA, 2006.

[5] K. Brendel, C. Dartois, E. Comets, A. Lemenuel-Diot, C.Laveille, B. Tranchand, P. Girard, C.M. Laffont, F. Mentre, Arepopulation pharmacokinetic and/or pharmacodynamicmodels adequately evaluated? A survey of the literaturefrom 2002 to 2004, Clin. Pharmacokinet. 46 (2007) 221–234.

[6] S. Vozeh, P. Maitre, D. Stanski, Evaluation of population(NONMEM) pharmacokinetic parameter estimates, J.Pharmacokinet. Biopharm. 18 (1990) 161–173.

[7] A. Boeckmann, L. Sheiner, S. Beal, NONMEM Version 5.1,University of California, NONMEM Project Group, SanFrancisco, 1998.

[8] F. Mesnil, F. Mentre, C. Dubruc, J. Thenot, A. Mallet,Population pharmacokinetics analysis of mizolastine andvalidation from sparse data on patients using thenonparametric maximum likelihood method, J.Pharmacokinet. Biopharm. 26 (1998) 133–161.

[9] M. Bayarri, J. Berger, P values for composite null models, J.Am. Statist. Assoc. 95 (2000) 1127–1142.

[10] A.E. Gelfand, Markov chain Monte Carlo in practice,Chapman & Hall, Boca Raton, 1996.

[11] A. Gelman, J.B. Carlin, H.S. Stern, D.B. Rubin, Bayesian Data

Analysis, Chapman & Hall, London, 1995.

[12] N. Holford, The Visual Predictive Check: superiority tostandard diagnostic (Rorschach) plots (Abstract 738), in: 14thMeeting of the Population Approach Group in Europe,Pamplona, Spain, 2005.

Page 13: evaluacion mixto no lineal

s i n

166 c o m p u t e r m e t h o d s a n d p r o g r a m

[13] F. Mentre, S. Escolano, Prediction discrepancies for theevaluation of nonlinear mixed-effects models, J.Pharmacokinet. Biopharm. 33 (2006) 345–367.

[14] F. Mentre, S. Escolano, Validation methods in populationpharmacokinetics: a new approach based on predictivedistributions with an evaluation by simulations, in: 10thMeeting of the Population Approach Group in Europe, Basel,Switzerland, 2001.

[15] K. Brendel, E. Comets, C. Laffont, C. Laveille, F. Mentre,Metrics for external model evaluation with an application tothe population pharmacokinetics of gliclazide, Pharm. Res.23 (2006) 2036–2049.

[16] K. Brendel, E. Comets, F. Mentre, Normalised predictiondistribution errors for the evaluation of nonlinearmixed-effects models (Abstract), in: 16th Meeting of thePopulation Approach Group in Europe, Copenhagen,Denmark, 2007.

[17] R Development Core Team, R: A Language and Environmentfor Statistical Computing, R Foundation for StatisticalComputing, Vienna, Austria, 2004.

[18] E. Comets, K. Brendel, F. Mentre, Normalised predictiondistribution errors in R: the npde library (Abstract 1120), in:16th Meeting of the Population Approach Group in Europe,Copenhagen, Denmark, 2007.

[19] E.N. Jonsson, M.O. Karlsson, Xpose – an S-PLUS based

population pharmacokinetic/pharmacodynamic modelbuilding aid for NONMEM, Comput. Methods ProgramsBiomed. 58 (1999) 51–64.

[20] S. Retout, S. Duffull, F. Mentre, Development andimplementation of the population Fisher information matrix

b i o m e d i c i n e 9 0 ( 2 0 0 8 ) 154–166

for the evaluation of population pharmacokinetic designs,Comput. Methods Programs Biomed. 65 (2001) 141–151.

[21] S. Retout, F. Mentre, Optimisation of individual andpopulation designs using Splus, J. Pharmacokinet.Pharmacodyn. 30 (2003) 417–443.

[22] S.P. Wright, Adjusted p-values for simultaneous inference,Biometrics 48 (1992) 1005–1013.

[23] E. Dimitriadou, K. Hornik, F. Leisch, D. Meyer, A. Weingessel,e1071: Misc Functions of the Department of Statistics(e1071), TU Wien (2006). R package version 1.5–16.

[24] M. Lavielle, MONOLIX (MOdeles NOn LIneaires a effetsmiXtes), MONOLIX group, Orsay, France, 2005.

[25] M. Davidian, D. Giltinan, Nonlinear Models for RepeatedMeasurement Data, Chapman & Hall, London, 1995.

[26] A. Hooker, C. Staatz, M. Karlsson, Conditional weightedresiduals (CWRES): a model diagnostic for the FOCE method,Pharm. Res. 24 (2007) 2187–2197.

[27] S. Beal, Ways to fit a pharmacokinetic model with some databelow the quantification limit, J. Pharmacokin.Pharmacodyn. 28 (2001) 481–504.

[28] A. Samson, M. Lavielle, F. Mentre, Extension of the SAEMalgorithm to left-censored data in nonlinear mixed-effectsmodel: application to HIV dynamics model, Comput. Stat.Data Anal. 51 (2006) 1562–1574.

[29] J. Wilkins, M. Karlsson, N. Jonsson, Patterns and power for

the visual predictive check (Abstract 1029), in: 15th Meetingof the Population Approach Group in Europe, Brugges,Belgium, 2006.

[30] M. Karlsson, R. Savic, Diagnosing model diagnostics, Clin.Pharmacol. Ther. 82 (2007) 17–20.