Validation of predictive regression models
Ewout W. Steyerberg, PhD
Clinical epidemiologist
Frank E. Harrell, PhD
Biostatistician
Personal background
Ewout Steyerberg: Erasmus MC, Rotterdam, the Netherlands
Frank Harrell: Health Evaluation Sciences,
Univ of Virginia, Charlottesville, VA, USA
“Validation of predictions from
regression models is of
paramount importance”
Learning objectives: knowledge of common types of regression models
fundamental assumptions of regression
models
performance criteria of predictive
models
principles of different types of validation
Performance objectives
To be able to explain why validation is
necessary for predictive models
To be able to judge the adequacy of a
validation procedure
Predictive models provide quantitative estimates of an outcome, e.g.
Quality of life one year after surgery
Death at 30 days after surgery
Long term survival
Predictive models are often based on regression analysis
y ~ a + sum(bi*xi)
y: outcome variable
a: intercept
bi: regression coefficient i
xi: predictor variable i
i in [1,many], usually 2 to 20
3 examples of regression
Quality of life one year after surgery:
continuous outcome, linear regression
Death at 30 days after surgery:
binary outcome, logistic regression
Long term survival:
time-to-outcome, Cox regression
Predictive models make assumptions
Distribution
Linearity of continuous variables
Additivity of effects
Example: a simple logistic regression model
30day mortality ~ a + b1*sex + b2*age
Assumptions:
Distribution of 30day mortality is binomial
Age has a linear effect
The effects of sex and age can be added
Assessing model assumptions
Examine model residuals
Perform specific tests
add nonlinear terms, e.g. age+age2
add interaction terms, e.g. sex*age
Model assumptions and predictionsBetter predictions if assumptions are met
Some violation inherent in empirical data
Evaluate predictions in new data
Evaluation of predictions
Calibration
average of predictions correct?
low and high predictions correct?
Discrimination
distinguish low risk from high risk
patients?
Example: predicted probabilities
0.0 0.1 0.2 0.3 0.4Predicted probability of 30-day mortality
0.0
0.1
0.2
0.3
0.4
Act
ual 3
0-da
y m
orta
lity
Area under ROC: 0.77Calibration: OK
3 types of validation
Apparent: performance on sample used to
develop model
Internal: performance on population
underlying the sample
External: performance on related but
slightly different population
Apparent validity
Easy to calculate
Results in optimistic performance
estimates
Apparent estimates optimistic since same data used for:
Definition of model structure:
e.g. selection and coding of variables
Estimation of model parameters:
e.g. regression coefficients
Evaluation of model performance:
e.g. calibration and discrimination
Internal validity
More difficult to calculate
Test model in new data, random from
underlying population
Why internal validation?
Honest estimate of performance should
be obtained, at least for a population
similar to the development sample
Internal validated performance sets an
upper limit to what may be expected in
other settings (external validity)
External validity
Moderately easy to calculate when new
data are available
Test model in new data, different from
development population
Why external validation?
Various factors may differ from
development population, including
different selection of patients
different definitions of variables
different diagnostic or therapeutic
procedures
Internal validation techniques
Split-sample:
development / validation
Cross-validation:
alternating development / validation
extreme: n-1 develop / 1 validate
(‘jack-knife’)
Bootstrap
Bootstrap is the preferred internal validation technique
bootstrap sample for model development:
n patients drawn with replacement
original sample for validation: n patients
difference: optimism
efficiency: development and validation on n
patients
Example: bootstrap results for logistic regression model
30-day mortality ~ a + b1*sex + b2*age
Apparent area under the ROC curve: 0.77
Mean area of 200 bootstrap samples:0.772
Mean area of 200 tests in original: 0.762
Optimism in apparent performance: 0.01
Optimism-corrected area: 0.76
External validation techniques
Temporal validation: same
investigators, validate in recent years
Spatial validation (other place): same
investigators, cross-validate in centers
Fully external: other investigators, other
centers
Example: external validity of logistic regression model
30-day mortality ~ a + b1*sex + b2*age
Apparent area in 785 patients: 0.77
Tested in 20,318 other patients: 0.74
Tested by other investigators: ?
Example: external validation
0.0 0.1 0.2 0.3 0.4Predicted probability of 30-day mortality
0.0
0.1
0.2
0.3
0.4
Act
ual 3
0-da
y m
orta
lity
Area under ROC: 0.74Calibration: reasonable
Summary
Apparent validity gives an optimistic
estimate of model performance
Internal validity may be estimated by
bootstrapping
External validity should be determined
in other populations
Key references
tutorial and book on multivariable models(Harrell 1996, Stat Med 15:361-87;
Harrell: regression modeling strategies, Springer 2001)
empirical evaluations of strategies (Steyerberg 2000: Stat Med19: 1059-79)
internal validation (Steyerberg 2001:JCE 54: 774-81)
external validation (Justice 1999: Ann Intern Med 130:515-24;
Altman 2000: Stat Med 19: 453-73)
Links
Interactive text book on predictive
modelinghttp://www.neri.org/symptom/mockup/Chapter_8/
Harrell’s Regression modeling strategieshttp://hesweb1.med.virginia.edu/biostat/rms/
Top Related