Fundamentals of Model Calibration: Theory & Practice of Model Calibration: Theory & Practice ISPOR...
Transcript of Fundamentals of Model Calibration: Theory & Practice of Model Calibration: Theory & Practice ISPOR...
Fundamentals of Model Calibration:Theory & Practice
ISPOR 17th Annual International MeetingWashington, DC USA4 June 2012
Confidential property of Optum. Do not distribute or reproduce without express permission from Optum. 2
Workshop Leaders• Douglas Taylor, MBA
– Associate Director, Ironwood Pharmaceuticals Inc, Cambridge, MA USA
• Ankur Pandya, PhD MPH– Graduate Student, Harvard University, Boston, MA USA
• David Thompson, PhD– Executive Vice President & Senior Scientist, OptumInsight, Boston,
MA USA
*Copy and paste this text box to enter notations/source information. 7pt type. Aligned to bottom. No need to move or resize this box.
Confidential property of Optum. Do not distribute or reproduce without express permission from Optum.
Acknowledgements
Kristen GilmoreRowan IskandarDenise KruzikasKevin LeahyVivek PawarMilton Weinstein
3
We would like to thank our colleagues who have contributed much to this research over the last several years
Confidential property of Optum. Do not distribute or reproduce without express permission from Optum. 4
Workshop Objectives• Discuss rationale for model calibration in what circumstances is
calibration needed? • Provide overview of model calibration process: selection of inputs,
specifying the objective function, implementing the search process, and evaluating the calibration results
• Describe advanced topics in model calibration, including incorporation of calibrated inputs into uncertainty analyses
• Illustrate concepts through real-world examples
*Copy and paste this text box to enter notations/source information. 7pt type. Aligned to bottom. No need to move or resize this box.
Confidential property of Optum. Do not distribute or reproduce without express permission from Optum. 5
Model Inputs Model Model Outputs
Data Sources
Concept of Model Calibration• Calibration traditionally conceptualized as an important—but not
necessary step—in model validation:– If reliable benchmark data exist, then predictive validity can be
assessed & model calibrated if found to be inaccurate– Otherwise, model cannot be impugned for not being calibrated
• Calibration task involves systematic adjustment of model parameter estimates so that model outputs more accurately reflect externalbenchmarks
• Calibration requires modeler to assess how model outputs can govern model inputs, rather than the other way around
Confidential property of Optum. Do not distribute or reproduce without express permission from Optum. 6
• Model validity threatened by spatial variation (eg, if being adapted from original setting to a foreign one)
CHD Risk
Cholesterol Level
France
US
When is calibration needed?
Confidential property of Optum. Do not distribute or reproduce without express permission from Optum. 7
• Model validity threatened by temporal variation (eg, if input data are old or secular changes have occurred since their collection)
CHD Risk
Cholesterol Level
US2010
US1980
When is calibration needed?
Confidential property of Optum. Do not distribute or reproduce without express permission from Optum. 8
CHD Risk
Cholesterol Level
US Women
US Average
US Men
• Model validity threatened by heterogeneity (eg, population average data available, but subgroup data not)
When is calibration needed?
Confidential property of Optum. Do not distribute or reproduce without express permission from Optum. 9
Run Model
Adjust Inputs Assess Results
Estimate Model Parameters
• Looks straightforward, but …– What criteria do we employ to adjust model results?– How do we go about adjusting model inputs?– How do we know when we are done?
Model Calibration Process
Thank You.
Contact Info:David Thompson, [email protected]
FUNDAMENTALS OF MODEL CALIBRATION: THEORY & PRACTICE
Identifying Inputs to Calibrate
• Theoretically, any input could be calibrated…
• But inputs should be related to the “problem”to justify using calibration– Adapted from one setting to another
– Estimated from heterogeneous populations
– Affected by temporal changes in epidemiology or practice patterns
Identifying Calibration Targets
• Targets should be based on setting‐specific (or otherwise appropriate) data
• Model should predict these types of events (age‐specific, composite outcomes, etc.)
Goodness‐of‐Fit
• Assess how well model outputs match observed data
• Three potential approaches:
–Acceptable windows
–Minimizing deviations
– Likelihood functions
Acceptable Windows
• Compare model‐predicted outcomes to established ranges for each endpoint
• Suitable when there are multiple endpoints of interest
• Easy to implement
• Limitation: Does not capture the degree of closeness
Acceptable Windows – Example
Upper Bound
Lower Bound
Acceptable Windows – Example
Upper Bound
Lower Bound
Acceptable Windows – Example
Upper Bound
Lower Bound
Acceptable Windows – Example
Upper Bound
Lower Bound
Minimizing Deviation
• Summary measure of relative distance of model‐produced results from benchmarks
• Captures magnitude of goodness‐of‐fit
• Easy to implement
• Weights all endpoints equally, unless weighting scheme introduced
Percentage Deviation
Where, n = number of endpointspredi = model-based estimate of the ith endpointobsi = data-based target value of the ith endpointwi = weight assigned to the ith endpoint
∑=
−=
n
i i
iii obs
obspredwDeviationPercentageMeanWeighted
1
Minimizing Deviation ‐ Example
Target
Likelihood Functions
• How likely the model‐produced results are in light of the observed outcomes
• Incorporates precision of endpoint data
• Harder to implement– Need data on sample sizes
– Have to know (or assume) distributions
Likelihood Functions ‐ Example
• Assume incidence has binomial distribution
• Where:k = # of events observed in model
n = sample size of outcome data
p = # of events observed in outcome data / n
knk ppkn
kK −−⎟⎟⎠
⎞⎜⎜⎝
⎛== )1()Pr(
Likelihood Function ‐ Example
n = person-yearsk = events(k / n)*1000 = Incidence (y-axis)
k = 23n = 2,800Incidence ≈ 8.21
Target
Likelihood Function ‐ Example
0.00
2.00
4.00
6.00
8.00
10.00
12.00
45‐54yrs 55‐64yrs 65‐74yrs 75‐84yrs
Age‐Specific Incidence (per 1000 person years)
ARIC
Parameter Set 1
Parameter Set 2
k = 23n = 2,800
k = 28L = 0.047
k = 14L = 0.013 Target
Likelihood Function ‐ Example
0.00
2.00
4.00
6.00
8.00
10.00
12.00
45‐54yrs 55‐64yrs 65‐74yrs 75‐84yrs
Age‐Specific Incidence (per 1000 person years)
ARIC
Parameter Set 1
Parameter Set 2
k = 23n = 2,800
k = 28L = 0.047
k = 14L = 0.013
k = 287n = 49,000 k = 368
L = 0.00000064
k = 240L = 0.00045
Target
Combining Likelihoods
• Multiply likelihoods (if independent)
• Sum log‐likelihoods
Easy to implementCaptures magnitude of deviationsWeights for multiple endpoints will be subjective
DeviationsAcceptable Windows Likelihood-based
Easy to implementNot sensitive to magnitude of deviations
Need specific dataNeed to know (or assume) distributionGives “meaningful”goodness-of-fit measures (i.e., likelihoods are probabilities)
Summary of Goodness‐of‐Fit Options
Parameter Search Methods
• How to adjust inputs during calibration?– Manual adjustment
– Random searches
– Directed‐search algorithms
Confidential | 33
CONFIDENTIAL
Fundamentals of Model Calibration: Theory & PracticeAdvanced Topics
EXCEL DEMONSTRATION
Confidential | 35
Results of 100 calibrations of simple model
Confidential | 36
Advanced Topics• Probabilistic and deterministic sensitivity analysis
for calibrated disease models• Incorporating uncertainty of calibration endpoints in
calibrated oncology models• Identification of and correction for bias introduced
from calibrating longitudinal models to cross-sectional data
Confidential | 37
Probabilistic and deterministic sensitivity analyses for calibrated disease models
Confidential | 38
$0
$100
$200
$300
$400
$500
$600
0.000 0.002 0.004 0.006 0.008 0.010 0.012 0.014 0.016Incremental QALY
Incr
emen
tal C
ost
Why CSA Was Needed
Median: $10,500
Mean: $10,600
95% CI: ($7,800; $13,900)
$50K threshold
Confidential | 39
• Sources of uncertainty– Algorithm
Analyst in a manual calibrationStarting seed/search space in a random calibrationStarting simplex in Nelder-Mead calibration
– Objective functionIs really quite subjectiveChoices include:– Calibration targets– Weighting scheme
– Stopping point
Why CSA Was NeededWhy CSA Was Needed
Confidential | 40
• Evaluated algorithm uncertainty by choosing 5 different starting Nelder-Mead simplexes
• Evaluated objective function uncertainty by choosing 5 different objective functions
• Combined simplexes and weights for a total of 25 different calibrations
• Deterministic sensitivity analysis was performed by examining cost-effectiveness results for each calibration while holding all other parameters constant
• Probabilistic sensitivity analysis was performed by bootstrapping (with equal probability) the 25 calibrations within a 2nd order Monte Carlo simulation for other model parameters
CSA MethodsCSA Methods
Confidential | 41
CSA Deterministic Results
Weight 1 Weight 2 Weight 3 Weight 4 Weight 5
Simplex 1 $8,400 $13,800 $4,400 $11,600 $5,300
Simplex 2 $17,100 $20,800 $7,800 $15,100 $8,100
Simplex 3 $20,500 $11,500 $27,800 $17,300 $10,900
Simplex 4 $20,700 $22,000 $1,500 $8,000 $5,400
Simplex 5 $20,700 $21,000 $39,100 $12,100 $8,900
• ICER* by simplex and
• Median ICER: $12,600• Mean ICER: $14,000• Range: $1,500 ‐ $39,000
*ICER: Incremental Cost-Effectiveness Ratio (Cost per QALY gained) for vaccination vs. no vaccination
Confidential | 42
$0
$100
$200
$300
$400
$500
$600
0.000 0.002 0.004 0.006 0.008 0.010 0.012 0.014 0.016Incremental QALY
Incr
emen
tal C
ost
PSA for a Single Calibration
Median: $10,500
Mean: $10,600
95% CI: ($7,800; $13,900)
$50K threshold
Confidential | 43
-$100
$0
$100
$200
$300
$400
$500
$600
0.000 0.002 0.004 0.006 0.008 0.010 0.012 0.014 0.016
Incremental QALY
Incr
emen
tal C
ost
Vaccination of age cohorts are compared with no vaccination among same age cohorts. Each square represents a calibration and each color represents the PSA around those calibrations.
CSA Probabilistic SA Results
Median: $12,600
Mean: $14,000
95% CI: ($2,700; $29,100)
$50K threshold
Confidential | 44
Representing uncertainty in calibration targets
Confidential | 45
Objective
Demonstrate methods for incorporating uncertainties in calibration targets into sensitivity analyses (PSA) using an oncology example
Confidential | 46
Model
Non-Progressed Progressed
Dead
We constructed hypothetical PFS and OS (with censoring) curves for two treatments and a corresponding three-state Markov model
Three transition probabilities for each treatment were calibrated (using Excel Solver) to simultaneously fit the PFS/OS curves, using mean squared deviation as the objective function
Uncertainty in cost-effectiveness results was represented by cost-effectiveness acceptability curves (CEAC) of lifetime costs and quality-adjusted life-years
Confidential | 47
Analysis
We will look at results of three increasingly comprehensive PSAs using second-order Monte Carlo simulation (SMCS):
Conventional PSA by including only probability distributions of costs and utilities
Calibration Parameter PSA, reflecting uncertainty in the target PFS/OS curves, by specifying beta distributions for failure probabilities at each PFS/OS time point, simulating multiple replicates of the PFS/OS data from these distributions, re-estimating and refitting the curves for each replicate, and incorporating the resulting calibrated parameter sets into the SMCS
Calibration Structural PSA, reflecting uncertainty associated with calibration methods, by varying curve-fitting parameters (initial values, constraints, objective function)
Confidential | 48
Sensitivity analysis process flow
Generate 200 survival curves from trial data reflecting sampling error
Alternative calibration methods to generate 200 parameter sets
Calibrate model to generate 200 parameter sets
Bootstrap 200 parameter sets within PSA
Confidential | 49
Sample Kaplan-Meier Data
Timepoint 0 4 9 14 19 24
OSAt Risk 100 88 65 47 23 9
Censored 0 7 9 12 14 7
PFSAt Risk 100 80 48 27 12 3
Censored 0 6 8 7 7 3
Confidential | 50
Uncertainty estimates
Life-table estimates are computed by counting the numbers of censored and uncensored observations that are in time intervals
[ ] ∞==+= +− 101 0121 kii tandtwherekitt ,,,,,, K
Estimated standard error is iqp ˆˆ −= 1( )i
iii n
pqq′
=ˆˆˆσ̂ where
is the number of units censored in the intervaliiii wwherewnn ,2−=′
is the effective sample size inin′ [ ]ii tt ,1−
Conditional probability of an event in is estimated byi
ii n
dq′
=ˆ[ ]ii tt ,1−
Where dj is the number of events in the interval
Confidential | 51
Uncertainty in survival curves and calibration
Generated OS Calibrated OS
Confidential | 52
Comparison of PSA approaches
Confidential | 53
CEAC comparison
Calibrating Longitudinal Models to Cross-Sectional Data: The Effect of Temporal Changes in Health
Practices
Objective
• One set of calibrated transition probabilities for cervical cancer model
Problem
• Pap smear screening practices changed over time
• Calibration targets reflect current and past screening patterns– Older women (>65 years): Less screening
when they were young– Younger women: Exposed to higher
screening rates at same ages
Annual screening coverage by age
≥65<65
0%
10%
20%
30%
40%
50%
60%
70%
10 20 30 40 50 60 70 80 90 100Age
% S
cree
ned
Model
Calibration
Cal
ibra
tion
Calib
ratio
nC
alib
ratio
n
SEER targetstarget
Model with screening
How did we calibrate?
Two-stage model,two-stage calibration
Single-stage model,single-stage calibrationTwo-stage model run w/single-stage calibration
Inputs
Outputs
Single-stage model / single-stage calibrationTwo-stage model run w/ single-stage calibrationTwo-stage model / two-stage calibration
Results
SEER Target for incidence: 13.41
13.41
13.43
15.81
13.41
Incidence and Mortality rates per 100,000 (age 65+)
SEER Target for mortality: 7.14
5.68
10.507.14
7.32
Implication
• Effects of temporal changes are important when calibrating longitudinal models to cross-sectional data
Confidential | 61
Conclusions• Time is always a limiting factor – with more time a
“better” solution can almost always be found• Calibration can affect the interpretation of cost-
effectiveness results • In order to characterize the uncertainty in a
calibrated model:– Results should be reported as a range from different
calibrations– Calibration should be included in probabilistic sensitivity
analyses– Uncertainty in calibration targets should be considered
• Adjustments may need to be made to account for temporal shifts in data
• Using a combination of calibration methods is likely the most efficient way to arrive at good calibrations
DISCUSSION