Fundamentals of Model Calibration: Theory & Practice of Model Calibration: Theory & Practice ISPOR...

31
Fundamentals of Model Calibration: Theory & Practice ISPOR 17th Annual International Meeting Washington, DC USA 4 June 2012 Confidential property of Optum. Do not distribute or reproduce without express permission from Optum. 2 Workshop Leaders Douglas Taylor, MBA – Associate Director, Ironwood Pharmaceuticals Inc, Cambridge, MA USA Ankur Pandya, PhD MPH – Graduate Student, Harvard University, Boston, MA USA David Thompson, PhD – Executive Vice President & Senior Scientist, OptumInsight, Boston, MA USA *Copy and paste this text box to enter notations/source information. 7pt type. Aligned to bottom. No need to move or resize this box.

Transcript of Fundamentals of Model Calibration: Theory & Practice of Model Calibration: Theory & Practice ISPOR...

Fundamentals of Model Calibration:Theory & Practice

ISPOR 17th Annual International MeetingWashington, DC USA4 June 2012

Confidential property of Optum. Do not distribute or reproduce without express permission from Optum. 2

Workshop Leaders• Douglas Taylor, MBA

– Associate Director, Ironwood Pharmaceuticals Inc, Cambridge, MA USA

• Ankur Pandya, PhD MPH– Graduate Student, Harvard University, Boston, MA USA

• David Thompson, PhD– Executive Vice President & Senior Scientist, OptumInsight, Boston,

MA USA

*Copy and paste this text box to enter notations/source information. 7pt type. Aligned to bottom. No need to move or resize this box.

Confidential property of Optum. Do not distribute or reproduce without express permission from Optum.

Acknowledgements

Kristen GilmoreRowan IskandarDenise KruzikasKevin LeahyVivek PawarMilton Weinstein

3

We would like to thank our colleagues who have contributed much to this research over the last several years

Confidential property of Optum. Do not distribute or reproduce without express permission from Optum. 4

Workshop Objectives• Discuss rationale for model calibration in what circumstances is

calibration needed? • Provide overview of model calibration process: selection of inputs,

specifying the objective function, implementing the search process, and evaluating the calibration results

• Describe advanced topics in model calibration, including incorporation of calibrated inputs into uncertainty analyses

• Illustrate concepts through real-world examples

*Copy and paste this text box to enter notations/source information. 7pt type. Aligned to bottom. No need to move or resize this box.

Confidential property of Optum. Do not distribute or reproduce without express permission from Optum. 5

Model Inputs Model Model Outputs

Data Sources

Concept of Model Calibration• Calibration traditionally conceptualized as an important—but not

necessary step—in model validation:– If reliable benchmark data exist, then predictive validity can be

assessed & model calibrated if found to be inaccurate– Otherwise, model cannot be impugned for not being calibrated

• Calibration task involves systematic adjustment of model parameter estimates so that model outputs more accurately reflect externalbenchmarks

• Calibration requires modeler to assess how model outputs can govern model inputs, rather than the other way around

Confidential property of Optum. Do not distribute or reproduce without express permission from Optum. 6

• Model validity threatened by spatial variation (eg, if being adapted from original setting to a foreign one)

CHD Risk

Cholesterol Level

France

US

When is calibration needed?

Confidential property of Optum. Do not distribute or reproduce without express permission from Optum. 7

• Model validity threatened by temporal variation (eg, if input data are old or secular changes have occurred since their collection)

CHD Risk

Cholesterol Level

US2010

US1980

When is calibration needed?

Confidential property of Optum. Do not distribute or reproduce without express permission from Optum. 8

CHD Risk

Cholesterol Level

US Women

US Average

US Men

• Model validity threatened by heterogeneity (eg, population average data available, but subgroup data not)

When is calibration needed?

Confidential property of Optum. Do not distribute or reproduce without express permission from Optum. 9

Run Model

Adjust Inputs Assess Results

Estimate Model Parameters

• Looks straightforward, but …– What criteria do we employ to adjust model results?– How do we go about adjusting model inputs?– How do we know when we are done?

Model Calibration Process

Thank You.

Contact Info:David Thompson, [email protected]

FUNDAMENTALS OF MODEL CALIBRATION:  THEORY & PRACTICE

Identifying Inputs to Calibrate

• Theoretically, any input could be calibrated…

• But inputs should be related to the “problem”to justify using calibration– Adapted from one setting to another

– Estimated from heterogeneous populations

– Affected by temporal changes in epidemiology or practice patterns 

Identifying Calibration Targets

• Targets should be based on setting‐specific (or otherwise appropriate) data

• Model should predict these types of events (age‐specific, composite outcomes, etc.)

Goodness‐of‐Fit

• Assess how well model outputs match observed data

• Three potential approaches:

–Acceptable windows

–Minimizing deviations

– Likelihood functions

Acceptable Windows

• Compare model‐predicted outcomes to established ranges for each endpoint

• Suitable when there are multiple endpoints of interest

• Easy to implement

• Limitation:  Does not capture the degree of closeness

Acceptable Windows – Example

Upper Bound

Lower Bound

Acceptable Windows – Example

Upper Bound

Lower Bound

Acceptable Windows – Example

Upper Bound

Lower Bound

Acceptable Windows – Example

Upper Bound

Lower Bound

Minimizing Deviation

• Summary measure of relative distance of model‐produced results from benchmarks

• Captures magnitude of goodness‐of‐fit

• Easy to implement

• Weights all endpoints equally, unless weighting scheme introduced

Percentage Deviation

Where, n = number of endpointspredi = model-based estimate of the ith endpointobsi = data-based target value of the ith endpointwi = weight assigned to the ith endpoint

∑=

−=

n

i i

iii obs

obspredwDeviationPercentageMeanWeighted

1

Minimizing Deviation ‐ Example

Target

Minimizing Deviation ‐ Example

Target

Minimizing Deviation ‐ Example

Target

Likelihood Functions

• How likely the model‐produced results are in light of the observed outcomes

• Incorporates precision of endpoint data

• Harder to implement– Need data on sample sizes

– Have to know (or assume) distributions

Likelihood Functions ‐ Example

• Assume incidence has binomial distribution

• Where:k = # of events observed in model

n = sample size of outcome data

p = # of events observed in outcome data / n 

knk ppkn

kK −−⎟⎟⎠

⎞⎜⎜⎝

⎛== )1()Pr(

Likelihood Function ‐ Example

n = person-yearsk = events(k / n)*1000 = Incidence (y-axis)

k = 23n = 2,800Incidence ≈ 8.21

Target

Likelihood Function ‐ Example

0.00

2.00

4.00

6.00

8.00

10.00

12.00

45‐54yrs 55‐64yrs 65‐74yrs 75‐84yrs

Age‐Specific Incidence (per 1000 person years)

ARIC

Parameter Set 1

Parameter Set 2

k = 23n = 2,800

k = 28L = 0.047

k = 14L = 0.013 Target

Likelihood Function ‐ Example

0.00

2.00

4.00

6.00

8.00

10.00

12.00

45‐54yrs 55‐64yrs 65‐74yrs 75‐84yrs

Age‐Specific Incidence (per 1000 person years)

ARIC

Parameter Set 1

Parameter Set 2

k = 23n = 2,800

k = 28L = 0.047

k = 14L = 0.013

k = 287n = 49,000 k = 368

L = 0.00000064

k = 240L = 0.00045

Target

Combining Likelihoods

• Multiply likelihoods (if independent)

• Sum log‐likelihoods

Easy to implementCaptures magnitude of deviationsWeights for multiple endpoints will be subjective

DeviationsAcceptable Windows Likelihood-based

Easy to implementNot sensitive to magnitude of deviations

Need specific dataNeed to know (or assume) distributionGives “meaningful”goodness-of-fit measures (i.e., likelihoods are probabilities)

Summary of Goodness‐of‐Fit Options

Parameter Search Methods

• How to adjust inputs during calibration?– Manual adjustment

– Random searches

– Directed‐search algorithms 

Confidential | 33

CONFIDENTIAL

Fundamentals of Model Calibration: Theory & PracticeAdvanced Topics

EXCEL DEMONSTRATION

Confidential | 35

Results of 100 calibrations of simple model

Confidential | 36

Advanced Topics• Probabilistic and deterministic sensitivity analysis

for calibrated disease models• Incorporating uncertainty of calibration endpoints in

calibrated oncology models• Identification of and correction for bias introduced

from calibrating longitudinal models to cross-sectional data

Confidential | 37

Probabilistic and deterministic sensitivity analyses for calibrated disease models

Confidential | 38

$0

$100

$200

$300

$400

$500

$600

0.000 0.002 0.004 0.006 0.008 0.010 0.012 0.014 0.016Incremental QALY

Incr

emen

tal C

ost

Why CSA Was Needed

Median: $10,500

Mean: $10,600

95% CI: ($7,800; $13,900)

$50K threshold

Confidential | 39

• Sources of uncertainty– Algorithm

Analyst in a manual calibrationStarting seed/search space in a random calibrationStarting simplex in Nelder-Mead calibration

– Objective functionIs really quite subjectiveChoices include:– Calibration targets– Weighting scheme

– Stopping point

Why CSA Was NeededWhy CSA Was Needed

Confidential | 40

• Evaluated algorithm uncertainty by choosing 5 different starting Nelder-Mead simplexes

• Evaluated objective function uncertainty by choosing 5 different objective functions

• Combined simplexes and weights for a total of 25 different calibrations

• Deterministic sensitivity analysis was performed by examining cost-effectiveness results for each calibration while holding all other parameters constant

• Probabilistic sensitivity analysis was performed by bootstrapping (with equal probability) the 25 calibrations within a 2nd order Monte Carlo simulation for other model parameters

CSA MethodsCSA Methods

Confidential | 41

CSA Deterministic Results

Weight 1 Weight 2 Weight 3 Weight 4 Weight 5

Simplex 1 $8,400 $13,800 $4,400 $11,600 $5,300

Simplex 2 $17,100 $20,800 $7,800 $15,100 $8,100

Simplex 3 $20,500 $11,500 $27,800 $17,300 $10,900

Simplex 4 $20,700 $22,000 $1,500 $8,000 $5,400

Simplex 5 $20,700 $21,000 $39,100 $12,100 $8,900

• ICER* by simplex and

• Median ICER: $12,600• Mean ICER: $14,000• Range: $1,500 ‐ $39,000

*ICER: Incremental Cost-Effectiveness Ratio (Cost per QALY gained) for vaccination vs. no vaccination

Confidential | 42

$0

$100

$200

$300

$400

$500

$600

0.000 0.002 0.004 0.006 0.008 0.010 0.012 0.014 0.016Incremental QALY

Incr

emen

tal C

ost

PSA for a Single Calibration

Median: $10,500

Mean: $10,600

95% CI: ($7,800; $13,900)

$50K threshold

Confidential | 43

-$100

$0

$100

$200

$300

$400

$500

$600

0.000 0.002 0.004 0.006 0.008 0.010 0.012 0.014 0.016

Incremental QALY

Incr

emen

tal C

ost

Vaccination of age cohorts are compared with no vaccination among same age cohorts. Each square represents a calibration and each color represents the PSA around those calibrations.

CSA Probabilistic SA Results

Median: $12,600

Mean: $14,000

95% CI: ($2,700; $29,100)

$50K threshold

Confidential | 44

Representing uncertainty in calibration targets

Confidential | 45

Objective

Demonstrate methods for incorporating uncertainties in calibration targets into sensitivity analyses (PSA) using an oncology example

Confidential | 46

Model

Non-Progressed Progressed

Dead

We constructed hypothetical PFS and OS (with censoring) curves for two treatments and a corresponding three-state Markov model

Three transition probabilities for each treatment were calibrated (using Excel Solver) to simultaneously fit the PFS/OS curves, using mean squared deviation as the objective function

Uncertainty in cost-effectiveness results was represented by cost-effectiveness acceptability curves (CEAC) of lifetime costs and quality-adjusted life-years

Confidential | 47

Analysis

We will look at results of three increasingly comprehensive PSAs using second-order Monte Carlo simulation (SMCS):

Conventional PSA by including only probability distributions of costs and utilities

Calibration Parameter PSA, reflecting uncertainty in the target PFS/OS curves, by specifying beta distributions for failure probabilities at each PFS/OS time point, simulating multiple replicates of the PFS/OS data from these distributions, re-estimating and refitting the curves for each replicate, and incorporating the resulting calibrated parameter sets into the SMCS

Calibration Structural PSA, reflecting uncertainty associated with calibration methods, by varying curve-fitting parameters (initial values, constraints, objective function)

Confidential | 48

Sensitivity analysis process flow

Generate 200 survival curves from trial data reflecting sampling error

Alternative calibration methods to generate 200 parameter sets

Calibrate model to generate 200 parameter sets

Bootstrap 200 parameter sets within PSA

Confidential | 49

Sample Kaplan-Meier Data

Timepoint 0 4 9 14 19 24

OSAt Risk 100 88 65 47 23 9

Censored 0 7 9 12 14 7

PFSAt Risk 100 80 48 27 12 3

Censored 0 6 8 7 7 3

Confidential | 50

Uncertainty estimates

Life-table estimates are computed by counting the numbers of censored and uncensored observations that are in time intervals

[ ] ∞==+= +− 101 0121 kii tandtwherekitt ,,,,,, K

Estimated standard error is iqp ˆˆ −= 1( )i

iii n

pqq′

=ˆˆˆσ̂ where

is the number of units censored in the intervaliiii wwherewnn ,2−=′

is the effective sample size inin′ [ ]ii tt ,1−

Conditional probability of an event in is estimated byi

ii n

dq′

=ˆ[ ]ii tt ,1−

Where dj is the number of events in the interval

Confidential | 51

Uncertainty in survival curves and calibration

Generated OS Calibrated OS

Confidential | 52

Comparison of PSA approaches

Confidential | 53

CEAC comparison

Calibrating Longitudinal Models to Cross-Sectional Data: The Effect of Temporal Changes in Health

Practices

Objective

• One set of calibrated transition probabilities for cervical cancer model

Problem

• Pap smear screening practices changed over time

• Calibration targets reflect current and past screening patterns– Older women (>65 years): Less screening

when they were young– Younger women: Exposed to higher

screening rates at same ages

Annual screening coverage by age

≥65<65

0%

10%

20%

30%

40%

50%

60%

70%

10 20 30 40 50 60 70 80 90 100Age

% S

cree

ned

Model

Calibration

Cal

ibra

tion

Calib

ratio

nC

alib

ratio

n

SEER targetstarget

Model with screening

How did we calibrate?

Two-stage model,two-stage calibration

Single-stage model,single-stage calibrationTwo-stage model run w/single-stage calibration

Inputs

Outputs

Single-stage model / single-stage calibrationTwo-stage model run w/ single-stage calibrationTwo-stage model / two-stage calibration

Results

SEER Target for incidence: 13.41

13.41

13.43

15.81

13.41

Incidence and Mortality rates per 100,000 (age 65+)

SEER Target for mortality: 7.14

5.68

10.507.14

7.32

Implication

• Effects of temporal changes are important when calibrating longitudinal models to cross-sectional data

Confidential | 61

Conclusions• Time is always a limiting factor – with more time a

“better” solution can almost always be found• Calibration can affect the interpretation of cost-

effectiveness results • In order to characterize the uncertainty in a

calibrated model:– Results should be reported as a range from different

calibrations– Calibration should be included in probabilistic sensitivity

analyses– Uncertainty in calibration targets should be considered

• Adjustments may need to be made to account for temporal shifts in data

• Using a combination of calibration methods is likely the most efficient way to arrive at good calibrations

DISCUSSION