Learning objectives - ku

Risk predictions and performance measuresCourse: Statistical Evaluation of Diagnostic and Predictive

Models

Thomas Alexander Gerds (University of Copenhagen)

Summer School, Barcelona, June 29, 2015

1 / 46

Overview lectures and R tutorials

I Day 1 Risk predictions and performance measures

I Day 2 Modelling and added value of a new marker

I Day 3 Cross-validation

I Day 4 Survival analysis

2 / 46

Computer exercises: R statistical software

R is free, works on all platforms, supports reproducible research, can producenice �gures, and provides solutions for the purpose of our course.

3 / 46

Learning objectives

I The role of statistical models for medical decision making

I Statistical measures of predictive accuracy:I Sensitivity, speci�city, prognostic values,I Reclassi�cation tables, ROC curves, calibration curves, AUC,

Brier score, c-index

I Risk predictions obtained from logistic regression, Coxregression and random forests

I Model evaluation in external data, cross-validation andbootstrap

I Tools for censored event time outcome and competing risks

4 / 46

Terminology

In statistics, a diagnosis is an estimate of the patient's currentstatus.

A prediction/prognosis is an estimate of the patient's future status.

5 / 46

The role of statistics

Question: Doctors diagnose and predict patients everyday, so why

do we need statistics?

Answer: Using statistics, we can learn from what happened to

many past patients, if it is stored in a data base.

I Valuable information for the new patients.

I Improved accuracy of diagnoses and predictions.

I Evidence based information to assist medical decision making.

6 / 46

Comparing doctors 1

1DA Redelmeier, DA Bloch, and DH Hickam. Assessing predictive accuracy: how to compare Brier

scores. Journal of Clinical Epidemiology, 44:1141-6, 1991.

7 / 46

Who is asking the question?

A patient needs to know:

I Am I diseased? (current status)

I Will I develop the disease? (future status)

I Should I stop smoking?

I Do I really need chemotherapy?

The community wants an applicable tool

A basic researcher wants a biologically plausible model

A statistician wants a general strategy/algorithm

8 / 46

Can doctors predict the end of life?

Tom Koch (Bioethicist)2:

After Fred (as we'll call him) was diagnosed with an inoperable

tumour the doctors told him he had perhaps six months to live. A

merchant seaman, Fred decided to spend his savings enjoying

himself while waiting for the end. More than eight months later,

Fred showed up at a hospice still dying, but now broke, . . .

2Koch (2009). End of life, year after year after year. CMAJ, Vol 81, no. 11.Medical knowledge that matters.

9 / 46

Prognosis vs prediction

Stephen R. Workman (Medical Doctor):

Dr. Koch confuses a prediction with a prognostic assessment. A physician can

guess (predict) how long a patient will live.

An accurate assessment of prognosis di�ers from a prediction in that it requires

stating a range of survival based upon relevant data.

Some patients die within weeks but others live for months or years. The

average survival is around six months.

Assuming that data is correctly applied to a particular patient such prognostic

assessments are never 'wrong.'

It would be nice if physicians were clairvoyant, but as Yogi Bera noted: "It's

tough to make predictions, especially about the future."

10 / 46

Can a statistical model predict the end of life?

Henderson & Keiding (both bio-statisticians)3:

Doctors' survival predictions for terminally ill patients have been shown tobe inaccurate and there has been an argument for less guesswork andmore use of carefully constructed statistical indices.

As statisticians, the authors are less con�dent in the predictive value ofstatistical models and indices for individual survival times.

The authors argue that models and statistical indices can be useful at thegroup or population level, but that human survival is so uncertain thateven the best statistical analysis cannot provide single-number predictionsof real use for individual patients.

3J Med Ethics 2005;31:703-706 11 / 46

Tailored probability

Michael Kattan (computer scientist, patient):

When I was diagnosed with lymphoma 11 years ago, I was eager to learn

my prognosis.4

I really wanted a predicted probability of survival and didn't speci�cally

care what the prognostic factors were, what my relative risk might be, or

in what risk group I belonged.

We should produce increasingly accurate prediction models by increasing

sample sizes, adding informative markers, and applying more

sophisticated modeling approaches.

4Statistical prediction models, arti�cial neural networks, and the sophism 'Iam a patient, not a statistic`. Journal of Clinical Oncology, 20:885-887, 2002}

12 / 46

Prognostic versus predictive biomarkers 5

A prognostic biomarker provides information on the likely course ofthe cancer disease in an untreated individual . . .

. . . evidence based prognostic markers would be extremely helpfulin selecting patients for adjuvant systemic treatment.

5Nils Brünner (University of Copenhagen). What is the di�erence between"predictive and prognostic biomarkers"? Can you give some examples?Connection 2009, 13:18

13 / 46

Prognostic versus predictive biomarkers 5

A predictive biomarker is a marker which can be used to identifysub-populations of patients who are most likely to respond to agiven therapy.

Predictive biomarkers are the basis for individualized or tailor-madetreatment. E.g., KRAS mutation predicts resistance to EGFrantibody therapy.

14 / 46

Example: mutant KRAS group

Amado et al. (2008) JCO

15 / 46

Example: wild-type KRAS group

These results show that KRAS mutations predict for lack of clinical bene�t to panitumumab therapy.KRAS status should be considered in selecting patients with mCRC as candidates for panitu-mumabmonotherapy. J Clin Oncol 26:1626-1634.

16 / 46

De�nition

A medical diagnostic test is a decision rule

X =

{1 positive / disease

0 negative / non-disease

The test can be based on a biomarker.

17 / 46

Example: screening for prostata cancer

The �rst commercial Prostate Speci�c Antigen (PSA) test:

positive: if PSA > 4.0 ng/mL

negative: if PSA ≤ 4.0 ng/mL

I The reference range of serum PSA is 0.0�4.0 ng/mL (based ona study of 472 healthy men where 99% had a total PSA levelbelow 4 ng/mL).

I There are arguments that this level should be lowered to 2.5ng/ml.

18 / 46

From the Internet 6

University of Michigan researchers identify new blood test forprostate cancer. The test looks at 22 biomarkers; The results aremore accurate than PSA.

These 22 biomarkers appear to be the right number. If you used

too many or too few, the accuracy went down a bit. Our �ndings

held up when we tested the model on an independent set of blood

serum samples,. . .

6http:

//www.eurekalert.org/pub_releases/2005-09/uomh-uri091905.php19 / 46

Predicted probabilities

A prediction rule is a fully speci�ed mathematical function for mappingfrom patients characteristics to a predicted probability.

Patients characteristics may include:

I conventional predictors such as age, gender, blood pressure, etc.I biomarkers and (high dimensional) genetic markersI exposure history (until today)I treatment

The function is described by (estimated) parameters, such as regressioncoe�cients and cut-o� thresholds.

A set of biomarkers or a genes signature is not a prediction model.

20 / 46

Prostate Cancer Risk Calculator Available Online

I American researchers have developed and released an onlinecalculator to predict a man's risk of developing prostate cancerbased on his age, biopsy results, PSA levels, and digital rectalexam results.

I The original Prostate Cancer Prevention Trial (PCPT)Prostate Cancer Risk Calculator (PCPTRC) posted in 2006was developed based upon 5519 men in the placebo group ofthe Prostate Cancer Prevention Trial.

I All of these 5519 men initially had a prostate-speci�c antigen(PSA) value less than or equal to 3.0 ng/ml and were followedfor seven years with annual PSA and digital rectal examination(DRE). and had at least one prostate biopsy.

21 / 46

Prostate Cancer Risk Calculator Available Online

PSA, family history, DRE �ndings, and history of a prior negativeprostate biopsy provided independent predictive value to thecalculation of risk of a biopsy that showed presence of cancer.

DisclaimerThe calculator is in principle only applicable to men under thefollowing restrictions:

I Age 55 or older

I No previous diagnosis of prostate cancer

I DRE and PSA results less than 1 year old

22 / 46

Prostate Cancer Risk Calculator in action 7

7http://www.prostate-cancer-risk-calculator.info/23 / 46


24 / 46


25 / 46


26 / 46

What is behind the 'Prostate Cancer Risk Calculator'

I The Prostate Cancer Prevention Trial 8

I Here we used prostate biopsy data from 5519 participants in

the PCPT to examine whether interactions among these

variables (PSA level, family history of prostate cancer, age,

race, and digital rectal examination) can be used to predict

prostate cancer risk in an individual patient.

I We used multivariable logistic regression to model the risk ofprostate cancer by considering all possible combinations ofmain e�ects and interactions.

I The models chosen were those that minimized the Bayesian

information criterion (BIC) and maximized the average

out-of-sample area under the receiver operating characteristic

curve (via 4-fold cross-validation).

8Thompson et al. J Natl Cancer Inst, 98(8):529-34, 2006.27 / 46

Notation for Binary Markers

[Y : ] Outcome (disease status) E.g. coronary heart disease

[X : ] Test result (biomarker) E.g. exercise stress test

Y = 1 Y = 0

X = 0 False negative True negativeX = 1 True positive False positive

28 / 46

Evaluation of Binary Markers

To what extent does a biomarker re�ect true disease status?

[True positive rate: ] TPR = P(X = 1 | Y = 1) = sensitivity

[False positive rate: ] FPR = P(X = 1 | Y = 0) = 1- speci�city

Ideal tests have FPR = 0 and TPR = 1, but usually both errorrates have to be optimized simultaneously.

29 / 46

Estimating TPR and FPR

Use a case control study if disease prevalence is low:

F̂PR =No. controls with positive test

No. controls

T̂PR =No. cases with postive test

No. cases

Con�dence intervals are either obtained exactly or via:

F̂PR± 1.96

√(1/ncase)F̂PR

(1− F̂PR

)

T̂PR± 1.96

√(1/ncontrol)T̂PR

(1− T̂PR

)

30 / 46

Example: Coronary Artery Surgery Study 9

I Y : Coronary heart disease status

I X : Exercise Stress Test

Y = 0 Y = 1

X = 0 327 208X = 1 115 815

I F̂PR = 115/(115+ 327) = 0.26,

I T̂PR = 815/(208+ 815) = 0.80

9Data from Table 2 of: Weiner DA, Ryan TJ, McCabe CH, et al.Correlations among history of angina, ST-segment response and prevalence ofcoronary-artery disease in the Coronary Artery Surgery Study (CASS). NEJM301(5):230-235. 1979.

31 / 46

Continuous Markers

I In many clinical applications, biomarkers are continuous(Example: Prostate Speci�c Antigen (PSA) for prostatecancer)

I For any given cut-o� value c , we may de�ne a test

Y = 0 Y = 1

X ≥ c False Positve True PositiveX < c True Negative False Negative

I Classi�cation accuracy:

FPR(c) = P(X ≥ c | Y = 0), TPR(c) = P(X ≥ c | Y = 1)

32 / 46

Example: Pancreatic Cancer Study

Antigens, CA-125 and CA 19-9 are possible biomarkers ofpancreatic cancer 10

Distribution of CA125 among cases andcontrols

Trade-o� between:

I Increase c ⇒FPR ↓ and TPR ↓

I Decrease c ⇒FPR ↑ and TPR ↑

10Wieand et al (1989) studied ncase = 90 patients with pancreatic cancer andncontrol = 51 healthy patients with pancreatitis

33 / 46

The Receiver OperatingCharacteristic (ROC) Curve:

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

False positive rate = 1 − Specificity

True

pos

itive

rat

e =

Sen

sitiv

ity

● Perfect test

No pr

edict

ive va

lueActual test

Estimation of FPR(c) andTPR(c)

F̂PR(c) =No. controls with X ≥ c

No. controls

T̂PR(c) =No. cases with X ≥ c

No. cases

Roc plots TPR(c) against FPR(c) forall di�erent cut-o� values c.

34 / 46

Limitation of ROC analysis

The classi�cation accuracy measures are good:

I for describing the capacity a marker has in distinguishing adiseased subject from a non-diseased subject.

I in the discovery stage when interest lies in identifying markersfor disease diagnosis and prognosis.

In clinical practice, patients and clinicians are often more interestedin predictive values:

I How likely is it that the patient is truly diseased if the test ispositive?

I How likely is it that the patient is truly disease free if the testis negative?

35 / 46

Predictive Values

The positive predictive value: PPV= P(Y = 1 | X = 1)The probability that a patient with a positive test is diseased

The negative predictive value: NPV= P(Y = 0 | X = 0)The probability that a patient with a negative test is not diseased

I A perfect test has PPV = 1 and NPV = 1.

I A useless test has PPV = Prev and NPV = 1- Prev

where Prev= disease prevalence

36 / 46

Estimation of predictive values

In a cohort study the PPV(c) and NPV(c) can be estimateddirectly as:

P̂PV(c) =No. cases among subjects with X ≥ c

No. subjects with X ≥ c

N̂PV(c) =No. controls among subjects with X < c

No. subjects with X < c

In a case-control study PPV and NPV cannot be estimated (byusing this formula) .

37 / 46

Relation to TPR and FPR

The clinical interpretations of PPV and NPV are di�erent from thatof FPR and TPR. But the values are closely related via theprevalence � and Bayes' theorem:

PPV(c) =TPR(c)Prev

TPR(c)Prev+FPR(c)(1− Prev)

NPV(c) ={1− FPR(c)}(1− Prev)

{1− FPR(c)}(1− Prev) + {1− TPR(c)}Prev

38 / 46

Exercise 1.1: Screening tests (Clayton & Hills, 1993)

I Consider a screening test which has sensitivity of 90% and speci�city of99.5%. The prevalence of the disease is 0.001 so before carrying out thetest, the probability that a person has the disease is 0.001.

I Draw a probability tree in which the �rst level of branching refers tohaving the disease or not, and the second level to being positive ornegative on the screening test. Fill in the probabilities for each of thebranches and calculate the probabilities for the four possible combinationsof disease and test.

I Draw the tree the other way, so that the �rst level of branching refers tobeing positive or negative on the screening test and the second level tohaving the disease or not. Fill in the probabilities for the branches of thistree. What is the probability of a person having the disease after apositive test result?

I Consider a second screening test that has sensitivity 50% and speci�city99.9%. Compare this test to the one above in terms of positive andnegative predictive values in two populations with disease prevalence0.001 and 0.1, respectively.

39 / 46

The role of the statistical model

Data base

Statisticalmodel

Patient'scharacteristics

Predictedrisk

Medical decision

Personalcosts

40 / 46

Idea & aims

A risk prediction model provides a probability, the predicted risk:r(Xi) ∈ [0,1]

Idea:Use r(Xi) as a continuous marker for the event.

Evaluation:

I To assess if the model is calibrated.

I To assess if the model is able to discriminate.

I To choose between two risk prediction models (algorithms).

I To assess the value of a new marker.

41 / 46

Calibration

A predicted risk of 17% is reliable if it can be expected that theevent will occur to about 17 out of 100 patients who all received apredicted risk of 17%.

A statistical model is calibrated if it provides reliable predictions forall subjects.

A calibration plot shows predicted risks versus observed proportions.

42 / 46

The Brier scoreThe accuracy of the predictions measured by the distance betweenprediction and event status is called Brier score:

1

n

n∑

i=1

{Yi − r(Xi )}2

I (1-r(Xi))2: if subject i has the event.

I (0-r(Xi))2: if subject i is event-free.

The square root of the average Brier score is the expected distancebetween the risk prediction and the 0-1 outcome.

The lower the better. Benchmarks:

I coin toss

I random number

I always 50% risk

I 0 (perfect prediction)

43 / 46

AUC and c-index

A model discriminates well if it can correctly distinguish high riskfrom low risk subjects.

The area under the ROC curve (AUC) or concordance index(c-index) is the fraction of pairs (i,j) in which the predicted risk washigher for the subject who had the event:

r(Xi)>r(Xj) and Yi(t)=1, Yj(t)=0 pair i,j concordantr(Xi)<r(Xj) and Yi(t)=1, Yj(t)=0 pair i,j discordant

The higher the better. Benchmarks

I Coin toss, null model, constant prediction

I 100% (perfect discrimination)

44 / 46

BenchmarksNo matter the performance measure, any risk prediction modelshould outperform:

I a random prediction.

I a constant prediction

I an estimate of the event probability which ignores all subjectspeci�c predictors, i.e., ignores the covariates that areassociated with the event.

I an estimate of the event based on a model which only includesrandom noise to predict the outcome

A new risk prediction model should outperform the conventionalmodel:

I in terms of prediction performance.

I in terms of convenience, cost, risk for the doctor and thepatient.

45 / 46

Exercise 1.2: Benchmark valuesCompute Brier score and AUC in the following data:

Y X r1(X) r2(X) r3(X) r4(X)

0 17 0.06 0.66 0.78 0.791 23 0.04 0.9 0.99 11 21 0.05 0.85 0.98 0.980 3 0.33 0.05 0 0.180 13 0.08 0.42 0.42 0.551 21 0.05 0.85 0.91 0.981 23 0.04 0.9 0.9 10 8 0.12 0.17 0.02 0.31 9 0.11 0.21 0.07 0.34

Compare results also with benchmark values obtained by predicting the risk via:

I coin toss: predict either 0% or 100% risk

I constant: predict always 50% risk

I random: predict a sample from the uniform distribution U[0,1]

Report the Brier score and the AUC for models r1, r2, r3, r4 and the threebenchmarks.

46 / 46

Learning objectives - ku

Documents

Transcript of Learning objectives - ku