1 The Receiver Operating Characteristic (ROC) Curve EPP 245 Statistical Analysis of Laboratory Data.

14
1 The Receiver Operating Characteristic (ROC) Curve EPP 245 Statistical Analysis of Laboratory Data

Transcript of 1 The Receiver Operating Characteristic (ROC) Curve EPP 245 Statistical Analysis of Laboratory Data.

Page 1: 1 The Receiver Operating Characteristic (ROC) Curve EPP 245 Statistical Analysis of Laboratory Data.

1

The Receiver Operating Characteristic (ROC) Curve

EPP 245

Statistical Analysis of

Laboratory Data

Page 2: 1 The Receiver Operating Characteristic (ROC) Curve EPP 245 Statistical Analysis of Laboratory Data.

November 30, 2006 EPP 245 Statistical Analysis of Laboratory Data

2

Binary Classification

• Suppose we have two groups for which each case is a member of one or the other, and that we know the correct classification (“truth”).

• Suppose we have a prediction method that produces a single numerical value, and that small values of that number suggest membership in group 1 and large values suggest membership in group 2

Page 3: 1 The Receiver Operating Characteristic (ROC) Curve EPP 245 Statistical Analysis of Laboratory Data.

November 30, 2006 EPP 245 Statistical Analysis of Laboratory Data

3

• If we pick a cutpoint t, we can assign any case with a predicted value ≤ t to group 1 and the others to group 2.

• For that value of t, we can compute the number correctly assigned to group 2 and the number incorrectly assigned to group 2 (true positives and false positives).

• For t small enough, all will be assigned to group 2 and for t large enough all will be assigned to group 1.

• The ROC curve is a plot of true positives vs. false positives

Page 4: 1 The Receiver Operating Characteristic (ROC) Curve EPP 245 Statistical Analysis of Laboratory Data.

November 30, 2006 EPP 245 Statistical Analysis of Laboratory Data

4

Juul's IGF data

Description:

The 'juul' data frame has 1339 rows and 6 columns. It contains a reference sample of the distribution of insulin-like growth factor (IGF-I), one observation per subject in various ages with the bulk of the data collected in connection with school physical examinations.

Variables:

age a numeric vector (years). menarche a numeric vector. Has menarche occurred (code 1: no, 2: yes)? sex a numeric vector (1: boy, 2: girl). igf1 a numeric vector. Insulin-like growth factor ($mu$g/l). tanner a numeric vector. Codes 1-5: Stages of puberty a.m. Tanner. testvol a numeric vector. Testicular volume (ml).

Page 5: 1 The Receiver Operating Characteristic (ROC) Curve EPP 245 Statistical Analysis of Laboratory Data.

November 30, 2006 EPP 245 Statistical Analysis of Laboratory Data

5

Predicting Menarche

• Subset Juul data to only females between 8 and 20 years old

• Predict menarch from age as a quantitative variable and Tanner score as a qualitative variable using dummy variables

• Menarch re-coded to be 0/1

Page 6: 1 The Receiver Operating Characteristic (ROC) Curve EPP 245 Statistical Analysis of Laboratory Data.

November 30, 2006 EPP 245 Statistical Analysis of Laboratory Data

6

. logistic men1 age tan2 tan3 tan4 tan5

Logistic regression Number of obs = 519 LR chi2(5) = 568.74 Prob > chi2 = 0.0000Log likelihood = -75.327218 Pseudo R2 = 0.7906

------------------------------------------------------------------------------ men1 | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- age | 3.944062 .7162327 7.56 0.000 2.762915 5.630151 tan2 | .0444044 .0486937 -2.84 0.005 .0051761 .3809341 tan3 | .1369598 .095596 -2.85 0.004 .0348712 .5379227 tan4 | .6969611 .3898228 -0.65 0.519 .2328715 2.085935 tan5 | 9.169558 7.638664 2.66 0.008 1.791671 46.9287------------------------------------------------------------------------------

. predict pmen(option p assumed; Pr(men1))

. predict pmen1, xb

Page 7: 1 The Receiver Operating Characteristic (ROC) Curve EPP 245 Statistical Analysis of Laboratory Data.

November 30, 2006 EPP 245 Statistical Analysis of Laboratory Data

7

. histogram pmen

. graph export pmenhist.wmf

. histogram pmen if men1==0, title("Pre-Menarch")

. graph export pmenhist0.wmf

. histogram pmen if men1==1, title("Post-Menarch")

. graph export pmenhist1.wmf

. histogram pmen1

. graph export pmen1hist.wmf

. hist pmen1 if men1==0, title("Pre-Menarche")

. graph export pmen1hist0.wmf

. hist pmen1 if men1==1, title("Post-Menarche")

. graph export pmen1hist1.wmf

. lroc

Logistic model for men1

number of observations = 519area under ROC curve = 0.9867

. graph export pmenroc.wmf

Page 8: 1 The Receiver Operating Characteristic (ROC) Curve EPP 245 Statistical Analysis of Laboratory Data.

November 30, 2006 EPP 245 Statistical Analysis of Laboratory Data

8

02

46

8D

ensi

ty

0 .2 .4 .6 .8 1Pr(men1)

Page 9: 1 The Receiver Operating Characteristic (ROC) Curve EPP 245 Statistical Analysis of Laboratory Data.

November 30, 2006 EPP 245 Statistical Analysis of Laboratory Data

9

05

1015

Den

sity

0 .2 .4 .6 .8 1Pr(men1)

Pre-Menarch

Page 10: 1 The Receiver Operating Characteristic (ROC) Curve EPP 245 Statistical Analysis of Laboratory Data.

November 30, 2006 EPP 245 Statistical Analysis of Laboratory Data

10

05

1015

Den

sity

0 .2 .4 .6 .8 1Pr(men1)

Post-Menarch

Page 11: 1 The Receiver Operating Characteristic (ROC) Curve EPP 245 Statistical Analysis of Laboratory Data.

November 30, 2006 EPP 245 Statistical Analysis of Laboratory Data

11

0.0

2.0

4.0

6.0

8.1

Den

sity

-10 -5 0 5 10Linear prediction

Page 12: 1 The Receiver Operating Characteristic (ROC) Curve EPP 245 Statistical Analysis of Laboratory Data.

November 30, 2006 EPP 245 Statistical Analysis of Laboratory Data

12

0.0

5.1

.15

.2D

ensi

ty

-10 -5 0 5Linear prediction

Pre-Menarche

Page 13: 1 The Receiver Operating Characteristic (ROC) Curve EPP 245 Statistical Analysis of Laboratory Data.

November 30, 2006 EPP 245 Statistical Analysis of Laboratory Data

13

0.0

5.1

.15

Den

sity

-5 0 5 10Linear prediction

Post-Menarche

Page 14: 1 The Receiver Operating Characteristic (ROC) Curve EPP 245 Statistical Analysis of Laboratory Data.

November 30, 2006 EPP 245 Statistical Analysis of Laboratory Data

14

0.00

0.25

0.50

0.75

1.00

Sen

sitiv

ity

0.00 0.25 0.50 0.75 1.001 - Specificity

Area under ROC curve = 0.9867