1 The Receiver Operating Characteristic (ROC) Curve EPP 245 Statistical Analysis of Laboratory Data.
-
Upload
gillian-carson -
Category
Documents
-
view
217 -
download
1
Transcript of 1 The Receiver Operating Characteristic (ROC) Curve EPP 245 Statistical Analysis of Laboratory Data.
1
The Receiver Operating Characteristic (ROC) Curve
EPP 245
Statistical Analysis of
Laboratory Data
November 30, 2006 EPP 245 Statistical Analysis of Laboratory Data
2
Binary Classification
• Suppose we have two groups for which each case is a member of one or the other, and that we know the correct classification (“truth”).
• Suppose we have a prediction method that produces a single numerical value, and that small values of that number suggest membership in group 1 and large values suggest membership in group 2
November 30, 2006 EPP 245 Statistical Analysis of Laboratory Data
3
• If we pick a cutpoint t, we can assign any case with a predicted value ≤ t to group 1 and the others to group 2.
• For that value of t, we can compute the number correctly assigned to group 2 and the number incorrectly assigned to group 2 (true positives and false positives).
• For t small enough, all will be assigned to group 2 and for t large enough all will be assigned to group 1.
• The ROC curve is a plot of true positives vs. false positives
November 30, 2006 EPP 245 Statistical Analysis of Laboratory Data
4
Juul's IGF data
Description:
The 'juul' data frame has 1339 rows and 6 columns. It contains a reference sample of the distribution of insulin-like growth factor (IGF-I), one observation per subject in various ages with the bulk of the data collected in connection with school physical examinations.
Variables:
age a numeric vector (years). menarche a numeric vector. Has menarche occurred (code 1: no, 2: yes)? sex a numeric vector (1: boy, 2: girl). igf1 a numeric vector. Insulin-like growth factor ($mu$g/l). tanner a numeric vector. Codes 1-5: Stages of puberty a.m. Tanner. testvol a numeric vector. Testicular volume (ml).
November 30, 2006 EPP 245 Statistical Analysis of Laboratory Data
5
Predicting Menarche
• Subset Juul data to only females between 8 and 20 years old
• Predict menarch from age as a quantitative variable and Tanner score as a qualitative variable using dummy variables
• Menarch re-coded to be 0/1
November 30, 2006 EPP 245 Statistical Analysis of Laboratory Data
6
. logistic men1 age tan2 tan3 tan4 tan5
Logistic regression Number of obs = 519 LR chi2(5) = 568.74 Prob > chi2 = 0.0000Log likelihood = -75.327218 Pseudo R2 = 0.7906
------------------------------------------------------------------------------ men1 | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- age | 3.944062 .7162327 7.56 0.000 2.762915 5.630151 tan2 | .0444044 .0486937 -2.84 0.005 .0051761 .3809341 tan3 | .1369598 .095596 -2.85 0.004 .0348712 .5379227 tan4 | .6969611 .3898228 -0.65 0.519 .2328715 2.085935 tan5 | 9.169558 7.638664 2.66 0.008 1.791671 46.9287------------------------------------------------------------------------------
. predict pmen(option p assumed; Pr(men1))
. predict pmen1, xb
November 30, 2006 EPP 245 Statistical Analysis of Laboratory Data
7
. histogram pmen
. graph export pmenhist.wmf
. histogram pmen if men1==0, title("Pre-Menarch")
. graph export pmenhist0.wmf
. histogram pmen if men1==1, title("Post-Menarch")
. graph export pmenhist1.wmf
. histogram pmen1
. graph export pmen1hist.wmf
. hist pmen1 if men1==0, title("Pre-Menarche")
. graph export pmen1hist0.wmf
. hist pmen1 if men1==1, title("Post-Menarche")
. graph export pmen1hist1.wmf
. lroc
Logistic model for men1
number of observations = 519area under ROC curve = 0.9867
. graph export pmenroc.wmf
November 30, 2006 EPP 245 Statistical Analysis of Laboratory Data
8
02
46
8D
ensi
ty
0 .2 .4 .6 .8 1Pr(men1)
November 30, 2006 EPP 245 Statistical Analysis of Laboratory Data
9
05
1015
Den
sity
0 .2 .4 .6 .8 1Pr(men1)
Pre-Menarch
November 30, 2006 EPP 245 Statistical Analysis of Laboratory Data
10
05
1015
Den
sity
0 .2 .4 .6 .8 1Pr(men1)
Post-Menarch
November 30, 2006 EPP 245 Statistical Analysis of Laboratory Data
11
0.0
2.0
4.0
6.0
8.1
Den
sity
-10 -5 0 5 10Linear prediction
November 30, 2006 EPP 245 Statistical Analysis of Laboratory Data
12
0.0
5.1
.15
.2D
ensi
ty
-10 -5 0 5Linear prediction
Pre-Menarche
November 30, 2006 EPP 245 Statistical Analysis of Laboratory Data
13
0.0
5.1
.15
Den
sity
-5 0 5 10Linear prediction
Post-Menarche
November 30, 2006 EPP 245 Statistical Analysis of Laboratory Data
14
0.00
0.25
0.50
0.75
1.00
Sen
sitiv
ity
0.00 0.25 0.50 0.75 1.001 - Specificity
Area under ROC curve = 0.9867