Considerations for Evaluation of Diagnostic Performance
Transcript of Considerations for Evaluation of Diagnostic Performance
C E N T E RGLAUCOMAUCSDUCSD
Considerations for Evaluation of Considerations for Evaluation of Diagnostic PerformanceDiagnostic Performance
Felipe A. MedeirosFelipe A. Medeiros, M.D., Ph.D., M.D., Ph.D.Professor of OphthalmologyProfessor of Ophthalmology
Hamilton Glaucoma CenterHamilton Glaucoma Center University of California, San DiegoUniversity of California, San Diego
C E N T E RGLAUCOMAUCSDUCSD
I have the following financial disclosures:I have the following financial disclosures:
•• Research Support:Research Support:–– National Eye Institute R01 National Eye Institute R01 EY021818 –– Alcon, Inc.Alcon, Inc.–– Allergan, Inc.Allergan, Inc.–– CarlCarl--Zeiss Meditec, Inc.Zeiss Meditec, Inc.–– Heidelberg EngineeringHeidelberg Engineering–– OptovueOptovue–– ReichertReichert–– SensimedSensimed–– Merck, Inc. Merck, Inc.
C E N T E RGLAUCOMAUCSDUCSD
• Diagnosis
• Decrease diagnostic uncertainty in those SUSPECTED of a condition (suspected of having damage or suspected of having progression)
• Screening
• Identify abnormal or suspected cases in the general population or pre- selected subjects (e.g., older population, positive family history)
• Prognosis• Determine risk of developing a condition
PotentialPotential Applications of Imaging Instruments Applications of Imaging Instruments
C E N T E RGLAUCOMAUCSDUCSD
The design of the study should take into account the purpose of the test and involve
the clinically relevant population
PotentialPotential Applications of Imaging Instruments Applications of Imaging Instruments
C E N T E RGLAUCOMAUCSDUCSD
• Diagnostic tests are used to decrease uncertainty about presence of a condition
• Cross-sectional assessment: The test is applied at a single point in time in those suspected of having the disease
• Longitudinal assessment: The test is applied multiple times during follow-up in suspects or those with confirmed disease
Diagnostic Accuracy StudiesDiagnostic Accuracy Studies
Does this patient have glaucoma?
Does this patient have disease progression?
C E N T E RGLAUCOMAUCSDUCSD
Suspicions optic disc appearanceSuspicions optic disc appearance, with normal or , with normal or suspicious visual field resultssuspicious visual field results
Uncertainty is what characterizes glaucoma suspects
Does this patient have glaucoma?
C E N T E RGLAUCOMAUCSDUCSD
Diagnostic Studies in ImagingDiagnostic Studies in Imaging
•• Diagnostic Studies in Glaucoma Diagnostic Studies in Glaucoma –– CasesCases: : Glaucoma patients with Glaucoma patients with repeatable visual field lossrepeatable visual field loss
–– Controls:Controls: Healthy individuals Healthy individuals (volunteers without any suspicious (volunteers without any suspicious finding)finding)
•• Diagnostic accuracy measuresDiagnostic accuracy measures–– Proportion of cases correctly identified by the test as being Proportion of cases correctly identified by the test as being
abnormal (sensitivity)abnormal (sensitivity)
–– Proportion of controls correctly identified by the test as beingProportion of controls correctly identified by the test as being normal (specificity)normal (specificity)
–– ROC curvesROC curves
C E N T E RGLAUCOMAUCSDUCSD
Typical cases included in most studiesTypical cases included in most studies……
Why do we need an imaging instrument to diagnose glaucoma in this patient ?
C E N T E RGLAUCOMAUCSDUCSD
A typical controlA typical control…… healthy volunteerhealthy volunteer
IOP = 10 mmHg
C E N T E RGLAUCOMAUCSDUCSD
LimitaLimitations of tions of ““conventionalconventional”” studiesstudies
•• CaseCase--control studies including only patients with wellcontrol studies including only patients with well--defined disease defined disease and healthy subjectsand healthy subjects
…… are important for an initial evaluation of a diagnostic testare important for an initial evaluation of a diagnostic test
•• However...However...... In clinical practice, ... In clinical practice, diagnosticdiagnostic tests are used to evaluate tests are used to evaluate
patients who are patients who are suspectedsuspected of having the disease, not of having the disease, not patients with confirmed diseasepatients with confirmed disease
““when the diagnosis is obvious to the eye, we donwhen the diagnosis is obvious to the eye, we don’’t need further diagnostic testst need further diagnostic tests””
(Straus SE et al. Evidence(Straus SE et al. Evidence--Based Medicine: How to Practice and Teach it) Based Medicine: How to Practice and Teach it)
C E N T E RGLAUCOMAUCSDUCSD
Limitations of Limitations of ““conventionalconventional”” studiesstudies
•• The population sample included in most studies may not be The population sample included in most studies may not be representative of the one in which we apply the diagnostic representative of the one in which we apply the diagnostic tests in everyday practicetests in everyday practice
•• Estimates of sensitivity and specificity obtained from these Estimates of sensitivity and specificity obtained from these studies may not be directly applicable in clinical practicestudies may not be directly applicable in clinical practice
C E N T E RGLAUCOMAUCSDUCSD
Do test results distinguish patients with and without the target disorder among those in whom it is clinically sensible to suspect the disorder?
“If sensitivity is determined in seriously ill subjects and specificity in clearly healthy individuals, both will be grossly overestimated” (Sackett)
Br J Ophthalmol. 2007 Mar;91(3):273-4
C E N T E RGLAUCOMAUCSDUCSD
Healthy sample, recruited from the general population
Test more abnormal (larger cup area or smaller rim area)
C E N T E RGLAUCOMAUCSDUCSD
Cut-off to determine abnormalityOnly 1% false-positives
For example, abnormal MRA
Should I expect the test to have only 1% false positives in my
clinical practice?
Healthy sample, recruited from the general population
Test more abnormal (larger cup area or smaller rim area)
C E N T E RGLAUCOMAUCSDUCSD
Clinical population will be enriched by individuals with suspicious discs. The test will not have the same specificity
Healthy sample, recruited from the general population
Test more abnormal (larger cup area or smaller rim area)
C E N T E RGLAUCOMAUCSDUCSD
If the purpose of the test is to complement If the purpose of the test is to complement current clinical evaluationcurrent clinical evaluation……
How should we design How should we design diagnostic accuracy studies? diagnostic accuracy studies?
C E N T E RGLAUCOMAUCSDUCSD
An example: An example:
Accuracy of ECG for diagnosing Accuracy of ECG for diagnosing myocardial infarctionmyocardial infarction
C E N T E RGLAUCOMAUCSDUCSD
Accuracy of ECG for myocardial infarctionAccuracy of ECG for myocardial infarction
1. Panju AA, Hemmelgarn BR, Guyatt GH, Simel DL. The rational cl1. Panju AA, Hemmelgarn BR, Guyatt GH, Simel DL. The rational clinical examination. inical examination. Is this patient having a myocardial infarction? JAMA 1998;280:12Is this patient having a myocardial infarction? JAMA 1998;280:125656--63.63.
• Review of diagnostic accuracy studies of ECG1
• All included patients were suspect of MI at the time of the ECG
• Diagnosis of MI was subsequently confirmed or ruled out based on levels of cardiac enzymes (reference test)
C E N T E RGLAUCOMAUCSDUCSD
Diagnostic Accuracy of ECGDiagnostic Accuracy of ECG
1. Panju AA, Hemmelgarn BR, Guyatt GH, Simel DL. The rational cl1. Panju AA, Hemmelgarn BR, Guyatt GH, Simel DL. The rational clinical examination. inical examination. Is this patient having a myocardial infarction? JAMA 1998;280:12Is this patient having a myocardial infarction? JAMA 1998;280:125656--63.63.
Patients suspected of MI (Chest pain)
Myocardial infarction
ECG
Without myocardial infarction(Chest pain from other causes)
Reference test: Cardiac enzymes
C E N T E RGLAUCOMAUCSDUCSD
How should we design How should we design diagnostic accuracy studies in diagnostic accuracy studies in
glaucoma? glaucoma?
C E N T E RGLAUCOMAUCSDUCSD
Diagnostic Accuracy of Test X Diagnostic Accuracy of Test X
Patients suspected of having glaucoma(High IOP, suspicious discs, etc)
Glaucoma
TEST X
No glaucoma
Reference test
C E N T E RGLAUCOMAUCSDUCSD
What reference standard What reference standard should be used? should be used?
Visual Field – Not a good reference standard if you want to evaluate additional clinical benefit of test X
C E N T E RGLAUCOMAUCSDUCSD
What reference standard What reference standard should be used? should be used?
Cross-sectional optic disc evaluation (photos, slit-lamp)
Not a good reference standard because of poor accuracy, prognostic value
C E N T E RGLAUCOMAUCSDUCSD
What reference standard What reference standard should be used? should be used?
Diagnostic studies involving glaucoma suspects will requirelongitudinal follow-up (historical or prospective)
C E N T E RGLAUCOMAUCSDUCSD
SUSPECTS
Apply the diagnostic test BASELINE
Follow them over timeVisual fields, photos
Progress:Glaucoma
Do not progress
Is this a good design?
This is a prognostic study
Important to assess measures of prognostic ability (hazard ratios not enough)
Need to account for other risk factors
Test may have a weak prognostic ability but be a good diagnostic test
C E N T E RGLAUCOMAUCSDUCSD
Test may have a weak prognostic ability Test may have a weak prognostic ability but be a good diagnostic testbut be a good diagnostic test
Ocular hypertensive eyes WITHOUT nerve damage
Progressive nerve damage:Glaucoma
No progressionFollow them over time
Imaging tests will not be able to discriminate these eyes at baseline
(they don’t have damage) Poor prognostic value
Imaging tests may clearly separate eyes with nerve damage
from those without damage hereGood diagnostic value
C E N T E RGLAUCOMAUCSDUCSD
Test may have a weak prognostic ability Test may have a weak prognostic ability but be a good diagnostic testbut be a good diagnostic test
Ocular hypertensive eyes WITHOUT nerve damage
Progressive nerve damage:Glaucoma
No progressionFollow them over time
How can we assess diagnostic performance hereif we cannot use visual fields as reference standard?
Historical follow-up
C E N T E RGLAUCOMAUCSDUCSD
Progressive Optic Disc Damage is Highly Progressive Optic Disc Damage is Highly Predictive of Development of Functional Loss Predictive of Development of Functional Loss in Glaucoma in Glaucoma
Predictor R2 (95% confidence interval)
Optic Disc Progression 79% (65% - 87%)
Baseline GON grading 21% (9% - 37%)
Baseline vertical C/D ratio 21% (8% - 37%)
Baseline IOP 10% (2% – 22%)
CCT 6% (1% – 15%)
Baseline PSD 26% (15% - 40%)
Age 23% (11% - 39%)
Medeiros FA, et al. Arch Ophthalmol 2009; 127: 1250-6.
C E N T E RGLAUCOMAUCSDUCSDMedeiros FA, et al. IOVS 2007; 48: 214Medeiros FA, et al. IOVS 2007; 48: 214--22. 22.
•• Influence of the studied population on the Diagnostic Influence of the studied population on the Diagnostic Accuracy of CSLO (HRT) Accuracy of CSLO (HRT)
C E N T E RGLAUCOMAUCSDUCSD
The Effect of Study DesignThe Effect of Study Design……
•• Compared Diagnostic Accuracy of CSLO in 2 scenarios:Compared Diagnostic Accuracy of CSLO in 2 scenarios:
•• ANALYSIS 1ANALYSIS 1–– Discriminate patients with glaucomatous visual field loss from Discriminate patients with glaucomatous visual field loss from
healthy subjectshealthy subjects (recruited from general population)(recruited from general population)
•• ANALYSIS 2ANALYSIS 2–– Discriminate suspects who have glaucoma from suspects who do Discriminate suspects who have glaucoma from suspects who do
not have glaucoma (using history of previous optic disc not have glaucoma (using history of previous optic disc progression as reference standard)progression as reference standard)
Medeiros FA, et al. IOVS 2007; 48: 214Medeiros FA, et al. IOVS 2007; 48: 214--22. 22.
C E N T E RGLAUCOMAUCSDUCSD
Analysis 2 Analysis 2 Cohort of Glaucoma SuspectsCohort of Glaucoma Suspects
Suspect A Suspect B
C E N T E RGLAUCOMAUCSDUCSD
Analysis 2 Analysis 2
Suspect, but normal
Patient followed for 18 years without treatment and Patient followed for 18 years without treatment and without any changes to the optic nerve and VFwithout any changes to the optic nerve and VF
C E N T E RGLAUCOMAUCSDUCSD
Analysis 2Analysis 2
Evidence of PROGRESSIVE glaucomatous damage confirms diagnosis of glaucoma
C E N T E RGLAUCOMAUCSDUCSD
The Effect of Study DesignThe Effect of Study Design……
•• Diferences in diagnostic accuracy in the two Diferences in diagnostic accuracy in the two analyses analyses –– Glaucoma Probability Score (GPS)Glaucoma Probability Score (GPS)
Medeiros FA, et al. IOVS 2007; 48: 214Medeiros FA, et al. IOVS 2007; 48: 214--22. 22.
C E N T E RGLAUCOMAUCSDUCSD
The Effect of Study DesignThe Effect of Study Design……
Analysis 1ROC curve area = 0.89
Analysis 2ROC curve area = 0.65
Medeiros FA, et al. IOVS 2007; 48: 214Medeiros FA, et al. IOVS 2007; 48: 214--22. 22.
C E N T E RGLAUCOMAUCSDUCSD
Spectrum BiasSpectrum Bias
•• If patients are referred as suspicious of glaucoma by the If patients are referred as suspicious of glaucoma by the same characteristic measured by the diagnostic test (e.g., rim same characteristic measured by the diagnostic test (e.g., rim thinning), the test will have little additional value for thinning), the test will have little additional value for decreasing diagnostic uncertaintydecreasing diagnostic uncertainty
C E N T E RGLAUCOMAUCSDUCSD
• Diagnosis
• Decrease diagnostic uncertainty in those SUSPECTED of a condition (suspected of having damage or suspected of having progression)
• Screening
• Identify abnormal or suspected cases in the general population or pre- selected subjects (e.g., older population, positive family history)
• Prognosis• Determine risk of developing a condition
PotentialPotential Applications of Imaging Instruments Applications of Imaging Instruments
C E N T E RGLAUCOMAUCSDUCSD
• Estimates of diagnostic accuracy obtained in clearly glaucomatous versus healthy eyes seem to be more relevant to opportunistic screening situations
ScreeningScreening
C E N T E RGLAUCOMAUCSDUCSD
Diagnostic Studies Diagnostic Studies SDOCT Retinal Nerve Fiber LayerSDOCT Retinal Nerve Fiber Layer
Felipe A. MedeirosFelipe A. Medeiros, M.D., Ph.D., M.D., Ph.D.Professor of OphthalmologyProfessor of Ophthalmology
Hamilton Glaucoma CenterHamilton Glaucoma Center University of California, San DiegoUniversity of California, San Diego
C E N T E RGLAUCOMAUCSDUCSD
•• 134 glaucoma suspects followed for average of 14 134 glaucoma suspects followed for average of 14
±±
3.6 years3.6 years
•• 42 progressive GON (photos)42 progressive GON (photos)
•• 86 control eyes (no progression followed untreated)86 control eyes (no progression followed untreated)
Ophthalmology 2012 (E-pub ahead of publication)
C E N T E RGLAUCOMAUCSDUCSDOphthalmology 2012 (E-pub ahead of publication)
RNFL parameters performed significantly better than topographic RNFL parameters performed significantly better than topographic disc parameters for diagnosing damage in suspectsdisc parameters for diagnosing damage in suspects
ROC area SDOCT average thickness = 0 .86
ROC area HRT rim area = 0.72
C E N T E RGLAUCOMAUCSDUCSD
RNFL versus optic disc topographyRNFL versus optic disc topography
•• In clinical practice, patients are usually referred as In clinical practice, patients are usually referred as suspected of having glaucoma because of suspected of having glaucoma because of suspicious rim thinning or large cupssuspicious rim thinning or large cups
•• In this situation, RNFL parameters seem to offer In this situation, RNFL parameters seem to offer more benefit as a complementary test to clinical more benefit as a complementary test to clinical examinationexamination
C E N T E RGLAUCOMAUCSDUCSD
•• 138 eyes with glaucomatous visual field defects138 eyes with glaucomatous visual field defects•• 106 healthy eyes106 healthy eyes•• Diagnostic accuracies of the 3 SDOCT instrumentsDiagnostic accuracies of the 3 SDOCT instruments
Ophthalmology 2011;118:1334Ophthalmology 2011;118:1334––13391339
C E N T E RGLAUCOMAUCSDUCSD
Comparison of Cirrus HDOCT, RTVue and Spectralis
Spectralis Cirrus RTVue
Global RNFL thickness 0.862 0.864 0.844
Inferior thickness 0.837 0.849 0.846
Superior thickness 0.865 0.863 0.841
Areas under ROC curves
Ophthalmology 2011;118:1334Ophthalmology 2011;118:1334––13391339
C E N T E RGLAUCOMAUCSDUCSD
Diagnostic Studies of RNFL parametersDiagnostic Studies of RNFL parameters
•• Diagnostic accuracy estimates obtained from Diagnostic accuracy estimates obtained from ““conventional conventional studiesstudies”” (healthy versus glaucomatous field loss) are similar (healthy versus glaucomatous field loss) are similar to those estimates obtained from cohort studies of suspectsto those estimates obtained from cohort studies of suspects
C E N T E RGLAUCOMAUCSDUCSD
Disease severity still affects Disease severity still affects
accuracy of RNFL parametersaccuracy of RNFL parameters……
C E N T E RGLAUCOMAUCSDUCSD
Areas under ROCs for different Areas under ROCs for different levels of disease severitylevels of disease severity
(SDOCT)(SDOCT)
RNFL thickness
VFI Global Inferior Superior
100% 0.822 0.851 0.812
90% 0.886 0.871 0.874
80% 0.932 0.920 0.921
70% 0.962 0.954 0.952
IOVS 2010;51:4104–410
C E N T E RGLAUCOMAUCSDUCSD
Does SDOCT RNFL assessment Does SDOCT RNFL assessment
assist in glaucoma diagnosis?assist in glaucoma diagnosis?
““Diagnosis is not about finding the truth but Diagnosis is not about finding the truth but limiting uncertaintylimiting uncertainty””
(Straus SE et al. Evidence-Based Medicine: How to Practice and Teach it)
C E N T E RGLAUCOMAUCSDUCSD
Evaluating a Glaucoma SuspectEvaluating a Glaucoma Suspect
Pre-test probability of disease = 30%
C E N T E RGLAUCOMAUCSDUCSD
How can SDOCT help us with this patient ?How can SDOCT help us with this patient ?
Pre-test probabilityof glaucoma (30%)
IMAGINGTEST
Post-test probabilityof glaucoma ?
SDOCT result
C E N T E RGLAUCOMAUCSDUCSD
How do we calculate the postHow do we calculate the post--test probability?test probability?
PostPost--test odds of disease = LR x Pretest odds of disease = LR x Pre--test odds of diseasetest odds of disease
Odds = probability / 1- probability
C E N T E RGLAUCOMAUCSDUCSD
Continuous Likelihood Ratios for Glaucoma Continuous Likelihood Ratios for Glaucoma Diagnosis with Spectral Domain OCTDiagnosis with Spectral Domain OCT
The likelihood ratio for a specific test value is the is the tangent to the ROC curve at the corresponding value
Lisboa R, Mansouri K, Zangwill L, Weinreb RN and Medeiros FA. ARLisboa R, Mansouri K, Zangwill L, Weinreb RN and Medeiros FA. ARVO 2011VO 2011
C E N T E RGLAUCOMAUCSDUCSDLisboa R, Mansouri K, Zangwill L, Weinreb RN and Medeiros FA. ARLisboa R, Mansouri K, Zangwill L, Weinreb RN and Medeiros FA. ARVO 2011VO 2011
Continuous Likelihood Ratios for Glaucoma Continuous Likelihood Ratios for Glaucoma Diagnosis with Spectral Domain OCTDiagnosis with Spectral Domain OCT
C E N T E RGLAUCOMAUCSDUCSD
Pre-test probability of disease = 30%
Evaluating a Glaucoma SuspectEvaluating a Glaucoma Suspect
Post-test probability of disease = 93%
SDOCT result
C E N T E RGLAUCOMAUCSDUCSD
Diagnostic Accuracy of SDOCT RNFLDiagnostic Accuracy of SDOCT RNFL
•• Diagnostic accuracy studies involving the clinically Diagnostic accuracy studies involving the clinically relevant population show a clear benefit of RNFL relevant population show a clear benefit of RNFL assessment with SDOCT in decreasing uncertainty in assessment with SDOCT in decreasing uncertainty in glaucoma diagnosisglaucoma diagnosis
C E N T E RGLAUCOMAUCSDUCSD
For a cross-sectional study describing the distribution of measurements in known normal and known diseased subjects (excluding suspects):
a) How should the diseased subject population be defined? What clinical work-up should be done to establish these populations?
A: Cross-sectional studies including only clearly disease subjects versus healthy normals may lead to biased estimates of accuracy. This seems to be worse for certain instruments/parameters versus others.
Studies involving healthy versus glaucomatous field loss should be conducted for initial evaluation, but should be followed by studies evaluating the clinically relevant population (suspects)
C E N T E RGLAUCOMAUCSDUCSD
Can the diseased population be further divided by severity of disease using a clinical reference method (e.g., perimetry)? If so, should the clinical performance of the imaging device be characterized separately for each severity group?
A: YES. Important to include a broad spectrum of disease severity and use methods to evaluate diagnostic performance according to disease severity (e.g., ROC regression methods).
C E N T E RGLAUCOMAUCSDUCSD
For a longitudinal diagnostic performance study where the structural measurement is taken at baseline and the clinical reference standard consists of a baseline assessment with follow-up:
a)What subjects should be included in the study population (including glaucoma suspects)?
This is a prognostic study. What are you trying to predict? This is a prognostic study. What are you trying to predict?
If we are trying to predict development of glaucoma, the populatIf we are trying to predict development of glaucoma, the population ion should be of suspects (without clear signs of damage)should be of suspects (without clear signs of damage)
C E N T E RGLAUCOMAUCSDUCSD
For a longitudinal diagnostic performance study where the structural measurement is taken at baseline and the clinical reference standard consists of a baseline assessment with follow-up:
b) What is an appropriate clinical reference standard? Again, depends on what you are trying to predict. If development of glaucoma, then appropriate standards could be development of visual field loss and/or progressive optic nerve damage
C E N T E RGLAUCOMAUCSDUCSD
For a longitudinal diagnostic performance study where the structural measurement is taken at baseline and the clinical reference standard consists of a baseline assessment with follow-up:
c)What minimum follow-up period would provide assurance that the subject has been appropriately classified?
Not necessarily we need to establish a minimum period, as long as we take into account censoring. It is very important to adequately evaluate prognostic ability (c-index, modified R2, reclassification tables, predictiveness measure)