Considerations for Evaluation of Diagnostic Performance

65
CENTER GLAUCOMA UCSD UCSD Considerations for Evaluation of Considerations for Evaluation of Diagnostic Performance Diagnostic Performance Felipe A. Medeiros Felipe A. Medeiros , M.D., Ph.D. , M.D., Ph.D. Professor of Ophthalmology Professor of Ophthalmology Hamilton Glaucoma Center Hamilton Glaucoma Center University of California, San Diego University of California, San Diego

Transcript of Considerations for Evaluation of Diagnostic Performance

C E N T E RGLAUCOMAUCSDUCSD

Considerations for Evaluation of Considerations for Evaluation of Diagnostic PerformanceDiagnostic Performance

Felipe A. MedeirosFelipe A. Medeiros, M.D., Ph.D., M.D., Ph.D.Professor of OphthalmologyProfessor of Ophthalmology

Hamilton Glaucoma CenterHamilton Glaucoma Center University of California, San DiegoUniversity of California, San Diego

C E N T E RGLAUCOMAUCSDUCSD

I have the following financial disclosures:I have the following financial disclosures:

•• Research Support:Research Support:–– National Eye Institute R01 National Eye Institute R01 EY021818 –– Alcon, Inc.Alcon, Inc.–– Allergan, Inc.Allergan, Inc.–– CarlCarl--Zeiss Meditec, Inc.Zeiss Meditec, Inc.–– Heidelberg EngineeringHeidelberg Engineering–– OptovueOptovue–– ReichertReichert–– SensimedSensimed–– Merck, Inc. Merck, Inc.

C E N T E RGLAUCOMAUCSDUCSD

• Diagnosis

• Decrease diagnostic uncertainty in those SUSPECTED of a condition (suspected of having damage or suspected of having progression)

• Screening

• Identify abnormal or suspected cases in the general population or pre- selected subjects (e.g., older population, positive family history)

• Prognosis• Determine risk of developing a condition

PotentialPotential Applications of Imaging Instruments Applications of Imaging Instruments

C E N T E RGLAUCOMAUCSDUCSD

The design of the study should take into account the purpose of the test and involve

the clinically relevant population

PotentialPotential Applications of Imaging Instruments Applications of Imaging Instruments

C E N T E RGLAUCOMAUCSDUCSD

• Diagnostic tests are used to decrease uncertainty about presence of a condition

• Cross-sectional assessment: The test is applied at a single point in time in those suspected of having the disease

• Longitudinal assessment: The test is applied multiple times during follow-up in suspects or those with confirmed disease

Diagnostic Accuracy StudiesDiagnostic Accuracy Studies

Does this patient have glaucoma?

Does this patient have disease progression?

C E N T E RGLAUCOMAUCSDUCSD

Suspicions optic disc appearanceSuspicions optic disc appearance, with normal or , with normal or suspicious visual field resultssuspicious visual field results

Uncertainty is what characterizes glaucoma suspects

Does this patient have glaucoma?

C E N T E RGLAUCOMAUCSDUCSD

Diagnostic Studies in ImagingDiagnostic Studies in Imaging

•• Diagnostic Studies in Glaucoma Diagnostic Studies in Glaucoma –– CasesCases: : Glaucoma patients with Glaucoma patients with repeatable visual field lossrepeatable visual field loss

–– Controls:Controls: Healthy individuals Healthy individuals (volunteers without any suspicious (volunteers without any suspicious finding)finding)

•• Diagnostic accuracy measuresDiagnostic accuracy measures–– Proportion of cases correctly identified by the test as being Proportion of cases correctly identified by the test as being

abnormal (sensitivity)abnormal (sensitivity)

–– Proportion of controls correctly identified by the test as beingProportion of controls correctly identified by the test as being normal (specificity)normal (specificity)

–– ROC curvesROC curves

C E N T E RGLAUCOMAUCSDUCSD

Typical cases included in most studiesTypical cases included in most studies……

Why do we need an imaging instrument to diagnose glaucoma in this patient ?

C E N T E RGLAUCOMAUCSDUCSD

A typical controlA typical control…… healthy volunteerhealthy volunteer

IOP = 10 mmHg

C E N T E RGLAUCOMAUCSDUCSD

LimitaLimitations of tions of ““conventionalconventional”” studiesstudies

•• CaseCase--control studies including only patients with wellcontrol studies including only patients with well--defined disease defined disease and healthy subjectsand healthy subjects

…… are important for an initial evaluation of a diagnostic testare important for an initial evaluation of a diagnostic test

•• However...However...... In clinical practice, ... In clinical practice, diagnosticdiagnostic tests are used to evaluate tests are used to evaluate

patients who are patients who are suspectedsuspected of having the disease, not of having the disease, not patients with confirmed diseasepatients with confirmed disease

““when the diagnosis is obvious to the eye, we donwhen the diagnosis is obvious to the eye, we don’’t need further diagnostic testst need further diagnostic tests””

(Straus SE et al. Evidence(Straus SE et al. Evidence--Based Medicine: How to Practice and Teach it) Based Medicine: How to Practice and Teach it)

C E N T E RGLAUCOMAUCSDUCSD

Limitations of Limitations of ““conventionalconventional”” studiesstudies

•• The population sample included in most studies may not be The population sample included in most studies may not be representative of the one in which we apply the diagnostic representative of the one in which we apply the diagnostic tests in everyday practicetests in everyday practice

•• Estimates of sensitivity and specificity obtained from these Estimates of sensitivity and specificity obtained from these studies may not be directly applicable in clinical practicestudies may not be directly applicable in clinical practice

C E N T E RGLAUCOMAUCSDUCSD

Do test results distinguish patients with and without the target disorder among those in whom it is clinically sensible to suspect the disorder?

“If sensitivity is determined in seriously ill subjects and specificity in clearly healthy individuals, both will be grossly overestimated” (Sackett)

Br J Ophthalmol. 2007 Mar;91(3):273-4

C E N T E RGLAUCOMAUCSDUCSD

C E N T E RGLAUCOMAUCSDUCSD

Healthy sample, recruited from the general population

Test more abnormal (larger cup area or smaller rim area)

C E N T E RGLAUCOMAUCSDUCSD

Cut-off to determine abnormalityOnly 1% false-positives

For example, abnormal MRA

Should I expect the test to have only 1% false positives in my

clinical practice?

Healthy sample, recruited from the general population

Test more abnormal (larger cup area or smaller rim area)

C E N T E RGLAUCOMAUCSDUCSD

Clinical population will be enriched by individuals with suspicious discs. The test will not have the same specificity

Healthy sample, recruited from the general population

Test more abnormal (larger cup area or smaller rim area)

C E N T E RGLAUCOMAUCSDUCSD

If the purpose of the test is to complement If the purpose of the test is to complement current clinical evaluationcurrent clinical evaluation……

How should we design How should we design diagnostic accuracy studies? diagnostic accuracy studies?

C E N T E RGLAUCOMAUCSDUCSD

An example: An example:

Accuracy of ECG for diagnosing Accuracy of ECG for diagnosing myocardial infarctionmyocardial infarction

C E N T E RGLAUCOMAUCSDUCSD

Accuracy of ECG for myocardial infarctionAccuracy of ECG for myocardial infarction

1. Panju AA, Hemmelgarn BR, Guyatt GH, Simel DL. The rational cl1. Panju AA, Hemmelgarn BR, Guyatt GH, Simel DL. The rational clinical examination. inical examination. Is this patient having a myocardial infarction? JAMA 1998;280:12Is this patient having a myocardial infarction? JAMA 1998;280:125656--63.63.

• Review of diagnostic accuracy studies of ECG1

• All included patients were suspect of MI at the time of the ECG

• Diagnosis of MI was subsequently confirmed or ruled out based on levels of cardiac enzymes (reference test)

C E N T E RGLAUCOMAUCSDUCSD

Diagnostic Accuracy of ECGDiagnostic Accuracy of ECG

1. Panju AA, Hemmelgarn BR, Guyatt GH, Simel DL. The rational cl1. Panju AA, Hemmelgarn BR, Guyatt GH, Simel DL. The rational clinical examination. inical examination. Is this patient having a myocardial infarction? JAMA 1998;280:12Is this patient having a myocardial infarction? JAMA 1998;280:125656--63.63.

Patients suspected of MI (Chest pain)

Myocardial infarction

ECG

Without myocardial infarction(Chest pain from other causes)

Reference test: Cardiac enzymes

C E N T E RGLAUCOMAUCSDUCSD

How should we design How should we design diagnostic accuracy studies in diagnostic accuracy studies in

glaucoma? glaucoma?

C E N T E RGLAUCOMAUCSDUCSD

Diagnostic Accuracy of Test X Diagnostic Accuracy of Test X

Patients suspected of having glaucoma(High IOP, suspicious discs, etc)

Glaucoma

TEST X

No glaucoma

Reference test

C E N T E RGLAUCOMAUCSDUCSD

What reference standard What reference standard should be used? should be used?

Visual Field – Not a good reference standard if you want to evaluate additional clinical benefit of test X

C E N T E RGLAUCOMAUCSDUCSD

What reference standard What reference standard should be used? should be used?

Cross-sectional optic disc evaluation (photos, slit-lamp)

Not a good reference standard because of poor accuracy, prognostic value

C E N T E RGLAUCOMAUCSDUCSD

What reference standard What reference standard should be used? should be used?

Diagnostic studies involving glaucoma suspects will requirelongitudinal follow-up (historical or prospective)

C E N T E RGLAUCOMAUCSDUCSD

SUSPECTS

Apply the diagnostic test BASELINE

Follow them over timeVisual fields, photos

Progress:Glaucoma

Do not progress

Is this a good design?

This is a prognostic study

Important to assess measures of prognostic ability (hazard ratios not enough)

Need to account for other risk factors

Test may have a weak prognostic ability but be a good diagnostic test

C E N T E RGLAUCOMAUCSDUCSD

Test may have a weak prognostic ability Test may have a weak prognostic ability but be a good diagnostic testbut be a good diagnostic test

Ocular hypertensive eyes WITHOUT nerve damage

Progressive nerve damage:Glaucoma

No progressionFollow them over time

Imaging tests will not be able to discriminate these eyes at baseline

(they don’t have damage) Poor prognostic value

Imaging tests may clearly separate eyes with nerve damage

from those without damage hereGood diagnostic value

C E N T E RGLAUCOMAUCSDUCSD

Test may have a weak prognostic ability Test may have a weak prognostic ability but be a good diagnostic testbut be a good diagnostic test

Ocular hypertensive eyes WITHOUT nerve damage

Progressive nerve damage:Glaucoma

No progressionFollow them over time

How can we assess diagnostic performance hereif we cannot use visual fields as reference standard?

Historical follow-up

C E N T E RGLAUCOMAUCSDUCSD

Progressive Optic Disc Damage is Highly Progressive Optic Disc Damage is Highly Predictive of Development of Functional Loss Predictive of Development of Functional Loss in Glaucoma in Glaucoma

Predictor R2 (95% confidence interval)

Optic Disc Progression 79% (65% - 87%)

Baseline GON grading 21% (9% - 37%)

Baseline vertical C/D ratio 21% (8% - 37%)

Baseline IOP 10% (2% – 22%)

CCT 6% (1% – 15%)

Baseline PSD 26% (15% - 40%)

Age 23% (11% - 39%)

Medeiros FA, et al. Arch Ophthalmol 2009; 127: 1250-6.

C E N T E RGLAUCOMAUCSDUCSD

Does it matter? Does it matter?

C E N T E RGLAUCOMAUCSDUCSDMedeiros FA, et al. IOVS 2007; 48: 214Medeiros FA, et al. IOVS 2007; 48: 214--22. 22.

•• Influence of the studied population on the Diagnostic Influence of the studied population on the Diagnostic Accuracy of CSLO (HRT) Accuracy of CSLO (HRT)

C E N T E RGLAUCOMAUCSDUCSD

The Effect of Study DesignThe Effect of Study Design……

•• Compared Diagnostic Accuracy of CSLO in 2 scenarios:Compared Diagnostic Accuracy of CSLO in 2 scenarios:

•• ANALYSIS 1ANALYSIS 1–– Discriminate patients with glaucomatous visual field loss from Discriminate patients with glaucomatous visual field loss from

healthy subjectshealthy subjects (recruited from general population)(recruited from general population)

•• ANALYSIS 2ANALYSIS 2–– Discriminate suspects who have glaucoma from suspects who do Discriminate suspects who have glaucoma from suspects who do

not have glaucoma (using history of previous optic disc not have glaucoma (using history of previous optic disc progression as reference standard)progression as reference standard)

Medeiros FA, et al. IOVS 2007; 48: 214Medeiros FA, et al. IOVS 2007; 48: 214--22. 22.

C E N T E RGLAUCOMAUCSDUCSD

Analysis 2 Analysis 2 Cohort of Glaucoma SuspectsCohort of Glaucoma Suspects

Suspect A Suspect B

C E N T E RGLAUCOMAUCSDUCSD

Analysis 2 Analysis 2

Suspect, but normal

Patient followed for 18 years without treatment and Patient followed for 18 years without treatment and without any changes to the optic nerve and VFwithout any changes to the optic nerve and VF

C E N T E RGLAUCOMAUCSDUCSD

Analysis 2Analysis 2

Evidence of PROGRESSIVE glaucomatous damage confirms diagnosis of glaucoma

C E N T E RGLAUCOMAUCSDUCSD

The Effect of Study DesignThe Effect of Study Design……

•• Diferences in diagnostic accuracy in the two Diferences in diagnostic accuracy in the two analyses analyses –– Glaucoma Probability Score (GPS)Glaucoma Probability Score (GPS)

Medeiros FA, et al. IOVS 2007; 48: 214Medeiros FA, et al. IOVS 2007; 48: 214--22. 22.

C E N T E RGLAUCOMAUCSDUCSD

The Effect of Study DesignThe Effect of Study Design……

Analysis 1ROC curve area = 0.89

Analysis 2ROC curve area = 0.65

Medeiros FA, et al. IOVS 2007; 48: 214Medeiros FA, et al. IOVS 2007; 48: 214--22. 22.

C E N T E RGLAUCOMAUCSDUCSD

Spectrum BiasSpectrum Bias

•• If patients are referred as suspicious of glaucoma by the If patients are referred as suspicious of glaucoma by the same characteristic measured by the diagnostic test (e.g., rim same characteristic measured by the diagnostic test (e.g., rim thinning), the test will have little additional value for thinning), the test will have little additional value for decreasing diagnostic uncertaintydecreasing diagnostic uncertainty

C E N T E RGLAUCOMAUCSDUCSD

• Diagnosis

• Decrease diagnostic uncertainty in those SUSPECTED of a condition (suspected of having damage or suspected of having progression)

• Screening

• Identify abnormal or suspected cases in the general population or pre- selected subjects (e.g., older population, positive family history)

• Prognosis• Determine risk of developing a condition

PotentialPotential Applications of Imaging Instruments Applications of Imaging Instruments

C E N T E RGLAUCOMAUCSDUCSD

• Estimates of diagnostic accuracy obtained in clearly glaucomatous versus healthy eyes seem to be more relevant to opportunistic screening situations

ScreeningScreening

C E N T E RGLAUCOMAUCSDUCSD

Diagnostic Studies Diagnostic Studies SDOCT Retinal Nerve Fiber LayerSDOCT Retinal Nerve Fiber Layer

Felipe A. MedeirosFelipe A. Medeiros, M.D., Ph.D., M.D., Ph.D.Professor of OphthalmologyProfessor of Ophthalmology

Hamilton Glaucoma CenterHamilton Glaucoma Center University of California, San DiegoUniversity of California, San Diego

C E N T E RGLAUCOMAUCSDUCSD

•• 134 glaucoma suspects followed for average of 14 134 glaucoma suspects followed for average of 14

±±

3.6 years3.6 years

•• 42 progressive GON (photos)42 progressive GON (photos)

•• 86 control eyes (no progression followed untreated)86 control eyes (no progression followed untreated)

Ophthalmology 2012 (E-pub ahead of publication)

C E N T E RGLAUCOMAUCSDUCSDOphthalmology 2012 (E-pub ahead of publication)

RNFL parameters performed significantly better than topographic RNFL parameters performed significantly better than topographic disc parameters for diagnosing damage in suspectsdisc parameters for diagnosing damage in suspects

ROC area SDOCT average thickness = 0 .86

ROC area HRT rim area = 0.72

C E N T E RGLAUCOMAUCSDUCSD

RNFL versus optic disc topographyRNFL versus optic disc topography

•• In clinical practice, patients are usually referred as In clinical practice, patients are usually referred as suspected of having glaucoma because of suspected of having glaucoma because of suspicious rim thinning or large cupssuspicious rim thinning or large cups

•• In this situation, RNFL parameters seem to offer In this situation, RNFL parameters seem to offer more benefit as a complementary test to clinical more benefit as a complementary test to clinical examinationexamination

C E N T E RGLAUCOMAUCSDUCSD

•• 138 eyes with glaucomatous visual field defects138 eyes with glaucomatous visual field defects•• 106 healthy eyes106 healthy eyes•• Diagnostic accuracies of the 3 SDOCT instrumentsDiagnostic accuracies of the 3 SDOCT instruments

Ophthalmology 2011;118:1334Ophthalmology 2011;118:1334––13391339

C E N T E RGLAUCOMAUCSDUCSD

Comparison of Cirrus HDOCT, RTVue and Spectralis

Spectralis Cirrus RTVue

Global RNFL thickness 0.862 0.864 0.844

Inferior thickness 0.837 0.849 0.846

Superior thickness 0.865 0.863 0.841

Areas under ROC curves

Ophthalmology 2011;118:1334Ophthalmology 2011;118:1334––13391339

C E N T E RGLAUCOMAUCSDUCSD

Diagnostic Studies of RNFL parametersDiagnostic Studies of RNFL parameters

•• Diagnostic accuracy estimates obtained from Diagnostic accuracy estimates obtained from ““conventional conventional studiesstudies”” (healthy versus glaucomatous field loss) are similar (healthy versus glaucomatous field loss) are similar to those estimates obtained from cohort studies of suspectsto those estimates obtained from cohort studies of suspects

C E N T E RGLAUCOMAUCSDUCSD

Disease severity still affects Disease severity still affects

accuracy of RNFL parametersaccuracy of RNFL parameters……

C E N T E RGLAUCOMAUCSDUCSDIOVS 2010;51:4104–4109

VFI = 95%

VFI = 85%

C E N T E RGLAUCOMAUCSDUCSD

Areas under ROCs for different Areas under ROCs for different levels of disease severitylevels of disease severity

(SDOCT)(SDOCT)

RNFL thickness

VFI Global Inferior Superior

100% 0.822 0.851 0.812

90% 0.886 0.871 0.874

80% 0.932 0.920 0.921

70% 0.962 0.954 0.952

IOVS 2010;51:4104–410

C E N T E RGLAUCOMAUCSDUCSDIOVS 2010;51:4104–4109

C E N T E RGLAUCOMAUCSDUCSD

Does SDOCT RNFL assessment Does SDOCT RNFL assessment

assist in glaucoma diagnosis?assist in glaucoma diagnosis?

““Diagnosis is not about finding the truth but Diagnosis is not about finding the truth but limiting uncertaintylimiting uncertainty””

(Straus SE et al. Evidence-Based Medicine: How to Practice and Teach it)

C E N T E RGLAUCOMAUCSDUCSD

Evaluating a Glaucoma SuspectEvaluating a Glaucoma Suspect

Pre-test probability of disease = 30%

C E N T E RGLAUCOMAUCSDUCSD

How can SDOCT help us with this patient ?How can SDOCT help us with this patient ?

Pre-test probabilityof glaucoma (30%)

IMAGINGTEST

Post-test probabilityof glaucoma ?

SDOCT result

C E N T E RGLAUCOMAUCSDUCSD

How do we calculate the postHow do we calculate the post--test probability?test probability?

PostPost--test odds of disease = LR x Pretest odds of disease = LR x Pre--test odds of diseasetest odds of disease

Odds = probability / 1- probability

C E N T E RGLAUCOMAUCSDUCSD

Continuous Likelihood Ratios for Glaucoma Continuous Likelihood Ratios for Glaucoma Diagnosis with Spectral Domain OCTDiagnosis with Spectral Domain OCT

The likelihood ratio for a specific test value is the is the tangent to the ROC curve at the corresponding value

Lisboa R, Mansouri K, Zangwill L, Weinreb RN and Medeiros FA. ARLisboa R, Mansouri K, Zangwill L, Weinreb RN and Medeiros FA. ARVO 2011VO 2011

C E N T E RGLAUCOMAUCSDUCSDLisboa R, Mansouri K, Zangwill L, Weinreb RN and Medeiros FA. ARLisboa R, Mansouri K, Zangwill L, Weinreb RN and Medeiros FA. ARVO 2011VO 2011

Continuous Likelihood Ratios for Glaucoma Continuous Likelihood Ratios for Glaucoma Diagnosis with Spectral Domain OCTDiagnosis with Spectral Domain OCT

C E N T E RGLAUCOMAUCSDUCSD

Pre-test probability of disease = 30%

Evaluating a Glaucoma SuspectEvaluating a Glaucoma Suspect

Post-test probability of disease = 93%

SDOCT result

C E N T E RGLAUCOMAUCSDUCSD

Diagnostic Accuracy of SDOCT RNFLDiagnostic Accuracy of SDOCT RNFL

•• Diagnostic accuracy studies involving the clinically Diagnostic accuracy studies involving the clinically relevant population show a clear benefit of RNFL relevant population show a clear benefit of RNFL assessment with SDOCT in decreasing uncertainty in assessment with SDOCT in decreasing uncertainty in glaucoma diagnosisglaucoma diagnosis

C E N T E RGLAUCOMAUCSDUCSD

Addressing the Questions Addressing the Questions

AskedAsked……

C E N T E RGLAUCOMAUCSDUCSD

For a cross-sectional study describing the distribution of measurements in known normal and known diseased subjects (excluding suspects):

a) How should the diseased subject population be defined? What clinical work-up should be done to establish these populations?

A: Cross-sectional studies including only clearly disease subjects versus healthy normals may lead to biased estimates of accuracy. This seems to be worse for certain instruments/parameters versus others.

Studies involving healthy versus glaucomatous field loss should be conducted for initial evaluation, but should be followed by studies evaluating the clinically relevant population (suspects)

C E N T E RGLAUCOMAUCSDUCSD

Can the diseased population be further divided by severity of disease using a clinical reference method (e.g., perimetry)? If so, should the clinical performance of the imaging device be characterized separately for each severity group?

A: YES. Important to include a broad spectrum of disease severity and use methods to evaluate diagnostic performance according to disease severity (e.g., ROC regression methods).

C E N T E RGLAUCOMAUCSDUCSD

For a longitudinal diagnostic performance study where the structural measurement is taken at baseline and the clinical reference standard consists of a baseline assessment with follow-up:

a)What subjects should be included in the study population (including glaucoma suspects)?

This is a prognostic study. What are you trying to predict? This is a prognostic study. What are you trying to predict?

If we are trying to predict development of glaucoma, the populatIf we are trying to predict development of glaucoma, the population ion should be of suspects (without clear signs of damage)should be of suspects (without clear signs of damage)

C E N T E RGLAUCOMAUCSDUCSD

For a longitudinal diagnostic performance study where the structural measurement is taken at baseline and the clinical reference standard consists of a baseline assessment with follow-up:

b) What is an appropriate clinical reference standard? Again, depends on what you are trying to predict. If development of glaucoma, then appropriate standards could be development of visual field loss and/or progressive optic nerve damage

C E N T E RGLAUCOMAUCSDUCSD

For a longitudinal diagnostic performance study where the structural measurement is taken at baseline and the clinical reference standard consists of a baseline assessment with follow-up:

c)What minimum follow-up period would provide assurance that the subject has been appropriately classified?

Not necessarily we need to establish a minimum period, as long as we take into account censoring. It is very important to adequately evaluate prognostic ability (c-index, modified R2, reclassification tables, predictiveness measure)