CAT (Critically Appraised Topic) (adapted from Sackett, et al. 2000) 1-page summary of evidence...

CAT (Critically Appraised Topic) CAT (Critically Appraised Topic) (adapted from Sackett, et al. 2000)(adapted from Sackett, et al. 2000)

1-page summary of evidence 1-page summary of evidence resulting from critical appraisal of an resulting from critical appraisal of an article, test, etc.article, test, etc.

Answers a specific foreground Answers a specific foreground question question – ““Compared to no treatment, does Compared to no treatment, does

parent-administered treatment parent-administered treatment significantly improve the language skills significantly improve the language skills of toddlers with language delay?”of toddlers with language delay?”

First part of CAT identical for tx and First part of CAT identical for tx and dx studies (see handout pp. 2-3)dx studies (see handout pp. 2-3)

Clinical bottom line: (appears 1Clinical bottom line: (appears 1stst but but completed last)completed last)

Clinical question: Clinical question: Search terms: Search terms: Appraised by whom, and date: Appraised by whom, and date: Synopsis of key (memorable) Synopsis of key (memorable)

information, in a concise, maximally information, in a concise, maximally useful format (e.g., types of subjects, useful format (e.g., types of subjects, procedures, measures, results, etc.)procedures, measures, results, etc.)

CAT-egories (appraisal points) for a CAT-egories (appraisal points) for a study of therapy study of therapy (Sackett et al., 2000)(Sackett et al., 2000)

Prospective, controlled?Prospective, controlled? Random assignment?Random assignment? Comparing Comparing >> 2 conditions? 2 conditions? Recognizable subjects?Recognizable subjects? Evidence of pre-tx group similarity?Evidence of pre-tx group similarity? Blinding (insofar as possible) of Blinding (insofar as possible) of

evaluators, relevant others?evaluators, relevant others?

Appraisal points (cont.)Appraisal points (cont.)

Control over nuisance variables?Control over nuisance variables? Valid, reliable measures of tx effects? Valid, reliable measures of tx effects? Statistically significant difference Statistically significant difference (p(p-value)-value)??

Practically significant difference Practically significant difference (d(d-value)-value)?? Precision of treatment effects (narrow CI)? Precision of treatment effects (narrow CI)?

Outcomes for all enrolled?Outcomes for all enrolled? Cost-benefit and feasibility analyses?Cost-benefit and feasibility analyses?

A sample treatment CATA sample treatment CAT CAT: Language of delayed toddlers improves in CAT: Language of delayed toddlers improves in

response to parent-administered focused stimulation response to parent-administered focused stimulation Clinical bottom line: Compared to an untreated control Clinical bottom line: Compared to an untreated control

group, motivated mothers of low-vocabulary toddlers group, motivated mothers of low-vocabulary toddlers significantly decreased their speaking rate and significantly decreased their speaking rate and language complexity and increased their vocabulary language complexity and increased their vocabulary inputs in response to ~18 hr of instruction in focused inputs in response to ~18 hr of instruction in focused stimulation techniques, and their children produced stimulation techniques, and their children produced significantly more words and early grammatical forms. significantly more words and early grammatical forms.

Clinical question: Compared to no treatment, does Clinical question: Compared to no treatment, does parent-administered treatment significantly improve the parent-administered treatment significantly improve the language skills of toddlers with language delay?language skills of toddlers with language delay?

Search terms: word learning AND toddlers, PubMed Search terms: word learning AND toddlers, PubMed clinical query clinical query

Appraised by: DollaghanAppraised by: Dollaghan

Key appraisal pointsKey appraisal points Prospective, controlledProspective, controlled Yes Yes Randomized Randomized Yes Yes Comparing Comparing >> 2 conditions 2 conditions Yes Yes Recognizable SsRecognizable Ss Yes Yes Pre-tx similarityPre-tx similarity Yes Yes BlindingBlinding Yes Cn; no parent Yes Cn; no parent Control over nuisance variablesControl over nuisance variables Yes Yes Valid, reliable measuresValid, reliable measures Yes Yes Statistically significant differencesStatistically significant differences Yes Yes Practically significant differencesPractically significant differences Yes Yes Precision of treatment effectsPrecision of treatment effects No No Outcomes for all enrolledOutcomes for all enrolled Yes Yes Cost-benefit, feasibility analysesCost-benefit, feasibility analyses Yes Yes

Critical appraisal of evidence Critical appraisal of evidence on diagnostic indicatorson diagnostic indicators

The key variables by which The key variables by which individuals are identified as members individuals are identified as members of a class, ostensibly to improve of a class, ostensibly to improve prediction and outcome for themprediction and outcome for them

Myriad diagnostic indicators have Myriad diagnostic indicators have been proposed in communication been proposed in communication sciences and disorderssciences and disorders

Diagnostic indicators in your area of Diagnostic indicators in your area of interest?interest?

Most diagnostic indicators in CSD Most diagnostic indicators in CSD are based on “Phase I” studiesare based on “Phase I” studies

Group mean comparison studies Group mean comparison studies – People with, and people without, the condition of People with, and people without, the condition of

interest are compared with respect to a interest are compared with respect to a proposed indicatorproposed indicator

Correlational studiesCorrelational studies– Association between proposed indicator and Association between proposed indicator and

accepted indicatorsaccepted indicators Such studies can’t address the two most Such studies can’t address the two most

crucial features of a diagnostic indicator: crucial features of a diagnostic indicator: accuracy and precisionaccuracy and precision

Accuracy and precisionAccuracy and precision AccuracyAccuracy– The ability of an indicator to identify a The ability of an indicator to identify a

condition of interest, i.e., the amount condition of interest, i.e., the amount of agreement between the proposed of agreement between the proposed indicator and a reference standardindicator and a reference standard

PrecisionPrecision–Width of confidence intervals (CI) for Width of confidence intervals (CI) for

estimates of accuracy estimates of accuracy

Accuracy of a diagnostic indicatorAccuracy of a diagnostic indicator

The ability of an indicator to identify The ability of an indicator to identify a condition of interest, i.e., the a condition of interest, i.e., the amount of agreement between the amount of agreement between the proposed indicator and a reference proposed indicator and a reference standardstandard

Preferred measures of diagnostic Preferred measures of diagnostic accuracy: positive and negative accuracy: positive and negative likelihood ratios likelihood ratios

(Battaglia et al., 2002)(Battaglia et al., 2002)

Positive Likelihood Ratio (LR+)Positive Likelihood Ratio (LR+)

Reflects the degree of confidence that a Reflects the degree of confidence that a person who scores in the positive person who scores in the positive (affected or disordered) range on a dx (affected or disordered) range on a dx indicator indicator does does have the disorderhave the disorder

Formula: Formula: sensitivity/1-specificity sensitivity/1-specificity The higher the LR+, the more informative The higher the LR+, the more informative

the indicator for identifying people who the indicator for identifying people who have the disorder have the disorder

Interpreting LR+ values Interpreting LR+ values (Sackett et al., 1991)(Sackett et al., 1991)

LR+ LR+ >> 20 20 Very highVery high; virtually certain that a ; virtually certain that a person with this score has the disorder person with this score has the disorder

LR+ = 10LR+ = 10 HighHigh; disorder very likely in a person; disorder very likely in a person with this scorewith this scoreLR+ = 4LR+ = 4 IntermediateIntermediate; the indicator is ; the indicator is suggestive of disorder but insufficient suggestive of disorder but insufficient to diagnoseto diagnoseLR+ = 1LR+ = 1 EquivocaEquivocal; a person who scores in the ; a person who scores in the

disordered range on the measure may disordered range on the measure may or may not have the disorder; the or may not have the disorder; the measure provides no new informationmeasure provides no new information

Negative Likelihood Ratio (LR-)Negative Likelihood Ratio (LR-)

Reflects the degree of confidence that Reflects the degree of confidence that a person scoring in the negative a person scoring in the negative (normal) range on the diagnostic (normal) range on the diagnostic indicator truly indicator truly does notdoes not have the have the disorder disorder

Formula: Formula: 1-sensitivity/specificity1-sensitivity/specificity The lower the LR-, the more The lower the LR-, the more

informative the indicator for ruling out informative the indicator for ruling out the presence of disorderthe presence of disorder

Interpreting LR- values Interpreting LR- values (Sackett et al., 1991)(Sackett et al., 1991)

LR- LR- << 0.10 0.10 Very low ; Very low ; virtually certain that a virtually certain that a person scoring in this range does notperson scoring in this range does nothave the disorderhave the disorder

LR- = 0.20LR- = 0.20 Low;Low; disorder very unlikely disorder very unlikely LR- = 0.40LR- = 0.40 IntermediateIntermediate; the indicator is suggestive; the indicator is suggestive but insufficient to rule out the disorderbut insufficient to rule out the disorderLR- = 1.0LR- = 1.0 EquivocalEquivocal; a person scoring in the; a person scoring in the normal range on this measure may ornormal range on this measure may or may not be normalmay not be normal

Calculating sensitivity and specificity Calculating sensitivity and specificity (nothing more than LR precursors)(nothing more than LR precursors)

Sensitivity: the percentage of people Sensitivity: the percentage of people with the disorder that the new indicator with the disorder that the new indicator correctly classifies as disordered correctly classifies as disordered

Specificity: the percentage of people Specificity: the percentage of people who don’t have the disorder that the who don’t have the disorder that the new indicator correctly classifies as new indicator correctly classifies as not disordered not disordered

The “true” status of every individual The “true” status of every individual with regard to the disorder is with regard to the disorder is established according to a gold (or established according to a gold (or reference) standardreference) standard

a a

bb

cc dd

Disorder Status (re: Gold Standard)

+ Disorder (LI)+ Disorder (LI) - Disorder (LN)- Disorder (LN)

New Test Result

+ Disorder (LI)+ Disorder (LI)

-Disorder (LN)-Disorder (LN)

# with# withdisorderdisorder

# without# withoutdisorderdisorder

ddTrue negativeTrue negative

ccFalse negativeFalse negative

False positiveFalse positive

bb

True positiveTrue positive

aa


+ Disorder (LI)+ Disorder (LI) - Disorder (LN)- Disorder (LN)

New Test Result

+ Disorder (LI)+ Disorder (LI)

-Disorder (LN)-Disorder (LN)

Sensitivity=a/a+c Sensitivity=a/a+c (the proportion of people with the (the proportion of people with the disorder that the new test identifies disorder that the new test identifies as having the disorder)as having the disorder)

True positiveTrue positive

a a

False positiveFalse positive

bb

ccFalse negativeFalse negative

ddTrue negativeTrue negative


+ Disorder+ Disorder - Disorder- Disorder

New Test Result

+ Disorder+ Disorder

-Disorder-Disorder

Specificity = d/b+dSpecificity = d/b+d(the proportion of people(the proportion of peoplewithout the disorder thatwithout the disorder thatthe new test identifies as the new test identifies as not having the disorder)not having the disorder)

ExampleExample

100 children diagnosed with language 100 children diagnosed with language impairments (LI) and enrolled in impairments (LI) and enrolled in language intervention, and 100 same-language intervention, and 100 same-age children with no history of language age children with no history of language impairment (LN), were administered a impairment (LN), were administered a new test of grammatical morphology.new test of grammatical morphology.

80 of the children with LI, and 30 of the 80 of the children with LI, and 30 of the children with LN, scored in the children with LN, scored in the disordered range on the new measure.disordered range on the new measure.

8080

a a

3030

bb

cc

(20)(20)

dd

(70)(70)


+ Disorder (LI) - Disorder (LN)

New Test Result

+ Disorder (LI)

-Disorder (LN)

100100 withdisorderSens= a/a+c=80/100 = .80

100100 withoutdisorderSpec = d/b+d =70/100 = .70

Why not just use sensitivity and Why not just use sensitivity and specificity as measures of accuracy?specificity as measures of accuracy?

It’s their interrelationship that is most It’s their interrelationship that is most important overallimportant overall

Sensitivity and specificity vary Sensitivity and specificity vary substantially according sample substantially according sample characteristics, including characteristics, including N, N, base rate base rate (prevalence), severity, confusability(prevalence), severity, confusability

Likelihood Ratios are not impervious to Likelihood Ratios are not impervious to sample characteristics, but are much sample characteristics, but are much less affected than are sensitivity and less affected than are sensitivity and specificityspecificity

Calculating Likelihood RatiosCalculating Likelihood Ratios

Sens = .80Sens = .80 Spec = .70Spec = .70 LR+ = LR+ = sens/1-spec = .80/.30 = 2.67sens/1-spec = .80/.30 = 2.67 LR- = LR- = 1-sens/spec = .20/.770 = 0.291-sens/spec = .20/.770 = 0.29 Several programs, some free on web, Several programs, some free on web,

are set up to allow entry in 2x2 table are set up to allow entry in 2x2 table formatformat

In addition to accuracy measures, they In addition to accuracy measures, they also provide information on precisionalso provide information on precision

Precision of a diagnostic indicatorPrecision of a diagnostic indicator Width of confidence intervals (CI) for sensitivity, Width of confidence intervals (CI) for sensitivity,

specificity, and likelihood ratios, calculated by specificity, and likelihood ratios, calculated by adding and subtracting a multiple of standard adding and subtracting a multiple of standard error (e.g., 1.96 SE for a 95% CI) error (e.g., 1.96 SE for a 95% CI)

Standard error depends on sample size and Standard error depends on sample size and reliability; larger samples and higher reliability reliability; larger samples and higher reliability will result in narrower CIs, all else being equal will result in narrower CIs, all else being equal

Sackett et al. (2000) appendix shows how to Sackett et al. (2000) appendix shows how to calculate CIs by hand, and programs (some free) calculate CIs by hand, and programs (some free) provide CIs given raw numbers in a 2x2 tableprovide CIs given raw numbers in a 2x2 table

Sample size and precision: 95% CIs for Sample size and precision: 95% CIs for studies with same LRs but different studies with same LRs but different NNss

N N = 200= 200 NN = 20 = 20

ValueValue (95% CI)(95% CI) (95% CI)(95% CI)

Sens = .80 Sens = .80 (0.71-0.87)(0.71-0.87) (0.44-0.98)(0.44-0.98)

Spec = .70 Spec = .70 (0.60-0.79)(0.60-0.79) (0.35-0.93)(0.35-0.93)

LR+LR+ = 2.67 = 2.67 (1.98-3.70)(1.98-3.70) (1.12-7.66)(1.12-7.66)

LR- LR- = = 0.290.29 (0.19-0.42)(0.19-0.42) (0.08-0.87)(0.08-0.87)

CAT-ing evidence on a diagnostic CAT-ing evidence on a diagnostic indicator indicator (Sackett et al., 2000; Battaglia et al., 2002)(Sackett et al., 2000; Battaglia et al., 2002)

Does the study report a comparison Does the study report a comparison between measures, or measure and gold between measures, or measure and gold standard?standard?– sine qua non sine qua non for evidence of diagnostic for evidence of diagnostic

accuracyaccuracy Was the gold (or reference) standard valid, Was the gold (or reference) standard valid,

reliable, and/or reasonable?reliable, and/or reasonable?– Gold standard and new indicator also must be Gold standard and new indicator also must be

independent to avoid incorporation bias that can independent to avoid incorporation bias that can inflate accuracy measuresinflate accuracy measures

Criteria for diagnostic Criteria for diagnostic indicators indicators (cont.)(cont.)

Were patients enrolled prospectively Were patients enrolled prospectively and consecutively (or by random and consecutively (or by random assignment), and assignment), and

Did the sample include a spectrum of Did the sample include a spectrum of patient types and severities?patient types and severities?– These two criteria are important in These two criteria are important in

avoiding spectrum bias, in which the avoiding spectrum bias, in which the sample includes only clear-cut or hand-sample includes only clear-cut or hand-picked cases and thus does not represent picked cases and thus does not represent the diagnostic taskthe diagnostic task


Were the new measure and the reference Were the new measure and the reference standard administered independently, by standard administered independently, by different examiners, anddifferent examiners, and

Were the examiners blinded to the subject’s Were the examiners blinded to the subject’s performance on the other test and to other performance on the other test and to other relevant subject information?relevant subject information?

Were the new measure and the reference Were the new measure and the reference standard both administered to all subjects standard both administered to all subjects and controls?and controls?– Important to avoid differential verification bias, Important to avoid differential verification bias,

when controls are assumed to be normal without when controls are assumed to be normal without testing on gold standardtesting on gold standard


Do likelihood ratios suggest adequate Do likelihood ratios suggest adequate diagnostic accuracy? diagnostic accuracy? – LR+ LR+ >> 4.0 ( 4.0 (>> 10 cf. Bayes Library, 2002) 10 cf. Bayes Library, 2002)– LR- LR- << 0. 40 ( 0. 40 (<< 0.20, cf Bayes Library, 2002) 0.20, cf Bayes Library, 2002)

Precision (narrow confidence intervals)?Precision (narrow confidence intervals)? Feasibility for usual clinical practice?Feasibility for usual clinical practice? Value (i.e., better than current Value (i.e., better than current

measure)?measure)?

Evidence on norm-referenced tests Evidence on norm-referenced tests as diagnostic indicators for early LIas diagnostic indicators for early LI

Many norm-referenced tests have diagnosis of Many norm-referenced tests have diagnosis of LI as their explicit purposeLI as their explicit purpose

A growing number of tests meet typical A growing number of tests meet typical psychometric criteria, e.g. psychometric criteria, e.g. NN = 100 subjects = 100 subjects per age level; reliability per age level; reliability >> .90; means, standard .90; means, standard deviations, and standard errors of deviations, and standard errors of measurementmeasurement

But very few provide evidence of diagnostic But very few provide evidence of diagnostic accuracy or precision, and none meet the accuracy or precision, and none meet the recommended critical appraisal criteriarecommended critical appraisal criteria

Norm-referenced tests not Norm-referenced tests not providing information on accuracy providing information on accuracy

or precisionor precision

Test of Language Development (TOLD)Test of Language Development (TOLD) Sequenced Inventory of Language Sequenced Inventory of Language

Development (SICD)Development (SICD) Test of Early Language Development Test of Early Language Development

(TELD)(TELD) Reynell Scales Reynell Scales MacArthur Communicative Development MacArthur Communicative Development

Inventories (CDI)Inventories (CDI)

A few tests provide information allowing A few tests provide information allowing accuracy and precision to be calculatedaccuracy and precision to be calculated

Age LI LNAge LI LN LR+ LR+ (95% CI)(95% CI) LR- (95% CI)LR- (95% CI)PLS-4PLS-4 Total language score Total language score << 85 85 3 243 24 2424 6.76.7 (2.6-19.4)(2.6-19.4) 0.19 (.08-.42)0.19 (.08-.42)4 234 23 2323 1818 (3.6-102)(3.6-102) 0.23 (.10-.44)0.23 (.10-.44)5 285 28 2828 4.44.4 (2.1-10.2)(2.1-10.2) 0.26 (.12-.50)0.26 (.12-.50)3-5 753-5 75 7575 6.76.7 (3.7-12.5)(3.7-12.5) 0.23 (.14-.35)0.23 (.14-.35)CELF-PCELF-P Total language score < 85 Total language score < 85 3-5 80 3-5 80 8080 5.3 5.3 (2.9-10.2)(2.9-10.2) 0.45 (.34-.58)0.45 (.34-.58)CELF-PCELF-P Total language score < 77 Total language score < 77 3-5 803-5 80 80 80 12.712.7 (4.4-37.8)(4.4-37.8) 0.54 (.43-.66)0.54 (.43-.66)But note that these studies would fail many of the other critical But note that these studies would fail many of the other critical

appraisal criteria, their accuracy notwithstanding.appraisal criteria, their accuracy notwithstanding.

The situation is no better for other The situation is no better for other proposed diagnostic indicatorsproposed diagnostic indicators

Few compare indicator to a gold standard, Few compare indicator to a gold standard, so accuracy can’t be determinedso accuracy can’t be determined

Few used blinded examiners, so a high Few used blinded examiners, so a high potential for context and other biasespotential for context and other biases

Small samples, wide CIs (rarely provided)Small samples, wide CIs (rarely provided) When sensitivity and specificity have been When sensitivity and specificity have been

reported, they have sometimes been reported, they have sometimes been calculated incorrectly and/or misinterpretedcalculated incorrectly and/or misinterpreted

I choose not to despairI choose not to despair

Knowing the limitations of our Knowing the limitations of our diagnostic tools is an important diagnostic tools is an important prerequisite to designing better prerequisite to designing better diagnostic toolsdiagnostic tools

Several possible ways forward, Several possible ways forward, most involving clinician-researcher most involving clinician-researcher partnershipspartnerships

AA way forward to EBP in Speech-way forward to EBP in Speech-language pathology and Audiologylanguage pathology and Audiology

Designing studies to meet the criteria for Designing studies to meet the criteria for strong evidencestrong evidence– e.g., STARDe.g., STARD (Bossuyt et al., 2003) statement (Bossuyt et al., 2003) statement

Large-scale, cooperative studies of Large-scale, cooperative studies of diagnostic indicatorsdiagnostic indicators– CARE-COAD model (Straus et al. 2002)CARE-COAD model (Straus et al. 2002)

Dealing with the absence of a gold standardDealing with the absence of a gold standard– e.g., Demissie et al., 1998; Dunson, 2001; e.g., Demissie et al., 1998; Dunson, 2001;

reliability and outcome studiesreliability and outcome studies Diagnostic studies as multivariable, Diagnostic studies as multivariable,

prediction research prediction research (Moons & Grobbee, 2002)(Moons & Grobbee, 2002)

Test yourselfTest yourself

Critical appraisal of diagnostic test Critical appraisal of diagnostic test (handout p. 5)(handout p. 5)

Critical appraisal of treatment study Critical appraisal of treatment study (handout p. 4)(handout p. 4)

Critical appraisal and CAT enable Critical appraisal and CAT enable the remaining steps to EBPthe remaining steps to EBP

5. Decide whether the evidence is 5. Decide whether the evidence is strong enough to influence your strong enough to influence your clinical practiceclinical practice

6. Integrate the evidence with the 6. Integrate the evidence with the “intangibles”“intangibles”

7. Update!7. Update!

EBP is itself a set of EBP is itself a set of assumptions, not a cultassumptions, not a cult

Ultimately, strong evidence will be Ultimately, strong evidence will be needed to determine whether EBP needed to determine whether EBP results in improved clinical service.results in improved clinical service.

And EBP can’t be applied blindly, to And EBP can’t be applied blindly, to all kinds of problemsall kinds of problems......

As with many interventions intended to As with many interventions intended to prevent ill health, the effectiveness of prevent ill health, the effectiveness of parachutes has not been subjected to parachutes has not been subjected to rigorous evaluation by using randomised rigorous evaluation by using randomised controlled trials. Advocates of evidence controlled trials. Advocates of evidence based medicine have criticised the adoption based medicine have criticised the adoption of interventions evaluated by using only of interventions evaluated by using only observational data. We think that everyone observational data. We think that everyone might benefit if the most radical might benefit if the most radical protagonists of evidence protagonists of evidence based medicine organised based medicine organised and participated in a doubleand participated in a double blind, randomised, placebo blind, randomised, placebo controlled, crossover trial of controlled, crossover trial of the parachute. the parachute.

Thanks!Thanks!

References

CAT (Critically Appraised Topic) (adapted from Sackett, et al. 2000) 1-page summary of evidence...

Documents

Transcript of CAT (Critically Appraised Topic) (adapted from Sackett, et al. 2000) 1-page summary of evidence...