Holger Schunemann TEACH 2010
-
Upload
suzana-silva -
Category
Documents
-
view
249 -
download
0
description
Transcript of Holger Schunemann TEACH 2010
Holger Schünemann, MD, PhD Chair, Department of Clinical Epidemiology & BiostatisticsMichael Gent Chair in Healthcare ResearchMcMaster University, Hamilton, Canada
TEACH, New York, NYAugust 11 - 13, 2010
Content
• Evidence hierarchies• From evidence to recommendations
Case scenarioA 13 year old girl who lives in rural Indonesia presented with flu symptoms and developed severe respiratory distress over the course of the last 2 days. She required intubation. The history reveals that she shares her living quarters with her parents and her three siblings. At night the family’s chicken stock shares this room too and several chicken had died unexpectedly a few days before the girl fell sick.
Interventions: antivirals, such as neuraminidase inhibitors oseltamivir and zanamivir
PICO
Clinical question:
Population: Avian Flu/influenza A (H5N1) patients
Intervention: Oseltamivir (or Zanamivir)
Comparison: No pharmacological intervention
Outcomes: Mortality, hospitalizations, resource use, adverse outcomes, antimicrobial resistance
WHO Avian Influenza GL. Schunemann et al., The Lancet ID, 2007
There are no RCTs! • Do you think that users of
recommendations would like to be informed about the basis (explanation) for a recommendation or coverage decision if they were asked (by their patients)?
• I suspect the answer is “yes”• If we need to provide the basis for
recommendations, we need to say whether the evidence is good or not so good; in other words perhaps “no RCTs”
Hierarchy of evidencebased on quality
STUDY DESIGN Randomized Controlled
Trials Cohort Studies and Case
Control Studies Case Reports and Case
Series, Non-systematic observations
Expert Opinion
BIAS
Which hierarchy?
Evidence Recommendation• B Class I• A 1• IV C
Organization AHA ACCP SIGN
Recommendation for use of oral anticoagulation in patients with atrial fibrillation and rheumatic mitral valve disease
What to do?
Evidence based healthcare decisions
Research evidence
Population valuesand preferences
(Clinical) state and circumstances
Expertise
Haynes et al. 2002
Hierarchy of evidencebased on quality
STUDY DESIGN Randomized Controlled
Trials Cohort Studies and Case
Control Studies Case Reports and Case
Series, Non-systematic observations
Expert Opinion
BIAS
Explain the following?
• Confounding, effect modification & ext. validity• Concealment of randomization• Blinding (who is blinded in a double blinded
study?)• Intention to treat analysis and its correct
application• P-values and confidence intervals
“Everything should be made as simple as possible but not simpler.”
BMJ 2003
BMJ, 2003
BMJ 2003Relative risk reduction:….> 99.9 % (1/100,000)
U.S. Parachute Association reported 821 injuries and 18 deaths out of 2.2 million jumps in 2007
A prominent grading systemLevel of evidence
I
II
III
IV
V
Source of evidence
SR, RCTs
Cohort studies
Case-control studies
Case series
Expert opinion
A
Grades of recomend.
B
C
D
Simple hierarchies are (too) simplistic
STUDY DESIGN Randomized Controlled
Trials Cohort Studies and
Case Control Studies Case Reports and Case
Series, Non-systematic observations
BIAS
Expert Opinion
Expert Opinion
Schünemann & Bone, 2003
Evidence based healthcare decisions
Research evidence
Population valuesand preferences
(Clinical) state and circumstances
Expertise
Haynes et al. 2002
So?
• Users want to know how good the evidence is– (rightly so) not equipped to do – someone needs to
• Confusion about what grading system– What methods to grade?
• Simple systems leading to recommendations not enough
GRADE Working Group
Grades of Recommendation Assessment, Development and
Evaluation
CMAJ 2003, BMJ 2004, BMC 2004, BMC 2005, AJRCCM 2006, Chest 2006, BMJ 2008
• International group: ACCP, AHRQ, Australian NMRC, BMJ Clinical Evidence, CC, CDC, McMaster, NICE, Oxford CEBM, SIGN, UpToDate, USPSTF, WHO
• Aim: to develop a common, transparent and sensible system for grading the quality of evidence and the strength of recommendations
• International group of guideline developers, methodologists & clinicians from around the world (>100 contributors) – since 2000
GRADE Uptake World Health Organization Allergic Rhinitis in Asthma Guidelines (ARIA) American Thoracic Society American College of Physicians European Respiratory Society European Society of Thoracic Surgeons British Medical Journal Infectious Disease Society of America American College of Chest Physicians UpToDate® National Institutes of Health and Clinical Excellence (NICE) Scottish Intercollegiate Guideline Network (SIGN) Cochrane Collaboration Infectious Disease Society of America Clinical Evidence Agency for Health Care Research and Quality (AHRQ) Partner of GIN Over 40 major organizations
GRADE Quality of EvidenceIn the context of a systematic review• The quality of evidence reflects the extent to which
we are confident that an estimate of effect is correct.
In the context of making recommendations • The quality of evidence reflects the extent to which
our confidence in an estimate of the effect is adequate to support a particular recommendation.
Likelihood of and confidence in an outcome
Definition of grades of evidence• /A/High: Further research is very unlikely
to change confidence in the estimate of effect.• /B/Moderate: Further research is likely to
have an important impact on confidence in the estimate of effect and may change the estimate.
• /C/Low: Further research is very likely to have an important impact on confidence in the estimate of effect and is likely to change the estimate.
• /D/Very low: Any estimate of effect is very uncertain.
Determinants of quality• RCTs
• observational studies
• 5 factors that can lower quality1. limitations in detailed design and execution (risk of bias criteria)2. Inconsistency (or heterogeneity)3. Indirectness (PICO and applicability)4. Imprecision (number of events and confidence intervals)5. Publication bias
• 3 factors can increase quality– large magnitude of effect– all plausible residual confounding may be working to reduce the
demonstrated effect or increase the effect if no effect was observed
– dose-response gradient
1. Design and Execution• Limitations/risk of bias
– Randomization– lack of concealment– intention to treat principle violated– inadequate blinding– loss to follow-up– early stopping for benefit
The evidence for the effect of sublingual immunotherapy in children with allergic rhinitis on the development of asthma, comes from a single randomised trial with no description of randomisation, concealment of allocation, and type of analysis, no blinding, and 21% of children lost to follow-up. These very serious limitations would limit our confidence in the estimates of effect and likely lead to downgrading the quality of evidence.
1. Design and Execution• From Cates , CDSR 2008
CDSR 2008
1. Design and Execution
Overall judgment required
I B II V III
Quality of evidence across studies
Outcome #1Outcome #2Outcome #3
Quality: HighQuality: ModerateQuality: Very low
Determinants of quality• RCTs
• observational studies
• 5 factors that can lower quality1. limitations in detailed design and execution (risk of bias criteria)2. Inconsistency (or heterogeneity)3. Indirectness (PICO and applicability)4. Imprecision (number of events and confidence intervals)5. Publication bias
• 3 factors can increase quality– large magnitude of effect– all plausible residual confounding may be working to reduce the
demonstrated effect or increase the effect if no effect was observed
– dose-response gradient
Case scenarioA 13 year old girl who lives in rural Indonesia presented with flu symptoms and developed severe respiratory distress over the course of the last 2 days. She required intubation. The history reveals that she shares her living quarters with her parents and her three siblings. At night the family’s chicken stock shares this room too and several chicken had died unexpectedly a few days before the girl fell sick.
Interventions: antivirals, such as neuraminidase inhibitors oseltamivir and zanamivir
PICO
Clinical question:
Population: Avian Flu/influenza A (H5N1) patients
Intervention: Oseltamivir (or Zanamivir)
Comparison: No pharmacological intervention
Outcomes: Mortality, hospitalizations, resource use, adverse outcomes, antimicrobial resistance
WHO Avian Influenza GL. Schunemann et al., The Lancet ID, 2007
Oseltamivir for Girl with Avian FluSummary of findings: No clinical trial of oseltamivir for treatment of
H5N1 patients. 4 systematic reviews and health technology
assessments (HTA) reporting on 5 studies of oseltamivir in seasonal influenza. Hospitalization: OR 0.22 (0.02 – 2.16) Pneumonia: OR 0.15 (0.03 - 0.69)
3 published case series. Many in vitro and animal studies. No alternative that is more promising at present. Cost: ~ $45 per treatment course
What are the factors that determine your decisions?
GRADE: Factors influencing decisions and recommendations
• Quality of Evidence• Balance of desirable and undesirable
consequences• Values and preferences• Resource
Determinants of the strength of recommendation
Factors that can strengthen a recommendation
Comment
Quality of the evidence The higher the quality of evidence, the more likely is a strong recommendation.
Balance between desirable and undesirable effects
The larger the difference between the desirable and undesirable consequences, the more likely a strong recommendation warranted. The smaller the net benefit and the lower certainty for that benefit, the more likely weak recommendation warranted.
Values and preferences The greater the variability in values and preferences, or uncertainty in values and preferences, the more likely weak recommendation warranted.
Costs (resource allocation) The higher the costs of an intervention – that is, the more resources consumed – the less likely is a strong recommendation warranted
Oseltamivir – Avian Influenza Factors that can weaken the strength of a recommendation. Example: treatment of H5N1 patients with oseltamivir
Decision Explanation
Lower quality evidence
Yes □ No
The quality of evidence is very low
Uncertainty about the balance of benefits versus harms and burdens
Yes □ No
The benefits are uncertain because several important or critical outcomes where not measured. However, the potential benefit is very large despite potentially small relative risk reductions.
Uncertainty or differences in values □ Yes No
All patients and care providers would accept treatment for H5N1 disease
Uncertainty about whether the net benefits are worth the costs
□ Yes No
For treatment of sporadic patients the price is not high ($45).
Frequent “yes” answers will increase the likelihood of a weak recommendation
Example: Oseltamivir for Avian FluRecommendation: In patients with confirmed or strongly suspected infection with avian influenza A (H5N1) virus, clinicians should administer oseltamivir treatment as soon as possible (????? recommendation, very low quality evidence).
Schunemann et al. The Lancet ID, 2007
Example: Oseltamivir for Avian FluRecommendation: In patients with confirmed or strongly suspected infection with avian influenza A (H5N1) virus, clinicians should administer oseltamivir treatment as soon as possible (strong recommendation, very low quality evidence).
Values and PreferencesRemarks: This recommendation places a high value on the prevention of death in an illness with a high case fatality. It places relatively low values on adverse reactions, the development of resistance and costs of treatment.
Schunemann et al. The Lancet ID, 2007
Reassessment of clinical practice guidelines
• Editorial by Shaneyfelt and Centor (JAMA 2009)– “Too many current guidelines have become
marketing and opinion-based pieces…”– “AHA CPG: 48% of recommendations are based
on level C = expert opinion…”– “…clinicians do not use CPG […] greater concern
[…] some CPG are turned into performance measures…”
– “Time has come for CPG development to again be centralized, e.g., AHQR…”
Conclusions Clinical practice guidelines should be based on the best
available evidence to be evidence-based GRADE combines what is known in health research
methodology and provides a structured approach to improve communication
Criteria for evidence assessment across questions and outcomes
Criteria for moving from evidence to recommendations Transparent, systematic
four categories of quality of evidence two grades for strength of recommendations
Transparency in decision making and judgments is key
Example: Oseltamivir for Avian FluRecommendation: In patients with confirmed or strongly suspected infection with avian influenza A (H5N1) virus, clinicians should administer oseltamivir treatment as soon as possible (????? recommendation, very low quality evidence).
Schunemann et al. The Lancet ID, 2007
Are values important?Should resources be considered?
45
Example: Oseltamivir for Avian FluRecommendation: In patients with confirmed or strongly suspected infection with avian influenza A (H5N1) virus, clinicians should administer oseltamivir treatment as soon as possible (strong recommendation, very low quality evidence).
Values and PreferencesRemarks: This recommendation places a high value on the prevention of death in an illness with a high case fatality. It places relatively low values on adverse reactions, the development of resistance and costs of treatment.
Schunemann et al. The Lancet ID, 2007
2. Consistency of results• Look for explanation for inconsistency
– patients, intervention, comparator, outcome, methods
• Judgment– variation in size of effect– overlap in confidence intervals– statistical significance of heterogeneity– I2
3. Directness of Evidence
• differences in– populations/patients (mild versus severe COPD, older,
sicker or more co-morbidity)– interventions (all inhaled steroids, new vs. old)– outcomes (important vs. surrogate; long-term health-
related quality of life, short –term functional capacity, laboratory exercise, spirometry)
• indirect comparisons– interested in A versus B– have A versus C and B versus C– formoterol versus salmeterol versus tiotropium
4. Publication Bias
• Publication bias– Cave! Only few small studies
A systematic review of topical treatments for seasonal allergic conjunctivitis showed that patients using topical sodium cromoglycate were more likely to perceive benefit than those using placebo. However, only small trials reported clinically and statistically significant benefits of active treatment, while a larger trial showed a much smaller and a statistically not significant effect. These findings suggest that smaller studies demonstrating smaller effects might not have been published.
5. Imprecision
• small sample size• small number of events
– wide confidence intervals– uncertainty about magnitude of effect
Observational studies examining the impact of exclusive breastfeeding on development of allergic rhinitis in high risk infants showed a relative risk of 0.87 (95% CI: 0.48 to 1.58) that neither rules out important benefit nor important harm (Mimouni Bloch 2002).
What can raise quality?3 Factors
• large magnitude can upgrade one level– very large two levels– common criteria
• everyone used to do badly• almost everyone does well
– The parachute example: if we’d look at the observational evidence, we’d find a very large effect and the evidence probably would be high quality for preventing death
• dose response relation (higher dose of brain radiation in childhood leukemia leads to greater risk of late malignancies)
• Residual confounding unlikely to be responsible for observed effect
55
Hierarchy of evidencebased on quality
STUDY DESIGN Randomized Controlled
Trials Cohort Studies and Case
Control Studies Case Reports and Case
Series, Non-systematic observations
Expert Opinion
BIAS
GRADE Working Group
Grades of Recommendation Assessment, Development and
Evaluation
CMAJ 2003, BMJ 2004, BMC 2004, BMC 2005, AJRCCM 2006, Chest 2006, BMJ 2008
• International group: ACCP, AHRQ, Australian NMRC, BMJ Clinical Evidence, CC, CDC, McMaster, NICE, Oxford CEBM, SIGN, UpToDate, USPSTF, WHO
• Aim: to develop a common, transparent and sensible system for grading the quality of evidence and the strength of recommendations
• International group of guideline developers, methodologists & clinicians from around the world (>100 contributors) – since 2000
GRADE Uptake World Health Organization Allergic Rhinitis in Asthma Guidelines (ARIA) American Thoracic Society American College of Physicians European Respiratory Society European Society of Thoracic Surgeons British Medical Journal Infectious Disease Society of America American College of Chest Physicians UpToDate® National Institutes of Health and Clinical Excellence (NICE) Scottish Intercollegiate Guideline Network (SIGN) Cochrane Collaboration Infectious Disease Society of America Clinical Evidence Agency for Health Care Research and Quality (AHRQ) Partner of GIN Over 40 major organizations
Guideline development
Process
Prioritise problems & scoping
Establish guideline panel and develop questions, including outcomes
Find and critically appraise systematic review(s)
and/or Prepare protocol(s) for systematic review(s)
and Prepare systematic review(s)
(searches, selection of studies, data collection and analysis)
Prepare an evidence profile
Assess the quality of evidence for each outcome
Prepare a Summary of Findings table
If developing guidelines: Assess the overall quality of evidence
and Decide on the direction (which alternative) and strength of the
recommendation
Draft guideline
Consult with stakeholders and/or external peer reviewers
Disseminate guidelines
Update review or guidelines when needed
Adapt guidelines, if needed
Prioritise guidelines/recommendations for implementation
Implement or support implementation of the guidelines
Evaluate the impact of the guidelines and implementation strategies
Update systematic review/guidelines
Systematic review
Guideline development
PICO
OutcomeOutcomeOutcomeOutcome
Formulate
question
Rate
importance
Critical
ImportantCritical
Not important
Create
evidence
profile with
GRADEpro
Summary of findings & estimate of effect for each outcome
Rate overall quality of
evidence across outcomes based
on lowest quality of critical outcomes
RCT start high, obs. data start
low1. Risk of bias2. Inconsisten
cy3. Indirectnes
s4. Imprecision5. Publication
bias
Grad
e d
own
Grad
e u
p 1. Large effect
2. Dose response
3. Confounders
Rate quality
of evidence
for each
outcomeSelect
outcomes
Very low
LowModerate
High
Formulate recommendations:•For or against (direction)•Strong or weak (strength)
By considering:Quality of evidenceBalance benefits/harmsValues and preferences
Revise if necessary by considering:Resource use (cost)
• “We recommend using…”• “We suggest using…”• “We recommend against using…”• “We suggest against using…”
Outcomes
across
studies
Oxford Centre for Evidence Based Medicine
Levels of Evidence and Grades of Recommendations- 23 November 1999. Grade of
Recommendation Level of
Evidence Therapy/Prevention, Aetiology/Harm Prognosis Diagnosis Economic analysis
1a SR (with homogeneity) of RCTs SR (with homogeneity*) of inception cohort studies; or a CPG validated on a test set.
SR (with homogeneity*) of Level 1 diagnostic studies; or a CPG validated on a test set.
SR (with homogeneity*) of Level 1 economic studies
A
1b Individual RCT (with narrow Confidence Interval)
Individual inception cohort study with > 80% follow-up
Independent blind comparison of an appropriate spectrum of consecutive patients, all of whom have undergone both the diagnostic test and the reference standard.
Analysis comparing all (critically-validated) alternative outcomes against appropriate cost measurement, and including a sensitivity analysis incorporating clinically sensible variations in important variables.
1c All or none All or none case-series Absolute SpPins and SnNouts Clearly as good or better, but cheaper. Clearly as bad or worse but more expensive. Clearly better or worse at the same cost.
2a SR (with homogeneity*) of cohort studies SR (with homogeneity*) of either retrospective cohort studies or untreated control groups in RCTs.
SR (with homogeneity*) of Level >2 diagnostic studies
SR (with homogeneity*) of Level >2 economic studies
B
2b Individual cohort study (including low quality RCT; e.g., <80% follow-up)
Retrospective cohort study or follow-up of unt reated control patients in an RCT; or CPG not validated in a test set.
Any of: Independent blind or objective comparison; Study performed in a set of non-consecutive
patients, or confined to a narrow spectrum of study individuals (or both) all of whom have undergone both the diagnostic test and the reference standard;
A diagnostic CPG not validated in a test set.
Analysis comparing a limited number of alternative outcomes against appropriate cost measurement, and including a sensitivity analysis incorporating clinically sensible variations in important variables.
2c “Outcomes” Research “Outcomes” Research 3a SR (with homogeneity*) of case-control
studies
3b Individual Case-Control Study Independent blind comparison of an appropriate spectrum, but the reference standard was not applied to all study patients
Analysis without accurate cost measurement, but including a sensitivity analysis incorporating clinically sensible variations in important variables.
C
4 Case-series (and poor quality cohort and case-control studies)
Case-series (and poor quality prognostic cohort studies)
Any of: Reference standard was unobjective,
unblinded or not independent; Positive and negative tests were verified
using separate reference standards; Study was performed in an inappropriate
spectrum** of patients.
Analysis with no sensitivity analysis
D
5 Expert opinion without explicit critical appraisal, or based on physiology, bench research or “first principles”
Expert opinion without explicit critical appraisal, or based on physiology, bench research or “first principles”
Expert opinion without explicit critical appraisal, or based on physiology, bench research or “first principles”
Expert opinion without explicit critical appraisal, or based on economic theory
Oxford Centre for Evidence-Based Medicine (Chris Ball, Dave Sackett, Bob Phillips, Brian Haynes, and Sharon Straus).
USPSTF - Grade Definitions After May 2007: Certainty
Level of Certainty DescriptionHigh The available evidence usually includes consistent results from well-designed, well-
conducted studies in representative primary care populations. These studies assess the effects of the preventive service on health outcomes. This conclusion is therefore unlikely to be strongly affected by the results of future studies.
Moderate •The available evidence is sufficient to determine the effects of the preventive service on health outcomes, but confidence in the estimate is constrained by such factors as: The number, size, or quality of individual studies.•Inconsistency of findings across individual studies.•Limited generalizability of findings to routine primary care practice.•Lack of coherence in the chain of evidence.As more information becomes available, the magnitude or direction of the observed effect could change, and this change may be large enough to alter the conclusion.
Low •The available evidence is insufficient to assess effects on health outcomes. Evidence is insufficient because of: The limited number or size of studies.•Important flaws in study design or methods.•Inconsistency of findings across individual studies.•Gaps in the chain of evidence.•Findings not generalizable to routine primary care practice.•Lack of information on important health outcomes.More information may allow estimation of effects on health outcomes.
The USPSTF defines certainty as "likelihood that the USPSTF assessment of the net benefit of a preventive service is correct."
• Recommendations for prognosis– Use prognostic information to determine baseline
risk for healthcare decisions
65
Center for Disease Control and Prevention (CDC)
Evidence of Effectiveness
Execution - Good or
Fair
Design Suitability —
Greatest, Moderate, or
Least
Number of Studies
Consistent Effect Sized
Expert Opinion
Strong Good Greatest At Least 2 Yes Sufficient Not Used
Good Greatest or Moderate
At Least 5 Yes Sufficient Not Used
Good or Fair
Greatest At Least 5 Yes Sufficient Not Used
Meet Design, Execution, Number, and Consistency Criteria for Sufficient But Not Strong Evidence
Large Not Used
Sufficient Good Greatest 1 Not Applicable
Sufficient Not Used
Good or Fair
Greatest or Moderate
At Least 3 Yes Sufficient Not Used
Good or Fair
Greatest, Moderate, or Least
At Least 5 Yes Sufficient Not Used
Expert Opinion Varies Varies Varies Varies Sufficient Supports a Recommendation
Insufficient A. Insufficient Designs or Execution
B. Too Few Studies
C. Inconsistent
D. Small E. Not Used
GRADE - 2004
Systematic review
Guideline development
PICO
OutcomeOutcomeOutcomeOutcome
Formulate
question
Rate
importance
Critical
ImportantCritical
Not important
Create
evidence
profile with
GRADEpro
Summary of findings & estimate of effect for each outcome
Rate overall quality of
evidence across outcomes based
on lowest quality of critical outcomes
RCT start high, obs. data start
low1. Risk of bias2. Inconsisten
cy3. Indirectnes
s4. Imprecision5. Publication
bias
Grad
e d
own
Grad
e u
p 1. Large effect
2. Dose response
3. Confounders
Rate quality
of evidence
for each
outcomeSelect
outcomes
Very low
LowModerate
High
Formulate recommendations:•For or against (direction)•Strong or weak (strength)
By considering:Quality of evidenceBalance benefits/harmsValues and preferences
Revise if necessary by considering:Resource use (cost)
• “We recommend using…”• “We suggest using…”• “We recommend against using…”• “We suggest against using…”
Outcomes
across
studies
Case scenarioA 13 year old girl who lives in rural Indonesia presented with flu symptoms and developed severe respiratory distress over the course of the last 2 days. She required intubation. The history reveals that she shares her living quarters with her parents and her three siblings. At night the family’s chicken stock shares this room too and several chicken had died unexpectedly a few days before the girl fell sick.
Potential interventions: antivirals, such as neuraminidase inhibitors oseltamivir and zanamivir
Types of questions
Background QuestionsDefinition: What is Avian Influenza?Mechanism: What is the mechanism of
action of oseltamivir?
Foreground QuestionsBenefit > harm: In patients with avian
influenza, does oseltamivir therapy improve survival, …?
Framing a foreground question
Population: Avian Flu/influenza A (H5N1) patients
Intervention: Oseltamivir
Comparison: No pharmacological intervention
Outcomes: Mortality, hospitalizations, resource use, adverse outcomes, antimicrobial resistance
Schunemann, et al., The Lancet ID, 2007
Systematic review
Guideline development
PICO
OutcomeOutcomeOutcomeOutcome
Formulate
question
Rate
importance
Critical
ImportantCritical
Not important
Create
evidence
profile with
GRADEpro
Summary of findings & estimate of effect for each outcome
Rate overall quality of
evidence across outcomes based
on lowest quality of critical outcomes
RCT start high, obs. data start
low1. Risk of bias2. Inconsisten
cy3. Indirectnes
s4. Imprecision5. Publication
bias
Grad
e d
own
Grad
e u
p 1. Large effect
2. Dose response
3. Confounders
Rate quality
of evidence
for each
outcomeSelect
outcomes
Very low
LowModerate
High
Formulate recommendations:•For or against (direction)•Strong or weak (strength)
By considering:Quality of evidenceBalance benefits/harmsValues and preferences
Revise if necessary by considering:Resource use (cost)
• “We recommend using…”• “We suggest using…”• “We recommend against using…”• “We suggest against using…”
Outcomes
across
studies
Evidence Profiles/Summaries
Evidence Profiles/Summaries
Evidence Profiles/Summaries
Evidence Profiles/Summaries
Galactomannan ELISA for the diagnosis of invasive aspergillosis Question: Should Galactomannan ELISA be used for the diagnosis of invasive aspergillosis? Authors of the profile: Nancy Santesso, Jan Brozek, Holger Schünemann Population: immunocompromized patients, mostly hematology patients [1] Setting: mainly hematology or cancer departments, mainly inpatients New Test: commercial Platelia® sandwich ELISA detecting galactomannan in serum [2] Cut-off value: 1.5 ODI Reference Test: composite of EORTC/MSG clinical and histological criteria [3] Threshold: Proven or probable invasive aspergillosis Bibliography: Leeflang et al. Galactomannan detection for invasive aspergillosis in immunocompromized patients. Cochrane Database of Systematic Reviews 2008, Issue 4. CD007394 Number of Participants (studies)
Study design Risk of bias Indirectness Inconsistency Imprecision Publication bias
Effect (number per 1000 patients tested) Quality of the evidence (GRADE)
Importance [5]
Prevalence 1% [4]
Prevalence 12% [4]
Prevalence 44% [4]
True positives (Patients correctly classified as having invasive aspergillosis) [6] 2777 (18 studies)
cohort, case-control no no no no unlikely 6
(5 to 8) 77 (60 to 92)
282 (220 to 339)
HIGH
7
True negatives (Patients correctly classified as not having invasive aspergillosis) [7] 2777 (18 studies)
cohort, case-control no no no no unlikely 941
(901 to 960) 836 (801 to 854)
532 (510 to 543)
HIGH
7
False positives (Patients incorrectly classified as having invasive aspergillosis) [8] 2777 (18 studies)
cohort, case-control no serious [9] no no unlikely 50
(89 to 30) 44 (79 to 26)
28 (50 to 17)
MODERATE 8
False negatives (Patients incorrectly classified as not having the disease) [10] 2777 (18 studies)
cohort, case-control no serious [11] no no unlikely 4
(5 to 2) 43 (60 to 28)
158 (220 to 101)
MODERATE
9
Inconclusive results [12] 901 (7 studies)
cohort, case-control – – – – – – – – – 4
Based on Sensitivity of 0.64 (0.50-0.77) and Specificity of 0.95 (0.91-0.97) Difference in complications [13] – – – – – – – – – – – 3 Difference in resource use [13] – – – – – – – – – – – 6 Footnotes: [1] One study that included only patients with HIV/AIDS was excluded, because this patient group and setting differed from the other included patient groups in such a way that authors of the review regarded the study population as not being representative. [2] A diagnostic test for invasive aspergillosis (IA) needs to be not too invasive or a burden to immunocompromised patients. The galactomannan ELISA test on a serum specimen is less burdensome that a current reference standard. [3] A true reference standard is autopsy combined with culture from autopsy specimens; a clinical reference standard is composed of EORTC/MSG clinical and histological criteria. [4] Estimates of prevalence of IA were based on the median and range of values in included studies. [5] GRADE recommends classifying outcomes on a 9 point scale: 1-3 is not important; 4-6 is important; and, 7-9 is critical to a decision.
Summary of Findings table Galactomannan ELISA for the diagnosis of invasive aspergillosis Patient or population: immunocompromized patients, mostly hematology patients1 Settings: mainly hematology or cancer departments, mainly inpatients New Test: commercial Platelia® sandwich ELISA detecting galactomannan in serum2 Cut-off value: 1.5 ODI Reference Test: composite of EORTC/MSG clinical and histological criteria3 Threshold: Proven or probable invasive aspergillosis
Outcomes
Illustrative Comparative Numbers (95% CI)4 Number of
participants (studies)
Quality of the
Evidence
Comments Assumed numbers with galactomannan ELISA compared to reference
test (EORTC/MSG criteria)
True positives (Patients correctly classified as having invasive aspergillosis)
Prevalence 1%
2777 (18)
High
True positives are important because patients will be treated with improved survival and spared the invasive diagnostic procedures, however they will suffer the toxicity of the treatment.
6 (5 to 8)
Prevalence 12% 77 (60 to 92) Prevalence 44% 282 (220 to 339)
True negatives (Patients correctly classified as not having invasive aspergillosis)
Prevalence 1%
2777 (18)
High
True negatives are important, because patients will not be treated unnecessarily, will be spared the toxic effects of treatment, and will be reassured.
941 (901 to 960) Prevalence 12% 836 (801 to 854) Prevalence 44% 532 (510 to 543)
False positives (Patients incorrectly classified as having invasive aspergillosis)
Prevalence 1%
2777 (18)
Moderate5
False positives are important, because patients will be treated unnecessarily and many will be exposed to nephrotoxic drugs, also because of a possibly delayed diagnosis and treatment of true cause of symptoms.
50 (89 to 30) Prevalence 12% 44 (79 to 26) Prevalence 44% 28 (50 to 17)
False negatives (Patients incorrectly classified as not having the disease)
Prevalence 1%
2777 (18)
Moderate6
False negatives important, because of missed diagnosis and either will not be treated with 60–90% mortality rate or treatment will be delayed with uncertain consequences on mortality.
4 (5 to 2) Prevalence 12% 43 (60 to 28) Prevalence 44% 158 (220 to 101)
Inconclusive results
See comment See comment
See comment
Inconclusive results were not reported. Most likely they would result in repeating the index test and therefore increased cost.
Complications See comment See comment
See comment Complications were not reported.
Resource use See comment See comment
See comment Resource use was not reported.
CI: Confidence interval; EORTC: European Organization for Research and Treatment of Cancer; MSG: Mycoses Study Group GRADE Working Group grades of evidence High quality: Further research is very unlikely to change our confidence in the estimate of effect. Moderate quality: Further research is likely to have an important impact on our confidence in the estimate of effect and may change the estimate. Low quality: Further research is very likely to have an important impact on our confidence in the estimate of effect and is likely to change the estimate. Very low quality: We are very uncertain about the estimate. 1 One study that included only patients with HIV/AIDS was excluded, because this patient group and setting differed from the other included patient groups in such a way that authors of the review regarded the study population as not being representative. 2 A diagnostic test for invasive aspergillosis (IA) needs to be not too invasive or a burden to immunocompromised patients. The galactomannan ELISA test on a serum specimen is less burdensome that a current reference standard. 3 A true reference standard is autopsy combined with culture from autopsy specimens; a clinical reference standard is composed of EORTC/MSG clinical and histological criteria. 4 Estimates of prevalence of IA were based on the median and range of values in included studies. 5 There is uncertainty to what extent a delayed diagnosis and treatment of true cause of symptoms would impact on patient-important outcomes. 6 There is some uncertainty to what extent a delayed diagnosis and treatment would impact mortality and other outcomes.
Systematic review
Guideline development
PICO
OutcomeOutcomeOutcomeOutcome
Formulate
question
Rate
importance
Critical
ImportantCritical
Not important
Create
evidence
profile with
GRADEpro
Summary of findings & estimate of effect for each outcome
Rate overall quality of
evidence across outcomes based
on lowest quality of critical outcomes
RCT start high, obs. data start
low1. Risk of bias2. Inconsisten
cy3. Indirectnes
s4. Imprecision5. Publication
bias
Grad
e d
own
Grad
e u
p 1. Large effect
2. Dose response
3. Confounders
Rate quality
of evidence
for each
outcomeSelect
outcomes
Very low
LowModerate
High
Formulate recommendations:•For or against (direction)•Strong or weak (strength)
By considering:Quality of evidenceBalance benefits/harmsValues and preferences
Revise if necessary by considering:Resource use (cost)
• “We recommend using…”• “We suggest using…”• “We recommend against using…”• “We suggest against using…”
Outcomes
across
studies
Questions
GRADE: recommendation – quality of evidence
Clear separation:1) Recommendation: 2 grades –
weak/conditional/optional or strong (for or against an intervention)?– Balance of benefits and downsides, values and
preferences, resource use and quality of evidence
2) 4 categories of quality of evidence: (High), (Moderate), (Low), (Very low)?– methodological quality of evidence– likelihood of bias– by outcome and across outcomes
*www.GradeWorking-Group.org
GRADE Quality of Evidence
In the context of making recommendations:• The quality of evidence reflects the extent of our
confidence that the estimates of an effect are adequate to support a particular decision or recommendation.
Likelihood of and confidence in an outcome
Definition of grades of evidence• /A/High: Further research is very unlikely
to change confidence in the estimate of effect.• /B/Moderate: Further research is likely to
have an important impact on confidence in the estimate of effect and may change the estimate.
• /C/Low: Further research is very likely to have an important impact on confidence in the estimate of effect and is likely to change the estimate.
• /D/Very low: Any estimate of effect is very uncertain.
Determinants of quality• RCTs
• observational studies
• 5 factors that can lower quality1. limitations in detailed design and execution (risk of bias criteria)2. Inconsistency (or heterogeneity)3. Indirectness (PICO and applicability)4. Imprecision (number of events and confidence intervals)5. Publication bias
• 3 factors can increase quality– large magnitude of effect– all plausible residual confounding may be working to reduce the
demonstrated effect or increase the effect if no effect was observed
– dose-response gradient
1. Design and Execution/Risk of Bias
Examples:• Inappropriate selection of exposed and unexposed groups• Failure to adequately measure/control for confounding• Selective outcome reporting• Failure to blind (e.g. outcome assessors)• High loss to follow-up• Lack of concealment in RCTs• Intention to treat principle violated
Design and Execution/RoB
From Cates , CDSR 2008
Design and Execution/RoB
Overall judgment required
YesNoDon’t know or undecided
Who believes the risk of bias is of concern?
No, there are no serious limitations
Yes, there are serious limitations
Yes, there are very serious limitations
Would you downgrade for risk of bias?
No, there are no serious limitations – although there ….
Yes, there are serious limitations – most people would agree that selective reporting is….
Yes, there are very serious limitations – there is a risk of bias but only for the one criteria of selective reporting
Would you downgrade for risk of bias?
2. Inconsistency of results(Heterogeneity)
• if inconsistency, look for explanation– patients, intervention, comparator, outcome
• if unexplained inconsistency lower quality
Reminders for immunization uptake
3. Directness of Evidence
• differences in– populations/patients (childrent – neonates, women in
general – pregnant women)– interventions (all vaccines, new - old)– comparator appropriate (new policy – old or no policy)– outcomes (important – surrogate: cases prevented –
seroconversion)• indirect comparisons
– interested in A versus B– have A versus C and B versus C– Vaccine A versus Placebo versus Vaccine B
4. Publication Bias
• Should always be suspected– Only small “positive” trials– For profit interest– Various methods to evaluate – none perfect, but
clearly a problem
Egger M, Smith DS. BMJ 1995;310:752-54 98
I.V. Mg in acute
myocardial infarction
Publication bias
Meta-analysisYusuf S.Circulation 1993
ISIS-4Lancet 1995
99
Funnel plotS
tand
ard
Err
or
Odds ratio
0.1 0.3 1 33
2
1
0
100.6
Symmetrical:No publication bias
100
Funnel plotS
tand
ard
Err
or
Odds ratio
0.1 0.3 1 33
2
1
0
100.6
Asymmetrical:Publication bias?
0.4
File drawer problemNo interest in publishing or being published
5. Imprecision• Small sample size
– small number of events
• Wide confidence intervals– uncertainty about magnitude of effect
• Extent to which confidence in estimate of effect adequate to support decision
Example: Immunization in children
What can raise quality?1. large magnitude can upgrade (RRR 50%/RR 2)
– very large two levels (RRR 80%/RR 5)– criteria
• everyone used to do badly• almost everyone does well
– parachutes to prevent death when jumping from airplanes
Reminders for immunization uptake
What can raise quality?2. dose response relation
– (higher INR – increased bleeding)– childhood lymphoblastic leukemia
• risk for CNS malignancies 15 years after cranial irradiation• no radiation: 1% (95% CI 0% to 2.1%) • 12 Gy: 1.6% (95% CI 0% to 3.4%) • 18 Gy: 3.3% (95% CI 0.9% to 5.6%)
3. all plausible confounding may be working to reduce the demonstrated effect or increase the effect if no effect was observed
All plausible residual confoundingwould result in an overestimate of effect
Hypoglycaemic drug phenformin causes lactic acidosis
The related agent metformin is under suspicion for the same toxicity.
Large observational studies have failed to demonstrate an association– Clinicians would be more alert to lactic acidosis in
the presence of the agent• Vaccine – adverse effects
Quality assessment criteria Study design Initial quality of a
body of evidence Lower if Higher if Quality of a body
of evidence
Randomised trials
High Risk of Bias - 1 Serious
-2 Very serious
Inconsistency - 1 Serious
-2 Very serious
Indirectness - 1 Serious
-2 Very serious
Imprecision
- 1 Serious
-2 Very serious
Publication bias
- 1 Likely
-2 Very likely
Large effect + 1 Large +2 Very large
Dose response
+1 Evidence of a gradient
All plausible residual confounding
+1 Would reduce a demonstrated effect
+1 Would suggest a spurious effect if no effect was observed
A/High (four plus: )
B/Moderate (three plus: )
Observational studies
Low C/Low (two plus: )
D/Very low (one plus: )
1. Design and Execution/Risk of Bias
Examples:• Inappropriate selection of exposed and unexposed groups• Failure to adequately measure/control for confounding• Selective outcome reporting• Failure to blind (e.g. outcome assessors)• High loss to follow-up• Lack of concealment in RCTs• Intention to treat principle violated
Design and Execution/RoB
Cancer, anticoagulationand mortality
Akl E, Barba M, Rohilla S, Terrenato I, Sperati F, Schünemann HJ. “Anticoagulation for the long term treatment of venous thromboembolism in patients with cancer”. Cochrane Database Syst Rev. 2008 Apr 16;(2):CD006650.
Content
• Background • Quality of evidence• Moving from evidence to recommendations
Strength of recommendation
“The strength of a recommendation reflects the extent to which we can, across the range of patients for whom the recommendations are intended, be confident that desirable effects of a management strategy outweigh undesirable effects.” • Strong or weak/conditional
Determinants of the strength of recommendation
Factors that can strengthen a recommendation
Comment
Quality of the evidence The higher the quality of evidence, the more likely is a strong recommendation.
Balance between desirable and undesirable effects
The larger the difference between the desirable and undesirable consequences, the more likely a strong recommendation warranted. The smaller the net benefit and the lower certainty for that benefit, the more likely weak recommendation warranted.
Values and preferences The greater the variability in values and preferences, or uncertainty in values and preferences, the more likely weak recommendation warranted.
Costs (resource allocation) The higher the costs of an intervention – that is, the more resources consumed – the less likely is a strong recommendation warranted
Developing recommendations
Case scenario
A 13 year old girl who lives in rural Indonesia presented with flu symptoms and developed severe respiratory distress over the course of the last 2 days. She required intubation. The history reveals that she shares her living quarters with her parents and her three siblings. At night the family’s chicken stock shares this room too and several chicken had died unexpectedly a few days before the girl fell sick.
Methods – WHO Rapid Advice Guidelines for management of Avian
FluApplied findings of a recent systematic evaluation
of guideline development for WHO/ACHR
Group composition (including panel of 13 voting members):
clinicians who treated influenza A(H5N1) patients infectious disease experts basic scientists public health officers methodologists
Independent scientific reviewers: Identified systematic reviews, recent RCTs, case series,
animal studies related to H5N1 infection
Oseltamivir for Avian FluSummary of findings: No clinical trial of oseltamivir for treatment of
H5N1 patients. 4 systematic reviews and health technology
assessments (HTA) reporting on 5 studies of oseltamivir in seasonal influenza. Hospitalization: OR 0.22 (0.02 – 2.16) Pneumonia: OR 0.15 (0.03 - 0.69)
3 published case series. Many in vitro and animal studies. No alternative that is more promising at present. Cost: 40$ per treatment course
From evidence to recommendation
Factors that can strengthen a recommendation
Comment
Quality of the evidence Very low quality evidence
Balance between desirable and undesirable effects
Uncertain, but small reduction in relative risk still leads to large absolute effect
Values and preferences Little variability and clear
Costs (resource allocation) Low cost under non-pandemic conditions
Example: Oseltamivir for Avian FluRecommendation: In patients with confirmed or strongly suspected infection with avian influenza A (H5N1) virus, clinicians should administer oseltamivir treatment as soon as possible (????? recommendation, very low quality evidence).
Schunemann et al. The Lancet ID, 2007
Example: Oseltamivir for Avian FluRecommendation: In patients with confirmed or strongly suspected infection with avian influenza A (H5N1) virus, clinicians should administer oseltamivir treatment as soon as possible (strong recommendation, very low quality evidence).
Values and PreferencesRemarks: This recommendation places a high value on the prevention of death in an illness with a high case fatality. It places relatively low values on adverse reactions, the development of resistance and costs of treatment.
Schunemann et al. The Lancet ID, 2007
Implications of a strong recommendation
• Patients: Most people in this situation would want the recommended course of action and only a small proportion would not
• Clinicians: Most patients should receive the recommended course of action
• Policy makers: The recommendation can be adapted as a policy in most situations
Implications of a conditional/weak recommendation• Patients: The majority of people in this situation
would want the recommended course of action, but many would not
• Clinicians: Be more prepared to help patients to make a decision that is consistent with their own values/decision aids and shared decision making
• Policy makers: There is a need for substantial debate and involvement of stakeholders
Systematic review
Guideline development
PICO
OutcomeOutcomeOutcomeOutcome
Formulate
question
Rate
importance
Critical
ImportantCritical
Not important
Create
evidence
profile with
GRADEpro
Summary of findings & estimate of effect for each outcome
Grade overall quality of
evidence across outcomes based
on lowest quality of critical outcomes
Randomization increases initial
quality1. Risk of bias2. Inconsisten
cy3. Indirectnes
s4. Imprecision5. Publication
bias
Grad
e d
own
Grad
e u
p 1. Large effect
2. Dose response
3. Confounders
Rate quality
of evidence
for each
outcomeSelect
outcomes
Very low
LowModerate
High
Formulate recommendations:•For or against (direction)•Strong or weak (strength)
By considering:Quality of evidenceBalance benefits/harmsValues and preferences
Revise if necessary by considering:Resource use (cost)
• “We recommend using…”• “We suggest using…”• “We recommend against using…”• “We suggest against using…”
Outcomes
across
studies
Issues in guideline development for immunization
• Causation versus effects of intervention– Causation not equivalent to efficacy of interventions– Bradford Hill
• Nearly half a century old – tablet from the mountain?
• Harms caused by medications– Assumption is that removal of drug (or no exposure)
leads to NO adverse effects• How confident can one be that removal of the
exposure is effective in preventing disease?– Whether drugs or environmental factors it will
depend on the intervention to remove exposure
GRADE and immunizations• Can herd immunity following immunisation and indirect effects on the co-
circulation of other pathogens typically be ascertained only through the use of observational epidemiological methods?– Innovative randomized controlled trials (RCTs) using cluster-randomization
increasingly being conducted to provide this evidence• A 94% protective effect of a live, monovalent vaccine against measles is classified
as “moderate level of scientific evidence.” – GRADE’s strength of association criteria maybe applied to increase the grade
by 2 levels – from “low” to “high” - possible in this situation• GRADE ratings do not give credit to “gradient of effects with scale of population
level impact compatible with degree of coverage.” – GRADE’s dose-response criterion would apply to such gradients
• May anti-vaccination lobby groups abuse the ratings. – Abuse of any system is possible: equally likely that increased transparency
provided by the GRADE framework can strengthen, rather than undermine, the trust in vaccines and other interventions
Bradford Hill and GRADEBradford Hill Criteria Consideration in GRADE
Strength Strength of association and imprecision in effect estimate
Consistency Consistency across studies, i.e. across different situations
(different researchers)
Temporality Study design, specific study limitations; RCTs fulfil this
criterion better than observational studies
Biological gradient Dose response gradient
Specificity Indirectness
Biological Plausibility Indirectness, publication bias
Coherence Indirectness
Experiment Study design, randomization
Analogy Existing association for critical outcomes will lead to not downgrading the quality
Bradford Hill Criteria
• Would agree with modifying if these criteria were not already considered
• Emphasis is on strength of recommendation, quality of evidence is only one factor
Conclusions Clinical practice guidelines should be based on the best
available evidence to be evidence based GRADE combines what is known in health research
methodology and provides a structured approach to improve communication
Criteria for evidence assessment across questions and outcomes
Criteria for moving from evidence to recommendations Transparent, systematic
four categories of quality of evidence two grades for strength of recommendations
Transparency in decision making and judgments is key