Holger Schunemann TEACH 2010

Post on 30-Mar-2016

252 views 0 download

description

Holger Schünemann, MD, PhD Chair, Department of Clinical Epidemiology & Biostatistics Michael Gent Chair in Healthcare Research McMaster University, Hamilton, Canada

Transcript of Holger Schunemann TEACH 2010

Holger Schünemann, MD, PhD Chair, Department of Clinical Epidemiology & BiostatisticsMichael Gent Chair in Healthcare ResearchMcMaster University, Hamilton, Canada

TEACH, New York, NYAugust 11 - 13, 2010

Content

• Evidence hierarchies• From evidence to recommendations

Case scenarioA 13 year old girl who lives in rural Indonesia presented with flu symptoms and developed severe respiratory distress over the course of the last 2 days. She required intubation. The history reveals that she shares her living quarters with her parents and her three siblings. At night the family’s chicken stock shares this room too and several chicken had died unexpectedly a few days before the girl fell sick.

Interventions: antivirals, such as neuraminidase inhibitors oseltamivir and zanamivir

PICO

Clinical question:

Population: Avian Flu/influenza A (H5N1) patients

Intervention: Oseltamivir (or Zanamivir)

Comparison: No pharmacological intervention

Outcomes: Mortality, hospitalizations, resource use, adverse outcomes, antimicrobial resistance

WHO Avian Influenza GL. Schunemann et al., The Lancet ID, 2007

There are no RCTs! • Do you think that users of

recommendations would like to be informed about the basis (explanation) for a recommendation or coverage decision if they were asked (by their patients)?

• I suspect the answer is “yes”• If we need to provide the basis for

recommendations, we need to say whether the evidence is good or not so good; in other words perhaps “no RCTs”

Hierarchy of evidencebased on quality

STUDY DESIGN Randomized Controlled

Trials Cohort Studies and Case

Control Studies Case Reports and Case

Series, Non-systematic observations

Expert Opinion

BIAS

Which hierarchy?

Evidence Recommendation• B Class I• A 1• IV C

Organization AHA ACCP SIGN

Recommendation for use of oral anticoagulation in patients with atrial fibrillation and rheumatic mitral valve disease

What to do?

Evidence based healthcare decisions

Research evidence

Population valuesand preferences

(Clinical) state and circumstances

Expertise

Haynes et al. 2002

Hierarchy of evidencebased on quality

STUDY DESIGN Randomized Controlled

Trials Cohort Studies and Case

Control Studies Case Reports and Case

Series, Non-systematic observations

Expert Opinion

BIAS

Explain the following?

• Confounding, effect modification & ext. validity• Concealment of randomization• Blinding (who is blinded in a double blinded

study?)• Intention to treat analysis and its correct

application• P-values and confidence intervals

“Everything should be made as simple as possible but not simpler.”

BMJ 2003

BMJ, 2003

BMJ 2003Relative risk reduction:….> 99.9 % (1/100,000)

U.S. Parachute Association reported 821 injuries and 18 deaths out of 2.2 million jumps in 2007

A prominent grading systemLevel of evidence

I

II

III

IV

V

Source of evidence

SR, RCTs

Cohort studies

Case-control studies

Case series

Expert opinion

A

Grades of recomend.

B

C

D

Simple hierarchies are (too) simplistic

STUDY DESIGN Randomized Controlled

Trials Cohort Studies and

Case Control Studies Case Reports and Case

Series, Non-systematic observations

BIAS

Expert Opinion

Expert Opinion

Schünemann & Bone, 2003

Evidence based healthcare decisions

Research evidence

Population valuesand preferences

(Clinical) state and circumstances

Expertise

Haynes et al. 2002

So?

• Users want to know how good the evidence is– (rightly so) not equipped to do – someone needs to

• Confusion about what grading system– What methods to grade?

• Simple systems leading to recommendations not enough

GRADE Working Group

Grades of Recommendation Assessment, Development and

Evaluation

CMAJ 2003, BMJ 2004, BMC 2004, BMC 2005, AJRCCM 2006, Chest 2006, BMJ 2008

• International group: ACCP, AHRQ, Australian NMRC, BMJ Clinical Evidence, CC, CDC, McMaster, NICE, Oxford CEBM, SIGN, UpToDate, USPSTF, WHO

• Aim: to develop a common, transparent and sensible system for grading the quality of evidence and the strength of recommendations

• International group of guideline developers, methodologists & clinicians from around the world (>100 contributors) – since 2000

GRADE Uptake World Health Organization Allergic Rhinitis in Asthma Guidelines (ARIA) American Thoracic Society American College of Physicians European Respiratory Society European Society of Thoracic Surgeons British Medical Journal Infectious Disease Society of America American College of Chest Physicians UpToDate® National Institutes of Health and Clinical Excellence (NICE) Scottish Intercollegiate Guideline Network (SIGN) Cochrane Collaboration Infectious Disease Society of America Clinical Evidence Agency for Health Care Research and Quality (AHRQ) Partner of GIN Over 40 major organizations

GRADE Quality of EvidenceIn the context of a systematic review• The quality of evidence reflects the extent to which

we are confident that an estimate of effect is correct.

In the context of making recommendations • The quality of evidence reflects the extent to which

our confidence in an estimate of the effect is adequate to support a particular recommendation.

Likelihood of and confidence in an outcome

Definition of grades of evidence• /A/High: Further research is very unlikely

to change confidence in the estimate of effect.• /B/Moderate: Further research is likely to

have an important impact on confidence in the estimate of effect and may change the estimate.

• /C/Low: Further research is very likely to have an important impact on confidence in the estimate of effect and is likely to change the estimate.

• /D/Very low: Any estimate of effect is very uncertain.

Determinants of quality• RCTs

• observational studies

• 5 factors that can lower quality1. limitations in detailed design and execution (risk of bias criteria)2. Inconsistency (or heterogeneity)3. Indirectness (PICO and applicability)4. Imprecision (number of events and confidence intervals)5. Publication bias

• 3 factors can increase quality– large magnitude of effect– all plausible residual confounding may be working to reduce the

demonstrated effect or increase the effect if no effect was observed

– dose-response gradient

1. Design and Execution• Limitations/risk of bias

– Randomization– lack of concealment– intention to treat principle violated– inadequate blinding– loss to follow-up– early stopping for benefit

The evidence for the effect of sublingual immunotherapy in children with allergic rhinitis on the development of asthma, comes from a single randomised trial with no description of randomisation, concealment of allocation, and type of analysis, no blinding, and 21% of children lost to follow-up. These very serious limitations would limit our confidence in the estimates of effect and likely lead to downgrading the quality of evidence.

1. Design and Execution• From Cates , CDSR 2008

CDSR 2008

1. Design and Execution

Overall judgment required

I B II V III

Quality of evidence across studies

Outcome #1Outcome #2Outcome #3

Quality: HighQuality: ModerateQuality: Very low

Determinants of quality• RCTs

• observational studies

• 5 factors that can lower quality1. limitations in detailed design and execution (risk of bias criteria)2. Inconsistency (or heterogeneity)3. Indirectness (PICO and applicability)4. Imprecision (number of events and confidence intervals)5. Publication bias

• 3 factors can increase quality– large magnitude of effect– all plausible residual confounding may be working to reduce the

demonstrated effect or increase the effect if no effect was observed

– dose-response gradient

Case scenarioA 13 year old girl who lives in rural Indonesia presented with flu symptoms and developed severe respiratory distress over the course of the last 2 days. She required intubation. The history reveals that she shares her living quarters with her parents and her three siblings. At night the family’s chicken stock shares this room too and several chicken had died unexpectedly a few days before the girl fell sick.

Interventions: antivirals, such as neuraminidase inhibitors oseltamivir and zanamivir

PICO

Clinical question:

Population: Avian Flu/influenza A (H5N1) patients

Intervention: Oseltamivir (or Zanamivir)

Comparison: No pharmacological intervention

Outcomes: Mortality, hospitalizations, resource use, adverse outcomes, antimicrobial resistance

WHO Avian Influenza GL. Schunemann et al., The Lancet ID, 2007

Oseltamivir for Girl with Avian FluSummary of findings: No clinical trial of oseltamivir for treatment of

H5N1 patients. 4 systematic reviews and health technology

assessments (HTA) reporting on 5 studies of oseltamivir in seasonal influenza. Hospitalization: OR 0.22 (0.02 – 2.16) Pneumonia: OR 0.15 (0.03 - 0.69)

3 published case series. Many in vitro and animal studies. No alternative that is more promising at present. Cost: ~ $45 per treatment course

What are the factors that determine your decisions?

GRADE: Factors influencing decisions and recommendations

• Quality of Evidence• Balance of desirable and undesirable

consequences• Values and preferences• Resource

Determinants of the strength of recommendation

Factors that can strengthen a recommendation

Comment

Quality of the evidence The higher the quality of evidence, the more likely is a strong recommendation.

Balance between desirable and undesirable effects

The larger the difference between the desirable and undesirable consequences, the more likely a strong recommendation warranted. The smaller the net benefit and the lower certainty for that benefit, the more likely weak recommendation warranted.

Values and preferences The greater the variability in values and preferences, or uncertainty in values and preferences, the more likely weak recommendation warranted.

Costs (resource allocation) The higher the costs of an intervention – that is, the more resources consumed – the less likely is a strong recommendation warranted

Oseltamivir – Avian Influenza Factors that can weaken the strength of a recommendation. Example: treatment of H5N1 patients with oseltamivir

Decision Explanation

Lower quality evidence

Yes □ No

The quality of evidence is very low

Uncertainty about the balance of benefits versus harms and burdens

Yes □ No

The benefits are uncertain because several important or critical outcomes where not measured. However, the potential benefit is very large despite potentially small relative risk reductions.

Uncertainty or differences in values □ Yes No

All patients and care providers would accept treatment for H5N1 disease

Uncertainty about whether the net benefits are worth the costs

□ Yes No

For treatment of sporadic patients the price is not high ($45).

Frequent “yes” answers will increase the likelihood of a weak recommendation

Example: Oseltamivir for Avian FluRecommendation: In patients with confirmed or strongly suspected infection with avian influenza A (H5N1) virus, clinicians should administer oseltamivir treatment as soon as possible (????? recommendation, very low quality evidence).

Schunemann et al. The Lancet ID, 2007

Example: Oseltamivir for Avian FluRecommendation: In patients with confirmed or strongly suspected infection with avian influenza A (H5N1) virus, clinicians should administer oseltamivir treatment as soon as possible (strong recommendation, very low quality evidence).

Values and PreferencesRemarks: This recommendation places a high value on the prevention of death in an illness with a high case fatality. It places relatively low values on adverse reactions, the development of resistance and costs of treatment.

Schunemann et al. The Lancet ID, 2007

Reassessment of clinical practice guidelines

• Editorial by Shaneyfelt and Centor (JAMA 2009)– “Too many current guidelines have become

marketing and opinion-based pieces…”– “AHA CPG: 48% of recommendations are based

on level C = expert opinion…”– “…clinicians do not use CPG […] greater concern

[…] some CPG are turned into performance measures…”

– “Time has come for CPG development to again be centralized, e.g., AHQR…”

Conclusions Clinical practice guidelines should be based on the best

available evidence to be evidence-based GRADE combines what is known in health research

methodology and provides a structured approach to improve communication

Criteria for evidence assessment across questions and outcomes

Criteria for moving from evidence to recommendations Transparent, systematic

four categories of quality of evidence two grades for strength of recommendations

Transparency in decision making and judgments is key

Example: Oseltamivir for Avian FluRecommendation: In patients with confirmed or strongly suspected infection with avian influenza A (H5N1) virus, clinicians should administer oseltamivir treatment as soon as possible (????? recommendation, very low quality evidence).

Schunemann et al. The Lancet ID, 2007

Are values important?Should resources be considered?

45

Example: Oseltamivir for Avian FluRecommendation: In patients with confirmed or strongly suspected infection with avian influenza A (H5N1) virus, clinicians should administer oseltamivir treatment as soon as possible (strong recommendation, very low quality evidence).

Values and PreferencesRemarks: This recommendation places a high value on the prevention of death in an illness with a high case fatality. It places relatively low values on adverse reactions, the development of resistance and costs of treatment.

Schunemann et al. The Lancet ID, 2007

2. Consistency of results• Look for explanation for inconsistency

– patients, intervention, comparator, outcome, methods

• Judgment– variation in size of effect– overlap in confidence intervals– statistical significance of heterogeneity– I2

3. Directness of Evidence

• differences in– populations/patients (mild versus severe COPD, older,

sicker or more co-morbidity)– interventions (all inhaled steroids, new vs. old)– outcomes (important vs. surrogate; long-term health-

related quality of life, short –term functional capacity, laboratory exercise, spirometry)

• indirect comparisons– interested in A versus B– have A versus C and B versus C– formoterol versus salmeterol versus tiotropium

4. Publication Bias

• Publication bias– Cave! Only few small studies

A systematic review of topical treatments for seasonal allergic conjunctivitis showed that patients using topical sodium cromoglycate were more likely to perceive benefit than those using placebo. However, only small trials reported clinically and statistically significant benefits of active treatment, while a larger trial showed a much smaller and a statistically not significant effect. These findings suggest that smaller studies demonstrating smaller effects might not have been published.

5. Imprecision

• small sample size• small number of events

– wide confidence intervals– uncertainty about magnitude of effect

Observational studies examining the impact of exclusive breastfeeding on development of allergic rhinitis in high risk infants showed a relative risk of 0.87 (95% CI: 0.48 to 1.58) that neither rules out important benefit nor important harm (Mimouni Bloch 2002).

What can raise quality?3 Factors

• large magnitude can upgrade one level– very large two levels– common criteria

• everyone used to do badly• almost everyone does well

– The parachute example: if we’d look at the observational evidence, we’d find a very large effect and the evidence probably would be high quality for preventing death

• dose response relation (higher dose of brain radiation in childhood leukemia leads to greater risk of late malignancies)

• Residual confounding unlikely to be responsible for observed effect

55

Hierarchy of evidencebased on quality

STUDY DESIGN Randomized Controlled

Trials Cohort Studies and Case

Control Studies Case Reports and Case

Series, Non-systematic observations

Expert Opinion

BIAS

GRADE Working Group

Grades of Recommendation Assessment, Development and

Evaluation

CMAJ 2003, BMJ 2004, BMC 2004, BMC 2005, AJRCCM 2006, Chest 2006, BMJ 2008

• International group: ACCP, AHRQ, Australian NMRC, BMJ Clinical Evidence, CC, CDC, McMaster, NICE, Oxford CEBM, SIGN, UpToDate, USPSTF, WHO

• Aim: to develop a common, transparent and sensible system for grading the quality of evidence and the strength of recommendations

• International group of guideline developers, methodologists & clinicians from around the world (>100 contributors) – since 2000

GRADE Uptake World Health Organization Allergic Rhinitis in Asthma Guidelines (ARIA) American Thoracic Society American College of Physicians European Respiratory Society European Society of Thoracic Surgeons British Medical Journal Infectious Disease Society of America American College of Chest Physicians UpToDate® National Institutes of Health and Clinical Excellence (NICE) Scottish Intercollegiate Guideline Network (SIGN) Cochrane Collaboration Infectious Disease Society of America Clinical Evidence Agency for Health Care Research and Quality (AHRQ) Partner of GIN Over 40 major organizations

Guideline development

Process

Prioritise problems & scoping

Establish guideline panel and develop questions, including outcomes

Find and critically appraise systematic review(s)

and/or Prepare protocol(s) for systematic review(s)

and Prepare systematic review(s)

(searches, selection of studies, data collection and analysis)

Prepare an evidence profile

Assess the quality of evidence for each outcome

Prepare a Summary of Findings table

If developing guidelines: Assess the overall quality of evidence

and Decide on the direction (which alternative) and strength of the

recommendation

Draft guideline

Consult with stakeholders and/or external peer reviewers

Disseminate guidelines

Update review or guidelines when needed

Adapt guidelines, if needed

Prioritise guidelines/recommendations for implementation

Implement or support implementation of the guidelines

Evaluate the impact of the guidelines and implementation strategies

Update systematic review/guidelines

Systematic review

Guideline development

PICO

OutcomeOutcomeOutcomeOutcome

Formulate

question

Rate

importance

Critical

ImportantCritical

Not important

Create

evidence

profile with

GRADEpro

Summary of findings & estimate of effect for each outcome

Rate overall quality of

evidence across outcomes based

on lowest quality of critical outcomes

RCT start high, obs. data start

low1. Risk of bias2. Inconsisten

cy3. Indirectnes

s4. Imprecision5. Publication

bias

Grad

e d

own

Grad

e u

p 1. Large effect

2. Dose response

3. Confounders

Rate quality

of evidence

for each

outcomeSelect

outcomes

Very low

LowModerate

High

Formulate recommendations:•For or against (direction)•Strong or weak (strength)

By considering:Quality of evidenceBalance benefits/harmsValues and preferences

Revise if necessary by considering:Resource use (cost)

• “We recommend using…”• “We suggest using…”• “We recommend against using…”• “We suggest against using…”

Outcomes

across

studies

Oxford Centre for Evidence Based Medicine

Levels of Evidence and Grades of Recommendations- 23 November 1999. Grade of

Recommendation Level of

Evidence Therapy/Prevention, Aetiology/Harm Prognosis Diagnosis Economic analysis

1a SR (with homogeneity) of RCTs SR (with homogeneity*) of inception cohort studies; or a CPG validated on a test set.

SR (with homogeneity*) of Level 1 diagnostic studies; or a CPG validated on a test set.

SR (with homogeneity*) of Level 1 economic studies

A

1b Individual RCT (with narrow Confidence Interval)

Individual inception cohort study with > 80% follow-up

Independent blind comparison of an appropriate spectrum of consecutive patients, all of whom have undergone both the diagnostic test and the reference standard.

Analysis comparing all (critically-validated) alternative outcomes against appropriate cost measurement, and including a sensitivity analysis incorporating clinically sensible variations in important variables.

1c All or none All or none case-series Absolute SpPins and SnNouts Clearly as good or better, but cheaper. Clearly as bad or worse but more expensive. Clearly better or worse at the same cost.

2a SR (with homogeneity*) of cohort studies SR (with homogeneity*) of either retrospective cohort studies or untreated control groups in RCTs.

SR (with homogeneity*) of Level >2 diagnostic studies

SR (with homogeneity*) of Level >2 economic studies

B

2b Individual cohort study (including low quality RCT; e.g., <80% follow-up)

Retrospective cohort study or follow-up of unt reated control patients in an RCT; or CPG not validated in a test set.

Any of: Independent blind or objective comparison; Study performed in a set of non-consecutive

patients, or confined to a narrow spectrum of study individuals (or both) all of whom have undergone both the diagnostic test and the reference standard;

A diagnostic CPG not validated in a test set.

Analysis comparing a limited number of alternative outcomes against appropriate cost measurement, and including a sensitivity analysis incorporating clinically sensible variations in important variables.

2c “Outcomes” Research “Outcomes” Research 3a SR (with homogeneity*) of case-control

studies

3b Individual Case-Control Study Independent blind comparison of an appropriate spectrum, but the reference standard was not applied to all study patients

Analysis without accurate cost measurement, but including a sensitivity analysis incorporating clinically sensible variations in important variables.

C

4 Case-series (and poor quality cohort and case-control studies)

Case-series (and poor quality prognostic cohort studies)

Any of: Reference standard was unobjective,

unblinded or not independent; Positive and negative tests were verified

using separate reference standards; Study was performed in an inappropriate

spectrum** of patients.

Analysis with no sensitivity analysis

D

5 Expert opinion without explicit critical appraisal, or based on physiology, bench research or “first principles”

Expert opinion without explicit critical appraisal, or based on physiology, bench research or “first principles”

Expert opinion without explicit critical appraisal, or based on physiology, bench research or “first principles”

Expert opinion without explicit critical appraisal, or based on economic theory

Oxford Centre for Evidence-Based Medicine (Chris Ball, Dave Sackett, Bob Phillips, Brian Haynes, and Sharon Straus).

USPSTF - Grade Definitions After May 2007: Certainty

Level of Certainty DescriptionHigh The available evidence usually includes consistent results from well-designed, well-

conducted studies in representative primary care populations. These studies assess the effects of the preventive service on health outcomes. This conclusion is therefore unlikely to be strongly affected by the results of future studies.

Moderate •The available evidence is sufficient to determine the effects of the preventive service on health outcomes, but confidence in the estimate is constrained by such factors as: The number, size, or quality of individual studies.•Inconsistency of findings across individual studies.•Limited generalizability of findings to routine primary care practice.•Lack of coherence in the chain of evidence.As more information becomes available, the magnitude or direction of the observed effect could change, and this change may be large enough to alter the conclusion.

Low •The available evidence is insufficient to assess effects on health outcomes. Evidence is insufficient because of: The limited number or size of studies.•Important flaws in study design or methods.•Inconsistency of findings across individual studies.•Gaps in the chain of evidence.•Findings not generalizable to routine primary care practice.•Lack of information on important health outcomes.More information may allow estimation of effects on health outcomes.

The USPSTF defines certainty as "likelihood that the USPSTF assessment of the net benefit of a preventive service is correct."

• Recommendations for prognosis– Use prognostic information to determine baseline

risk for healthcare decisions

65

Center for Disease Control and Prevention (CDC)

Evidence of Effectiveness

Execution - Good or

Fair

Design Suitability —

Greatest, Moderate, or

Least

Number of Studies

Consistent Effect Sized

Expert Opinion

Strong Good Greatest At Least 2 Yes Sufficient Not Used

Good Greatest or Moderate

At Least 5 Yes Sufficient Not Used

Good or Fair

Greatest At Least 5 Yes Sufficient Not Used

Meet Design, Execution, Number, and Consistency Criteria for Sufficient But Not Strong Evidence

Large Not Used

Sufficient Good Greatest 1 Not Applicable

Sufficient Not Used

Good or Fair

Greatest or Moderate

At Least 3 Yes Sufficient Not Used

Good or Fair

Greatest, Moderate, or Least

At Least 5 Yes Sufficient Not Used

Expert Opinion Varies Varies Varies Varies Sufficient Supports a Recommendation

Insufficient A. Insufficient Designs or Execution

B. Too Few Studies

C. Inconsistent

D. Small E. Not Used

GRADE - 2004

Systematic review

Guideline development

PICO

OutcomeOutcomeOutcomeOutcome

Formulate

question

Rate

importance

Critical

ImportantCritical

Not important

Create

evidence

profile with

GRADEpro

Summary of findings & estimate of effect for each outcome

Rate overall quality of

evidence across outcomes based

on lowest quality of critical outcomes

RCT start high, obs. data start

low1. Risk of bias2. Inconsisten

cy3. Indirectnes

s4. Imprecision5. Publication

bias

Grad

e d

own

Grad

e u

p 1. Large effect

2. Dose response

3. Confounders

Rate quality

of evidence

for each

outcomeSelect

outcomes

Very low

LowModerate

High

Formulate recommendations:•For or against (direction)•Strong or weak (strength)

By considering:Quality of evidenceBalance benefits/harmsValues and preferences

Revise if necessary by considering:Resource use (cost)

• “We recommend using…”• “We suggest using…”• “We recommend against using…”• “We suggest against using…”

Outcomes

across

studies

Case scenarioA 13 year old girl who lives in rural Indonesia presented with flu symptoms and developed severe respiratory distress over the course of the last 2 days. She required intubation. The history reveals that she shares her living quarters with her parents and her three siblings. At night the family’s chicken stock shares this room too and several chicken had died unexpectedly a few days before the girl fell sick.

Potential interventions: antivirals, such as neuraminidase inhibitors oseltamivir and zanamivir

Types of questions

Background QuestionsDefinition: What is Avian Influenza?Mechanism: What is the mechanism of

action of oseltamivir?

Foreground QuestionsBenefit > harm: In patients with avian

influenza, does oseltamivir therapy improve survival, …?

Framing a foreground question

Population: Avian Flu/influenza A (H5N1) patients

Intervention: Oseltamivir

Comparison: No pharmacological intervention

Outcomes: Mortality, hospitalizations, resource use, adverse outcomes, antimicrobial resistance

Schunemann, et al., The Lancet ID, 2007

Systematic review

Guideline development

PICO

OutcomeOutcomeOutcomeOutcome

Formulate

question

Rate

importance

Critical

ImportantCritical

Not important

Create

evidence

profile with

GRADEpro

Summary of findings & estimate of effect for each outcome

Rate overall quality of

evidence across outcomes based

on lowest quality of critical outcomes

RCT start high, obs. data start

low1. Risk of bias2. Inconsisten

cy3. Indirectnes

s4. Imprecision5. Publication

bias

Grad

e d

own

Grad

e u

p 1. Large effect

2. Dose response

3. Confounders

Rate quality

of evidence

for each

outcomeSelect

outcomes

Very low

LowModerate

High

Formulate recommendations:•For or against (direction)•Strong or weak (strength)

By considering:Quality of evidenceBalance benefits/harmsValues and preferences

Revise if necessary by considering:Resource use (cost)

• “We recommend using…”• “We suggest using…”• “We recommend against using…”• “We suggest against using…”

Outcomes

across

studies

Evidence Profiles/Summaries

Evidence Profiles/Summaries

Evidence Profiles/Summaries

Evidence Profiles/Summaries

Galactomannan ELISA for the diagnosis of invasive aspergillosis Question: Should Galactomannan ELISA be used for the diagnosis of invasive aspergillosis? Authors of the profile: Nancy Santesso, Jan Brozek, Holger Schünemann Population: immunocompromized patients, mostly hematology patients [1] Setting: mainly hematology or cancer departments, mainly inpatients New Test: commercial Platelia® sandwich ELISA detecting galactomannan in serum [2] Cut-off value: 1.5 ODI Reference Test: composite of EORTC/MSG clinical and histological criteria [3] Threshold: Proven or probable invasive aspergillosis Bibliography: Leeflang et al. Galactomannan detection for invasive aspergillosis in immunocompromized patients. Cochrane Database of Systematic Reviews 2008, Issue 4. CD007394 Number of Participants (studies)

Study design Risk of bias Indirectness Inconsistency Imprecision Publication bias

Effect (number per 1000 patients tested) Quality of the evidence (GRADE)

Importance [5]

Prevalence 1% [4]

Prevalence 12% [4]

Prevalence 44% [4]

True positives (Patients correctly classified as having invasive aspergillosis) [6] 2777 (18 studies)

cohort, case-control no no no no unlikely 6

(5 to 8) 77 (60 to 92)

282 (220 to 339)

HIGH

7

True negatives (Patients correctly classified as not having invasive aspergillosis) [7] 2777 (18 studies)

cohort, case-control no no no no unlikely 941

(901 to 960) 836 (801 to 854)

532 (510 to 543)

HIGH

7

False positives (Patients incorrectly classified as having invasive aspergillosis) [8] 2777 (18 studies)

cohort, case-control no serious [9] no no unlikely 50

(89 to 30) 44 (79 to 26)

28 (50 to 17)

MODERATE 8

False negatives (Patients incorrectly classified as not having the disease) [10] 2777 (18 studies)

cohort, case-control no serious [11] no no unlikely 4

(5 to 2) 43 (60 to 28)

158 (220 to 101)

MODERATE

9

Inconclusive results [12] 901 (7 studies)

cohort, case-control – – – – – – – – – 4

Based on Sensitivity of 0.64 (0.50-0.77) and Specificity of 0.95 (0.91-0.97) Difference in complications [13] – – – – – – – – – – – 3 Difference in resource use [13] – – – – – – – – – – – 6 Footnotes: [1] One study that included only patients with HIV/AIDS was excluded, because this patient group and setting differed from the other included patient groups in such a way that authors of the review regarded the study population as not being representative. [2] A diagnostic test for invasive aspergillosis (IA) needs to be not too invasive or a burden to immunocompromised patients. The galactomannan ELISA test on a serum specimen is less burdensome that a current reference standard. [3] A true reference standard is autopsy combined with culture from autopsy specimens; a clinical reference standard is composed of EORTC/MSG clinical and histological criteria. [4] Estimates of prevalence of IA were based on the median and range of values in included studies. [5] GRADE recommends classifying outcomes on a 9 point scale: 1-3 is not important; 4-6 is important; and, 7-9 is critical to a decision.

Summary of Findings table Galactomannan ELISA for the diagnosis of invasive aspergillosis Patient or population: immunocompromized patients, mostly hematology patients1 Settings: mainly hematology or cancer departments, mainly inpatients New Test: commercial Platelia® sandwich ELISA detecting galactomannan in serum2 Cut-off value: 1.5 ODI Reference Test: composite of EORTC/MSG clinical and histological criteria3 Threshold: Proven or probable invasive aspergillosis

Outcomes

Illustrative Comparative Numbers (95% CI)4 Number of

participants (studies)

Quality of the

Evidence

Comments Assumed numbers with galactomannan ELISA compared to reference

test (EORTC/MSG criteria)

True positives (Patients correctly classified as having invasive aspergillosis)

Prevalence 1%

2777 (18)

High

True positives are important because patients will be treated with improved survival and spared the invasive diagnostic procedures, however they will suffer the toxicity of the treatment.

6 (5 to 8)

Prevalence 12% 77 (60 to 92) Prevalence 44% 282 (220 to 339)

True negatives (Patients correctly classified as not having invasive aspergillosis)

Prevalence 1%

2777 (18)

High

True negatives are important, because patients will not be treated unnecessarily, will be spared the toxic effects of treatment, and will be reassured.

941 (901 to 960) Prevalence 12% 836 (801 to 854) Prevalence 44% 532 (510 to 543)

False positives (Patients incorrectly classified as having invasive aspergillosis)

Prevalence 1%

2777 (18)

Moderate5

False positives are important, because patients will be treated unnecessarily and many will be exposed to nephrotoxic drugs, also because of a possibly delayed diagnosis and treatment of true cause of symptoms.

50 (89 to 30) Prevalence 12% 44 (79 to 26) Prevalence 44% 28 (50 to 17)

False negatives (Patients incorrectly classified as not having the disease)

Prevalence 1%

2777 (18)

Moderate6

False negatives important, because of missed diagnosis and either will not be treated with 60–90% mortality rate or treatment will be delayed with uncertain consequences on mortality.

4 (5 to 2) Prevalence 12% 43 (60 to 28) Prevalence 44% 158 (220 to 101)

Inconclusive results

See comment See comment

See comment

Inconclusive results were not reported. Most likely they would result in repeating the index test and therefore increased cost.

Complications See comment See comment

See comment Complications were not reported.

Resource use See comment See comment

See comment Resource use was not reported.

CI: Confidence interval; EORTC: European Organization for Research and Treatment of Cancer; MSG: Mycoses Study Group GRADE Working Group grades of evidence High quality: Further research is very unlikely to change our confidence in the estimate of effect. Moderate quality: Further research is likely to have an important impact on our confidence in the estimate of effect and may change the estimate. Low quality: Further research is very likely to have an important impact on our confidence in the estimate of effect and is likely to change the estimate. Very low quality: We are very uncertain about the estimate. 1 One study that included only patients with HIV/AIDS was excluded, because this patient group and setting differed from the other included patient groups in such a way that authors of the review regarded the study population as not being representative. 2 A diagnostic test for invasive aspergillosis (IA) needs to be not too invasive or a burden to immunocompromised patients. The galactomannan ELISA test on a serum specimen is less burdensome that a current reference standard. 3 A true reference standard is autopsy combined with culture from autopsy specimens; a clinical reference standard is composed of EORTC/MSG clinical and histological criteria. 4 Estimates of prevalence of IA were based on the median and range of values in included studies. 5 There is uncertainty to what extent a delayed diagnosis and treatment of true cause of symptoms would impact on patient-important outcomes. 6 There is some uncertainty to what extent a delayed diagnosis and treatment would impact mortality and other outcomes.

Systematic review

Guideline development

PICO

OutcomeOutcomeOutcomeOutcome

Formulate

question

Rate

importance

Critical

ImportantCritical

Not important

Create

evidence

profile with

GRADEpro

Summary of findings & estimate of effect for each outcome

Rate overall quality of

evidence across outcomes based

on lowest quality of critical outcomes

RCT start high, obs. data start

low1. Risk of bias2. Inconsisten

cy3. Indirectnes

s4. Imprecision5. Publication

bias

Grad

e d

own

Grad

e u

p 1. Large effect

2. Dose response

3. Confounders

Rate quality

of evidence

for each

outcomeSelect

outcomes

Very low

LowModerate

High

Formulate recommendations:•For or against (direction)•Strong or weak (strength)

By considering:Quality of evidenceBalance benefits/harmsValues and preferences

Revise if necessary by considering:Resource use (cost)

• “We recommend using…”• “We suggest using…”• “We recommend against using…”• “We suggest against using…”

Outcomes

across

studies

Questions

GRADE: recommendation – quality of evidence

Clear separation:1) Recommendation: 2 grades –

weak/conditional/optional or strong (for or against an intervention)?– Balance of benefits and downsides, values and

preferences, resource use and quality of evidence

2) 4 categories of quality of evidence: (High), (Moderate), (Low), (Very low)?– methodological quality of evidence– likelihood of bias– by outcome and across outcomes

*www.GradeWorking-Group.org

GRADE Quality of Evidence

In the context of making recommendations:• The quality of evidence reflects the extent of our

confidence that the estimates of an effect are adequate to support a particular decision or recommendation.

Likelihood of and confidence in an outcome

Definition of grades of evidence• /A/High: Further research is very unlikely

to change confidence in the estimate of effect.• /B/Moderate: Further research is likely to

have an important impact on confidence in the estimate of effect and may change the estimate.

• /C/Low: Further research is very likely to have an important impact on confidence in the estimate of effect and is likely to change the estimate.

• /D/Very low: Any estimate of effect is very uncertain.

Determinants of quality• RCTs

• observational studies

• 5 factors that can lower quality1. limitations in detailed design and execution (risk of bias criteria)2. Inconsistency (or heterogeneity)3. Indirectness (PICO and applicability)4. Imprecision (number of events and confidence intervals)5. Publication bias

• 3 factors can increase quality– large magnitude of effect– all plausible residual confounding may be working to reduce the

demonstrated effect or increase the effect if no effect was observed

– dose-response gradient

1. Design and Execution/Risk of Bias

Examples:• Inappropriate selection of exposed and unexposed groups• Failure to adequately measure/control for confounding• Selective outcome reporting• Failure to blind (e.g. outcome assessors)• High loss to follow-up• Lack of concealment in RCTs• Intention to treat principle violated

Design and Execution/RoB

From Cates , CDSR 2008

Design and Execution/RoB

Overall judgment required

YesNoDon’t know or undecided

Who believes the risk of bias is of concern?

No, there are no serious limitations

Yes, there are serious limitations

Yes, there are very serious limitations

Would you downgrade for risk of bias?

No, there are no serious limitations – although there ….

Yes, there are serious limitations – most people would agree that selective reporting is….

Yes, there are very serious limitations – there is a risk of bias but only for the one criteria of selective reporting

Would you downgrade for risk of bias?

2. Inconsistency of results(Heterogeneity)

• if inconsistency, look for explanation– patients, intervention, comparator, outcome

• if unexplained inconsistency lower quality

Reminders for immunization uptake

3. Directness of Evidence

• differences in– populations/patients (childrent – neonates, women in

general – pregnant women)– interventions (all vaccines, new - old)– comparator appropriate (new policy – old or no policy)– outcomes (important – surrogate: cases prevented –

seroconversion)• indirect comparisons

– interested in A versus B– have A versus C and B versus C– Vaccine A versus Placebo versus Vaccine B

4. Publication Bias

• Should always be suspected– Only small “positive” trials– For profit interest– Various methods to evaluate – none perfect, but

clearly a problem

Egger M, Smith DS. BMJ 1995;310:752-54 98

I.V. Mg in acute

myocardial infarction

Publication bias

Meta-analysisYusuf S.Circulation 1993

ISIS-4Lancet 1995

99

Funnel plotS

tand

ard

Err

or

Odds ratio

0.1 0.3 1 33

2

1

0

100.6

Symmetrical:No publication bias

100

Funnel plotS

tand

ard

Err

or

Odds ratio

0.1 0.3 1 33

2

1

0

100.6

Asymmetrical:Publication bias?

0.4

File drawer problemNo interest in publishing or being published

5. Imprecision• Small sample size

– small number of events

• Wide confidence intervals– uncertainty about magnitude of effect

• Extent to which confidence in estimate of effect adequate to support decision

Example: Immunization in children

What can raise quality?1. large magnitude can upgrade (RRR 50%/RR 2)

– very large two levels (RRR 80%/RR 5)– criteria

• everyone used to do badly• almost everyone does well

– parachutes to prevent death when jumping from airplanes

Reminders for immunization uptake

What can raise quality?2. dose response relation

– (higher INR – increased bleeding)– childhood lymphoblastic leukemia

• risk for CNS malignancies 15 years after cranial irradiation• no radiation: 1% (95% CI 0% to 2.1%) • 12 Gy: 1.6% (95% CI 0% to 3.4%) • 18 Gy: 3.3% (95% CI 0.9% to 5.6%)

3. all plausible confounding may be working to reduce the demonstrated effect or increase the effect if no effect was observed

All plausible residual confoundingwould result in an overestimate of effect

Hypoglycaemic drug phenformin causes lactic acidosis

The related agent metformin is under suspicion for the same toxicity.

Large observational studies have failed to demonstrate an association– Clinicians would be more alert to lactic acidosis in

the presence of the agent• Vaccine – adverse effects

Quality assessment criteria Study design Initial quality of a

body of evidence Lower if Higher if Quality of a body

of evidence

Randomised trials

High Risk of Bias - 1 Serious

-2 Very serious

Inconsistency - 1 Serious

-2 Very serious

Indirectness - 1 Serious

-2 Very serious

Imprecision

- 1 Serious

-2 Very serious

Publication bias

- 1 Likely

-2 Very likely

Large effect + 1 Large +2 Very large

Dose response

+1 Evidence of a gradient

All plausible residual confounding

+1 Would reduce a demonstrated effect

+1 Would suggest a spurious effect if no effect was observed

A/High (four plus: )

B/Moderate (three plus: )

Observational studies

Low C/Low (two plus: )

D/Very low (one plus: )

1. Design and Execution/Risk of Bias

Examples:• Inappropriate selection of exposed and unexposed groups• Failure to adequately measure/control for confounding• Selective outcome reporting• Failure to blind (e.g. outcome assessors)• High loss to follow-up• Lack of concealment in RCTs• Intention to treat principle violated

Design and Execution/RoB

Cancer, anticoagulationand mortality

Akl E, Barba M, Rohilla S, Terrenato I, Sperati F, Schünemann HJ. “Anticoagulation for the long term treatment of venous thromboembolism in patients with cancer”. Cochrane Database Syst Rev. 2008 Apr 16;(2):CD006650.

Content

• Background • Quality of evidence• Moving from evidence to recommendations

Strength of recommendation

“The strength of a recommendation reflects the extent to which we can, across the range of patients for whom the recommendations are intended, be confident that desirable effects of a management strategy outweigh undesirable effects.” • Strong or weak/conditional

Determinants of the strength of recommendation

Factors that can strengthen a recommendation

Comment

Quality of the evidence The higher the quality of evidence, the more likely is a strong recommendation.

Balance between desirable and undesirable effects

The larger the difference between the desirable and undesirable consequences, the more likely a strong recommendation warranted. The smaller the net benefit and the lower certainty for that benefit, the more likely weak recommendation warranted.

Values and preferences The greater the variability in values and preferences, or uncertainty in values and preferences, the more likely weak recommendation warranted.

Costs (resource allocation) The higher the costs of an intervention – that is, the more resources consumed – the less likely is a strong recommendation warranted

Developing recommendations

Case scenario

A 13 year old girl who lives in rural Indonesia presented with flu symptoms and developed severe respiratory distress over the course of the last 2 days. She required intubation. The history reveals that she shares her living quarters with her parents and her three siblings. At night the family’s chicken stock shares this room too and several chicken had died unexpectedly a few days before the girl fell sick.

Methods – WHO Rapid Advice Guidelines for management of Avian

FluApplied findings of a recent systematic evaluation

of guideline development for WHO/ACHR

Group composition (including panel of 13 voting members):

clinicians who treated influenza A(H5N1) patients infectious disease experts basic scientists public health officers methodologists

Independent scientific reviewers: Identified systematic reviews, recent RCTs, case series,

animal studies related to H5N1 infection

Oseltamivir for Avian FluSummary of findings: No clinical trial of oseltamivir for treatment of

H5N1 patients. 4 systematic reviews and health technology

assessments (HTA) reporting on 5 studies of oseltamivir in seasonal influenza. Hospitalization: OR 0.22 (0.02 – 2.16) Pneumonia: OR 0.15 (0.03 - 0.69)

3 published case series. Many in vitro and animal studies. No alternative that is more promising at present. Cost: 40$ per treatment course

From evidence to recommendation

Factors that can strengthen a recommendation

Comment

Quality of the evidence Very low quality evidence

Balance between desirable and undesirable effects

Uncertain, but small reduction in relative risk still leads to large absolute effect

Values and preferences Little variability and clear

Costs (resource allocation) Low cost under non-pandemic conditions

Example: Oseltamivir for Avian FluRecommendation: In patients with confirmed or strongly suspected infection with avian influenza A (H5N1) virus, clinicians should administer oseltamivir treatment as soon as possible (????? recommendation, very low quality evidence).

Schunemann et al. The Lancet ID, 2007

Example: Oseltamivir for Avian FluRecommendation: In patients with confirmed or strongly suspected infection with avian influenza A (H5N1) virus, clinicians should administer oseltamivir treatment as soon as possible (strong recommendation, very low quality evidence).

Values and PreferencesRemarks: This recommendation places a high value on the prevention of death in an illness with a high case fatality. It places relatively low values on adverse reactions, the development of resistance and costs of treatment.

Schunemann et al. The Lancet ID, 2007

Implications of a strong recommendation

• Patients: Most people in this situation would want the recommended course of action and only a small proportion would not

• Clinicians: Most patients should receive the recommended course of action

• Policy makers: The recommendation can be adapted as a policy in most situations

Implications of a conditional/weak recommendation• Patients: The majority of people in this situation

would want the recommended course of action, but many would not

• Clinicians: Be more prepared to help patients to make a decision that is consistent with their own values/decision aids and shared decision making

• Policy makers: There is a need for substantial debate and involvement of stakeholders

Systematic review

Guideline development

PICO

OutcomeOutcomeOutcomeOutcome

Formulate

question

Rate

importance

Critical

ImportantCritical

Not important

Create

evidence

profile with

GRADEpro

Summary of findings & estimate of effect for each outcome

Grade overall quality of

evidence across outcomes based

on lowest quality of critical outcomes

Randomization increases initial

quality1. Risk of bias2. Inconsisten

cy3. Indirectnes

s4. Imprecision5. Publication

bias

Grad

e d

own

Grad

e u

p 1. Large effect

2. Dose response

3. Confounders

Rate quality

of evidence

for each

outcomeSelect

outcomes

Very low

LowModerate

High

Formulate recommendations:•For or against (direction)•Strong or weak (strength)

By considering:Quality of evidenceBalance benefits/harmsValues and preferences

Revise if necessary by considering:Resource use (cost)

• “We recommend using…”• “We suggest using…”• “We recommend against using…”• “We suggest against using…”

Outcomes

across

studies

Issues in guideline development for immunization

• Causation versus effects of intervention– Causation not equivalent to efficacy of interventions– Bradford Hill

• Nearly half a century old – tablet from the mountain?

• Harms caused by medications– Assumption is that removal of drug (or no exposure)

leads to NO adverse effects• How confident can one be that removal of the

exposure is effective in preventing disease?– Whether drugs or environmental factors it will

depend on the intervention to remove exposure

GRADE and immunizations• Can herd immunity following immunisation and indirect effects on the co-

circulation of other pathogens typically be ascertained only through the use of observational epidemiological methods?– Innovative randomized controlled trials (RCTs) using cluster-randomization

increasingly being conducted to provide this evidence• A 94% protective effect of a live, monovalent vaccine against measles is classified

as “moderate level of scientific evidence.” – GRADE’s strength of association criteria maybe applied to increase the grade

by 2 levels – from “low” to “high” - possible in this situation• GRADE ratings do not give credit to “gradient of effects with scale of population

level impact compatible with degree of coverage.” – GRADE’s dose-response criterion would apply to such gradients

• May anti-vaccination lobby groups abuse the ratings. – Abuse of any system is possible: equally likely that increased transparency

provided by the GRADE framework can strengthen, rather than undermine, the trust in vaccines and other interventions

Bradford Hill and GRADEBradford Hill Criteria Consideration in GRADE

Strength Strength of association and imprecision in effect estimate

Consistency Consistency across studies, i.e. across different situations

(different researchers)

Temporality Study design, specific study limitations; RCTs fulfil this

criterion better than observational studies

Biological gradient Dose response gradient

Specificity Indirectness

Biological Plausibility Indirectness, publication bias

Coherence Indirectness

Experiment Study design, randomization

Analogy Existing association for critical outcomes will lead to not downgrading the quality

Bradford Hill Criteria

• Would agree with modifying if these criteria were not already considered

• Emphasis is on strength of recommendation, quality of evidence is only one factor

Conclusions Clinical practice guidelines should be based on the best

available evidence to be evidence based GRADE combines what is known in health research

methodology and provides a structured approach to improve communication

Criteria for evidence assessment across questions and outcomes

Criteria for moving from evidence to recommendations Transparent, systematic

four categories of quality of evidence two grades for strength of recommendations

Transparency in decision making and judgments is key