Tips on Critical Appraisal of Evidence

download Tips on Critical Appraisal of Evidence

of 21

Transcript of Tips on Critical Appraisal of Evidence

  • 7/31/2019 Tips on Critical Appraisal of Evidence

    1/21

    1

    Tips on critical appraisal of evidence: Diagnosis

    Clinical scenario: Elderly woman with possible iron deficiency anaemia

    Are the results of this study valid?

    Returning to our clinical scenario from the question formulation tutorial:

    You admit a 75 year old woman with community-acquired pneumonia. She responds nicely to appropriate

    antibiotics but her hemoglobin remains at 100 g/l with an MCV of 80. Her peripheral blood smear shows

    hypochromia, she is otherwise well and is on no incriminating medications. You contact her family physician and

    find out that her Hgb was 105 g/l 6 months ago. She has never been investigated for anaemia. A ferritin has been

    ordered and comes back at 40 mmol/l. You admit to yourself that you're unsure how to interpret a ferritin result and

    don't know how precise and accurate it is.

    In the tutorial on clinical questions we formulated the following question: In an elderly woman with hypochromic,

    microcytic anaemia, can a low ferritin diagnose iron deficiency anaemia?

    Our search of the literature to answer this question retrieved an article from the Am J of Medicine(1990;88:205-9).

    How do we critically appraise this diagnosis paper? We'll start off by considering validity first and the following list

    outlines the questions that we need to consider when deciding if a diagnosis paper is valid.

    1. Was there an independent, blind comparison with a reference ("gold") standard of diagnosis?2. Was the diagnostic test evaluated in an appropriate spectrum of patients (like those in whom we would use

    it in practice)?

    3. Was the reference standard applied regardless of the diagnostic test result?4. Was the test (or cluster of tests) validated in a second, independent group of patients?

    Was there an independent, blind comparison with a reference ('gold') standard

    of diagnosis?

    In considering this question, we need to determine whether all patients in the study underwent both the diagnostic

    test under evaluation (in our scenario, the serum ferritin) and the reference standard (in our scenario, bone marrow

    biopsy) to show that they definitely do or do not have the target disorder. We should also ensure that those

    investigators who are applying and interpreting the reference standard do not know the results from the diagnostic

    test.

    We also need to consider if the reference standard is appropriate. Sometimes a reference standard may not be clear

    cut, (such as in the diagnosis of delirium) and in this case, we'd need to review the rationale for the choice of

    reference standard as outlined by the study authors.

    All patients in the study we found underwent serum ferritin testing and bone marrow biopsy.

    Was the diagnostic test evaluated in an appropriate spectrum of patients (like

    those in whom we would use it in practice)?

    http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=pubmed&cmd=Retrieve&dopt=AbstractPlus&list_uids=2178409&query_hl=3&itool=pubmed_docsumhttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=pubmed&cmd=Retrieve&dopt=AbstractPlus&list_uids=2178409&query_hl=3&itool=pubmed_docsum
  • 7/31/2019 Tips on Critical Appraisal of Evidence

    2/21

    2

    The study should include both patients with common presentations of the target disorder and those with conditions

    that are commonly confused with the target disorder of interest. If the study only includes patients with severe

    symptoms of the target disorder (and who would be very obvious to diagnose) it is not likely to be useful to us. We

    need to find out if patients with varying severity of the disease were included in the study and also whether it

    includes patients with target disorders that are often confused with this one. For example, anaemic patients can be

    symptomatic or asymptomatic and the anaemia can result from a number of causes - we would want to ensure that

    the study we retrieved included patients with a variety of presentations and symptoms.

    Reviewing the ferritin study, it included consecutive patients over the age of 65 who were admitted with anaemia to a

    university-affiliated hospital in Canada. It excluded patients from institutions and patients who were too ill or who had severe

    dementia. No details are provided on the definitions used for 'too ill' or 'severe dementia'.

    Was the reference standard applied regardless of the diagnostic test result?

    We need to check to see that even if a patient's serum ferritin was normal, the study investigators performed the

    reference standard. Sometimes if the reference standard is invasive, it may be considered unethical to perform it on

    patients with a negative test result. For example, if a patient with chest pain is suspected to be at low risk of a

    pulmonary embolism and has a negative V/Q scan, an investigator (who is performing a study looking at theaccuracy of the V/Q scan in diagnosing pulmonary embolism) may not want to subject the patient to pulmonary

    angiography which is not without morbidity and mortality. Indeed, this was what the investigators did in

    the PIOPED study - if patients were considered to be at a low risk of a pulmonary embolism and had a negative V/Q

    scan, rather than undergoing a pulmonary angiogram, they were followed up clinically for several months, without

    receiving antithrombotic therapy to see if an event occurred.

    In the ferritin study, all patients received both the diagnostic test and the reference standard.

    Was the test (or cluster of tests) validated in a second, independent group of

    patients?

    The tests should be assessed in an independent 'test' set of patients. This question is important in studies looking at

    multiple diagnostic elements.

    If the study fails any of the above criteria, we need to consider if the flaw is significant and threatens the validity of

    the study. If this is the case, we'll need to look for another study. Returning to our clinical scenario, the paper we

    found satisfies all of the above criteria and we will proceed to assessing it for importance.

    Are the results of this study important?

    Let's begin by drawing a 2x2 table, using the results from the study that we identified:

    http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=2332918&dopt=Abstracthttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=2332918&dopt=Abstract
  • 7/31/2019 Tips on Critical Appraisal of Evidence

    3/21

    3

    Target Disorder (iron deficiency

    anaemia)Totals

    Present Absent

    Diagnostic

    test result (serum

    ferritin)

    Test Positive ( 45mmol/l)

    70

    a

    15

    b

    85

    a + b

    Test Negative (>45

    mmol/l)

    15

    c

    135

    d

    150

    c + d

    Totals

    a + c

    85

    b + d

    150

    a + b + c +

    d

    235

    Our patient's serum ferritin comes back at 40 mmol/l and looking at the Table, we can see that she fits in somewhere

    in the top row (either cell 'a' or cell 'b'). From the Table we can also see that 82% (70/85) of people who have iron

    deficiency anaemia have a serum ferritin in the same range as our patient - this is called the sensitivity of a test.

    And, 10% (15/150) of people without this diagnosis have a serum ferritin in the same range as our patient - this is

    the complement of the specificity (1-specificity). The specificity is the proportion of people without iron deficiency

    anemia who have a negative or normal test result. We're interested in how likely a serum ferritin of 40 mmol/l is in a

    patient with iron deficiency anaemia as compared to someone without this target disorder. Our patient's serum

    ferritin is 8 (82%/10%) times as likely to occur in a patient with iron deficiency than in someone without iron

    deficiency anaemia - this is called the likelihood ratio for a positive test. We can now use this likelihood ratio tocalculate our patient's posttest probability of having iron deficiency anaemia.

    Our patient's posttest probability of having iron deficiency anaemia is obtained by calculating:

    posttest odds/(posttest odds + 1)

    where

    posttest odds = pretest odds x likelihood ratio

    The pretest odds are calculated as pretest probability/1-pretest probability. We judge our patient's pretestprobability of having iron deficiency anaemia as being similar to that of the patients in this study (a+c/a+b+c+d =

    85/235 = 36%) and therefore:

    pretest odds = (0.36/(1-0.36)

    = 0.56

    Using this we can calculate

  • 7/31/2019 Tips on Critical Appraisal of Evidence

    4/21

    4

    posttest odds = 0.56 x 8

    = 4.5

    And, finally,

    posttest probability = 4.5/5.5

    = 82%

    With this information, we can conclude that based on our patient's serum ferritin, it is very likely that she has iron deficiency

    anaemia (posttest probability > 80%) and that our posttest probability is sufficiently high that we would want to work our

    patient up for causes of this target disorder.

    Instead of doing all of the above calculations, we could simply use the likelihood ratio nomogram. Considering that

    our patient's pretest probability of iron deficiency anaemia was 36%, and that the likelihood ratio for a serum

    ferritin of 40 mmol/l was 8, we can see that her posttest probability of iron deficiency anaemia is just over 80%.

    Multilevel tests

    In the paper we found, the serum ferritin results are divided into 3 levels: =45 mmol/l, 46-100 mmol/l and >100

    mmol/l. We can see that more information about the diagnostic test is available when results are presented in

    multilevels:

    Diagnostic test result

    Target Disorder (iron deficiency anaemia)

    Likelihood ratio

    Present Absent

    45 mmol/l70/85 15/150 8

    > 45 100 mmol/l 7/85 27/150 0.4

    > 100 mmol/l 8/85 108/150 0.1

    If our patient's serum ferritin was 110 mmol/l (and using her pretest probability of 36% and the likelihood ratio of

    0.1), her posttest probability of iron deficiency anaemia would be less than 3%, virtually ruling out the possibility of

    this diagnosis. However, if her serum ferritin came back at 65, her posttest probability would be 10% and we'd have

    to decide if this was sufficiently low to stop testing or if we needed to do further investigations.

    Can you apply this valid, important evidence about a diagnostic test

    in caring for your patient?

    Is the diagnostic test available, affordable, accurate, and precise

    in your setting?

  • 7/31/2019 Tips on Critical Appraisal of Evidence

    5/21

    5

    Can you generate a clinically sensible estimate of your patients

    pre-test probability (from personal experience, prevalence

    statistics, practice databases, or primary studies)?

    Are the study patients similar to your own? Is it unlikely that the disease possibilities or probabilities

    have changed since the evidence was gathered?

    Will the resulting post-test probabilities affect your management

    and help your patient?

    Could it move you across a test-treatment threshold? Would your patient be a willing partner in carrying it out?

    Would the consequences of the test help your patient?

  • 7/31/2019 Tips on Critical Appraisal of Evidence

    6/21

    6

    Tips on critical appraisal of evidence: Therapy

    - Single Trials

    Clinical scenario: man with stroke, moderate carotid stenosis

    Are the results of this study valid?

    Returning to our clinical scenario from the question formulation tutorial:

    You admit a 65 year old man with a stroke. On examination you find that he has mild weakness of the right arm

    and right leg and bilateral carotid bruits. You send the patient for carotid doppler ultrasonography and

    subsequently receive the report that he has moderate stenosis (50-69% by NASCET criteria) of the ipsilateral carotid

    artery. You've noticed in the pile of journals that is accumulating in your office that there has been some recent

    literature addressing surgical versus medical therapy for patients with symptomatic carotid stenosis but you are

    unsure of what the results of these studies indicate.

    In the tutorial on clinical questions, we formulated the following question: In a 65 year old man with stroke and

    moderate carotid stenosis, can carotid endarterectomy decrease the risk of stroke compared with medical therapy?

    Our search of the literature found article from the Best Evidence(1999;130:33).

    How do we critically appraise this therapy paper? We'll start off by considering validity first and the following list

    outlines the questions that we need to consider when deciding if a therapy paper is valid.

    1. Was the assignment of patients to treatment randomized? And, was the randomization list concealed?2.

    Was follow-up of patients sufficiently long and complete?

    3. Were all patients analyzed in the groups to which they were randomized?And some less important points:

    4. Were patients and clinicians kept blind to treatment?5. Were groups treated equally, apart from the experimental therapy?6. Were the groups similar at the start of the trial?

    Was the assignment of patients to treatment randomized? And, was the

    randomization list concealed?

    Randomisation helps ensure that patients in treatment groups are identical at the study onset in their risk of the

    event we are hoping to prevent. It balances groups for prognostic factors (good or bad) that if they were unequally

    distributed amongst the groups, could increase, decrease or nullify the effect of the therapy.

    We need to check if the randomisation list has been concealed from the clinicians who entered patients into the trial.

    This is done so that the clinicians won't be aware of which treatment the next patient would receive.

    http://ktclearinghouse.ca/cebm/practise/formulate/tips2http://ktclearinghouse.ca/cebm/practise/formulate/tips2http://ktclearinghouse.ca/cebm/practise/searchhttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=9811916&dopt=Abstracthttp://ktclearinghouse.ca/cebm/practise/ca/therapyst/validity1http://ktclearinghouse.ca/cebm/practise/ca/therapyst/validity2http://ktclearinghouse.ca/cebm/practise/ca/therapyst/validity3http://ktclearinghouse.ca/cebm/practise/ca/therapyst/validity4and5http://ktclearinghouse.ca/cebm/practise/ca/therapyst/validity4and5http://ktclearinghouse.ca/cebm/practise/ca/therapyst/validity6http://ktclearinghouse.ca/cebm/practise/ca/therapyst/validity6http://ktclearinghouse.ca/cebm/practise/ca/therapyst/validity4and5http://ktclearinghouse.ca/cebm/practise/ca/therapyst/validity4and5http://ktclearinghouse.ca/cebm/practise/ca/therapyst/validity3http://ktclearinghouse.ca/cebm/practise/ca/therapyst/validity2http://ktclearinghouse.ca/cebm/practise/ca/therapyst/validity1http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=9811916&dopt=Abstracthttp://ktclearinghouse.ca/cebm/practise/searchhttp://ktclearinghouse.ca/cebm/practise/formulate/tips2http://ktclearinghouse.ca/cebm/practise/formulate/tips2
  • 7/31/2019 Tips on Critical Appraisal of Evidence

    7/21

    7

    The study that we found was randomised (which is one of the inclusion criteria for a therapy article in Best Evidence). From the

    original article we can see that the randomisation list was concealed and details on the randomisation process were also

    provided.

    Was follow-up of patients sufficiently long and complete?

    We'd want to see that the duration of follow-up was sufficiently long to see the outcomes of interest. It is also

    important that the investigators provide details on the number of patients followed up and if possible, on the

    outcomes of patients who dropped out of the study. If we are unsure of what effect the dropouts may have on the

    study result, we can perform a 'sensitivity analysis' for a 'worst case scenario'. For the group that did better, assume

    that all the people who were lost to follow-up did poorly. For the group that did worse, assume all the people who

    were lost to follow-up fared well. If the result still supports the original conclusion, than the follow-up was

    sufficiently complete. It would be unusual for a study to be able to withstand more than a 20% loss of follow-up and

    therefore most journals of secondary publication (including ACP Journal Club and EBM) use this as an exclusion

    criteria for article selection.

    From the abstract we identified in Best Evidence, 99.7%!! of patients were followed up for 5 years.

    Were all patients analyzed in the groups to which they were randomized?

    Anything that happens after randomisation can affect the chance that a study patient has an outcome event.

    Therefore, we need to see if the investigators analysed the patients in the groups to which they were randomised,

    even if they crossed over to the other treatment group. This 'intention to treat' analysis preserves the value of

    randomisation.

    An intention to treat analysis was done in the study that we identified. (This information was provided in the abstract available

    on Best Evidence.)

    Were patients and clinicians kept blind to treatment? And, were groups treatedequally, apart from the experimental therapy?

    Blinding of clinicians and patients helps to prevent additional treatment. The provision of treatment (received in

    addition to the experimental treatment) to just one of the groups is called cointervention. If either the patients or the

    clinicians weren't blinded it could lead to the reporting of symptoms or the interpretation of these symptoms to be

    affected by suspicion about the effectiveness of the treatment under investigation.

    In the NASCET study, all patients received antiplatelet therapy (this was usually ASA and the dose was left to the discretion of

    the neurologist at each study centre), and when indicated they received antihypertensive and or antilipidemic medications.

    Blinding is not always possible (such as in surgery trials) and in these situations we should check to see if outcome

    events were assessed by blinded investigators. For example in NASCET, outcome events were assessed by 4 groups

    the participating neurologist and surgeon; the neurologist at the study centre; by 'blinded' members of the steering

    committee; and by 'blinded' external adjudicators.

    Were the groups similar at the start of the trial?

    This is usually reported in the 'Table 1' of the article. If the groups aren't similar, we need to see if there was an

    adjustment made for the potentially important prognostic factors.

  • 7/31/2019 Tips on Critical Appraisal of Evidence

    8/21

    8

    The medical and surgical groups were similar in NASCET. For example, the percentages of patients who were prescribed

    antihypertensive or antilipidemic medications were similar.

    If the study fails any of the above criteria, we need to decide if the flaw is significant and threatens the validity of

    the study. If this is the case, we'll need to look for another study. Returning to our clinical scenario, the paper we

    found satisfies all the above criteria and we will proceed to assessing it for importance.

    Are the results of this study important?

    What is the magnitude of the treatment effect?

    There are several ways that information about treatment effects can be presented. This discussion will be illustrated

    using the results of NASCET (for any stroke at 5 years) as shown in the first row of numbers in the table below.

    Control Event

    Rate

    Experimental Event

    Rate

    Relative Risk

    Reduction

    Absolute Risk

    Reduction

    Number Needed to

    Treat

    0.264 0.198 25% 0.066 15

    0.000000264 0.000000198 25% 0.000000066 15,000,000

    The control event rate (CER) is the proportion of patients in the control group (in this study, the group that received

    medical care) that had the outcome event of interest (in our scenario, this would be any stroke). The experimental

    event rate (EER) is the proportion of patients in the experimental group (patients in the carotid endarterectomy

    group) that had the outcome of interest.

    The relative risk reduction (RRR) is one way of describing the treatment effects and is calculated as:

    RRR = |EER-CER|/CER

    = |0.198-0.264|/0.264

    = 25%

    Applying this, we can say that if we treat people who have moderate carotid stenosis with carotid endarterectomy

    we can decrease their risk of future stroke by 25% compared to those people who receive medical therapy only.

    If the experimental treatment increases the risk of a good event, we can use this same equation to calculate the

    relative benefit increase (RBI). Similarly, if the experimental treatment increases the risk of an adverse event we canuse the equation to calculate the relative risk increase (RRI).

    The RRR has limitations. Consider the second row of numbers in the table above - when the CER was incredibly

    small (0.000000264) the RRR remains at 25%. The RRR is unable to discriminate between small treatment effects and

    large ones and doesn't reflect the baseline risk of the event.

    One measure that overcomes this is the absolute difference between the CER and EER or the absolute risk reduction

    (ARR). It is calculated as:

  • 7/31/2019 Tips on Critical Appraisal of Evidence

    9/21

    9

    ARR = |EER-CER|

    = |0.198-0.264|

    = 0.066

    If the experimental treatment increased the risk of a good event, we can use this same equation to calculate the

    absolute benefit increase (ABI). Or, if the experimental treatment increases the risk of an adverse event, we can use

    the equation to calculate the absolute risk increase (ARI).

    Returning to the data in the table, we can see that the ARR reflects the baseline risk of the event and that it

    discriminates between small and large treatment effects. However, because it is not a whole number, it is often

    difficult to remember and to translate to patients.

    To overcome these difficulties, we can take the inverse of the ARR which tells us the number of patients that we'd

    need to treat with the experimental therapy in order to prevent one additional bad event. This is called the number

    needed to treat (NNT) and in our example, the NNT is 15. We can see from the table that the NNT (like the ARR) is

    able to differentiate between small and large treatment effects - in the second row of the table, when the CER and

    EER are very small, the NNT is over 15 million!

    When the treatment increases the risk of adverse events, we can calculate the number of patients that we'd need to

    treat with this therapy to cause one additional bad event and this term is called the number needed to harm (NNH).

    The NNH is calculated as 1/ARI.

    How big should an NNT be for us to be impressed? Consider some examples. We'd need to treat 40 people who

    have suspected MI with aspirin to prevent 1 additional death. And, we'd only need to treat 20 people who have

    suspected MI with aspirin and thrombolysis to prevent 1 additional death. If you want to see more examples of

    NNTs, please click here.

    What is the precision of the treatment effect?

    The confidence interval around the NNT can be calculated as the inverse of the confidence interval for the ARR. The

    smaller the number of patients who have the event of interest, the wider the confidence interval.

    Can you apply this valid, important evidence about therapy in caring

    for your patient?Do these results apply to your patient?

    Is your patient so different from those in the study that

    its results cannot apply?

    Is the treatment feasible in your setting?

    What are your patients potential benefits and harms from the therapy?

    http://ktclearinghouse.ca/cebm/glossary/nnthttp://ktclearinghouse.ca/cebm/glossary/nnt
  • 7/31/2019 Tips on Critical Appraisal of Evidence

    10/21

    10

    Method I: f Risk of the outcome in your patient, relative to

    patients in the trial.

    Expressed as a decimal: ______

    NNT/f = ______ / ______ = ______

    (NNT for patients like yours)

    Method II: 1/(PEER x RRR) Your patients expected event rate if theyreceived the control treatment (PEER)

    = ______

    1/(PEERxRRR) = 1/________ = ______

    (NNT for patients like yours)

    Are your patients values and preferences satisfied by the regimen and its consequences?

    Do your patient and you have a clear assessment of their

    values and preferences?

    Are they met by this regimen and its consequences?

  • 7/31/2019 Tips on Critical Appraisal of Evidence

    11/21

    11

    Tips on critical appraisal of evidence: Harm

    Clinical scenario: Man with extrasystoles on sotalol

    Are the results of this study valid?

    Evidence about harm can come from a number of different study types. Ideally we'd like to see a high quality

    systematic review of randomised trials but these aren't easy to find because RCTs aren't always feasible for issues of

    harm. As a result, we usually find evidence about harm in cohort studies (groups of patients who are and aren't

    exposed to the treatment are followed up for the outcome of interest) and case-control studies (patients with the

    outcome of interest are matched with patients without the outcome and investigators look retrospectively to

    determine exposure). Case-control studies are useful when the outcome of interest is rare or when the required

    follow-up is long. The strength of inference that can be drawn from a case-control study is limited because they are

    more susceptible to bias.

    Returning to our clinical scenario from the question formulation tutorial:

    You see a 50 year old man who asks for a repeat prescription of sotalol which he has been taking for extrasystoles

    for several years. He has a remote history of an MI. You haven't seen him previously and are concerned about the

    proarrhythmic properties of sotalol given what is known about other antiarrhythmics.

    During the tutorial on clinical questions we formulated the question: In a man with extrasystoles and a remote

    history of MI, does treatment with sotalol increase his risk of death?

    Searching the literature we found an RCT from the Lancet(1996;348:7-12).

    How do we critically appraise this harm paper? We'll start off by considering validity first and the following list

    outlines the questions that we need to consider when deciding if a harm paper is valid.

    1. Were there clearly defined groups of patients, similar in all important ways other than exposure to thetreatment or other cause?

    2. Were treatments/exposures and clinical outcomes measured in the same ways in both groups? (Was theassessment of outcomes either objective or blinded to exposure?)

    3. Was the follow-up of the study patients sufficiently long (for the outcome to occur and complete)?4. Do the results of the harm study fulfil some of the diagnostic tests for causation?

    o Is it clear that the exposure preceded the onset of the outcome?o Is there a dose-response gradient?o Is there any positive evidence from a 'dechallenge-rechallenge' study?o Is the association consistent from study to study?o Does the association make biological sense?

    Were there clearly defined groups of patients, similar in all important ways other

    than exposure to the treatment or other cause?

    http://ktclearinghouse.ca/cebm/practise/formulate/morepractise/harmhttp://ktclearinghouse.ca/cebm/practise/formulate/morepractise/harmhttp://ktclearinghouse.ca/cebm/practise/searchhttp://www.thelancet.com/newlancet/reg/issues/vol348no9019/menu_NOD3.htmlhttp://ktclearinghouse.ca/cebm/practise/ca/harm/validity1http://ktclearinghouse.ca/cebm/practise/ca/harm/validity1http://ktclearinghouse.ca/cebm/practise/ca/harm/validity1http://ktclearinghouse.ca/cebm/practise/ca/harm/validity1http://ktclearinghouse.ca/cebm/practise/ca/harm/validity2http://ktclearinghouse.ca/cebm/practise/ca/harm/validity2http://ktclearinghouse.ca/cebm/practise/ca/harm/validity2http://ktclearinghouse.ca/cebm/practise/ca/harm/validity2http://ktclearinghouse.ca/cebm/practise/ca/harm/validity3http://ktclearinghouse.ca/cebm/practise/ca/harm/validity4http://ktclearinghouse.ca/cebm/practise/ca/harm/validity4http://ktclearinghouse.ca/cebm/practise/ca/harm/validity3http://ktclearinghouse.ca/cebm/practise/ca/harm/validity2http://ktclearinghouse.ca/cebm/practise/ca/harm/validity2http://ktclearinghouse.ca/cebm/practise/ca/harm/validity1http://ktclearinghouse.ca/cebm/practise/ca/harm/validity1http://www.thelancet.com/newlancet/reg/issues/vol348no9019/menu_NOD3.htmlhttp://ktclearinghouse.ca/cebm/practise/searchhttp://ktclearinghouse.ca/cebm/practise/formulate/morepractise/harmhttp://ktclearinghouse.ca/cebm/practise/formulate/morepractise/harm
  • 7/31/2019 Tips on Critical Appraisal of Evidence

    12/21

    12

    Consider the following table:

    Adverse Event

    Totals

    Present (Case) Absent (Control)

    Exposure to treatment (RCT or cohort) a b a + b

    No exposure to treatment (RCT or cohort) c d c + d

    Totals a + c b + d a + b + c + d

    1. This first question is easy to answer if we've been able to find an RCT during our search. Randomisationshould make the 2 groups of patients similar for all causes of the outcome that we are interested in. In an

    RCT, patients in the experimental treatment group would be in cells a or b in the table above and patients in

    the control group would be in cells c or d.

    2. Returning to our clinical scenario, we have been fortunate in our search and have managed to find an RCT and aresatisfied that patients are similar in all important ways other than exposure to sotalol.

    3. However, there's not always an RCT available to answer our questions and indeed more frequently we findcohort or case control studies to answer our questions about harm and etiology. In a cohort study, 2 groups

    of patients are followed - one group with the exposure to the treatment (a+b in the table) and one group

    without the exposure (c+d) - for the development of the outcome of interest (either a or c ). Because the

    decision about who receives treatment is not randomised, exposed patients may differ from nonexposed

    patients for important determinants of the outcome (these determinants are called confounders).

    Investigators should document characteristics of patients and either show that they are similar or adjust for

    the confounders that they identify. This is limited by the fact that investigators can only adjust for

    confounders that are known and that have been measured.

    4. In case control studies, people with the outcome of interest (cases = a+c) are identified along with thosewithout it (controls = b+d). The proportion of each group who were exposed to the putative agent is

    assessed. Case control studies are susceptible to more bias than cohort studies because confounders that are

    transient or that lead to early death won't get measured. We also need to ensure when reading a case control

    study, that people in the control group had the same opportunity for exposure as people in the case group.

    For example, if we found a case control study looking at the association between sotalol and sudden cardiac

    death and its investigators assembled people with sudden cardiac death as the cases but excluded patients

    with atrial fibrillation from the control group, we'd be concerned that the association found between sotalol

    and sudden cardiac death could be spurious.

    Were treatments/exposures and clinical outcomes measured in the same ways

    in both groups? (Was the assessment of outcomes either objective or blinded toexposure?)

    The application of explicit criteria for the outcomes of interest, a discussion of how they were applied and evidence

    that they were applied without knowledge of which group the patient was in is important. Blinding is crucial if any

    judgment is required to assess the outcome (in RCTs and cohorts studies) or the exposure (in case control studies).

    For example, an unblinded investigator may search more aggressively for outcomes in people with exposure to the

    putative agent. Similarly, people with the adverse outcome may be more likely to have brooded about their

    situation and may have greater incentive to recall possible exposure. Therefore we would want patients and

    interviewers to be blind to the study hypothesis.

  • 7/31/2019 Tips on Critical Appraisal of Evidence

    13/21

    13

    In the RCT that we retrieved, the outcome was death and was the same for both groups.

    Was the follow-up of the study patients sufficiently long (for the outcome to

    occur and complete)?

    If follow-up is short, it may be that too few study patients will have the outcome of interest, thus providing little

    information of use to a patient. For example, if investigators were looking at the association between cancer and aparticular agent and the follow-up time was 1 month, this would be too short for the investigators to see a clinically

    important effect.

    The more people who are unavailable for follow-up, the less accurate the estimate of the risk of the outcome is.

    Losses may occur because patients are too ill (or too well) to be followed or may have died, and the failure to

    document these losses threatens the validity of the study.

    The RCT that we found was stopped early because an increased risk of death was noted.

    Do the results of the harm study fulfil some of the diagnostic tests for causation?

    Is it clear that the exposure preceded the onset of the outcome?

    We'd want to make sure that the exposure occurred before the outcome and that it wasn't just a marker that the

    outcome was already underway. With an RCT, the exposure clearly precedes the outcome as with the trial that we

    found. If it's a case control study, this question becomes more difficult to answer, and more important to ascertain.

    Is there a dose-response gradient?

    With larger doses of the agent, was there an increased risk of the outcome event? In the study we retrieved, this

    wasn't tested since the investigators looked at one dose of sotalol.

    Is there positive evidence from a dechallenge-rechallenge study?

    This occurs when the outcome event disappears (or decreases in intensity) when the putative agent is withdrawn

    and reappears when it is reinstituted. This couldn't be done in the RCT we found because the outcome was death.

    Is the association consistent from study to study?

    Or, is this the only study where the association has been identified? We would be happy to see that several studies

    have looked at this question and have come to the same conclusion (or even better, if there was a systematic review

    of the topic). Only 1 RCT has had sufficient power to look at the use of sotalol and the risk of death.

    Does the association make biological sense?

    If the association between outcome and exposure makes biological sense, a causal relationship is more plausible.

    The results of the sotalol RCT are consistent with findings from studies that have looked at other antiarrythmics

    (e.g. CAST).

  • 7/31/2019 Tips on Critical Appraisal of Evidence

    14/21

    14

    If the study fails any of the above criteria, we need to decide if the flaw is significant and threatens the validity of

    the study. If this is the case, we'll need to look for another study. Returning to our clinical scenario, the paper we

    found satisfies all of the above criteria and we will proceed to assessing it for importance.

    Are the results of this study important?

    What is the magnitude and precision of the association between the exposureand the outcome?

    Let's begin by drawing a 2x2 table using the data from the RCT that we found.

    Adverse Event

    Totals

    Present (Case) Absent (Control)

    Experimental group (d-solatol)

    78

    a

    1471

    b

    1549

    a + b

    Control group (placebo)

    48

    c

    1524

    d

    1572

    c + d

    Totals

    a + c

    126

    b + d

    2995

    a + b + c + d

    6242

    For RCTs and cohort studies, we look at the risk of the event in the treatment group relative to the risk of the eventin the untreated patient. This 'relative risk' is calculated as:

    RR = [ a/(a+b) ] / [ c/(c+d) ]

    Using the values in the table, the relative risk for death in patients receiving d-sotalol is:

    RR = [ 78/1549 ] / [ 48/1572 ]

    = 1.65

    Case control studies sample outcomes, not exposure and therefore we can't calculate the relative risk. Instead, the

    strength of association is estimated indirectly using the odds ratio = ad/bc.

    How big should the relative risk (RR) or odds ratio (OR) be for us to be impressed by it? OR and RR > 1 indicate that

    there is an increased risk of the adverse outcome with the exposure. Because cohort studies and case control studies

    are susceptible to many biases, we need to ensure that the OR/RR is greater than that which could occur from bias

    alone. We also need to look at the confidence interval around the OR and RR to see how precise the estimate is.

    A more clinically useful measure than the OR and RR is the number of patients that we'd need to treat with the

    putative agent in order to cause 1 additional harmful event (number needed to harm or NNH). Using the OR, the

    NNH can be calculated as:

  • 7/31/2019 Tips on Critical Appraisal of Evidence

    15/21

    15

    NNH = [ PEER (OR-1) + 1 ] / [ PEER (OR-1) x (1-PEER) ]

    Where PEER = the patient's expected event rate

    Alternatively, we can refer to the tables below for this information. We can see from these tables that for different

    PEER, the same OR can generate very different NNHs.

    When OR < 1:

    Adapted from John Geddes, 1999

    For Odds Ratios LESS than 1

    0.9 0.8 0.7 0.6 0.5 0.4 0.3

    Patient Expected Event Rate (PEER)

    0.05 209 104 69 52 41 34 29

    0.10 110 54 36 27 21 18 15

    0.20 61 30 20 14 11 10 8

    0.30 46 22 14 10 8 7 5

    0.40 40 19 12 9 7 6 4

    0.50 38 18 11 8 6 5 4

    0.70 44 20 13 9 6 5 4

    0.90 101 46 27 18 12 9 4

    When OR > 1:

    Adapted from John Geddes, 1999

    For Odds Ratios GREATER than 1

    1.1 1.25 1.5 1.75 2 2.25 2.5

    Patient Expected Event Rate (PEER)

    0.05 212 86 44 30 23 18 16

    0.10 113 46 24 16 13 10 9

    0.20 64 27 14 10 8 7 6

    0.30 50 21 11 8 7 6 5

    0.40 44 19 10 8 6 5 5

  • 7/31/2019 Tips on Critical Appraisal of Evidence

    16/21

    16

    Adapted from John Geddes, 1999

    For Odds Ratios GREATER than 1

    1.1 1.25 1.5 1.75 2 2.25 2.5

    0.50 42 18 10 8 6 5 4

    0.70 51 23 13 10 9 8 7

    0.90 121 55 33 25 22 19 18

    We can also convert the RR to an NNT/NNH using the following equations:

    For RR < 1

    NNT = 1/(1-RR) x PEER

    For RR > 1NNT (or NNH) = 1/(RR-1) x PEER

    Using the PEER (3.1%) from the study we found and the RR (1.65) that we calculated, the NNH for death from d-

    sotalol in the study is:

    NNH = 1/(1.65-1) x 0.031

    = 50

    Therefore we would need to treat 50 people with d-sotalol to cause 1 additional death. We can also calculate the

    confidence interval around this estimate using the inverse of the confidence interval for the absolute risk increase.

    Should these valid, potentially important results change the treatment

    of your patient?

    Is your patient so different from those in the study that its results dont apply?

    What are your patients risks of the adverse event?

    To calculate the NNH (number of patients you need to treat to harm one of them) for any odds ratio (OR)

    and your patients expected event rate for this adverse event if they were not exposed to this treatment

    (PEER):

    )1()1(

    1)1(

    PEERORPEER

    ORPEERNNH

  • 7/31/2019 Tips on Critical Appraisal of Evidence

    17/21

    17

    What are your patients preferences, concerns and expectations from this treatment?

    What alternative treatments are available?

  • 7/31/2019 Tips on Critical Appraisal of Evidence

    18/21

    18

    Tips on critical appraisal of evidence: Prognosis

    Clinical scenario: Man with a history of a stroke who is concerned about his risk of seizure

    Are the results of this study valid?

    Information about prognosis can come from a variety of study types. Cohort studies (investigators follow 1 or more

    groups of individuals over time and monitor for the occurrence of the outcome of interest) are the best source of

    evidence about prognosis. Randomised control trials can also provide information about prognosis although trial

    participants may not be representative of the population with the disorder. Case-control studies (investigators

    retrospectively determine prognostic factors by defining the exposure of cases who have already experienced the

    outcome of interest and of controls who haven't) are useful when the outcome of interest is rare or when the

    required follow-up is long. The strength of inference that can be drawn from a case-control study is limited because

    they are more susceptible to bias.

    Returning to our clinical scenario from the question formulation tutorial:

    You see a 70 year old man in your outpatient clinic 3 months after he was discharged from your service with an

    ischemic stroke. He is in sinus rhythm, has mild residual left-sided weakness but is otherwise well. His only

    medication is ASA and he has no allergies. He recently saw an article on the BMJ website describing the risk of

    seizure after a stroke and is concerned that this will happen to him.

    In the tutorial on clinical questions, we formulated the following question: In a 70 year old man does a history of

    stroke increase his risk for seizure?

    Our search of the literature to answer this question retrieved an article from the BMJ(1997;315:1582-7).

    How do we critically appraise this prognosis paper? We'll start by considering validity first and the following listoutlines the questions that we need to consider when deciding if a prognosis paper is valid.

    1. Was a defined, representative sample of patients assembled at a common (usually early) point in the courseof their disease?

    2. Was patient follow-up sufficiently long and complete?3. Were objective outcome criteria applied in a "blind" fashion?4. If subgroups with different prognoses are identified:

    o Was there adjustment for important prognostic factors?o Was there validation in an independent group of "test-set" patients?

    Was a defined, representative sample of patients assembled at a common

    (usually early) point in the course of their disease?

    We hope to find that the individuals included in the study are representative of the underlying population (and

    reflect the spectrum of illness). But, from what point in the target disorder should patients be followed? Above, we

    state 'usually early' implying an inception cohort (a group of people who are assembled at an early point in their

    disease), but clinicians may want information about prognosis in later stages of a target disorder. Thus, a study that

    http://ktclearinghouse.ca/cebm/practise/formulate/morepractise/prognosishttp://ktclearinghouse.ca/cebm/practise/formulate/morepractise/prognosishttp://www.bmj.com/cgi/content/full/315/7122/1582http://ktclearinghouse.ca/cebm/practise/ca/prognosis/validity1http://ktclearinghouse.ca/cebm/practise/ca/prognosis/validity1http://ktclearinghouse.ca/cebm/practise/ca/prognosis/validity1http://ktclearinghouse.ca/cebm/practise/ca/prognosis/validity1http://ktclearinghouse.ca/cebm/practise/ca/prognosis/validity2http://ktclearinghouse.ca/cebm/practise/ca/prognosis/validity3http://ktclearinghouse.ca/cebm/practise/ca/prognosis/validity4http://ktclearinghouse.ca/cebm/practise/ca/prognosis/validity4http://ktclearinghouse.ca/cebm/practise/ca/prognosis/validity4http://ktclearinghouse.ca/cebm/practise/ca/prognosis/validity4http://ktclearinghouse.ca/cebm/practise/ca/prognosis/validity3http://ktclearinghouse.ca/cebm/practise/ca/prognosis/validity2http://ktclearinghouse.ca/cebm/practise/ca/prognosis/validity1http://ktclearinghouse.ca/cebm/practise/ca/prognosis/validity1http://www.bmj.com/cgi/content/full/315/7122/1582http://ktclearinghouse.ca/cebm/practise/formulate/morepractise/prognosishttp://ktclearinghouse.ca/cebm/practise/formulate/morepractise/prognosis
  • 7/31/2019 Tips on Critical Appraisal of Evidence

    19/21

    19

    assembled patients at a later point in the disease may provide useful information. However, if observations are

    made at different points in the course of disease for various people in the cohort, the relative timing of outcome

    events would be difficult to interpret. Thus, the ideal cohort is one in which participants are all at a similar stage in

    the course of the same disease.

    Returning to the paper we found, the study included patients who were entered after their first stroke. Further details on entry

    procedures aren't included in the study.

    Was patient follow-up sufficiently long and complete?

    Ideally, we'd like to see a follow-up period for a study that lasts until every patient recovers or has one of the other

    outcomes of interest, or until the elapsed time of observation is of clinical interest to clinicians or patients. If follow-

    up is short, it may be that too few study patients will have the outcome of interest, thus providing little information

    of use to a patient.

    The more patients who are unavailable for follow-up, the less accurate the estimate of the risk of the outcome.

    Losses may occur because patients are too ill (or too well) to be followed or may have died, and the failure to

    document these losses threatens the validity of the study. Sometimes, however, losses to follow-up are unavoidableand unrelated to prognosis. Although an analysis showing that the baseline demographics of these patients are

    similar to those followed up provides some reassurance that certain types of participants were not selectively lost,

    such an analysis is limited by those characteristics that were measured at baseline. Investigators cannot control for

    unmeasured traits that may be important prognostically, and that may have been more or less prevalent in the lost

    participants than in the followed-up participants. most evidence-based journals of secondary publication (like ACP

    Journal Club and Evidence Based Medicine) require at least 80% follow-up for a prognosis study to be considered

    valid.

    In the study we retrieved, follow-up was sufficiently complete and patients were followed from 2 to 6.5 years.

    Were objective outcome criteria applied in a "blind" fashion?

    We need to assess whether and how explicit criteria for each outcome of interest were applied and if there is

    evidence that they were applied without knowledge of the prognostic factors under consideration. Blinding is

    crucial if any judgement is required to assess the outcome because unblinded investigators may search more

    aggressively for outcomes in people with the characteristic(s) felt to be of prognostic importance than in other

    individuals. Blinding may be unnecessary if the assessments are preplanned for all patients and/or are unequivocal,

    such as total mortality. However, judging the underlying cause of death is difficult and requires blinding to the

    presence of the risk factor to ensure that it is unbiased.

    In the study we identified, patients were asked at follow-up if they had a seizure and if they said "yes", a study neurologist

    subsequently assessed them. It is unclear if the study neurologist was "blind".

    If subgroups with different prognoses are identified, was there adjustment for

    important prognostic factors and was there validation in an independent, "test

    set" of patients?

    We often want to know if patients with certain characteristics will have a different prognosis. For example, are

    patients with an intracranial hemorrhage at increased risk of seizure? Demographic, disease-specific or comorbid

  • 7/31/2019 Tips on Critical Appraisal of Evidence

    20/21

    20

    variables that are associated with the outcome of interest are called prognostic factors. They need not be causal but

    must be strongly enough associated with the development of an outcome to predict its occurrence.

    The identification of a prognostic factor for the first time could be the result of a chance difference in its distribution

    between patients with different prognoses. Therefore, the initial patient group in which the variable was identified

    as a prognostic factor may be considered to be a training set or a hypothesis generation set. Indeed, if investigators

    were to search for multiple potential prognostic factors in the same data set, a few would likely emerge on the basis

    of chance alone. Ideally, therefore, data from a second independent patient group, or a "test set" would be requiredto confirm the importance of a prognostic factor. Although this degree of evidence has often not been collected in

    the past, an increasing number of reports are describing a second, independent study validating the predictive

    power of prognostic factors. If a second, independent study validates these prognostic factors, it can be called a

    clinical prediction guide.

    In the study we found, the investigators looked at patients with different stroke types and identified that patients in these

    groups had different risks of seizures. This was not tested in an independent group of patients to see if it holds true.

    If the study fails any of the above criteria, we need to consider if the flaw is significant and threatens the validity of

    the study. If this is the case, we'll need to look for another study. Returning to our clinical scenario, the paper we

    found satisfies all of the above criteria and we will proceed to assessing it for importance.

    Are the results of this study important?

    How likely are the outcomes over time?

    Typically, results of prognosis studies are reported in one of three ways: as a percentage of the outcome of interest

    at a particular point in time (e.g. 1 year survival rates), as median time to the outcome (e.g. the length of follow-up

    by which 50% of patients have died) or as event curves (e.g. survival curves) that illustrate, at each point in time, the

    proportion of the original study sample who have not yet had a specified outcome.

    From the study we found, the risk of seizure after any type of stroke is 5.7% at 1 year.

    How precise is this prognostic estimate?

    The precision of the estimate is best reflected by its 95% confidence interval; the range of values within which we

    can be 95% sure that the population value lies. The narrower the confidence interval, the more precise is the

    estimate. If survival over time is the outcome of interest, earlier follow-up periods usually include results from more

    patients than later periods, so that survival curves are more precise (i.e. have narrower confidence intervals) earlier

    in follow-up.

    To calculate the 95% confidence interval for the study we identified, we can use the following equation:

    95% Confidence Interval = p +/- 1.96 x SE

    where:

    Standard Error (SE) =

  • 7/31/2019 Tips on Critical Appraisal of Evidence

    21/21

    And 'p' is a proportion of people with the outcome of interest and 'n' is the sample size.

    From the study, n = 675 and p = 0.057

    SE =

    = 0.009

    Therefore the 95% CI is:

    0.057 +/- 1.96 x 0.009 = 3.9% to 7.5%

    Can you apply this valid, important evidence about prognosis in

    caring for your patient?

    Were the study patients similar to your own?

    Will this evidence make a clinically important

    impact on your conclusions about what to offer

    or tell your patient?

    Source :http://ktclearinghouse.ca/cebm/practise/ca

    http://ktclearinghouse.ca/cebm/practise/cahttp://ktclearinghouse.ca/cebm/practise/cahttp://ktclearinghouse.ca/cebm/practise/cahttp://ktclearinghouse.ca/cebm/practise/ca