DPV Wiss False Research Findings

download DPV Wiss False Research Findings

of 6

Transcript of DPV Wiss False Research Findings

  • 8/13/2019 DPV Wiss False Research Findings

    1/6

    Open access freely availab le online

    Why Most Published Research FindingsAre FalseJohn P A. loannidis

    There is increasing concern that mostcurrent published research findingsarefalse.The probabilitythat a research daimis true may depend on study power andbias,the number of other studies on theSame question,and,importantly, he ratioof true to no relationshipsarnong therelationshipsprobed in each scientificfield.In this framework,a research findingis less likely to be true when the studiesconducted in a field are smaller;wheneffect sizes are smal1er;whenthere is agreater number and lesserpreselectionof tested re1ationships;where there isgreater flexibility in designs,definitions,outcomes,and analytical modes;whenthere is greater financialand otherinterestand prejudice;and when moreteams are involved in a scientificfieldin chase of statistical significance.Simulations show that for most studydesignsand Settings,it is rnore likely fora research claim to be false than true.Moreover,formany current scientificfields,claimed research findings mayoften be simply accurate measures of theprevailing bias. In this essay, discusstheimplications of these problems for theconduct and interoretationof research.

    factors that influence this problem andsome corollaries thereof.Modelingthe rameworkfor alsePositive indingsSeverai methodologists havepointed out [9-111 that the highrate of nonreplication (lack ofconfirmation) of research discoveriesis a consequence of the convenient,yet ill-founded strategy of claimingconclusive research findings solely onthe basis of a Single study assessed byformal statistical significance, typicallyfor a pvalue less than 0.05. Researchis not most appropriately representedand summarized by pvalues, but,unfortunately, there is a widespreadnotion that medical research articles

    t a n beproventhatmostdaimed resea~hfindingsare false.should be interpreted based only onpvalues. Research findings are definedhere as any relationship reachingformal statisticalsignificance,e.g.,effective interventions, informativepredictors, risk factors, or associations.Negative research is also very useful.

    P blished research findings aresometimes refuted by subsequentevidence, with ensuing confusionand disappointment. Refutation andcontroversy is Seen across the range ofresearch designs, from clinical trialsand traditional epidemiological studies[I-31 to the most modern molecularresearch [4,5]. There is increasingconcern that in modern research, falsefindings may be the majority or eventhe vast majority of published researchclaims [6-81. However, this shouldnot be surprising. It can be proventhat most claimed research findingsare false. Here I will examine the key

    The Essaysectioncontainsopinionpieceson topksof broad interest to a general medial audience.

    .:@.. PLoS Medicine Iwww.plosmedicine.org

    ~e&itive s actually a misnomer, andthe misinterpretation is widespread.However, here we will targetrelationships that investigators claimexist, rather than null findings.

    s has been shown previously, theprobability that a research findingis indeed true depends on the priorprobabilityof it being true (beforedoing the study), the statistical powerof the study, and the level of statisticalsignificance [10,11]. Consider atable in which research findings arecompared against the gold Standardof true relationships in a scientificfield. In a research field both true andfalse hypotheses can be made aboutthe presence of relationships. Let Rbe the ratio of the number of truerelationships to no relationshipsamong those tested in the field. R

    is characteristic of the field and canva ~y lot depending on whether thefield targets highly likely relationshipsor searches for only one or a fewtrue relationships among thousandsand millions of hypotheses that maybe postulated. Let us also consider,for computational simplicity,circumscribed fields where either thereis only one true relationship (amongmany that can be hypothesized) orthe power is similar to find any of theseveral existing true relationships. Thepre-study probability of a relationshipbeing true is R/(R 1). The probabilityof a study finding a true relationshipreflects the power 1 (one minusthe Type error rate). The probabilityof claiming a relationship when nonetruly exists reflects the Type I errorrate, a.Assuming h a t C relationshipsare being probed in the field, theexpected values of the x table areg-iven in Table 1. After a researchfinding has been claimed based onachieving formal statistical significance,the post-study probability that it is trueis the positive predictive value,PPVThe PPV is also the complementaryprobability of what Wacholder et alhave called the false positive rep onprobability [10]. According to thex table, one getsPPV = (1 R/(R

    PR a) . A research finding is thus

    Citatlon: loannidis JPA (2005) Why most publishedresearch findings arefalsePLoS Med 2(8):el2 4.Copyright:Q 2005 John PAloannidis.This s anopen-accessarticledistributedunder the terms,of the Creative CommonsAttributionLiceme,which permits unrestricteduse,distribution,andreproduction in any medium,providedthe originalwork isproperlycited.Abbmviatlon : PPV, positive predictivevalueJohn P A. loannidis is in the Departm ent of Hygieneand Epidemiology, Uniwnity of loannina Schwl ofMedicine, loannina,Greece,and lnst inne for ClinicalResearch and Health Policy Studies,Depanment ofMedicine,Tuhs-New England Me dia l Center,TuftsUniversity School of Medicine, Boston,Massachusetts,United States of America. E-mail:jioannid c.uoigrCo mp tin g IntaresWTh e author has declared thatno competing nterests exist.DOI: 10.137l/journal.prned.0020124

    August 2 5 I Volume2 I lssue 8 I e124

  • 8/13/2019 DPV Wiss False Research Findings

    2/6

    Table 1.Research Findinqs and True Relationships Same question, claims a statisticallyResearch True Relationsh lpFindina Yes Na Totales . 6 1 ) RI ( R+ l ) d R + ) c(R+a- BR)I(R+ 1

    No cRl(R 1) c(1 - )l(R 1) c(1 R)l(R+ 1 )Total d V (R + 1 ) d V l + l )

    more likely true than false if (1 )R> a. ince usually the vast majority ofinvestigators depend on a = 0.05, thismeans that a research finding is morelikely true than false if 1 )R > 0.05.

    What is less well appreciated isthat bias and the extent of repeatedindependent testing by different teamsof investigators around the globe rnayfurther distort this picture and rnaylead to even smaller probabilities of theresearch findings being indeed true.We will t y to model these two factors inthe context of similar 2 X 2 tables.

    iasFirst, let us define bias as thecombination of various design, data,analysis, and presentation factors thattend to produce research findingswhen they should not be produced.Let be the proportion of probedanalyses that would not have beenresearch findings, but nevertheless

    end up presented and reported assuch, because of bias. Bias should notbe confused with chance variabilitythat causes some findings to be false bychance even though the study design,data, analysis, and presentation areperfect. Bias can entail manipulationin the analysis or reporting of findings.Selective or distorted reporting is atypical form of such bias. We rnayassume that U does not depend onwhether a true relationship existsor not. This is not an unreasonableassumption, since typically it isimpossible to know which relationshipsare indeed true. In the presence of bias(Table 2 , one gets W = ([I - ]R$R)/(R+a-R+ U - ua R),andV decreases with increasing u unless

    1 , i.e., 1 10.05 for mostsituations. Thus, with increasing bias,the chances that a research finding

    are lost in noise [12], or investigatorsuse data inefficiently or fail to noticestatistically significant relationships, orthere rnay be conflicts of interest thattend to bury significant findings [13].There is no good large-scale empiricalevidence on how frequently suchreverse bias rnay occur across diverseresearch fields. However, it is probablyfair to say that reverse bias is not ascommon. Moreover measurementerrors and inefficient use of data areprobably becoming less frequentproblems, since measurement error hasdecreased with technological advancesin the molecular era and investigatorsare becoming increasingly sophisticatedabout their data. Regardless, reversebias rnay be modeled in the Same way asbias above. Also reverse bias should notbe confused with chance variability thatrnay lead to missing a true relationshipbecause of chance.Testing bySeveral IndependentTeams~ ~ ~ ~Several independent teams rnay beaddressing the Same Sets of researchquestions. s research efforts areglobalized, it is practically the rulethat several research teams, oftendozens of them, rnay probe the Sameor similar questions. Unfortunately, insome areas, the prevailing mentalityuntil now has been to focus onisolated discoveries by single teamsand interpret research experimentsin isolation. An increasing numberof questions have at least one studyclaiming a research finding, andthis receives unilateral attention.The probability that at least onestudy, among several done on the

    significant research finding is easy toestimate. For n independent studies ofequal power, the 2 ;2 table is shown inTable 3: V = R(l - 3/(R 1 [ l -aIn Rn) (not considering bias). Withincreasing number of independentstudies, V tends to decrease, unless1 < a , i.e., typically 1- < 0.05.This is shown for different levels ofpower and for different pre-study oddsin Figure 2. For n studies of differentpower, the term is replaced by theproduct of the terms i for i= 1to nbut inferences are similar.CorollariesA practical example is shown in Box1 Based on the above considerations,one rnay deduce several interestingcorollaries about the probability that aresearch finding is indeed true.

    Corollary : The smaller the studiesconducted in a scientific field the lessiikeiy the research hdings are to betrue. Small sample size means smallerpower and, for all functions above,the V for a true research findingdecreases as power decreases towards1 = 0.05. Thus, other factors beingequal, research findings are more likelytrue in scientific fields that undertakelarge studies, such as randomizedcontrolled trials in cardiology (severalthousand subjects randomized) [14]than in scientific fields with smallstudies, such as most research ofmolecular predictors (sarnple sizes 100-fold smaller) [15].

    Coroiiary 2: The smaiier the effectsizes in a scientific field the less iikeiythe researchfindings are tobe true.Power is also related to the effectsize. Thus research findings are morelikely true in scientific fields with largeeffects, such as the impact of smokingon Cancer or cardiovascular disease(relative risks 3-20), than in scientificfields where postulated effects aresmall, such as genetic risk factors formultigenetic diseases (relative risks1.1-1.5) [7]. Modem epidemiology sincreasingly obliged to target smaller

    Table 2. Research Findings andTrue Relationships in the Presence of Biasis true diminish considerably. This is Research True Relationshipshown for different levels of power and Finding Yes No Totalfor different pre-study odds in Figure 1.

    Conversely, true research findings Yes ( d l - B l R tu M IE R + l ) . d t u d l - a ) 1 ( ~ + 1 ) c(R+a-R+u-ua+uaR)/(R+t)No (1 - )cRI(R+ 1 1 - C I R 1 c(1 - )(1- R)/(R 1)rnay occasionally be annulled because Total cRNI 1 . .> c1@+1) . . . C ,< .

  • 8/13/2019 DPV Wiss False Research Findings

    3/6

    effect sizes [16]. Consequently, theproportion of true research findingsis expected to decrease. In the sameline of thinking, if the true effect sizesare very small in a scientific field,this field is likely to be plagued byalmost ubiquitous false positive claims.For example, if the majority of truegenetic or nutritional determinants ofcomplex diseases confer relative risksless &an 1.05, genetic or nutritionalepidemiology would be largely utopianendeavors.Coroliary 3: The greater the numberand the lesser the selection of testedrelationships m a scientic field, theless likely the researchfindingsare tobe true. s shown above, the post-studyprobability that a finding is true (Pm)depends a lot on the pre-study oddsR). Thus, research findings are morelikely true in confirmatory designs,

    such as large phase 111randomizedcontrolled trials, or meta-analysesthereof, than in hypothesis-generatingexpenments. Fields considered highlyinformative and creative given thewealth of the assembled and testedinformation, such as microarrays andother high-throughput discove~y-onented research [4,8,17], should haveextremely low PW.

    Corollary 4: The greater theflexibility in designs, definitions,outcomes, andm c a imodes ina scientinc field, the less likely theresearchfindmgsare to be true.Flexibility increases the potential fortransforrning what would be negativeresults into positiven results, i.e., bias,U For several research designs, e.g.,randomized controlled trials [18-201or meta-analyses [21,22], there havebeen efforts to standardize theirconduct and reporting. Adherence tocommon standards is likely to increasethe proportion of true findings. Thesame applies to outcomes. Truefindings rnay be more commonwhen outcomes are unequivocal anduniversally agreed (e.g., death) ratherthan when multifarious outcomes aredevised (e.g., scales for schizophrenia

    outcomes) [23]. Similarly, fields thatuse commonly agreed, stereotypedanalytical methods (e.g., Kaplan-Meier plots and the log-rank test)[24] rnay yield a larger proportionof true findings than fields whereanalytical methods are still underexpenmentation (e.g., artificialintelligente methods) and only bestresults are reported. Regardless, evenin the most stringent research designs,bias seems to be a major problem.For example, there is strong evidencethat selective outcome reporting,with manipulation of theoutcohesand analyses reported, s a commonproblem even for randomized trails[25]. Simply abolishing selectivepublication would not make thisproblem go away.

    Coroliary : The greater the nanciaiand other mterests and prejudicesina scientinc field, the less lik lythe researchfindingsare o be true.Conflicts of interest and prejudice rnayincrease bias, Conflicts of interestare very common in biomedicalresearch [26], and typically they areinadequately and sparsely reported[26,27]. Prejudice rnay not necessarilyhave financial roots. Scientists in agiven field rnay be prejudiced purelybecause of their belief in a scientifictheory or commitment to their ownfindings. Many otherwise seeminglyindependent, university-based studiesrnay be conducted for no other reasonthan to give physicians and researchersqualifications for promotion or tenure.Such nonfinancial conflicts rnay alsolead to distorted reported resuits andinterpretations. Prestigious investigatorsrnay suppress via the peer review processthe appearance and dissemination offindinh that refute their findings, thuscondemning their field to perpetuatefalse dogma, Empirical evidenceon expert opinion shows that it sextremely unreliable [28].Coroliary 6: The hotter ascientic field withmore scientificteams invohred), the less likely theresearchfindingsare to be true.

    Table3. Research Findings and Tnie Relationships in the Presence of Multiple StudiesResearch True ReiationshipFinding Yes No Total

    ~ i a 1 ~ 7 i / l a m i p d o o 2 o i ~ ~ . 0 ~

    PLoS Medicine miwplosmedicine.org

    o . 0 5 110.20 110.50 4 aDOI: 10.1371/ ournal.prned.0020124.g001Figure 1. PPV (ProbabilityThat a ResearchFinding IsTrue) as a Function of the Pre-StudyOdds for Various Levels of Bias,UPanels correspond to powerof 0.20 0.50and 0.80.This seemingly paradoxical corollaryfollows because, as stated above, thePPV of isolated findings decreaseswhen many teams of investigatorsare involved in the same field. Thisrnay explain why we occasionally seemajor excitement followed rapidlyby severe disappointments in fieldsthat draw wide attention. With manyteams working on the same field andwith massive experimental data beingproduced, timing is of the essencein beating competition. Thus, eachteam rnay pnontize on pursuing anddisseminating its most impressivepositive results. Negativenresults rnay

    become attractive for disseminationonly if some other team has founda positive association on the samequestion. In that case, it rnay beattractive to refute a claim made insome prestigious ournal. The termProteus phenomenon has been coinedto descnbe this phenomenon of rapidly

    August 2005 Volume I lssue 8 e124

  • 8/13/2019 DPV Wiss False Research Findings

    4/6

    Box 1 An Example:Sciencea t Low Pre StudyOddsLet us assumethat a team of

    investigatorsperforms a whole genomeassociationstudy to test whetherany of 100,000gene polymorphismsare associatedwith susceptibilitytoschizophrenia.Based on what weknow about the extent of heritabilityof the disease,it is reasonable toexpea that probaMy around tengene poiymorphisrnsamong thosetested would be truly associatedwithschizophrenia,with relativeiysimilarodds ratiosaround 1.3for the ten or sopolymorphismsand with a fairly similarpower to identifyany of them.ThenR 10/100,000= 10 4,and he prestudyprobability for any polymorphismto beassociatedwith schizophrenia is alsoR / R 1 = 10 '.Let us alsosuppose thatthe study has6 94power to find anassociationwith an odds ratio of 1.3at=0.05.Thenitcan be estimatedthatifa statisticallysignificantassociationis

    found with thep-value barely crossing the0.05 threshold,the post-studyprobabilitythat this is true increases about 12-foldcompared with the pre-studyprobability,but it is still only 12X 10 .

    Now let us suppose that theinvestigatorsrnanipulate their desigi

    analyses,and reporting so as to makemore relationshipsCross the p = 0.05threshold even though this would nothave been crossed with a perfealyadhered to designand analysisand withperfect comprehensive reportingof theresults,strictly accordingto the originalstudy plan. Such manipulationcould bedone,forexample,with serendipitousinclusionor exclusionof certain patientsor controls,posthoc subgrwp analyses,inwstigationof geneticcontraststhatwere not originally specified,changesin the diseaseor controldefinitions,and variouscombinationsof selectiveor distorted reporting of the results.Commerciallyavailableata mining"packages actually are proud of theirabilityto yield statisticallysignificantresuitsthrough data dredging.In thepresence of blas with u = 0.10,the post-study probability that a research findingis true is only 4.4 X 10 4 Furthermore,even in the absence of any bias, whenten independent research teams performsimilarexperiments around the world,one of them findsa formally statisticallysignificant association, he probabilitythat the research finding is true isonly1.5X 10 , ardly any higher than theprobability we had before any of thisextensive researchwas undertaken

    Figura2. PW(ProbabilityThata ResearchFinding IsTrue)as a Function of the Pre-StudyOddsforVariousNumbersof ConductedStudies,Panels correspondto power of 0.20,050,and 0.80.alternating extreme research claimsand extremelyopposite refutations[29] Empirical &dence suggeststhatthis sequence of extreme opposites isvery CO-&non n m~lecuiar~~enetics1291These coroiiariesconsider eachfactor separately,but these factors ofteninfluence each other. For example,investigatorsworking in fieldswheretrue effect sizes are perceived to besmall rnay be more likely to performlarge studies than investigators workingin fieldswhere true effectsizes areperceived tobe large. Or prejudicernay prevail in a hot scientificfield,further undermining the predictivevalue of its research findings.Highlyprejudiced stakeholders rnay evencreate a barrier that aborts efforts atobtaining and disseminating opposingresults. Conversely, the fact that a field@. PLoSMedicine www.plosmedicine.org

    s hot or has strong invested interestsrnay sometimes promote larger studiesand improved standardsof research,enhancing the predictivevalue of itsresearch findings.Or massive discovery-oriented testing may result in such alargeyield of significantrelationshipsthat investigatorshave enough toreport and search further and thusrefrain from data dredging andmanipulation.Most ResearchFindingsAre Falsefor MostResearchDesignsandforMostFieldsIn the described framework,a Vexceeding 50% is quite difficult toget. Table provides the resultsof simulationsusing the formulasdeveloped for the influence of power,ratio of true to non-true relationships,and bias, for various types of situationsthat rnay be characteristicof specificstudy designsand settings.A findingfrom a wellconducted, adequatelypowered randomized controlled t i a lstarting with a 50% pre-study chancethat the intemention is effective is

    eventuaily true about 85% of the time.A fairly similar performance is expectedof a confirmatory meta-analysis ofgoodquality randomized trials:potential bias probably increases, butpower and pre-test chances are highercompared to a single randomized trial.Conversely, a meta-analytic findingfrom inconclusive studies wherepooling is used to "correct" the lowpower of single studies, is probablyfalse if R 1:3. Research findings fromunderpowered, early-phase clinicaltrials would be true about one in fourtimes, or even less frequently if biasis present ~~idern iolog icaltudies ofan exploratory nature perform evenworse, especially when underpowered,but even well-powered epidemiologicalstudies rnay have only a one infive chance being true. if R = 1 10.Finally, in discoveqwriented researchwith massive testing, where testedrelationships exceed true ones 1,00&fold (e.g., 30,000 genes tested, of which30 rnay be the true culprits) [30,31],

    V or each claimed relationship isextremely low, even with considerable

    August 2005 Volume2 I lssue8 Ie124

  • 8/13/2019 DPV Wiss False Research Findings

    5/6

    standardiiation of laboratory andstatistical methods, outcomes, andreporting thereof to minimize bias.Claimed Research FindingsMay Often Be Simply AccurateMeasures of the Prevailing Biass shown, the majority of modern

    biomedical research is operating inareas with very low pre- and post-study probability for true findings.Let ussuppose that in a research fieldthere are-no true findings at all to bediicovered. History of science teachesus hat scientific endeavor has oftenin the past wasted effort in fields withabsolutely no yield of true scientificinformation, at least based on ourcurrent understanding. in such a nullfield, one would ideally expect allobserved effect sizes toW b y hancearound the null in the absence of bias.The extent that observed findingsdeviate from what is expected bychance alone would besimply a puremeasure of the prevailing bias.

    For example,letus suipose thatno nutrients or dietary Patterns areactually important determinants forthe risk of developing a specific tumor.Let us also suppose that the scientificliterature has examined 60 nutrientsand claims ail of them to be related tothe nsk of developing this tumor withrelative risks in the range of 1.2 to 1.4for the comparison of the upper to

    lower intake tertiles. Then the claimedeffect sizes are simply measuringnothing else but the net bias that hasbeen involved in the generation ofthis scientific literature. Claimed effectsizes are in fact the most accurateestimates of the net bias. It even followsthat between null fields, the fieldsthat claim stronger effects (often withaccompanying claims of medical orpublic health importance) are simplythose that have sustained the worstbiases.

    For fields with very low PPV he fewtrue relationships would not distortthis overall picture much. Even i afew relationships are tme, the shapeof the distribution of the obs e~ e deffects would still yield a clear measureof the biases involved in the field. Thisconcept totally reverses the way weview scientific results. Traditionally,investigators have viewed largeand highly significant effects withexcitement, as signs of importantdiscoveries. Too large and too highlysignificant effects rnay actually be morelikely to be signs of large bias in mostfields of modern research. They shouldlead investigators to careful criticalthinking about what might have gonewrong with their data, analyses, andresults.

    Of Course, investigators working inany field are likely to resist acceptingthat the whole field in which they have

    Table4. PPVof Research Findings forvarious Combinations of Power 1 P ,Ratioo f T ~ eo Not-True Relationships (R),and Bias U)1 PracticalExample PPV

    0.95 2:l 030 Co nfi mto ry meta-analysis of good- 0.85

    Underpowered,but well-performed 0.23

    080 1:lO 030 Adequately po we r4 exploratory 020widemioloaical studv

    2 :1 000 0.80 Discovery-oriented xploratory 0.0010research with massive testina

    .@,: PLoS Medicine www plosmedicine org

    spent their careers is a null field.However, other lines of evidence,or advances in technology andexperimentation, rnay lead eventuallyto the dismantling of a scientific field.Obtaining measures of the net biasin one field rnay also be useful forobtaining insight into what might bethe range of bias operating in otherfields where similar analytical methods,technologies, and conflicts rnay beoperating.How Can We lmprovethe Situation?1s it unavoidable that most researchfindings are false, or can we improvethe situation? major problem is thatit is impossible to know with 100%certainty what the truth is in anyresearch question. In this regard, thepure goldn tandard is unattainable.However, there are several approachesto improve the post-study probability.

    Better powered evidence. e.g., largestudies or low-bias meta-analyses,rnay help, as it Comes closer to theunknown gold standard. However,large studies rnay still have biasesand these should be acknowledgedand avoided. Moreover, large-scaleevidence is impossible to obtain for allof the millions and trillions of researchquestions posed in current research.Large-scale evidence should betargeted for research questions wherethe pre-study probability is alreadyconsiderably high, so that a significantresearch finding will lead to a post-testprobability that would be consideredquite definitive. Large-scale evidence isalso particularly indicated when it cantest major concepts rather than narrow,specific questions. negative findingcan then refute not only a specificproposed claim, but a whole field orconsiderable portion thereof. Selectingthe performance of large-scale studiesbased on narrow-minded cntena,such as the marketing promotion of aspecific dmg, is largely wasted research.Moreover, one should be cautiousthat extremely large studies rnay bemore likely to find a formally statisticalsignificant difference for a trivial effectthat is not really meaningfully diierentfrom the null [32 341.

    Second, most research questionsare addressed by many teams, andit is misleading to emphasize thestatistically significant findings ofany single team. What matters is the

    August 2005 Volume 2 lssue e124

  • 8/13/2019 DPV Wiss False Research Findings

    6/6

    totality of the evidente. Diminishing many relationships are expected to be 19. Ioannidis JP, Evans SJ, Gotzsche PC, O'Neil lbias through enhanced research true among those probed across the RT, Altman DG, et al. (2004) Better reportingof harms in randomized uials: An extensionstandards and curtailing of prejudices relevant research fields and research of the CONSORTstatement. Ann Intern Med- -rnay also help. However, this rnayrequire a change in scientific mentalitythat might be difficult to achieve.In some research designs, effortsrnay also be more successful withupfront registration of studies, e.g.,randomized tnals [35].Registrationwould Pose a challenge for hypothesis-generating research. Some kind ofregistration or networking of datacollections or investigators within fieldsrnay be more feasible than registrationof each and every hypothesis-generating experiment. Regardless,even if we do no t See a great deal ofProgress with registration of studiesin other fields, the pnnciples ofdeveloping and adhe ringto a protocolcould be more widely borrowed fromrandomized controlled trials.

    Finally, instead of chasing statisticalsignificance,we should improve ourunderstanding of the range of Rvalues-the pre-study odds-whereresearch efforts operate [10] Beforerunning an expenment, investigatorsshould consider what they believe thechances are that they are testing a truerather than a non-true relationship.Speculated high R values rnaysometimes then be ascertained. Asdescnbed above, whenever ethicallyacceptable, large studies with minimalbias should be berformed on researchfindings that are considered relativelyestablished, to See how often they areindeed confirmed. I suspect severalestablished classics will fail the test

    Nevertheless, most new discoverieswill continue to stem from hypothesis-generating research with low or verylow pre-study odds. We should thenacknowledge that statistical significancetesting in the report of a single studygives only a partial picture, withoutknowing how much testing has beendone outside the report and in therelevant field at large. Despite a largestatistical literature for multiple testingcorrections [37] sually it is impossibleto decipher how much data dredgingby the reporting authors or otherresearch teams has preceded a reportedresearch finding. Even if determiningthis were feasible, this would notinform us about the pre-study odds.Thus, it is unavoidable that one shouldmake approximate assumptions on how

    :@,: PLoS Medicine I www plormedicine org

    designs. The wider field rnay yield someguidance for estimating this probabilityfor the isolated research project.Experiences from biases detected inother neighboring fields would also beuseful to draw upon. Even though theseassumptions would be considerablysubjective, they would still be veryuseful in interpreting research claimsand putting them in context.References1. IoannidiiJP aidich AB, Lau J (2001) Anycasualties in the clash of randomised andobservational evidence? BMJ 322: 879-880.2. Lawlor DA, Davey Smith G, Kundu D,Bmckdorfer KR, Ebrahim S (2004) Thoseconfounded vitamins: What can we learn fromthe differences between observational versus

    randomised uial evidence Lancet 363: 1724-1727.3. Vandenbroucke JP (2004) When areobservational studies as credible as randomiseduials? Lancet 363: 172tL1731.4. Michiels S, Koscielny S. Hill C (2005)Prediction of cancer outcome with microarrays:A multiple random Validation suategy. Lancet365: 488492 .5. Ioannidis JPA, Nmn i EE, Trikalinos TA,Contopoul~IoannidisDG (2001) Replicationvalidity of genetic association studies. NatGenet 29: 30630 9.6. Colhoun HM, McKeigue PM. Davey SmithG (2003) Problems of reponing geneticassociations with complex outcomes. Lancet361: 865-872.7. Ioannidi i JP (2003) Cenetic associations: Falseor uue? Trends Mol Med 9: 135-138.8. IoannidisJPA (2005) Microarrays andmolecular research: Noise discovery? Lancet365: 454455 .9. Sterne JA, Davey Smith G (2001) Sit in g theevidence-What's wrong with signif icance tests.BMJ 322: 226-231.10. Wacholder S, Chanock S, Garcia-Closas M, EIghormli L, Rothman N (2004) Assessing theprobability that a positive repor t is false: Anapproach for molecular epidemiology studies.JNad Cancer Inst 96: 434-442.11. Risch NJ (2000) Searching for geneticdeterminants in the new millennium. Nature405: 847-856.12. KelseyJL, Whittemore AS, Evans AS,ThompsonWD 1996) Methods inobservational epidemiology, 2nd ed. NewYork:Oxford U Press. 432 P.13. Topol EJ (2004) Failing the public health-Rofecoxib, Merck, and the FDA. N Engl J Med351: 1707-1709.14. Yusuf S, Collins R, Peto R (1984) Why do weneed some large, simple randomized tr ials? StatMed 3: 409-422.15. Alunan DG, Royston P (2000) What do wemean by validating a prognostic model? StatMed 19: 453-473.16. Taubes G (1995) Epidemiology faces i o limits.Science 269: 164-169.17. Golub TR, Slonim DK, Tamayo P, HuardC, Gaasenbeek M, er al. (1999) Molecularclassification of cancer: Class discoveryand dass prediction by gene expressionmonitoring. Science 286: 531-537.18. Moher D, SchulzKF Alunan DG (2001)The CONSORT Statement Revisedrecommendations for improving the qualityof reports of parallel-group randomised uials.Lancet 357: 1191-1 194.

    141: 781-788.20. International Conference on HarmonisationE9 Expert Working Group (1999) ICHHarmonised Tripartite Guideline. Statisticalprinciples for clinical uials. Stat Med 18: 1905-1942.21. Moher D, Cook DJ, Eastwood S, Olkin I,Rennie D, et al. (1999) Improving the qualityof reports of meta-analyses of randomisedcontrolled trials: The QUOROM starement.Quality of Reporting of ~eta-ana lyses.Lancet354: 18961900.22. Stroup DF, Berlin JA, Monon SC. Olkin I,Williamson GD, er al. (2000) Meta-analysisof observational studies in epidemiology:A proposal for reponing. Meta-analysisof Observational Studies in Epidemiology(MOOSE) group. JAMA 283: 2008-2012.23. Marshall M, Lockwood A, Bradley C,Adams C, Joy C, er al. (2000) Unpublishedratingscales: A major source of bias inrandomised controlled uials of treaunents forschizophrenia . BrJ Psychiatry 176: 249-252.24. Alunan DG, Goodman SN (1994) Transferof technology from statisticaljoumals to thebiomedical literature. Past trends and futurepredictions. JAMA 272: 129-132.25. Chan AW, Hrobjartsson A, Haahr MT,Gotzsche PC, Alunan DG (2004) Empiricalevidence for selective reporting of outcomes inrandomized uials: Comparison of protocols topublished anicles. JAMA 291: 2457-2465.26. Krimsky S, Rothenberg LS tott P, Kyle G(1998) Scientificjournals and their authors'financial interests: A pilot study. PsychotherPsychosom 67: 194-201.27. Papanikolaou GN, Baltogianni MS,Contopoulos-Ioannidis DG, Haidich AB,Giannakakii IA et al. (2001) Reporting ofconflicu of interest in guidelines of preventiveand therapeutic interventions. BMC Med ResMethodol 1: 3.28. Anunan EM, Lau J, Kupelnick B, Mosteller FChalmers TC (1992) A comparison of resuluof meta-analyses of randomized control trialsand recommendations of clinical exp ew .Treatments for myocardial infarction. JAMA268: 240-248.29. loannidi sJP, Trikalinos TA (2005) Earlyextrem e conuadictory estimates mayappear in published research: The Proteusphenomenon in molecular genetics researchand randomized uials. J Clin Epidemiol 58:543-549.30. Ntzani EE, Ioannid isJP (2003) Predictiveability of DNA microanays for cancer outcomesand correlates: An empirical assessment.Lancet 362: 1439-1444.31. Ransohoff DF (2004) Rules of evidencefor cancer molecular-marker discovery andvalidation. Nat Rev Cancer 4: 309-314.32. Lindley DV (1957) A statistical paradox.Biometrika 44: 187-192.33. Barrlett MS (1957) A comment on D.V.Lindley's statist ical paradox. Biomeuika 44:533-534.34. Senn SJ (2001) Two chee n for P-values.JEpidemiol Biostat 6: 193-204.35. De Angelis C. DrazenJM rizelle FA, Haug C,Hoey J. e t al. (2004) Clinical uial regisuation:A statement from the International &mmitteeof MedicalJournal Editors. N Engl J Med 351:1250-1251136. Ioannidii JPA (2005) Conuadicted andinitially suonge r effects in highly cired clinicalresearch. JAMA 294: 218-228.37. Hsueh HM, ChenJJ, Kodell RL (2003)Comparison of methods for estimating thenumber of uu e null hypotheses in multiplicitytesting. J Biopharm Stat 13: 675-689.

    August 2 5 Volume 2 I lssue 8 e124