Download - An Empirical Study of Bias in Randomized Controlled Trials ... · PDF fileAn Empirical Study of Bias in . Randomized Controlled Trials and . ... contains 37 individual sources of bias

An Empirical Study of Bias in

Randomized Controlled Trials and

Non-randomized Studies of Surgical Interventions

by

Lakhbir Sandhu

A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy in Clinical Epidemiology

Institute of Health Policy, Management & Evaluation University of Toronto

© Copyright by Lakhbir Sandhu, 2013

ii

An Empirical Study of Bias in

Randomized Controlled Trials and

Non-randomized Studies of Surgical Interventions

Lakhbir Sandhu

Doctor of Philosophy

Institute of Health Policy, Management & Evaluation University of Toronto

2013

Abstract

Objectives: The aim of this dissertation was to examine bias in randomized controlled trials

(RCTs) and non-randomized studies (NRS) in surgery using the literature evaluating

laparoscopy and conventional (i.e. open) surgery for the treatment of colon cancer as a case

study. The objectives were 1) to develop a conceptual framework for bias in comparative

NRS; 2) to compare effect estimates from NRS with those from RCTs at low risk of bias;

3) to explore the impact of NRS-design attributes on estimates of treatment effect.

Methods: The methods included a modified framework synthesis, systematic review of the

literature, random-effects meta-analyses, and frequentist and Bayesian meta-regression. The

Cochrane Risk of Bias Tool was used to classify trials as Strong RCTs (i.e. low risk of bias)

or Typical RCTs (i.e. unclear or high risk of bias).

iii

Results: A conceptual framework for bias in comparative NRS was developed and it

contains 37 individual sources of bias or “items”. These items were organized within 6

overarching “domains”: selection bias, information bias, performance bias, detection bias,

attrition bias, and selective reporting bias. Our analyses revealed that NRS were associated

with more extreme estimates of benefit for laparoscopy than Strong RCTs when examining

subjective outcomes. The odds ratios from NRS were 36% smaller (i.e. demonstrating more

benefit for laparoscopy) than those from Strong RCTs for the outcome post-operative

complications (Ratio of Odds Ratios, ROR 0.64, [0.42, 0.97], p=0.04). Similar exaggerated

benefit was seen among NRS when assessing length of stay, (Difference in Mean

Differences, -2.15 days, [-4.08, -0.21], p=0.03). This pattern was not observed with the

objective outcomes peri-operative mortality and number of lymph nodes harvested. Analyses

adjusted for period effects and between-study case-mix yielded similar findings. Finally,

effect estimates in NRS did not consistently vary according to the presence or absence of

nine design characteristics identified from the conceptual framework.

Conclusions: We have demonstrated that the results of surgical NRS can be significantly

biased as compared with those of low risk of bias RCTs when evaluating subjective

outcomes. However, none of the nine NRS-design characteristics examined was consistently

associated with biased effect estimates.

iv

Acknowledgments

I would like to express my gratitude to members of my thesis committee for their insight,

support, and guidance throughout this endeavor. Drs. Erin Kennedy and Nancy Baxter, your

feedback has been invaluable. Dr. George Tomlinson, you have spent countless hours

teaching me the finer points of quantitative analysis and simultaneously shared your

enthusiasm for science and discovery. My research supervisor, Dr. David Urbach, has

provided especially unwavering support and encouragement — this dissertation work would

not have been possible without your expertise.

I am thankful to the Surgeon Scientist Training Program, the Division of General Surgery

and the Department of Surgery for supporting my dissertation work and career. In particular,

I would like to thank Drs. Najma Ahmed, Andrew Smith, Zane Cohen, Lorne Rotstein and

James Rutka for their direction.

Thank you to my colleagues and friends who have supported me every step of the way;

Dr. Barbara Haas, Dr. Robert Bisken, Dr. Claire Trottier, Aron Klein, Dr. Lorie Kloda,

Dr. Anna Shawyer, Dr. Anna Bendzak, Dr. Marvin Hsiao, Dr. Charles de Mestral,

Dr. Boris Zevin, Dr. Anusha Jegatheeswaran, Dr. Sapna Rawal, Marina Englesakis and

Dr. Anna Gagliardi. A very special thank you to Suna Girn for her encourgament. Finally,

thank you to my family for their love and encouragement.

v

Funding

This research would not have been possible without the generous support of the Surgeon

Scientist Training Program, the Department of Surgery, Division of General Surgery and

Faculty of Medicine at the University of Toronto. This work was also supported by the

Post-MD Fellowship Program of the National Cancer Institute of Canada/Canadian Cancer

Society (Grant # 20019) and the Johnson & Johnson Medical Products/Surgeon Scientist

Training Program Fellowship.

I would also like to express my sincere gratitude to Val Cabral and Nancy Condo for their

guidance with identifying funding opportunities.

vi

Table of Contents

Acknowledgments .......................................................................................................................... iv Funding ............................................................................................................................................ v Table of Contents ............................................................................................................................ vi List of Abbreviations ...................................................................................................................... ix List of Tables ................................................................................................................................... x List of Figures .............................................................................................................................. xiii List of Appendices ........................................................................................................................ xiv Thesis Overview ........................................................................................................................... xv Chapter 1 Literature Review ........................................................................................................ 1

1.1 The hierarchy of study design.............................................................................................. 1 1.1.1 The limitations of RCTs .......................................................................................... 2

1.2 The infrequency of surgical trials ........................................................................................ 3 1.3 Barriers to the conduct of surgical trials .............................................................................. 4

1.3.1 Issues with patient and physician accrual ................................................................ 4 1.3.2 Funding surgical research ........................................................................................ 5

1.4 Challenges in surgical trials ................................................................................................. 6 1.4.1 Blinding in surgical RCTs ....................................................................................... 6 1.4.2 Standardizing surgical technique ............................................................................. 7

1.5 The Balliol Collaboration .................................................................................................... 8 1.6 Do NRS and RCTs yield comparable results? ..................................................................... 9

1.6.1 Empirical comparisons of effect estimates from NRS and RCTs ......................... 11 1.7 Study characteristics of RCTs associated with bias .......................................................... 14

1.7.1 The methodological shortcomings of surgical RCTs ............................................ 22 1.8 Study characteristics of NRS associated with bias ............................................................ 23 1.9 Summary of gaps in knowledge ........................................................................................ 24 1.10 Dissertation rationale ......................................................................................................... 25 1.11 Research aims .................................................................................................................... 26

Chapter 2 Laparoscopic colon surgery – an opportunity to study bias ................................. 27 Chapter 3 Development of a conceptual framework for bias in non-randomized

studies: results of a modified framework synthesis ............................................................. 31 3.1 Summary ............................................................................................................................ 31 3.2 Introduction........................................................................................................................ 33 3.3 Methods ............................................................................................................................. 35

3.3.1 Search strategy ....................................................................................................... 36 3.3.2 Data collection ....................................................................................................... 37 3.3.3 Analytic approach .................................................................................................. 37 3.3.4 Framework refinement........................................................................................... 38

3.4 Results ............................................................................................................................... 38 3.4.1 Included studies ..................................................................................................... 38 3.4.2 Conceptual framework........................................................................................... 41

vii

3.4.3 Excluded items....................................................................................................... 50 3.5 Discussion .......................................................................................................................... 52 3.6 Conclusion ......................................................................................................................... 55

Chapter 4 Common Methods for Chapters 5 & 6 .................................................................... 56 4.1 Overview............................................................................................................................ 56 4.2 Literature search ................................................................................................................ 57 4.3 Data abstraction and management ..................................................................................... 58 4.4 Categorizing studies as RCTs or NRS ............................................................................... 60 4.5 Outcome selection and definition ...................................................................................... 61

4.5.1 Subjective versus objective outcomes ................................................................... 62 4.5.2 Summary effect measures ...................................................................................... 64

4.6 Handling multiple publications of the same cohort ........................................................... 67 4.7 Approach to missing data for continuous outcomes .......................................................... 67 4.8 Identifying a referent group – Strong RCTs ...................................................................... 70

4.8.1 Why categorize RCTs as Typical versus Strong?.................................................. 70 4.8.2 Cochrane Risk of Bias Tool ................................................................................... 71 4.8.3 Validating risk of bias assessments ....................................................................... 74

4.9 Statistical analyses ............................................................................................................. 75 4.10 Results 75

4.10.1 Data cohort............................................................................................................. 75 4.10.2 Strong RCTs .......................................................................................................... 86

4.11 Risk of bias assessment summary...................................................................................... 90 Chapter 5 Comparing effect estimates from non-randomized studies and

randomized controlled trials .................................................................................................. 91 5.1 Summary ............................................................................................................................ 91 5.2 Introduction........................................................................................................................ 93 5.3 Methods ............................................................................................................................. 94

5.3.1 Statistical analyses ................................................................................................. 94 5.4 Results ............................................................................................................................. 103

5.4.1 Included studies ................................................................................................... 103 5.4.2 Binary outcomes .................................................................................................. 104 5.4.3 Continuous outcomes .......................................................................................... 110

5.5 Discussion ........................................................................................................................ 121 5.6 Conclusion ....................................................................................................................... 127

Chapter 6 Empirically identifying the study attributes of non-randomized studies associated with bias: a meta-epidemiology study ............................................................... 128 6.1 Summary .......................................................................................................................... 128 6.2 Introduction...................................................................................................................... 130 6.3 Methods ........................................................................................................................... 131

6.3.1 Included studies ................................................................................................... 131 6.3.2 NRS study characteristics .................................................................................... 131 6.3.3 Statistical analyses ............................................................................................... 134

6.4 Results ............................................................................................................................. 137 6.4.1 Included studies ................................................................................................... 137 6.4.2 Subjective outcomes ............................................................................................ 138

viii

6.4.3 Objective outcomes ............................................................................................. 150 6.5 Discussion ........................................................................................................................ 163 6.6 Conclusion ....................................................................................................................... 165

Chapter 7 General Discussion and Future Directions............................................................ 166 7.1 Summary of findings ....................................................................................................... 166 7.2 Implications ..................................................................................................................... 168

7.2.1 Implications for the meta-analysis of surgical RCTs .......................................... 169 7.2.2 Implications for the interpretation of surgical NRS ............................................ 169 7.2.3 Implications for future meta-epidemiological studies of NRS study

characteristics. ..................................................................................................... 170 7.3 Limitations ....................................................................................................................... 171

7.3.1 Limitations of available data ............................................................................... 171 7.3.2 Limitations of data analysis ................................................................................. 172 7.3.3 Limitations of generalizability ............................................................................. 174

7.4 Future Directions ............................................................................................................. 174 7.4.1 Outcome Reporting in NRS and RCTs ................................................................ 174 7.4.2 Investigating the relationship between reporting and actual RCT quality .......... 175 7.4.3 Ongoing evaluations of NRS study characteristics ............................................. 176

7.5 Conclusions ..................................................................................................................... 176 References.................................................................................................................................... 178 Appendix A .................................................................................................................................. 195 Appendix B .................................................................................................................................. 196 Appendix C .................................................................................................................................. 203 Appendix D .................................................................................................................................. 208 Appendix E .................................................................................................................................. 218 Appendix F .................................................................................................................................. 232 Appendix G .................................................................................................................................. 237

ix

List of Abbreviations

ARR – absolute risk reduction

DMD – difference in mean differences

DVT – deep vein thrombosis

IQR – interquartile range

LAP – laparoscopy

LOS – length of stay

MD – mean difference

NRS – non-randomized studies

OPEN – open (i.e. conventional) surgery

OR – odds ratio

RCTs – randomized controlled trials

RoM – ratio of means

ROR – ratio of odds ratios

RD – risk difference

RR – relative risk

SID – study identification number

SMD – standardized mean difference

x

List of Tables

Table1.1 Oxford Centre for Evidence-based Medicine – Levels of Evidence for studies of therapy/prevention/ aetiology/harm (2009)............................................................................................................................................2

Table 1.2 Results of meta-analyses of NRS and RCTs appearing in Concato et al………………………………...11

Table 1.3 Meta-epidemiological studies of RCT-study attributes.......................................................................16

Table 1.4 Meta-analyses of meta-epidemiological studies..................................................................................21

Table 3.1 Definitions of key constructs…………………………………………………………………………,34

Table 3.2 Characteristics of included studies………………………………………………………………...…,39

Table 3.3 Bias domains extracted from systematic reviews of quality assessment tools for NRS……….……..40

Table 3.4 Bias domains in the conceptual framework…………………………………………………….…,….41

Table 3.5 Frequency of included items…………………………………………………………………………,.43

Table 3.6 Items abstracted from reviews but not related to bias………………………………………………,..51

Table 4.1 Definitions for abstracted variables…………………………………………………………………,..59

Table 4.2 Cochrane Risk of Bias Tool…………………………………………………………………………,..71

Table 4.3 Approach for summary assessments of risk of bias for an item, within a study and within a meta-analysis………………………………………………………………………………………………………...,...73

Table 4.4 Characteristics of included studies………………………………………………………………...,…77

Table 4.5 Non-randomized studies meeting inclusion criteria………………………………………………,….78

Table 4.6 Randomized controlled trials meeting inclusion criteria……………………………………………,..84

Table 4.7 Summary of risk of bias item responses for RCTs reporting post-operative complications……...….86

Table 4.8 Summary of risk of bias item responses for RCTs reporting peri-operative mortality……..………..87

Table 4.9 Summary of risk of bias item responses for RCTs reporting peri-operative mortality……………....88

Table 4.10 Summary of risk of bias item responses for RCTs reporting number of lymph nodes harvested…..89

Table 5.1 Characteristics of included studies…………..……………………………………..………………...105

Table 5.2 Random-effects meta-analysis results for studies reporting post-operative complications……..…...106

Table 5.3 Results of Strong Randomized Controlled Trials……………………….………………………...….107

Table 5.4 Meta-regression results comparing effect estimates for post-operative complications from different study designs…………………………………………………………………………………………………,…108

Table 5.5 Random-effects meta-analysis results for studies reporting peri-operative mortality………,………109

Table 5.6 Meta-regression results comparing effect estimates for peri-operative mortality from different study designs……………………………………………………………………………………………………...,,…..110

Table 5.7 Random-effects meta-analysis results for studies reporting length of stay (days)…………………,,.112

Table 5.8 Meta-regression results comparing effect estimates for length of stay from different study designs………………………………………………………………………………………………………,,….113

xi

Table 5.9 Random-effects meta-analysis results for studies reporting number of lymph nodes harvested…,…114

Table 5.10 Meta-regression results comparing effect estimates for number of lymph nodes harvested from different study designs……………………………………………………………………………….………….115

Table 5.11 Median year of publication and baseline event rates in studies reporting the outcomes of interest……………………………………………………………………………………………………….….116

Table 5.12 Bayesian meta-regression results comparing effect estimates for post-operative complications from different study designs, adjusted for year of publication and baseline event rate……………………………...120

Table 5.13 Bayesian meta-regression results comparing effect estimates for peri-operative mortality from different study designs, adjusted for year of publication and baseline event rate……………………………...120

Table 5.14 Bayesian meta-regression results comparing effect estimates for length of stay from different study designs, adjusted for year of publication and baseline event rate……………………………………………...121

Table 5.15 Bayesian meta-regression results comparing effect estimates for number of lymph nodes harvested from different study designs, adjusted for year of publication and baseline event rate………………………..121

Table 6.1 NRS study characteristics – definitions and relationship to the conceptual framework for bias in NRS………………………………………………………………………………………………………….…133

Table 6.2 Measures of inter-rater agreement…………………………………………………………………..135

Table 6.3 Characteristics of included studies………………………………………………………………….138

Table 6.4 Distribution of study attributes among NRS reporting post-operative complications (n=79)..........139

Table 6.5 Study characteristics patterns across NRS reporting post-operative complications (n=79 studies)..141

Table 6.6 Random-effects meta-analyses results among NRS reporting post-operative complications (n=79)...............................................................................................................................................................142

Table 6.7 Univariable meta-regression results among NRS reporting post-operative complications.............142

Table 6.8 Random-effects meta-analysis results across outcomes of interest and different study designs......143

Table 6.9 Univariable meta-regression results comparing NRS with or without study characteristics with Strong RCTs.................................................................................................................................................................144

Table 6.10 Distribution of study attributes among NRS reporting length of stay (n=106).............................145

Table 6.11 Study characteristics patterns across NRS reporting length of stay (n=106 studies)…………….147

Table 6.12 Random-effects meta-analysis results among NRS reporting length of stay (n=106)..................148

Table 6.13 Univariable meta-regression results among NRS reporting length of stay...................................148

Table 6.14 Univariable meta-regression results comparing NRS with or without study characteristics with Strong RCTs.....................................................................................................................................................150

Table 6.15 Distribution of study attributes among NRS reporting peri-operative mortality (n=79)...............151

Table 6.16 Study characteristics patterns across NRS reporting peri-operative mortality (n=79 studies)..….152

Table 6.17 Random-effects meta-analysis results among NRS reporting peri-operative mortality (n=79).....153

Table 6.18 Univariable meta-regression results among NRS reporting peri-operative mortality...................153

Table 6.19 Univariable meta-regression results comparing NRS with or without study characteristics with Strong RCTs.....................................................................................................................................................155

Table 6.20 Distribution of study attributes among NRS reporting number of lymph nodes harvested (n=59)...............................................................................................................................................................156

xii

Table 6.21 Study characteristics patterns across NRS reporting number of lymph nodes harvested (n=59 studies)…………………………………………………………………………………………………………157

Table 6.22 Random-effects meta-analysis results among NRS reporting number of lymph nodes (n=59)....158

Table 6.23 Univariable meta-regression results among NRS reporting number of lymph nodes harvested...158

Table 6.24 Univariable meta-regression results comparing NRS with or without study characteristics with Strong RCTs....................................................................................................................................................159

xiii

List of Figures

Figure 1.1. Results from Wood et al…………………………………………………………………………….20

Figure 3.1 Flow diagram of included studies………………………………………………………………..….39

Figure 3.2 Conceptual framework for bias in non-randomized studies…………………………………………41

Figure 4.1 Database Structure…………………………………………………………………………………...60

Figure 4.2 Flow diagram for the identification of eligible studies......................................................................76

Figure 5.1 Relationship between observed data, true study effects and the common treatment effect in fixed and random-effects meta-analysis...............................................................................................................................96

Figure 5.2 Relationship between the overall true effect (µ), the true effect in a given study (θ) and the observed effect (Yi)…………………………………………………………………………………………………………98

Figure 5.3 Forest plot of meta-analysis results for studies reporting post-operative complications……………………………………………………………………………………………..…….108

Figure 5.4 Forest plot of ratios of odds ratios (ROR) from meta-regression analysis comparing study designs………………………………………………………………………………………………………….109

Figure 5.5 Forest plot of meta-analysis results for studies reporting peri-operative mortality………………..110

Figure 5.6 Forest plot of ratios of odds ratios (ROR) from meta-regression analysis comparing study designs………………………………………………………………………………………………………….111

Figure 5.7 Forest plot of meta-analysis results for studies reporting length of stay…………………………..112

Figure 5.8 Forest plot of difference in mean differences (DMD) from meta-regression analysis comparing study designs……………………………………………………………………………………………………….....113

Figure 5.9 Forest plot of meta-analysis results for studies reporting number of lymph nodes harvested……..114

Figure 5.10 Forest plot of difference in mean differences (DMD) from meta-regression analysis comparing study designs………………………………………………………………………………………………..…..115

Figure 5.11 Baseline event rates over time…………………………………………………………………….117

Figure 5.12 Baseline event rates over time……………………………………………………………….……118

Figure 5.13 Funnel plots………………………………………………………………………………….……123

Figure 6.1 Forest plot of meta-analysis results, stratified according to the presence or absence of specific NRS study characteristics for the outcome post-operative complications………………………………………...…143

Figure 6.2 Forest plot of meta-analysis results, stratified according to the presence or absence of specific NRS study characteristics for the outcome length of stay……………………………………………………………150

Figure 6.3 Forest plot of meta-analysis results, stratified according to the presence or absence of specific NRS study characteristics for the outcome peri-operative mortality…………………………………………...……156

Figure 6.4 Forest plot of meta-analysis results, stratified according to the presence or absence of specific NRS study characteristics for the outcome number of lymph nodes harvested………………………...……………160

xiv

List of Appendices

Appendix A – Literature Search Strategy for the Development of a Conceptual Framework of Bias in Non-Randomised Studies

Appendix B – Literature Search Strategy for the Identification of Comparative Studies Evaluating Laparoscopy Versus Conventional Surgery For Colon Cancer

Appendix C – The Cochrane Risk of Bias Tool

Appendix D – Studies of Laparoscopy versus Conventional Surgery for Colon Cancer Meeting a priori Exclusion Criteria

Appendix E – Comparative Studies of Laparoscopy versus Conventional Surgery for Colon Cancer Meeting a priori Inclusion Criteria

Appendix F – Bayesian Models

Appendix G –Bayesian Meta-Analysis Results

xv

Thesis Overview

Chapter 1. Literature Review

The merits of randomized controlled trials and non-randomized studies (NRS) are reviewed.

The challenges of conducting surgical randomized controlled trials (RCTs) are discussed.

The literature comparing effect estimates from NRS and RCTs is described. The limitations

of these comparisons are explored. Since significant strides have been made in identifying

the attributes of study design associated with bias among RCTs, I propose that similar studies

are required for NRS.

Chapter 2. Laparoscopic colon surgery – an opportunity to study bias

The proposed case study of bias required the abundance of both NRS and RCTs evaluating a

single surgical intervention. Studies examining the surgical treatment of colon cancer met

these criteria. In this Chapter, I review the history of laparoscopic colon surgery and explain

how case-reports of port site metastases led to controversy in the surgical community. This in

turn led to numerous high-quality, multi-national, publicly funded RCTs in the area. There

are few surgical interventions that have been as thoroughly studied. The literature in this area

was used to conduct the study of bias in this dissertation.

Chapter 3. A conceptual framework for bias in non-randomized studies:

results from a modified framework synthesis

As there is no comprehensive framework for bias in NRS, I conducted a modified framework

synthesis to develop one. Sources of bias were extracted from systematic reviews of quality

xvi

assessments tools for NRS. These sources of bias were analyzed thematically and organized

into a framework for bias in comparative, NRS.

Chapter 4. Common Methods for Chapters 5 & 6

A common data set is used for the analyses in Chapters 5 and 6. Chapter 4 outlines the

literature search strategy, inclusion/exclusion criteria and approaches to data abstraction and

outcome selection. The Cochrane Risk of Bias Tool was employed to categorize RCTs as

“Strong” (i.e. at low risk of bias) versus “Typical” (i.e. unclear and high risk of bias). The

strengths and weaknesses of this approach are described.

Chapter 5. Comparing effect estimates from non-randomized studies

and randomized controlled trials

Combined effect estimates from NRS were compared with those from i) all RCTs, ii) Typical

RCTs and iii) Strong RCTs evaluating laparoscopic and open colon surgery. The impact of

period effects and between-study case-mix were explored using Bayesian meta-regression

methods.

Chapter 6. Empirically identifying the study attributes of non-randomized

studies associated with bias: a meta-epidemiology study

Using the conceptual framework developed in Chapter 3, a meta-epidemiology study was

conducted to examine the relationship between NRS-design characteristics and effect

estimates. Effect estimates were compared across NRS with and without specific design

characteristics. These estimates were in turn compared with the results of Strong RCTs.

xvii

Chapter 7. General Discussion and Future Directions

In this chapter, I summarize the main findings of this dissertation. The methodological

strengths and limitations of this work are described. The implications of this dissertation are

described and opportunities for future research are explored.

1

Chapter 1 Literature Review

1.1 The hierarchy of study design

The evidence from randomized controlled trials (RCTs) is the gold-standard against which

all other study designs are compared because RCTs are considered inherently less biased

than non-randomized studies (NRS) (Sackett and Sackett 1991). The process of

randomization ensures that all patients have an equal probability of receiving the

intervention. In contrast, treatment decisions in NRS are seldom determined by a study

protocol. Instead, patients and their physicians weigh the pros and cons of receiving

treatment. Physicians recommend therapy based on a patient’s likelihood of success.

Numerous patient characteristics that sway treatment decisions may also influence outcome.

Such variables are referred to as confounders; a confounder is i) associated with the exposure

ii) is an independent determinant of outcome, and iii) is not an intermediate in the causal

pathway (Fletcher and Fletcher 2005). For example, consider a hypothetical NRS study

comparing treatment A with treatment B. More deaths are observed in the first group. If

older patients were more likely to get treatment A, did age confound the relationship between

the exposure and the outcome (death)? Randomization balances both known and unknown

confounders in RCTs. In NRS however, treatment assignment is not random and so NRS are

more prone to bias arising from confounding.

NRS are accordingly regarded as an inferior study design in all systems rating quality of

evidence (Hadorn et al. 1996; Evans 2003). The best known of these systems, the Oxford

Center for Evidence-based Medicine Evidence Hierarchy, was developed in 1998 and later

updated in 2009 (Oxford Centre for Evidence-Based Medicine 2009). In this tool, systematic

2

reviews of RCTs (level 1a) appear at the top of the hierarchy (Table 1.1). At the other end of

the spectrum, level 5 evidence is based on expert opinion, studies of physiology or “first

principles.”

Table 1.1 Oxford Centre for Evidence-based Medicine – Levels of Evidence for studies of therapy/prevention/ aetiology/harm (2009).

Level Evidence 1 Systematic Review (with homogeneity§) of RCTs

1b Individual RCT (with narrow confidence interval)

1c All or none case-series†

2a Systematic review (with homogeneity§) of cohort studies

2b Individual cohort study (including low quality RCT; e.g., <80% follow-up

2c “Outcomes” Research; Ecological studies

3a Systematic review (with homogeneity§) of case-control studies

3b Individual case-control Study

4 Case-series (and poor-quality cohort and case-control studies)

5 Expert opinion without explicit critical appraisal, or based on physiology, bench research or “first principles”

§A review that is free of heterogeneity in the directions and degrees of results between individual studies. † Met when all patients died before the treatment became available, but some now survive on it; or when some patients died before the treatment became available, but none now die on it. Adapted from www.cebm.net

Whereas evidence hierarchies traditionally focus on study design, strength of evidence

systems (e.g. the GRADE guidelines) also incorporate other considerations such as the

quantity of evidence, the consistency of results and precision (Owens et al. 2010). These

systems also regard high-quality RCTs as the best source of evidence for the evaluation of

interventions (Atkins et al. 2004; Guyatt et al. 2008).

1.1.1 The limitations of RCTs

The aim of any RCT is to generate an estimate of treatment effect that is both accurate and

precise. However, there are a number of limitations of this study design. First, not all clinical

http://www.cebm.net/

3

questions can be investigated via RCTs. This is especially true when studying exposure to

harms such as smoking. RCTs also yield limited information about adverse events –

generally, sample sizes are too small and follow-up too short to detect important adverse

outcomes (Ernst and Pittler 2001). Most importantly, the stringent inclusion criteria of RCTs

can often lead to results with limited external validity. Van Spall et al. found that among

RCTs published in high impact-factor journals between 1994 and 2006, common medical

conditions formed the grounds for exclusion in 81.3% of trials (Van Spall et al. 2007). Sex

and age formed the basis for exclusion in 72.1% and 39.2% of RCTs, respectively. Konrat

and colleagues have also demonstrated that patients aged 65 years and older are poorly

represented in RCTs of drugs they are likely to receive (Konrat et al. 2012). Additional

studies in cardiology (Gurwitz, Col, and Avorn 1992; Lee et al. 2001; Masoudi et al. 2003)

and oncology (Hutchins et al. 1999; Lewis et al. 2003) have demonstrated similar exclusions

of the elderly. While such exclusions often augment internal validity, this occurs at the

expense of generalizability. In contrast, NRS often have less strict inclusion criteria.

Interventions in these studies are also delivered in a variety of “real-word” settings.

Accordingly, NRS may produce more pragmatic estimates of treatment effect.

1.2 The infrequency of surgical trials

RCTs are under-represented in the surgical literature. Solomon et al. have shown that among

articles published in three leading surgical journals in 1990, 7% reported the outcomes of

RCTs. Moreover, only 17% of studies were comparative, underscoring how case-series

dominate the literature landscape in surgery (Solomon and McLeod 1993). According to

more recent assessments, the frequency of RCTs and other comparative studies has remained

unchanged (Wente et al. 2003; Chang, Matsen, and Simpkins 2006; Panesar et al. 2006).

Some may argue that surgical RCTs are more likely to appear in higher impact-factor,

general medical journals. A more broad survey of articles indexed in MEDLINE (1966-

2000) demonstrated that 15% of published RCTs were surgical (Wente et al. 2003).

4

Moreover, surgical trials often evaluated medical therapies in surgical patients as opposed to

head-to-head comparisons of surgical technique; 55.9% of RCTs investigated peri-operative

analgesia, antibiotics or neo-/adjuvant chemotherapy (Wente et al. 2003). Why are RCTs

involving surgical interventions so rare?

1.3 Barriers to the conduct of surgical trials

1.3.1 Issues with patient and physician accrual

In a cohort study of RCTs funded by the UK Medical Research Council and the Health

Technology Assessment Programme between 1994 and 2002, only 31% of trials achieved

their original accrual target (McDonald et al. 2006). In a systematic review of studies

examining barriers to participation, clinicians cited time constraints, insufficient training,

lack of research personnel, loss of autonomy, worry about patients and the impact on the

doctor-patient relationship as obstacles (Ross et al. 1999). Difficulty with the consent

procedure was also a hurdle for physicians, as was the lack of rewards or recognition for

recruiting patients. Patients cited the uncertainty associated with treatment as a prominent

reason for not enrolling. Other patient barriers included the demands associated with

participation (e.g. additional procedures/ appointments, travel and cost) and preference for a

particular treatment.

Strong beliefs about treatment options can also impede the recruitment of surgeons (Mills et

al. 2003; Campbell et al. 2010). For example, in a study by Harrison et al., five different

treatments for locally advanced rectal cancer were presented to patients, colorectal surgeons,

and medical and radiation oncologists. The treatment options included pre-operative

radiotherapy, post-operative radiotherapy, chemotherapy, combined chemo-radiotherapy or

surgery (i.e. an abdominal perineal resection). Participants were then asked about their

willingness to enter a RCT evaluating the five treatments. Whereas 31% of patients would

5

enter a trial of pre or post-radiotherapy, only 19% would agree to participate in the surgical

RCT. Even fewer surgeons would allow their patients to be involved in a surgical trial (16%)

but radiation and medical oncologists were the most enthusiastic (23 and 31%, respectively).

Some argue that perhaps the surgical community does not hold RCTs in high regard.

However, it has been shown that surgeons and other consultant physicians have equally

positive attitudes towards RCTs (McCulloch et al. 2005). In the same study, surgeons were

found to be more intolerant of uncertainty. This discomfort with uncertainty might be the

reason why some surgeons decline to participate in trials.

1.3.2 Funding surgical research

Whereas RCTs in internal medicine are often industry-sponsored, an analogous funding

infrastructure is lacking in surgery. Drug companies must produce phase I-III trial data

before regulatory agencies will approve new medications (McLeod 1999). These private

entities thus use their vast resources, in collaboration with internists, to bring their product to

market. In contrast, surgical techniques do not require regulatory approval (Cook 2009). In

the absence of industry funding, surgeons must rely on operating grants to fund trials. These

funding opportunities however, are far more readily captured by departments of medicine

(Jackson et al. 2004). In a review of National Institute of Health funding between 1992 and

1999, funding increased to medical departments seven times more quickly as compared with

departments of surgery (21.2% per year or $73 million US per year for medicine versus 3.1%

or $5.8 million US per year for surgery). The relative lack of both private and public funding

places surgery at a distinct disadvantage.

6

1.4 Challenges in surgical trials

Investigators conducting RCTs in surgery must also overcome a number of challenges

related to blinding and intervention delivery. Some mistake the challenges outlined below for

barriers; whereas challenges add to the complexity of conducting trials, barriers instead make

it unlikely that RCTs will take place at all (Garas et al. 2012).

1.4.1 Blinding in surgical RCTs

The best trials employ blinding of participants, clinicians, data collectors, outcome

adjudicators and data analysts (Karanicolas, Farrokhyar, and Bhandari 2010). In

pharmacological trials, a placebo resembling active treatment in appearance, taste and

consistency can be delivered to achieve effective blinding. Blinding is unquestionably more

difficult in trials of surgical interventions. Boutron et al. examined 110 RCTs evaluating

pharmacological and non-pharmacological interventions in patients with hip or knee

osteoarthritis (Boutron et al. 2004). They examined these trials not for the occurrence but for

the feasibility of blinding. They found that blinding patients, providers and outcome

assessors could be achieved in 96%, 96% and 98% of pharmacological trials, respectively. In

comparison, only 12% of patients and 34% of health care providers could be blinded in non-

pharmacological RCTs. The blinding of outcome assessors was equally infeasible (42%).

When comparing surgery with non-surgical interventions, blinding can be achieved with

sham surgery; surgeons make the same incisions for both groups of patients but while those

in the active group receive the intervention, those in the control group do not. The first

instances of sham surgery appeared in RCTs of internal mammary artery ligation in the

1950s (Cobb et al. 1959; Dimond, Kittle, and Crockett 1960). Sham surgery raised ethical

concerns because making non-therapeutic incisions was seen as contravening the principle of

"do no harm" (Wolf and Buckwalter 2006). Not surprisingly, trials making use of sham

7

surgery remain rare (Freed et al. 2001; Moseley et al. 2002; Swank et al. 2003; McRae et al.

2004; Kallmes et al. 2009; Shikora et al. 2009). Notably, all RCTs employing sham surgery

failed to show benefit associated with the active treatment. Therefore, one should not

underestimate the importance of the placebo effect when comparing surgery with non-

surgical interventions.

1.4.2 Standardizing surgical technique

Standardizing interventions is also a necessary step in any RCT. In pharmacological trials,

the dose and timing of a drug can be protocolized so that patients will receive the

intervention in the same way. However, it is more challenging to ensure that complex

interventions, like surgery, are delivered in a uniform manner (Meakins 2002). The surgical

encounter is a multifaceted process involving numerous steps that collectively make up the

surgical intervention. Patients will receive medications in the pre-operative area, undergo a

multi-step surgical procedure and afterwards, be cared for in the post-operative area and the

clinical ward. At what point does the surgical “intervention” begin and end? Some argue that

it encompasses only what occurs in the operating room. Others would broaden this definition

to include the care provided immediately before and after surgery. Moreover, how does a

surgeon’s skill influence study outcomes? Boutron et al. evaluated RCTs evaluating either

pharmacological and non-pharmacological interventions for knee osteoarthritis and found

that the care provider’s skill level could influence treatment effects in 84% of non-

pharmacological RCTs vs 23.3% of pharmacological trials (Boutron et al. 2003). Achieving

standardization in a surgical RCT therefore requires specifying how surgery should be

performed and how patients should be cared for in the peri-operative period.

When a new surgical technique emerges, a learning curve is often observed. For example,

Yamamoto et al. examined surgeons who had differing levels of experience with performing

left-sided, laparoscopic colon surgery (Yamamoto et al. 2013). They found that surgeons

8

achieved proficiency once 30 procedures had been completed. Up to this point, operations

lasted longer and were associated with more blood loss. Surgeons who had performed 30

procedures had patients resume a solid diet earlier and return home sooner. Therefore, effect

estimates obtained in an RCT of laparoscopic surgery might be influenced by whether the

trial takes place early on in the learning curve or later on. This consideration applies to all

surgical RCTs (Farrokhyar et al. 2010). Learning curves have also been demonstrated with

inguinal hernia repair (Neumayer et al. 2005); hernias recur more frequently when surgery is

performed by inexperienced surgeons. In addition to individual learning curves, many

surgical procedures are associated with better outcomes when performed in high-volume

centers (Urbach and Baxter 2004). Moreover, surgical techniques continue to evolve over

time and so period effects can be prominent in surgical RCTs (Barkun et al. 2009; Lassen,

Hvarphiye, and Myrmel 2012).

Therefore, designing rigorous surgical trial requires i) standardizing operative procedures

and peri-operative care, ii) recruiting surgeons who have achieved proficiency with the

operation in question and iii) involving centers that meet certain volume thresholds. These

hurdles are not insurmountable but do add to the complexity of conducting RCTs in surgery.

1.5 The Balliol Collaboration

Between 2007 and 2009, a group of surgeons and methodologists took part in three

conferences on the topic of surgical innovation and evaluation at Oxford University. This

international group of renowned experts named themselves the Balliol Collaboration. Their

primary goal was to draft a special series of articles for The Lancet that would describe the

relationship between innovation and clinical practice in surgery. The first article in the series

focused on the process of innovation and assessment of novel surgical interventions (Barkun

et al. 2009). The second article described the challenges faced by those designing RCTs and

NRS evaluating surgical interventions (Ergina et al. 2009). In the third article, a paradigm

9

was proposed that outlines the “timely and appropriate assessment of surgical innovation

along its different stages” (McCulloch et al. 2009). This paradigm, the IDEAL

recommendations, divides the stages of surgical innovation into 5 phases: 1) idea;

2a) development; 2b) exploration; 3) assessment and 4) long-term study. These stages

progress from the proof of concept phase of an innovation through to surveillance once it has

been widely accepted. Authors suggest that “research database” NRS in conjunction with

explanatory or feasibility RCTs are the study designs of choice for Stage 2b evaluation. This

recommendation however is predicated on the assumption that the results of surgical NRS

are generally valid and perhaps even comparable to the results of RCTs. However, evidence

in support of this position is currently lacking.

1.6 Do NRS and RCTs yield comparable results?

As a result of all the challenges and barriers outlined in Sections 1.3 and 1.4, the surgical

community has relied heavily on evidence from NRS and is likely to continue to do so. Does

relying on NRS however, lead to misleading conclusions? This question has been raised by

the medical community on a number of occasions - especially when the results of NRS have

been later contradicted by the findings of RCTs. For example, consider the controversy

surrounding the use of hormone-replacement therapy (HRT) among post-menopausal

women. Two large NRS suggested that HRT could reduce the risk of risk of coronary heart

disease (Grodstein et al. 1996; Varas-Lorenzo et al. 2000). HRT was also associated with

fewer fractures in post-menopausal women. However, the Women’s Health Initiative trial

later demonstrated that HRT may instead increase the risk of cardiac events (Harrison et al.

2007). This RCT was the first large-scale, placebo-controlled study of HRT. The results of

the study were so alarming that the trial was stopped three years early. There was strong

reaction to these findings by patients and health care practioners. The use of HRT

subsequently declined rapidly (Krieger et al. 2005).

10

Discordant results between NRS and RCTs have also been encountered in investigations of

activated protein C (Baillie 2007; Marti-Carvajal et al. 2012) and pulmonary-artery

catheterization (National Heart Lung Blood Institute Acute Respiratory Distress Syndrome

Clinical Trials Network et al. 2006; Frazier and Skinner 2008). Comparable controversy has

arisen in surgery as well. For example, consider the literature evaluating arthroscopic knee

interventions. Many patients with osteoarthritis of the knee would undergo surgery when

medical therapy failed to control their pain. Surgery involved making small incisions

(<1 cm) around the knee and using a camera to visualize the movement of specialized

instruments in the joint space. Then surgeons would lavage or “wash” the joint space with

10 liters of fluid to remove loose debris and degenerated joint fragments. Some patients also

received debridement which entails shaving joint cartilage (i.e. chrondroplasty) and trimming

and smoothing the tissue that cushions the knee (i.e. meniscus). Multiple NRS demonstrated

pain relief with lavage and debridement (Baumgaertner et al. 1990; Gross et al. 1991;

McLaren et al. 1991) and another showed it to be superior to medical therapy (Livesley et al.

1991). However, in a RCT by Mosley et al., authors reached a remarkably different

conclusion (Moseley et al. 2002). Patients were randomized to receive arthroscopic lavage,

arthroscopic debridement or a sham procedure. The efforts taken by investigators to blind

patients in the sham arm are noteworthy; patients had three 1 cm incisions made around the

knee, surgeons called out for instruments and saline was splashed to simulate the sounds of

lavage. Whereas surgeons were aware of the treatment assignment, patients and nurses

providing post-operative care were not. This study followed patients for 2 years and showed

no benefit with active surgery at all time points evaluated. Without a sham surgery arm, the

study would have failed to control for the placebo effect.

These examples underscore the importance of generating reliable and valid evidence.

Without the emergence of RCTs, the results of earlier NRS would not have been cast in

doubt. Empirical comparisons of NRS and RCTs are necessary however, to determine

whether the aforementioned discrepancies are outliers or truly representative of the average

relationship between NRS and RCTs.

11

1.6.1 Empirical comparisons of effect estimates from NRS

and RCTs

In 2000, the New England Journal of Medicine published a sentinel article by Concato et al.

that questioned the superiority of RCTs (Concato, Shah, and Horwitz 2000). Authors found

meta-analyses of RCTs or NRS published between 1991 and 1995 and 99 articles addressing

the following five clinical topics; i) Bacille Calmette–Guérin vaccine and active tuberculosis,

ii) screening mammography and mortality from breast cancer iii) cholesterol levels and death

due to trauma among men, iv) treatment of hypertension and stroke among men, and v)

treatment of hypertension and coronary heart disease among men. Summary estimates for

NRS and RCTs were separately generated using random-effects meta-analysis. A consistent

trend was observed; the combined point estimates and 95% CI (confidence intervals) for

each study design were remarkably similar (Table 1.2).

Table 1.2 Results of meta-analyses of NRS and RCTs appearing in Concato et al.

Clinical Topic Studies Odds Ratio (95% CI) Bacille Calmette–Guérin vaccine and tuberculosis

13 RCTs 10 Case–control

0.49 (0.34–0.70) 0.50 (0.39–0.65)

Mammography and mortality from breast cancer

8 RCTs 4 Case–control

0.79 (0.71–0.88) 0.61 (0.49–0.77)

Cholesterol levels and death due to trauma

6 RCTs 14 Cohort

1.42 (0.94–2.15) 1.40 (1.14–1.66)

Treatment of hypertension and stroke

14 RCTs 7 Cohort

0.58 (0.50–0.67) 0.62 (0.60–0.65)

Treatment of hypertension and coronary heart disease

14 RCTs 9 Cohort

0.86 (0.78–0.96) 0.77 (0.75–0.80)

The authors concluded that the results of high-quality NRS are generally similar to those of

RCTs. The qualification of “high-quality” is especially important because the results of this

study may only apply to a subset of all RCTs and NRS. First, meta-analyses were identified

from among the highest ranking journals in clinical medicine (Annals of Internal Medicine,

the British Medical Journal, the Journal of American Medical Association, the Lancet, and

12

the New England Journal of Medicine). Articles that appear in these journals may be of a

different caliber than the vast majority of RCTs and NRS indexed in Medline. Second,

Concato et al. did not seek out all of the NRS or RCTs published for a specific clinical topic

but instead, relied on the inclusion/exclusion criteria employed by the authors of the meta-

analyses. Conducting a meta-analysis involves selecting articles based on certain criteria.

This process may lead to the inclusion of primary articles of higher quality. The conclusions

in this study were also drawn from comparisons involving few RCTs and NRS and none

examining surgical technique. For these reasons, Concato’s findings may not apply to

comparisons of NRS and RCTs in surgery.

Another study comparing effect estimates from NRS and RCTs (Benson and Hartz 2000).

Benson and colleagues examined 83 RCTs and 53 NRS spanning 19 topics. They examined a

diverse set of outcomes including mortality, stroke, infection, residual stones, pregnancy,

percent change in lumbar bone density, recurrent otitis and so on. They found that combined

effect estimates for NRS were very similar to those for RCTs for all topics. Authors

cautioned however that “there were insufficient data to exclude the possibility of clinically

important differences between the results of the two types of study.”

While these studies compared NRS with RCTs for mostly non-surgical interventions, Shikata

et al. focused exclusively on the field of surgery (Shikata et al. 2006). Meta-analyses of

RCTs in digestive surgery were identified from searches of PubMed (1996 to April 2004),

EMBASE (1986 to April 2004) and the Cochrane Database of Systematic Reviews (Issue 2,

2004). Thereafter, data for NRS were identified from published meta-analyses or if none

were available, the authors conducted their own meta-analysis. A total of 276 original

articles (96 RCTs and 180 NRS) were selected for inclusion in this study. These articles

spanned 18 surgical topics and a variety of outcomes including mortality, morbidity, wound

infection, etc... Shikata and colleagues performed fixed-effect and random-effect meta-

analyses for each topic. The combined effect estimates for RCTs were compared to those

from NRS using Z scores. They found significant discrepancies between RCTs and NRS for

13

4 of 16 primary outcomes. Heterogeneity was also more frequent in meta-analyses of NRS as

compared with meta-analyses of RCTs.

The study by Shikata et al. stirred debate in the surgical community over the comparability

of NRS and RCTs. One of the strengths of the study is the number of surgical topics

assessed. However, because authors relied on meta-analyses of RCTs and NRS in a manner

similar to Concato et al., selection bias might influence the findings. Moreover, the largest

comparison involved 5 RCTs and 25 NRS, but the average number of studies in any

comparison was 4 RCTs and 6 NRS. Again, it appears that a subset of NRS was compared to

a selected group of RCTs.

While Concato et al. and Benson et al. found NRS and RCTs to be comparable, Shikata and

colleagues found important differences in 25% of comparisons. Studies comparing a single

RCT with an individual NRS have also found a range of results from RCTs showing more

benefit (Reimold et al. 1992; Shapiro and Recht 1994; Nicolaides et al. 1994) to NRS

showing a larger benefit (1984; Jha et al. 1995; Pyorala, Huttunen, and Uhari 1995). Others

have instead found that NRS and RCTs reached opposite conclusions (Antman et al. 1985;

1994; Yamamoto et al. 1992). Two meta-analyses have combined these individual studies

and produced interesting findings. In the meta-analysis by Britton et al., authors compared

the results of RCTs with those of prospective NRS (Britton et al. 1998). Eighteen studies met

their inclusion criteria and seven found no significant difference between RCTs and NRS. In

another seven studies, effect estimates were in the same direction but significantly different

in magnitude. The results of NRS and RCTs reached opposite conclusions in four studies. A

more recent meta-analysis by Kunz et al. described the results of 15 studies comparing RCTs

and NRS for the same intervention (Kunz, Vist, and Oxman 2007). They identified 35

comparisons within these studies. In 22 of 35 comparisons, effect estimates were larger in

NRS. Kunz et al. found that control groups in NRS had a poorer prognosis than the controls

in RCTs. They hypothesized that differences in patient case-mix between studies may have

contributed to the observed differences between NRS and RCTs. In general, this literature

14

suggests that there are notable differences in effect estimates from NRS and RCTs and that

these differences may be influenced by factors other than study design.

One of the limitations of these meta-analyses is the inclusion of older RCTs, published in the

1980s and 1990s. The bias arising from inadequate random-sequence generation, allocation

concealment and the lack of double-blinding was demonstrated in studies published after

1998. RCTs with these methodological shortcomings are becoming less common (Wang et

al. 2011). Therefore, comparisons involving older studies have likely contrasted the results of

NRS with RCTs of varying methodological quality. Moreover, most comparisons have

focused on non-surgical interventions. None of the analyses have accounted for between-

study heterogeneity stemming from differences in case-mix; NRS and RCTs can vary in the

types of patients studied, with some enrolling older patients or those with more advanced

disease, and so results between NRS and RCTs might have differed for this reason.

Therefore, the comparability of NRS and RCTs in surgery remains unclear.

1.7 Study characteristics of RCTs associated with bias

In 1995, Schulz et al. presented the results of a study that markedly changed the way we

evaluate RCTs. They found empirical evidence of an association between inadequate trial

methodology and biased effect estimates (Schulz et al. 1995). Using meta-analyses indexed

in the Cochrane Pregnancy and Childbirth Database, Schulz et al. conducted an observational

study to determine the association between estimates of treatment effect and various study

attributes. This study included 250 trials from 33 meta-analyses covering a broad range of

interventions during pregnancy, preterm labor and delivery, induction of labor, labor and

delivery, cesarean delivery, puerperium and the early neonatal period. Authors did not focus

on a single outcome measure but instead included any binary outcome reported by all studies

within any one meta-analysis. Data was abstracted so that an odds ratio (OR) < 1 indicated

benefit. Schulz and colleagues chose to focus their analysis on four study attributes:

15

randomization sequence, allocation concealment, exclusions after randomization and double

blinding. The main study outcome was a ratio of odds ratios (ROR) for each attribute;

ROR =combined ORstudies 𝐰𝐢𝐭𝐡𝐨𝐮𝐭 characteristic

combined ORstudies 𝐰𝐢𝐭𝐡 characteristic

For example, the combined effect estimate from studies without adequate allocation

concealment was compared to the combined effect estimate from studies with adequate

concealment. A ROR < 1.0 indicates that trials that were methodologically inferior

(i.e. lacking the study attribute) had, on average, yielded larger estimates of treatment effect

as compared with the referent group (i.e. studies where the attribute is present). Conversely, a

ROR > 1.0 indicates that studies without the study characteristic have, on average, smaller

estimates of treatment effect compared with the referent group (i.e. methodologically

superior studies).

While authors did not find a statistically significant trend with appropriate random-sequence

generation, adequate allocation concealment appeared to protect against bias. Schulz et al.

divided studies into the following three categories: adequately concealed allocation,

inadequately concealed allocation and unclearly concealed allocation. Studies with

inadequate (n=21 trials) or unclear allocation concealment (n=150 trials) were in turn

compared with the referent group, studies with adequate allocation concealment (n=79).

Exaggerated estimates of benefit were found with both; inadequate allocation concealment

(ROR 0.59 [95% CI, 0.48 to 0.73]) was worse than having unclear allocation concealment

(ROR 0.67 [95% CI, 0.60 to 0.75]). Studies without double-blinding also appeared to be

susceptible to bias (ROR 0.83 [95% CI, 0.71 to 0.96]). Since 73% of RCTs in this study had

adequate double-blinding, Schulz’s results more readily apply to pharmacological trials then

surgical RCTs where double-blinding is often impossible. The study by Schulz is referred to

as a meta-epidemiological study; a study that uses meta-analyses to explore the impact of

study attributes on bias. Other meta-epidemiological studies have examined a variety of

RCT-design characteristics (Table 1.3).

16

Table 1.3 Meta-epidemiological studied of RCT study attributes.

Year Author Studies Attributes Outcome Findings 1995 Schulz et al.

(Schulz et al. 1995) 33 meta-analyses 250 RCTs

Random-sequence generation Allocation concealment Double-blinding Exclusions

Binary outcomes RCTs with inadequate or unclear allocation concealment showed more benefit than trials with adequate allocation concealment, ROR 0.59 [95% CI, 0.48 to 0.73] and ROR 0.67 [95% CI, 0.60 to 0.75], respectively. Trials without double-blinding also showed more benefit, ROR 0.83 [95% CI, 0.71 to 0.96]. No systematic bias detected in RCTs with inadequate random-sequence generation (p=0.58) or exclusions after randomization (p=0.01).

1998 Moher et al. (Moher et al. 1998)

11 meta-analyses 127 RCTs

Random-sequence generation Allocation concealment Double-blinding

Binary outcomes While no effect was seen with inadequate random-sequence generation or double-blinding, studies with inadequate concealment of allocation showed exaggerated benefit, ROR 0.63 [95% CI, 0.45-0.88].

2001 Kjaergard et al. (Kjaergard, Villumsen, and Gluud 2001)


Random-sequence generation Allocation concealment Blinding Description of dropouts and withdrawals

Binary outcomes In small RCTs (<1000 participants), more benefit was seen in trials with inadequate random-sequence generation, ROR 0.46 [95% CI, 0.25 to 0.83]; inadequate allocation concealment, ROR 0.49 [95% CI, 0.27 to 0.86]; and without double-blinding, ROR 0.52 [95% CI, 0.28 to 0.96]. A similar effect was not seen in trials that failed to describe dropouts or withdrawals (p=0.2).

17

When comparing trials with adequate random-sequence generation, there was no difference in estimates between smaller and larger RCTs (p>0.2). Similar results were obtained for RCTs with adequate allocation concealment (p>0.2), double-blinding (p>0.2) and follow-up (p>0.2).

2002 Balk et al. (Balk et al. 2002)


24 quality measures including random-sequence generation, allocation concealment, double-blinding, intention-to-treat analysis, power calculations, description of drop-outs, etc...

Binary outcomes (mortality in cardiovascular studies. For remaining studies, only outcomes with heterogeneous treatment effects).

Examined impact of study characteristics across 4 medical areas (cardiovascular disease, infectious disease, pediatrics and surgery). No consistent patterns of association across the medical areas.

2003 Als-Nielsen et al. (Als-Nielsen et al. 2003)


Funding

Binary outcomes In RCTs funded solely by for-profit organizations, conclusions were more likely to recommend experimental drugs than RCTs funded by non-profit organizations, OR 5.3 [95% CI, 2.0-14.4]. Bias was not detected in RCTs not specifying funding (p=0.10), or those with mixed non-profit and for-profit funding (p=0.09).

2005 Tierney et al. (Tierney and Stewart 2005)


Exclusions Survival Studies that included all patients were less likely to show benefit than RCTs with exclusions (p=0.03).

2007 Pildal et al. (Pildal et al. 2007)


Allocation concealment Double-blinding

Binary outcomes RCTs with unclear or inadequate allocation concealment showed more benefit, ROR 0.90 [95% CI, 0.81 to 1.01].

18

A similar trend was seen with the absence of double-blinding, ROR 0.94 [95% CI, 0.80 to 1.10].

2009 Nüesch et al. (Nuesch, Reichenbach, et al. 2009)


Allocation concealment Double-blinding

Continuous outcome Effect sizes tended to be more beneficial in RCTs with inadequate or unclear allocation concealment, standardized mean difference -0.15 [95% CI, -0.31 to 0.02]. The exaggerated effects seen with blinding disappeared after accounting for allocation concealment.

2009 Nüesch et al. (Nuesch, Trelle, et al. 2009)


Exclusions after randomization Continuous outcome Restricting meta-analyses to RCTs without exclusions resulted in smaller estimates of treatment effect, large p values and notable decreases in between-trial heterogeneity.

2010 Bassler et al. (Bassler et al. 2010)

545 RCTs Early stopping RCTs vs trial completion

Binary outcomes RCTs stopped early showed more benefit than matching non-truncated RCTs, ratio of relative risks 0.71 [95% CI, 0.65-0.77].

2010 Nüesch et al. (Nuesch et al. 2010)


Small RCTs (<100 patients per arm) vs Large RCTs

Continuous outcomes Small trials showed more benefit than large RCTs, standardized mean difference -0.21 [95% CI, -0.34 to -0.08]. Effect persistent when adjusting for concealment of allocation, blinding of patients and intention-to-treat analysis.

2011 Dechartres et al. (Dechartres et al. 2011)


Single centre vs multi-centre Objective binary outcomes (e.g. all-cause mortality or result of a biological test)

Effect estimates are larger in single-centre studies, ROR 0.73 [95% CI, 0.64 to 0.83]

2012 Bafeta et al. (Bafeta et al. 2012)


Single centre vs multi-centre Continuous outcomes Single-centre trials showed more benefit than multi-centre RCTs, standardized mean difference -0.17 [95% CI, -0.17 to -0.01].

19

2012 Hrobjartsson et al. (Hrobjartsson et al. 2012)

8 RCTs Blinded outcome assessment Binary outcomes Non-blinded assessors exaggerated treatment effects by 36%, ROR 0.64 [95% CI, 0.43 to 0.96].

2013 Hrobjartsson et al. (Hrobjartsson et al. 2013)

24 RCTs Blinded outcome assessment Continuous outcomes Non-blinded assessors exaggerated the pooled effect size by 68% [95% CI, 14% to 230%).

20

20

Some of these individual studies have been combined in meta-analyses – proverbial “meta-

meta-epidemiological studies.” One of the most impactful of these studies was conducted by

Wood et al. (Table 1.4). They too examined the influence of certain study characteristics on

bias but did so separately for objective and subjective outcomes (Wood et al. 2008). They

classified all-cause mortality and standardized laboratory procedure measures

(e.g. hemoglobin concentration) as objective outcomes. Subjective outcomes included patient

reported outcomes a physician-assessed disease outcomes (e.g. wound infection, pneumonia

and other complications). Fifty-three percent of the extracted outcomes were objective. The

remaining subjective outcomes were divided between patient-reported (11%), physician-

assessed (29%), a combination of both (0.7%), or patient withdrawal (6%). While bias was

detected with each of these study attributes for subjective outcomes (Figure 1.1), the same

could not be said for objective outcomes.

Figure 1.1 Results from Woods et al. Inadequate or unclear allocation concealment compared with adequate allocation concealment among objective outcomes (A) and subjective outcomes (B). Studies without blinding compared with studies with adequate blinding among objective outcomes (C) and subjective outcomes (D). ROR, ratio of odds ratios. A ROR<1.0 implies bias associated with the absence of a characteristic.

21

Table 1.4 Meta-analyses of meta-epidemiological studies.

Year Author Studies Attributes Outcomes 2008 Wood et al.

(Wood et al. 2008) 146 meta-analyses 1346 RCTs Included following individual studies: Schulz et al. (1995) Kjaergard et al. (2001) Egger et al.(2003)

Allocation Concealment Blinding (non-blinded vs any blinding)

Objective binary outcomes examined separately from subjective outcomes

2012 Savović et al. (Savovic et al. 2012)

234 meta-analyses 1973 RCTs Included following individual studies: Schulz et al. (1995) Kjaergard et al. (2001) Egger et al.(2003) Balk et al. (2002) Als-Nielsen et al. (2003) Contopoulos et al. (2005) Pildal et al. (2007)

Randomization sequence generation Allocation concealment Double-blinding

Objective binary outcomes examined separately from subjective outcomes

22

Similar results were found by Savović et al. (Table 1.4). This literature as a whole has

established that inappropriate random-sequence generation, inadequate allocation

concealment, the absence of double-blinding, stopping trials early, for-profit funding and not

describing exclusions all contribute to bias. This evidence has changed how we both conduct

and evaluate RCTs. Furthermore, this literature has provided the evidence for the

development of a risk of bias tool for RCTs endorsed by the Cochrane Collaboration.

1.7.1 The methodological shortcomings of surgical RCTs

As outlined above, a number of study characteristics have been associated with less biased

estimates of treatment effect for RCTs. Trials in surgery however, often lack some of these

characteristics or fail to report them. A survey of 364 surgical RCTs published between

1988 and 1994 demonstrated that only 27% provided a description of the randomization

technique (Hall et al. 1996). However, the results of this study may not apply to more recent

surgical trials because Hall et al. examined RCTs published before the emergence of the

Consolidated Standards of Reporting Trials (CONSORT) statement. This consensus

statement outlines what should be included in a published report of an RCT (Moher et al.

2001).While reporting quality had moderately improved (Gray et al. 2012), surgical RCTs

continue to have major methodological shortcomings (Walter et al. 2007). For example, it

has been shown that adequate randomization (30.4%), allocation concealment (35.3%), and

blinding of patients (29.2%), health care providers (45.8%) and outcome assessors (30.6%)

remains low (Als-Nielsen et al. 2003). Notably, authors reviewed 69 RCTs published in

surgical journals (Annals of Surgery, British Journal of Surgery, World Journal of Surgery,

Journal of Surgery, Journal of the American College of Surgeons and the American Journal

of Surgery) and prestigious medical journals (British Medical Journal, Lancet, Journal of the

American Medical Association, and the New England Journal of Medicine). Similar

suboptimal conduct has been documented in the areas of pediatric surgery (Curry, Reeves,

and Stringer 2003), urology (Ross et al. 1999), and orthopaedics (Jacquier et al. 2006;

Chaudhry et al. 2008).

23

Since RCTs are infrequent in surgery and many of these studies are methodologically flawed,

perhaps RCTs should not be automatically held in higher regard than NRS. Some argue that

a well-designed NRS may provide more accurate and less biased estimates for treatment

effect than poorly conducted RCTs (Grossman and Mackenzie 2005). Ultimately, studies

should be assessed on the basis of conduct and not study design labels (Hannan 2008).

However, while numerous studies have examined the sources of bias in RCTs, there is a

scarcity of similar studies for NRS.

1.8 Study characteristics of NRS associated with bias

While numerous meta-epidemiological studies have evaluated the association between study

design and bias for RCTs, there have been no comparable studies for NRS. However, there

have been two groups that examined NRS-design characteristics and performed subgroup

meta-analyses to determine if the presence or absence of specific characteristics yielded

results that are similar to those of RCTs. In a study by Bhandari et al., the results of NRS

evaluating arthroplasty versus internal fixation for hip fracture were compared with the

results of RCTs (Bhandari et al. 2004). For the outcome mortality, 13 NRS overestimated the

relative risk (RR) associated with arthroplasty by 40% (RR 1.44 versus 1.04) as compared

with 12 RCTs. On the other hand, the RR for risk of revision surgery was 0.38 among NRS

and 0.23 among RCTs. Thus, for the outcome risk of revision surgery, NRS underestimated

the benefit associated with arthroplasty by 15%. Further analysis suggested that there were

four NRS with point estimates of relative risk for mortality similar to the combined effect

estimate for RCTs. All four of these NRS had analyses that controlled for patient age,

gender, and fracture displacement.

In a study by Abraham et al., separate meta-analyses were conducted of NRS and RCTs

comparing laparoscopy with open surgery for the treatment of colon cancer (Abraham et al.

2010). They concluded that the OR for three dichotomous outcomes (mortality, morbidity

and reoperation rates) overlapped widely between the NRS and RCTs. For both study

designs, laparoscopy was associated with a statistically significant reduction in post-

24

operative morbidity. Authors then performed subgroup meta-analyses of NRS, stratifying

their analyses according to the presence or absence of certain design characteristics. NRS

with fewer than 50 patients per study arm, aims that were “not well defined,” end points that

were “not well defined,” non-consecutive patients, non-contemporaneous controls or

“inadequate” controls all failed to detect a statistically significant difference between

laparoscopy and open surgery with regards to morbidity. Authors also cautioned that there

was a possibility of a Type II error driven by the rather small sample size of the groups being

compared.

Along with the study by Shikata et al., these two innovative studies represent the some of the

first attempts to assess the comparability of NRS and RCTs in the surgical literature.

Moreover, both groups have attempted to identify which NRS yield results that are

comparable to those of RCTs. However, the analyses in these studies treated the pooled

effect estimates from RCTs as the referent or gold-standard value and overlooked the

important issue of methodological heterogeneity among these surgical trials. It is likely that

some of the RCTs in study by Bhandari et al. and Abraham and colleagues included RCTs at

unclear or high risk of bias. Therefore, additional studies are necessary to determine which

aspects of NRS design are associated with biased estimates of treatment effect — studies that

use low risk of bias RCTs as the referent group.

1.9 Summary of gaps in knowledge

In summary, there are important gaps in our understanding of bias in surgical NRS and

RCTs. These gaps include:

(1) The degree to which NRS and RCTs yield similar estimates of treatment effect is

uncertain. Comparisons of NRS and RCTs to date have rarely examined surgical studies and

most analyses have involved studies conducted in the 1980s and 1990s. These comparisons

have also not accounted for variation in RCT study quality, period effects or differences in

case-mix between individual studies.

25

(2) The relationship between study characteristics and effect estimates in NRS is unknown.

Whereas numerous meta-epidemiological studies have examined the association between

design attributes and bias in RCTs, there are no analogous studies for NRS.

1.10 Dissertation rationale

RCTs are rare in surgery and this trend has been consistent over time. Surgical trials are

uncommon because of the lack of an established funding infrastructure. The uncertainty

associated with randomization, on the part of both patients and physicians, also impedes

recruitment to surgical RCTs. Moreover, investigators must overcome challenges with

blinding and standardizing surgical technique when conducting a surgical trial. For these

reasons, NRS heavily inform the evidence base in surgery and will continue to do so. Should

this reliance be a cause for concern? After all, numerous studies have found that NRS yield

findings similar to those in RCTs. However, other studies have shown the opposite or been

inconclusive. These comparisons have been limited by including selections of NRS and

comparing them with RCTs of varying quality. Comparisons have not accounted for period

effects or patient case-mix. Most comparisons have also evaluated non-surgical interventions

and studies mostly conducted in the 1980s and 1990s. Thus, the comparability of NRS and

RCTs in surgery remains unclear.

Multiple meta-epidemiological studies have identified study characteristics associated with

bias for RCTs. The Cochrane Collaboration has used this empirical data to guide the

development of a risk of bias tool for RCTs. Similar work is needed in the field of NRS.

Identifying the attributes of NRS that are associated with bias will help those conducting,

reviewing and meta-analyzing NRS. Such advancements are necessary to understand how to

best use the evidence from NRS. To study bias in NRS, we have focused on the surgical

treatment of colorectal cancer due to the abundance of both NRS and RCTs in this area.

26

1.11 Research aims

Specific Aims

(1) To develop a conceptual framework for bias in comparative NRS.

(2) To compare effect estimates from NRS with those from RCTs at low risk of bias.

(3) To explore the impact of NRS-design attributes on estimates of treatment effect.

As there is no comprehensive framework for bias in NRS, a modified framework synthesis

was conducted to develop one. Sources of bias were identified from previously published

systematic reviews of quality assessment tools (e.g. scales and checklists) for NRS and

synthesized thematically into a framework.

In Chapter 5, pooled effect estimates from NRS were compared with those from i) all RCTs,

ii) Typical RCTs and iii) Strong RCTs evaluating laparoscopy and open colon surgery.

Random-effects meta-analysis and meta-regression methods were used to determine the

impact of study design. The Cochrane Risk of Bias Tool was used to categorize RCTs as

Strong or Typical.

In Chapter 6, a meta-epidemiology study was conducted to examine the relationship between

NRS-design characteristics and effect estimates. The conceptual framework developed in

Chapter 3 was used to identify potential study characteristics. Effect estimates were

compared across NRS with and without specific design characteristics. These estimates were

in turn compared with the results of Strong RCTs.

27

Chapter 2 Laparoscopic colon surgery – an opportunity

to study bias

Colorectal cancer is the third most common malignancy in men and women worldwide. In

2008, there were over 1.2 million new cases of colorectal cancer and 609,000 deaths (Ferlay

et al. 2010). In Canada, colorectal cancer is the second leading cause of cancer mortality

(Canadian Cancer Society’s Steering Committee on Cancer Statistics 2012). Surgery is the

cornerstone of treatment; in Ontario, 87% of patients diagnosed with colorectal cancer

undergo surgery (Rahima Nenshi et al. 2008). Prior to the early 1990s, surgical removal of

malignancies in the colon involved making a large abdominal incision. This traditional

approach, the “open” technique, allows surgeons to directly visualize and manipulate intra-

abdominal organs. However, in the late 1980s, surgeons began to consider laparoscopy for

the surgical management of colon cancer (Martel and Boushey 2006). Laparoscopy involves

inserting a fibre-optic laparoscope, a proverbial camera, into the abdominal cavity via a small

incision at the umbilicus. Thus, intra-abdominal organs are visualized indirectly - on

television monitors in the operating room. Additional small incisions, each less than 1.5 cm,

are made in the abdominal wall to allow the passage of specialized instruments. These small

incisions are referred to as “port sites.” At the end of the operation, the tumor is removed

through an incision that is much smaller than the one made with the “open” technique.

Laparoscopy was looked upon favourably because it was less invasive than open surgery and

thus was associated with less post-operative pain and a shorter post-operative hospital stay

(Schwenk et al. 2005).

The goal of any colon cancer operation is to remove the i) tumour, ii) a certain amount of

normal tissue adjacent to the tumour and iii) lymph nodes draining the tumour. While the

rationale of removing the tumour is obvious, the reasons for removing normal tissue and

lymph nodes are less so. The amount of adjacent normal tissue removed along with the tumor

is referred to as the margin. For colon cancer, surgeons aim to resect 5 cm of normal tissue

28

on either side of the tumor. With rectal cancer, they aim for a 2 cm distal margin (in the

fresh specimen) (Smith et al. 2010). Removing this normal tissue reduces the risk of local

recurrence.

All patients with colon cancer undergo pre-operative imaging (e.g. computerized tomography

scans of the abdomen and pelvis) to determine the extent to which the cancer has grown or

spread. The American Joint Committee on Cancer (AJCC) TNM pathology classification

scheme is commonly used to describe the extent of disease progression in cancer patients and

it consists of three components; T-category, the depth of tumor invasion through the colon

wall, N-category, the number of involved lymph nodes and M-category, the presence or

absence of distant metastases (Edge and American Joint Committee on Cancer. 2010).

Surgeons therefore aim to remove a minimum of 12 lymph nodes to allow for proper nodal

staging (i.e. N-category) of colon cancers (Compton et al. 2000; Nelson et al. 2001).

Combinations of T, N and M categories are in turn used to define disease stage (e.g. stage I

to IV). With increasing levels of stage, the overall prognosis worsens and the probability of

recurrent disease increases. Staging information is used to determine prognosis and identify

which patients will benefit from adjuvant treatment (i.e. chemotherapy) to prevent disease

recurrence.

The first case reports of laparoscopic colon surgery appeared in 1991 (Fowler and White

1991; Jacobs, Verdeja, and Goldstein 1991). Up to this point, laparoscopy had typically been

performed for the treatment of benign conditions including appendicitis and cholecystitis (i.e.

inflammation of the gallbladder). With the advent of laparoscopic colon surgery, many

questioned whether this new technique could be used to remove colon cancer. Multiple early

NRS suggested that laparoscopy was associated with fewer post-operative complications,

shorter hospital length of stay and less pain. Proponents of the “open” technique however

cautioned that despite these benefits, laparoscopy may be an inferior surgical approach for

cancer patients; during conventional, open colon surgery, surgeons could feel tumours and

periodically confirm that sufficient tissue was being removed. Laparoscopy could not allow

for such tactile feedback. Thus, many were concerned that laparoscopy may be associated

with inadequate margins and insufficient removal of lymph nodes.

29

Concern about laparoscopic colon surgery grew further with the publication of case reports

detailing cancer recurrence at port sites or wounds. In 1993, Alexander et al. described the

clinical course of a 67 year-old woman with right-sided colon cancer (Alexander, Jaques, and

Mitchell 1993). She unfortunately experienced recurrence at one of her wound sites at

3 months following laparoscopic surgery. In the two years following this initial report, an

additional 35 cases of port site recurrences were documented. In 1995, Wexner et al.

conducted a review of all published case series and found a recurrence rate of 6.3%

(CI 1.5 to 21%) (Wexner and Cohen 1995). In contrast, there were only 11 recurrences

among a series of 1,711 patients (0.64%, CI 0.32 to 1.15%) undergoing traditional surgery

between 1986 and 1989 (Reilly et al. 1996). Laparoscopy appeared to be associated with a

nearly 10 fold increase in disease recurrence. Since laparoscopy was also associated with

higher operating room costs, the benefits of using this technology for the treatment of colon

cancer were questioned by the surgical community.

Due to mounting concerns over the oncologic safety of the procedure, the American Society

for Colon and Rectal Surgeons recommended that laparoscopic colon cancer surgery should

only be performed within the context of a prospective trial (American Society of Colon and

Rectal Surgeons 1995). Over the following 15 years, multiple NRS and RCTs were

conducted to determine if there were appreciable differences between the two techniques in

terms of peri-operative mortality, post-operative complications, pain, length of stay, and so

on. Of note, a number of high-quality, publicly-funded, rigorous RCTs were conducted – a

relative rarity in surgery. Since 1998, fifty reviews (i.e. systematic reviews or meta-analyses)

have been published comparing laparoscopy with open surgery for the treatment of colon

cancer. This literature includes two separate Cochrane reviews of the short-term and long-

term outcomes following these operations. The first review found that laparoscopy was

associated with longer operative time but less total morbidity and shorter length of stay

(Schwenk et al. 2005). There was also evidence for less post-operative pain with

laparoscopy. An analysis of long-term outcomes found similar rates of port site/wound

occurrence and cancer-related mortality (Kuhry et al. 2008). However, the quality of

included RCTs has been noted to “greatly” vary (Kuhry et al. 2008). Most quality

assessments were undertaken using instruments that have not been validated. For example,

the Cochrane review by Schwenk et al. used the modified Evans and Pollock Questionnaire

30

to assess quality – this questionnaire has not been evaluated for face or content validity nor

intra or inter-observer reliability (Evans and Pollock 1985). Including studies of varying

quality probably increases heterogeneity but may also lead to aggregate estimates that are

biased. Furthermore, authors did not explore how differences in case-mix between studies

may have impacted the results of the meta-analyses.

Since the early 1990s, laparoscopic colon surgery has evolved from an experimental

technique to the preferred approach for the surgical treatment of colon cancer. The numerous

NRS and RCTs evaluating laparoscopic and open colon surgery facilitated this evolution.

The breadth of studies comparing the two techniques also provides for a unique opportunity

to compare the results of NRS with those of RCTs. There is also an opportunity to examine

the impact of quality, case-mix and period effects on observed results. To achieve these

goals, a case study of bias was undertaken.

31

Chapter 3 Development of a conceptual framework for bias in non-randomized studies: results of a

modified framework synthesis

3.1 Summary

Objective

The evaluation of any non-randomized study (NRS) requires a thorough assessment of bias.

However, there is no consensus on the sources of bias in studies lacking randomization.

Therefore, our objective was to develop a conceptual framework for bias in NRS.

Study Design and Setting

A modified framework synthesis was conducted; i) an a priori framework was developed,

ii) a systematic search identified reviews of quality assessment instruments (e.g. scales and

checklists) for NRS, and iii) sources of bias were extracted, analyzed thematically and

organized into a framework.

Results

Of the 7 reviews identified, 4 were published in peer-reviewed journals and the remaining

were produced by publicly-funded, scientific agencies. The final framework contains 37

sources of bias or “items”. These items were organized within 6 overarching “domains”;

selection bias, information bias, performance bias, detection bias, attrition bias, and selective

reporting bias.

Conclusion

32

The sources of bias in NRS have been arranged into 6 main domains. This framework can

facilitate the study of bias in NRS and help those designing or reviewing NRS.

33

3.2 Introduction

Evaluating the merits of any study should involve a thorough consideration of internal and

external validity. Internal validity is the extent to which the findings of a study are

representative of the true association between exposure and outcome. Bias, precision and

confounding are components of internal validity (Grimes and Schulz 2002). Bias is defined

as “systematic error or deviation in results or inferences from the truth” (Agabegi and Stern

2008). Although bias cannot be totally eliminated from studies, the goal is to minimize it. For

randomized controlled trials (RCTs), several empirical studies have shown that bias arises

with inadequate randomization, allocation concealment and blinding (Schulz et al. 1995;

Moher et al. 1998; Kjaergard, Villumsen, and Gluud 2001; Balk et al. 2002; Als-Nielsen et

al. 2003; Tierney and Stewart 2005; Pildal et al. 2007; Nuesch, Reichenbach, et al. 2009;

Nuesch, Trelle, et al. 2009; Nuesch et al. 2010; Bassler et al. 2010; Dechartres et al. 2011;

Bafeta et al. 2012; Hrobjartsson et al. 2012, 2013; Wood et al. 2008; Savovic et al. 2012).

These studies have helped guide researchers, physicians, and policy makers in assessing the

risk of bias in RCTs. However, comparable evidence is not available to guide the appraisal of

NRS. At best, study-design labels (e.g. case-control study) are used as crude markers for the

extent of bias in any study. Bias assessments should instead involve a full consideration of

study methodology.

Those assessing risk of bias in NRS might be tempted to use any one of the over 190 tools

(i.e. scales and checklists) available (Deeks 2002). Unfortunately, many were developed

without adhering to the principles of measurement theory (Sanderson, Tatt, and Higgins

2007). Moreover, many focus on the broader concept “quality” and were not specifically

designed to evaluate bias. Others, such as the Newcastle-Ottawa Scale, have been shown to

have low reliability between individual reviewers (Hartling et al. 2012). The concepts of bias

and quality are often used interchangeably but each term represents distinct constructs;

quality includes not only bias but also considerations of external validity, ethical conduct and

reporting standards (Table 3.1). While the components of quality are undoubtedly inter-

related, they should nevertheless be evaluated independent of one another. Indeed, the major

limitation of many available instruments for the appraisal of NRS is the emphasis placed on

reported study methods and not actual study conduct (Sanderson, Tatt, and Higgins 2007).

34

Table 3.1 Definitions of key constructs.

Quality Components Description* Internal Validity§

The extent to which study design, implementation, and data analysis have minimized or eliminated bias and that the findings are representative of the true association between exposure and outcome.

Bias A systematic error or deviation in results or inferences from the truth. Precision A measure of the likelihood of random errors in the results of a study, meta-

analysis or measurement. The greater the precision, the less random error. Confidence intervals around the estimate of effect from each study are one way of expressing precision, with a narrower confidence interval meaning more precision.

Confounding A factor that is associated with both an intervention (or exposure) and the outcome of interest but does not lie in the causual pathway. A confounder distorts the relationship between the intervention (or exposure) and the outcome

External Validity The extent to which results provide a correct basis for generalizations to other circumstances

Ethics† A branch of philosophy systematizing, defending, and recommending concepts of right and wrong conduct

Reporting Standards A proscriptive standard outlining what should be included in the report of a study (e.g. CONSORT, STROBE). These standards do not evaluate study conduct.

* Unless otherwise specified, construct definitions adapted from the Cochrane Glossary, www.cochrane.org/glossary § Adapted from “Identifying and avoiding bias in research,” (Pannucci and Wilkins 2010) † Adapted from “Ethics and science: an introduction,” (Briggle and Mitcham)

To best use the evidence from NRS, an approach to systematically deconstruct and evaluate

bias is required. A conceptual framework for bias in NRS would not only help those

appraising individual studies but could facilitate the study of bias in NRS. Reviewing the

available literature failed to identify such a framework. Therefore, the objective of this study

was to develop a conceptual framework for bias in NRS. A conceptual framework is a visual

representation that “explains, either graphically or in narrative form, the main things to be

studied – the key factors, concepts, or variables – and the presumed relationships among

them” (Miles and Huberman 1994). Conceptual frameworks have been widely used for the

study of complex phenomena including knowledge translation (Graham et al. 2006), shared

decision-making (Legare et al. 2008), and guideline implementation (Gagliardi et al. 2011).

Since our primary aim was to develop a conceptual framework for bias in NRS of

interventions, such a framework would not reflect the biases that could arise in studies of

prognosis or diagnosis.

35

3.3 Methods

To develop a conceptual framework for bias in NRS, we sought to first aggregate a list of

potential sources of bias. Data in the form of comprehensive lists of bias or classification

schemes for bias in NRS were required. Accordingly, we focused on published systematic

reviews of quality assessment tools for NRS. Many of these tools evaluate the

methodological rigor of NRS. By focusing on systematic reviews of NRS assessment tools,

we hoped to gain insight on how others had conceptualized and organized the content of

available tools. These analyses would likely contain descriptions of the potential sources of

bias in NRS. Therefore, our objective was to review these organizational approaches and

extract all potential sources of bias. Extracted biases were then organized into a conceptual

framework. Even though these reviews focused on tools assessing quality and not

specifically bias, this approach was deemed the most broad and inclusive of potential sources

of bias in NRS.

Our aim was to construct a hierarchical framework with “items” or individual sources of bias

organized thematically under “domains” or overarching themes. For example, while

“blinding of outcome assessors” would be considered an item, it would fall under the domain

of “detection bias.” A modified framework synthesis approach was used for this study

(Barnett-Page and Thomas 2009). Framework synthesis is used to synthesize qualitative data

across multiple studies to develop an overarching framework. Instead of using qualitative

studies, we used systematic reviews as the primary data source in this study. The other

principles of framework synthesis were closely observed: i) developing an a priori

framework, ii) extracting and mapping data onto the evolving framework in an iterative

manner, iii) organizing the final synthetic product so that associations between themes

became apparent. Domains in the a priori framework were chosen and defined using

background reading material (Sackett 1979; Kleinbaum, Morgenstern, and Kupper 1981;

Grimes and Schulz 2002; Fletcher and Fletcher 2005; Choi and Noseworthy 1992; Delgado-

Rodriguez and Llorca 2004) and team discussions. This initial framework contained the

following domains: selection bias, information bias, performance bias and attrition bias. A

36

search strategy was designed with the assistance of an information specialist to identify

pertinent reviews.

3.3.1 Search strategy

A MEDLINE search (2000- 2011) was conducted to identify systematic reviews of quality

assessment or risk of bias tools for NRS. The search strategy was structured to include terms

related to four main concepts: i) quality, bias, validity and critical appraisal (with appropriate

truncations), ii) instruments (tools, scales, checklists, and related terms), iii) non-randomized

study design (including cohort, case-control, observational and appropriate permutations)

and iv) systematic review. Titles and abstracts were assessed for eligibility by a single-

reviewer.

Given the challenges of searching for methodological articles, we supplemented our search

strategy in four ways. First, we reviewed the references of eligible articles. Second, the

“Related Citations” feature of PubMed® was used in turn for each of the eligible articles to

identify additional reviews. Third, Web of Knowledge® was used to identify articles citing

the eligible studies and these titles and abstracts were assessed for eligibility. Fourth, the

“Related Articles” feature of Google Scholar® was used in conjunction with each eligible

study to find any additional reviews meeting the eligibility criteria.

Studies meeting the following inclusion criteria were analyzed; i) review of quality

assessment tools for NRS, ii) content of tools analyzed and grouped thematically,

iii) publication in English, iv) publication in a peer-reviewed journal or report of an

independent scientific association or government agency. Exclusion criteria included the

following; i) review of tools for NRS evaluating diagnostic/prognostic questions ,

ii) reporting guidelines or reviews of reporting guidelines and iii) reviews published before

2000. Reporting guidelines were excluded because these focus on which elements should be

reported and do not evaluate actual study conduct. Systematic reviews published before

2000 were excluded so that the framework would be based on contemporaneous data

37

sources. However, it is likely that the reviews included many instruments themselves

developed prior to 2000.

3.3.2 Data collection

The following information was abstracted from each review; year of publication, number of

tools reviewed, number of content domains constructed, and the number of items or sources

of bias identified. In these reviews, investigators had organized the content of NRS tools and

summarized this content. We examined these organizational approaches and abstracted all

domains and specific sources of bias listed. These data were then analyzed thematically.

3.3.3 Analytic approach

In this study, abstracted items and domains were tabulated and coded using the a priori

framework. Open coding involved line by line analysis of abstracted sources of bias. Codes

were then combined into inter-related categories or domains using axial coding. This

iterative process involved a constant comparative approach where domains were compared

with existing ones and consensus was achieved across three study members (LS, DRU, GT).

Thus, data analysis was initially a deductive process. Deductive reasoning involves moving

from the broad to the more specific; a theory is devised (i.e. an a priori framework) and is

supported or refuted using the available data (Miles and Huberman 1994). Since the a priori

framework informed initial coding, a broad structure was used to inform data analysis.

Whenever the emerging framework could not accommodate a new domain related to bias,

the framework was expanded (Barnett-Page and Thomas 2009). Therefore, inductive analysis

occurred whenever domains emerged solely from the data; inductive reasoning involves

using collected data to form the basis for theory or broad conclusions. Therefore, thematic

analysis included both a deductive and inductive phase, as is customary in framework

synthesis (Barnett-Page and Thomas 2009).

38

Data relevant to bias were included in the framework whereas excluded items, those relating

to other facets of quality (i.e. ethics, external validity, reporting standards and/or precision),

were organized separately. We synthesized these latter elements in a separate table so that

the final framework could be compared and contrasted with these extraneous elements.

3.3.4 Framework refinement

Five scientists with extensive experience in clinical epidemiology and health services

research were approached to informally review the clarity and face validity of the

framework. They were asked, i) does this framework reflect the biases in NRS, ii) are any

sources of bias missing, iii) is the wording of any items ambiguous or unclear?

3.4 Results

3.4.1 Included studies

Seven reviews met the inclusion criteria (Figure 3.1). Four were published in peer-review

journals (Saunders et al. 2003; Katrak et al. 2004; Sanderson, Tatt, and Higgins 2007; Crowe

and Sheppard 2011) whereas 3 were produced by publicly-funded agencies (the Agency for

Healthcare Research and Quality (West et al. 2002; Viswanathan and Berkman 2011) and the

National Institute for Health Research-Health Technology Assessment Programme (Deeks et

al. 2003)). Five reviews focused solely on NRS and two evaluated instruments for both RCTs

and NRS (West et al. 2002; Crowe and Sheppard 2011) — but present data separately for

each design. The characteristics of eligible studies are summarized in Table 3.2.

Reviews differed in the number of instruments evaluated (range, 19-193). Each study

presented a scheme organizing the content of assessment tools. The number of domains

(range, 5-12), sub-domains (range, 0-22) and items (range, 11-54) varied across the reviews.

Table 3.3 outlines the constructs identified as domains. Authors clearly varied in the number

of domains constructed and in many instances, domains that appear in one review were not

39

mentioned in others. For example, West et al. include funding as a bias domain but others

did not. Crowe and Sheppard broached the concept in their review, but nested the finding

under “Ethical matters.” In many instances, the domains identified by authors were not

specific

Figure 3.1 Flow diagram of included studies.

* Deeks et al. (2003), Sanderson et al. (2007) ** West et al. (2002) § Saunders et al. (2003), Katrak et al. (2004), Vishwanthan & Berkman (2011) † Crowe et al. (2011)

Table 3.2 Characteristics of included studies.

Author Year Number of

Tools Reviewed Number of Domains

Number of Sub-domains

Number of Items

West et al. 2002 19* 9 - 29 Deeks et al. 2003 193 12 - 45 Saunders et al. 2003 18 4 - 27 Katrak et al. 2004 19§ - - 11 Sanderson et al. 2007 86 6 - 10 Crowe and Sheppard 2011 44 8 22 54 Vishwanathan and Berkman 2011 NA† 5 12 29 * 106 instruments identified and 19 specific to non-randomized studies § 121 critical appraisal tools identified and 19 specific to non-randomized studies † Authors reviewed 1429 questions collated from i) published instruments and ii) 84 Agency for Healthcare Research and Quality (AHRQ)-sponsored systematic reviews. These questions were analyzed and reduced to 29 final items by an expert panel

40

Table 3.3 Bias domains extracted from systematic reviews of quality assessment tools for NRS.

West et al. Deeks et al. Saunders et al. Katrak et al. Sandersen et al. Crowe and Sheppard

Vishwanathan and Berkman

Study question Study population Comparability of subjects Exposures/Interventions Outcome measures Statistical analyses Results Discussion Funding

Background/context Sample definition and selection Interventions Outcomes Creation of treatment groups Blinding Soundness of information Follow-up Analysis: comparability Analysis: outcome Interpretation Presentation and reporting

Subjects Interventions Outcomes Statistical Analyses

Nil Methods for selecting study population Methods for measuring exposure and outcome variables Design-specific sources of bias Methods to control confounding Statistical Methods Conflicts of interest

Preamble Introduction Research and design Sampling Ethical Matters Data collection Results Discussion

Selection bias and confounding Performance bias Attrition bias Detection bias

Reporting bias

41

types of bias but instead broad labels such as “outcomes” (West et al. 2002; Deeks et al.

2003; Saunders et al. 2003). Our analyses therefore progressed in a classical qualitative

manner wherein inter-related concepts were collapsed onto underlying constructs. The

conceptual framework that emerged is described below.

3.4.2 Conceptual framework

The final framework contains six overarching bias domains; selection, information,

performance, detection, attrition and selective reporting bias (Figure 3.2, Table 3.4).

Figure 3.2 Conceptual framework for bias in non-randomized studies.

42

Table 3.4 Bias domains in the conceptual framework.

Domain Definition* Selection Bias Attrition Bias

Bias arising when members of the intervention/exposure group differ from the comparator (i.e. control) group in ways aside from the exposure of interest Systematic differences between groups in withdrawals from a study

Information Bias Detection Bias

Bias arising from measurement errors of exposure, covariate or outcome variables Systematic differences between groups in how outcomes are determined and thus, a type of information bias

Performance Bias Systematic differences between the groups in the care that is provided, or in exposure to factors other than the interventions of interest

Selective Reporting Bias Systematic differences between reported and unreported findings * Adapted from the Cochrane Glossary, www.cochrane.org/glossary

Compared with the a priori framework, there were two significant expansions; first, selective

reporting bias was added. Second, the domain detection bias was added and nested under

information bias. The framework contains 37 specific items or sources of bias (Table 3.5). A

description of the framework domains and items is provided below.

Selection Bias

Selection bias is often cited as the most significant shortcoming of NRS. Within NRS,

selection bias refers to any process that leads groups to differ from one another, aside from

the exposure of interest. Randomization determines group allocation in RCTs, whereas in

many NRS, groups are often formed by virtue of the intervention received during the course

of routine care. When the characteristics of patients are related to both receipt of treatment

and the development of the outcome, confounding occurs. Investigators anticipating such

bias can employ a number of strategies throughout the course of a study, from assembling the

cohort (e.g. matching) through to analysis (e.g. stratification and regression analyses), to

diminish selection bias.

http://www.cochrane.org/glossary

43

Table 3.5 Frequency of included items. Item

# Item Bias Weeks Deeks Saunders Katrak Sanderson Crowe & Sheppard

Vishwanathan & Berkman

1 Outcomes specified a priori Selective Reporting X 2 Analyses specified a priori Selective Reporting X 3 Attempt to balance groups on known confounders by design Selection X X X Inclusion/Exclusion

Explicitly defined Identical criteria across groups Measured using valid and reliable instruments/approach Source of data

4 Selection X X X X X 5 Selection X X X 6 Information X 7 Information X X 8 Participant recruitment consistent across groups Selection X 9 Comparability of groups at baseline Selection X X X X X Confounders

Explicitly identified/defined Measured using valid and reliable instruments/approach Source of data Systematic determination of confounders across groups

10 Selection X X 11 Information X 12 Information X X 13 Information X 14 Intervention/Exposure

Explicitly identified/defined Measured using valid and reliable instruments/approach Source of data Systematic determination of intervention/exposure Intervention delivery and adherence

15 Performance X X X X X X 16 Information X X X X 17 Information X X 18 Information X X X 19 Performance X X X 20 Contamination Performance X X X 21 Concurrent treatment/co-interventions consistent across groups Performance X X X 22 Blinding of participants/personnel Performance X X X X 23 Blinding of outcome assessors Detection X X X X X X

Outcomes Explicitly defined Objective definition Measured using valid and reliable instruments/approach Source of data Systematic determination of outcome across groups

24 Detection X X X X X 25 Detection X 26 Information X X X X X 27 Information X X 28 Information X

Follow-up Adequately long-follow-up (to capture outcome events) Equal duration across groups Equal intensity across groups Losses to follow-up

29 Detection X X X X 30 Attrition X X 31 Attrition X 32 Attrition X X X X X

Analysis methods Intention to treat analysis Missing data (e.g. addressed via imputation or sensitivity analyses) Handling of known confounding Multiple comparisons

33 Attrition X X X 34 Attrition X X

35 Selection X X X X X X 36 Selective Reporting 37 Funding Selective Reporting X X X X

44

When inclusion inclusion/ exclusion criteria are not applied in a systematic way, bias may

arise. Ultimately, these criteria are used to create homogenous groups of participants —

groups should be as similar as possible, aside from receipt of the intervention under study.

Placing limits on age, severity of illness and the existence of comorbid illness can help to

construct groups that are more comparable to one another. Inclusion/exclusion criteria need

to be clearly defined (e.g. stage IV cancer as opposed to “advanced cancer”) and applied

uniformly across groups to prevent bias. Participants destined to be classified as the

“exposed” or “controls” should also be drawn from the same source population. Consider a

hypothetical study of robotic surgery for the treatment of prostate cancer. Since high-volume

centers are more likely to acquire robot technology, any study assessing this new technique

should draw control subjects from the same institution, or an institution with similar

characteristics. Otherwise, increased hospital volume could confound the relationship

between the exposure (i.e. robotic surgery) and the outcome (e.g. mortality). In this example,

hospital surgical volume is a confounder; it is i) associated with the exposure, ii) is an

independent determinant of outcome, and iii) is not an intermediate in the causal pathway

(Fletcher and Fletcher 2005). Given the importance of confounding in NRS, the framework

contains multiple items related to this source of bias. Confounding was considered at the

outset of a study (items 3, 9 and 10) as well during the analysis phase of a study (item 35).

Information Bias

Conducting RCTs ideally necessitates the development and registration of trial protocols

including detailed information about the measurement of patient characteristics and

outcomes. These data are thereafter collected on standardized forms, by trained study

personnel and entered into databases with mechanisms to detect inaccurate data entry. In

contrast, many NRS make use of information gathered from hospital charts or administrative

processes (e.g. billing). As this information was not collected in the course of a study,

questions can arise about the accuracy, completeness and other purposes of such information.

Consequently, there are 11 items in the framework related to information bias – bias

45

stemming from the errors in the measurement of exposures, confounders and outcomes.

Detection bias (or ascertainment bias) applies specifically to outcomes and is therefore

classified as subset of information bias.

Bias can arise in a NRS if investigators rely on source data of questionable quality (items 7,

12, 17 and 27). For example, consider a case-control study of antibiotic use; asking patients

to recall their drug history over the past 10 years is far less reliable than using hospital charts

or provincial drug-plan databases to obtain this information. In turn, those studying anti-

retroviral therapy may refer to hospital records to obtain CD4 blood levels, but could have

acquired more data points if a research protocol had dictated the frequency of sample

collection. Therefore, assessing the adequacy of source data in NRS is context dependent –

but will generally be poorer in quality as compared to data collected in the course of a trial.

Effect measures can thus deviate from the truth if the source data is itself flawed or

incomplete in a systematic way.

Bias may also arise if investigators rely on instruments that have not been validated or tested

for reliability (items 6, 11, 16 and 26). Even when valid and reliable instruments are used,

unless they are used in a systematic manner, bias may arise. Accordingly, the framework

contains items related to the systematic determination of confounders (item 13),

interventions/exposures (item 18) and outcomes (item 28). Moreover, blinding outcome

assessors (item 23) specifically serves to diminish detection bias and in its absence, bias may

arise because the conscious or subconscious beliefs of investigators may influence how

outcomes are recorded. For example, consider a trial comparing routine care versus incentive

spirometry (i.e. the use of breathing device) for the prevention of post-operative pneumonia.

An assessor who believes strongly in the benefit of incentive spirometry may review chest

radiographs of patients using the device and be less likely to diagnose a pneumonia.

Diagnosing a pneumonia on a chest radiograph can require judgment and is thus subject to

interpretation.

46

Performance Bias

In an ideal study, participants in the active and control arms of a study are similar except for

the exposure of interest. Blinding both study subjects and investigators ensures that all

participants are treated alike. However, such blinding is not always possible (Boutron et al.)

and in its absence, performance bias arises. Performance bias has four main components;

i) explicit definition of the intervention/exposure (item 15), ii) consistent delivery and

adherence to the intervention (item 19), iii) similar co-interventions across groups (item 21),

iv) contamination (item 20) and iv) blinding of participants and study personnel (item 22).

Bias can arise in a study if there is variation in the definition of an intervention or in its

standardization. For example, in a hypothetical study of surgery versus surgery and

chemotherapy for gastric cancer, surgeons can differ in the extent of lymphadenectomy

(i.e. removal of lymph nodes) performed intra-operatively. Some surgeons remove less

surrounding tissue, and thus fewer nodes, whereas others may do a more extensive

lymphadenectomy. If the extent of surgery differs between patients, will the summary effect

estimate truly reflect the difference between the two treatment strategies? Whereas in

pharmacological studies, standardizing the intervention can mean defining a dose, frequency

of delivery and using a specific pharmaceutical source, achieving this type of standardization

is more difficult in studies of non-pharmacological interventions. Within RCTs, investigators

will often insist on surgeons having achieved a certain degree of proficiency with a technique

before involving them in a study. For example, in the trial by Nelson and colleagues on

laparoscopic colon surgery, both procedure-volume thresholds were established and

videotapes of operations were reviewed to ensure participating surgeons were delivering the

intervention consistently (Nelson et al. 2004). A similar approach could be adopted in NRS

but would add to the complexity of studies. Without explicitly defining and standardizing

interventions, the results of a NRS may not reflect the impact of an intervention as it was

intended.

47

Patients can also fail to adhere to their intended intervention via non-compliance or by

switching over to the alternative therapy/intervention (i.e. contamination). This can be

common in pharmacological studies. As there are often meaningful reasons why people

switch, investigators should explore the underlying reasons why these changes occur. Non-

random switching can introduce bias that affects estimates from studies with significant non-

adherence or contamination. Some have argued that these effect estimates instead represent

“real-world” conditions.

Co-interventions can also introduce bias by similar mechanisms in NRS and RCTs. Consider

the following example which applies equally to both study types; investigators are evaluating

the rate of wound infection following two surgical procedures A and B. At the end of the

study, lower rates of post-operative infection are observed in patients receiving A. If

antibiotic prophylaxis was provided more frequently to patients undergoing A than those

undergoing B, the observed effect might be attributable to antibiotic prophylaxis rather than

to the surgical procedure. In RCTs, blinding of study personnel and participants helps to

minimize this bias but this is rarely encountered in NRS. Therefore, differential application

of co-interventions can introduce bias in NRS.

Attrition Bias

Attrition bias arises when there is incomplete data collection (item 34) or differential follow-

up (item 32). These processes lead to imbalances at the end of a study. Attrition bias

therefore diminishes the comparability of groups and was categorized as a type of selection

bias. Five of the reviews identified losses to follow up as a source of bias in NRS (Saunders

et al. 2003; Katrak et al. 2004; Sanderson, Tatt, and Higgins 2007; Crowe and Sheppard

2011; Viswanathan and Berkman 2011). Since NRS are often observational studies making

use of data collected during routine care, attrition poses a particular challenge. Bias can arise

if one group is followed more intensely than the other. There is more opportunity to record

information from patients that are seen more often or undergo tests more frequently. For

48

example, it is possible that disadvantaged populations are less likely to comply with follow-

up. If socioeconomic status is related to the outcome, seeing such patients infrequently will

not allow for complete capture of outcomes. Bias therefore arises if follow-up is of unequal

intensity (item 31) across groups.

Moreover, when new techniques or interventions are compared with more established ones,

there may only be a few years of follow-up available with the newer approach. If the patients

who received the conventional technique have been followed for over a decade, the

inequality in duration of follow-up (item 30) could make the newer technique falsely appear

superior to the older approach. Both groups may be followed with equal intensity (e.g. same

number of blood tests or computed tomography scans) but by virtue of being followed for a

shorter amount of time, bias may be introduced in the NRS.

The concept of intention-to-treat analysis is often classified within attrition bias for RCTs. A

similar approach was adopted in this framework for bias in NRS. In RCTs, intention-to-treat

(ITT) analysis implies analyzing participants according to the group they were randomly

assigned to, irrespective of the treatment received. Understanding this concept within the

context of NRS is more difficult. Since patients are assigned to the treatment or exposure arm

of a study based on the treatment received, how does intention-to-treat analysis retain its

meaning? Taking a closer look at studies comparing laparoscopy with conventional surgery

for colon cancer can help to illustrate the meaning of ITT analysis in NRS. Laparoscopy or

“key-hole” surgery involves operating through small incisions (e.g. <1cm) instead of the

much larger incision (>20 cm) used in conventional surgery. Minimally-invasive instruments

are inserted through the small incisions while the operative field is displayed on monitors in

the operating room. If surgeons encounter bleeding or adhesions that impede the progress of

the procedure, they may abandon laparoscopy in favor of the conventional approach. Later

analyzing these “converted” patients in the conventional surgery group would be a violation

of the principles of ITT analysis. More generally, ITT analysis in NRS requires using

strategies (e.g. last observation carried forward) to make sure that participants with some

missing data are not wholly removed from the analysis. Failure to do so could introduce bias

49

by removing patients with missing data — reasons for missingness may be related to the

outcome of interest.

Selective Reporting Bias

There are 5 items in the conceptual framework related to selective reporting bias. Whenever

investigators systematically choose to include certain findings but not others in the published

report of a study, selective reporting bias occurs. This bias is further divided into two types;

selective outcome reporting or selective analysis reporting (Norris et al. 2012). The former

arises when a subset of outcomes are reported based on: i) their direction or statistical

significance, ii) whether they are consistent with the hypotheses of the investigator or

funding source, or iii) if the findings support a paradigm shift (Chan et al. 2004). Selective

outcome reporting is mediated through a number of mechanisms which include omitting an

entire outcome, reporting only favorable aspects of an outcome (e.g. at 6 months of follow-

up but not 1-year), or providing insufficient detail (e.g. p>0.05) (Kirkham et al. 2010).

Selective analysis reporting involves including only a portion of the analyses performed,

using multiple approaches to missing data but reporting a subset, or turning continuously

measured variables into categorical ones (Norris et al. 2012).

Selective reporting bias can arise in NRS if there is selective reporting of outcomes and

analyses. Many have suggested that NRS should require protocol registration in a manner

akin to RCTs (Williams et al. 2010; Swaen, Carmichael, and Doe 2011) — doing so would

encourage investigators to refrain from “cherry-picking” outcomes and analyses (Mathieu et

al. 2009). Assessing the potential for selective reporting bias in NRS remains a challenge

however, in the absence of such protocols.

50

3.4.3 Excluded items

There were 24 items not included in the final conceptual framework (Table 3.6). The

majority were related to reporting standards (n=9), and the remainder to external validity

(n=8), ethics (n=4) and precision (n=3) (Table 3.6). While “precision” is considered a

component of internal validity (Higgins et al. 2011), the other three domains were

categorized as facets of “quality” (Sanderson, Tatt, and Higgins 2007).

A distinction should be drawn between the items related to selective outcome reporting

(items 1, 2, 33, 34, 36 and 38) of the framework and the “selection of outcomes for relevance

and importance.” Investigators should measure outcomes that are important to patients,

health care providers, administrators and policy makers alike. Which outcomes one chooses

to study is a different consideration from choosing which ones to publish among those

studied. Accordingly, the “selection of outcomes for relevance and importance” is a

reflection of the relevance and external validity of a study.

51

Table 3.6 Items abstracted from reviews but not related to bias.

Excluded Items Ethics

Ethics approval Informed consent Privacy/confidentiality maintained throughout the study Appropriate comparison group (given standard of care)

External Validity Question relevant to practice Participants representative of those seen in clinical practice Recruitment/participation rate Participants comparable to non-participants Feasibility of intervention Selection of outcomes for relevance and importance Clinical importance of findings

Reporting Standards Background information provided Clearly stated question Hypotheses described Study design adequately described Description of study population Statistical presentation/reporting of findings Conclusions supported by results/limitations Description of implications/applications

Internal Validity – Precision Sample size calculation Power calculation

52

3.5 Discussion

A comprehensive framework for bias in comparative, NRS has been developed and includes

37 individual sources of bias, organized within 6 domains. No single review included all

identified items and many included items related to the larger construct quality. The

appearance of items related to reporting standards, generalizability and ethics within reviews

underscores how many available instruments were not specifically designed to evaluate bias.

The domains of bias in the final framework are similar to those in the Cochrane Risk of Bias

Tool for RCTs. Whereas the Cochrane Risk of Bias Tool has five domains (selection,

performance, detection, attrition and reporting bias), the framework for NRS places detection

bias within the larger construct of information bias. The inclusion of information bias as a

stand-alone domain in the current framework highlights how NRS and RCTs fundamentally

differ in data acquisition; RCTs are by definition prospective studies carried out by dedicated

personnel following explicit protocols, collecting data in a formalized way. Whether

explanatory or pragmatic, RCTs are planned and structured experiments where specific

attempts are made to answer a clinical question. Most NRS however, make use of

information collected during routine care and this has notable implications on data quality.

Patients answer questions and undergo tests at intervals that are dictated by care goals, not

research protocols. For these reasons, many items that emerged in the framework related

directly to information bias – namely the quality of the data available, and the measures used

to determine exposure, confounders and outcome.

Some reviewing the framework might question why publication bias, a type of reporting

bias, does not appear in the framework. The biases addressed in the framework all function

within a study. In contrast, publication bias, a metabias (Goodman and Dickersin 2011),

occurs at the level of the entire study; a study reporting statistically significant results is more

likely to be published, often more quickly, more than once and ultimately gets cited more

often (Dwan et al. 2008). This bias is of particular concern to those performing systematic

53

reviews and meta-analyses where identifying all pertinent studies is critically important.

Since publication bias becomes apparent when aggregating studies, it does not apply to

within study processes. The tendency to publish some outcomes in lieu of others is more

formally captured by the domain selective outcome reporting, in part, to highlight this very

distinction.

Considering the indexing of methodological articles is highly variable, an augmented search

strategy was used to identify relevant reviews of quality assessment tools for NRS. This is

one of the strengths of the current study, as is the use of a formal approach for the synthesis

and development of the framework. In addition, by dividing the concept “follow-up” into

four components (adequacy of duration, equality of duration and intensity, and losses to

follow-up), the framework highlights facets of follow-up that have been previously

overlooked. While the informal face validity exercise did not lead to additions to the

framework, it did help refine the language used. The excluded items also help to illustrate

how issues of reporting have historically been intertwined with assessments of conduct.

As the framework applies to NRS more broadly and has not been tailored for specific study

designs, such as retrospective cohort studies or case-control studies, some may argue this

represents a limitation of the current work. Indeed, various instruments for the evaluation for

NRS, including the Newcastle-Ottawa Scale, have variations that apply to specific study

designs (Stang 2010). A design-specific approach was not adopted because most of the

source material did not make an analogous distinction when reviewing the instruments

evaluating bias. Introducing such a distinction during the course of framework development

would have required an inferential leap. Moreover, since the current framework can be

applied to any NRS, end-users do not have to identify the type of NRS being evaluated with

the framework. Previous studies have demonstrated that there can be a great deal of

ambiguity with study design labels for NRS (Furlan 2006; Hartling et al. 2011).

Conceptual frameworks facilitate research by providing a structure for understanding a

phenomenon. The current framework focuses on sources of bias but it is important to note

54

that any bias has an associated direction and magnitude that is context dependent. Will a lack

of blinded outcome assessors always bias effect estimates away from the null? That depends

on the hypotheses being tested and the subconscious motivations of the researchers or study

personnel. To use the framework for those studying and evaluating studies, it is important to

recognize how any given source of bias may arise in a particular study and if possible,

speculate on the direction of this bias. For RCTs, the Cochrane Risk of Bias tool does not

focus soley on what was done but requires reviewers to make a judgment about the

associated risk of bias. For example, if outcome assessors were not blinded, is bias

introduced when evaluating an objective outcome such as mortality? There is empirical

evidence to support that objective outcomes are insulated from the effects of unblinded

assessment in RCTs (Wood et al. 2008; Hrobjartsson et al. 2012, 2013). Until there are

empirical studies of bias for NRS, theoretical considerations will be used in determining the

risk of bias associated with any given item or domain in the framework.

Many of the items and domains in the framework are well-known to methodologists,

researchers and the end-users of NRS. Some of these include the blinding of outcome

assessors, minimizing losses to follow-up and adjusting analyses for known confounding.

This framework helps to shed light on many of the important sources of bias that are

infrequently discussed but are nonetheless important to consider. For example, the

framework highlights not only the duration of follow-up as a source of bias but also the

intensity of follow-up. Both of these concepts are set aside from the concept “losses to

follow-up.” The comprehensiveness of the current framework is indeed one its strengths.

Moreover, the sources of bias identified in this current framework include many of those

identified in the textbook “Clinical Epidemiology: The Essentials, 4th edition” (Fletcher and

Fletcher 2005). In fact, we identified an additional 12 sources of bias.

Ultimately, the framework for bias in comparative NRS may be used by individuals

designing a NRS, those reviewing a proposed protocol, or published studies. Additional

research and guidance is necessary however, to fully operationalize this framework for these

purposes.

55

3.6 Conclusion

In conclusion, a conceptual framework for bias in NRS containing six overarching domains

was developed. It contains 37 distinct items or sources of bias that were synthesized from

seven reviews of quality assessment tools for NRS.

56

Chapter 4 Common Methods for Chapters 5 & 6

4.1 Overview

The overarching objective of this thesis was to study bias in surgical NRS and RCTs. The

advent of laparoscopy provided us with such an opportunity; unlike most surgical

procedures, laparoscopic colon surgery has been thoroughly evaluated using both

randomized and non-randomized study designs (Howes et al. 1997; Wente et al. 2003). In the

1990’s, clinical equipoise existed between laparoscopy and conventional surgery for the

treatment of colon cancer. NRS comparing the two surgical approaches first appeared in

1993 (Falk et al. 1993). Numerous early case reports however detailed the development of

port-site metastases following laparoscopic resections (Nduka et al. 1994; Wexner and

Cohen 1995; Montorsi et al. 1995; Kazemier et al. 1995; Zmora, Gervaz, and Wexner 2001).

These reports led many to believe that laparoscopy was inferior to “open” or conventional

surgery. Others, however, disagreed. By 1995, the results of the first RCT comparing

laparoscopy with open surgery were published (Lacy et al. 1995). Numerous NRS and RCTs

have since established the safety of laparoscopy for the treatment of colon cancer (Abraham

et al. 2007; Schwenk et al. 2005; Kuhry et al. 2008). However, our aim was not to derive

clinical conclusions about the adequacy of laparoscopy but instead, to study bias. Therefore,

this surgical procedure was chosen for a case study of bias because many NRS and RCTs

specifically comparing laparoscopy with conventional surgery are available.

57

Specific Aims #2 and #3 have been reiterated below to place the following methods in context:

Specific Aim #2 (Chapter 5)

To compare effect estimates from NRS with those from RCTs at low risk of bias.

Specific Aim #3 (Chapter 6)

To explore the impact of NRS-design attributes on estimates of treatment effect.

4.2 Literature search

All studies comparing laparoscopic resections with conventional surgery for colon cancer

were identified in Medline (1950-2010) and EMBASE (1980-2010). Comprehensive search

strategies were separately devised for each database with the assistance of an information

specialist (Appendix B). Search terms included “colon cancer,” “laparotomy”, “open

surgery” and “laparoscopy.” The reference lists of articles retrieved were also manually

searched to identify relevant studies. Titles and abstracts were reviewed for eligibility using

EndNote™ bibliographic management software (Thomson Reuters, New York, NY, USA).

Studies were included if they fulfilled the following a priori inclusion criteria: i) surgery

undertaken in an elective setting (versus emergency operations); ii) publication in a peer-

reviewed journal and iii) publication in English. Patients who undergo emergency operations

often have a lower baseline functional status and higher comorbidity burden than elective

surgery patients (Ingraham et al. 2012). Not surprisingly, emergency colon surgery is

associated with higher rates of post-operative complications (Kirchhoff, Clavien, and

Hahnloser 2010), mortality (Leung et al. 2009) and longer hospital admission (Kelly et al.

2012). To minimize between-study clinical heterogeneity, only articles describing the

outcomes of elective procedures were included. Publications in English were reviewed for

pragmatic reasons; notably, Juni et al. have shown that excluding non-English studies from

meta-analyses can have minimal impact on summary effect estimates (Juni et al. 2002).

58

Studies that met the following exclusion criteria were not considered further: i) no clinical

outcomes reported (e.g. studies reporting biochemical outcomes only); ii) animal studies and

iii) systematic reviews or meta-analyses. For all abstracts that met inclusion/exclusion

criteria or were potentially eligible, full articles were retrieved.

4.3 Data abstraction and management

Data were abstracted using a pretested and standardized data collection form. DRU and LS

pilot tested the form using articles comparing laparoscopy with open surgery for the

treatment of diverticulitis, another disease of the colon. The form was revised in an iterative

manner to eliminate ambiguity.

Data abstracted from each article included the following (Table 4.1): i) study design (NRS or

RCT); ii) article attributes (year of publication, journal, length); iii) author characteristics

(number, involvement of a consortium or methodological expert); iv) study attributes

(country where the study took place, academic versus community setting); v) outcomes

reported; and vi) unadjusted outcome data (number of events and group size, mean and

standard deviation).

For RCTs, seven risk of bias items (random sequence generation, allocation concealment,

blinding of participants and personnel, blinding of outcome assessment, incomplete outcome

data, selective outcome reporting and other sources of bias) were abstracted. These seven

items collectively form the Cochrane Risk of Bias Tool (Higgins et al. 2011) (see Section

4.8.2 for additional information). For NRS, nine study characteristics (primary outcome,

prospective data collection, sample size calculation, concurrent controls, matched controls,

standardized concurrent therapy, systematic outcome assessment, blinded outcome

assessment, intention

59

Table 4.1 Definitions for abstracted variables.

Category Variable Definition/Guidance

Study Design

RCT A clinical trial in which participants were randomly assigned to either the experimental group (i.e. laparoscopy) or the control group (i.e. open surgery)

NRS A study in which patients received the experimental intervention (i.e. laparoscopy) or conventional treatment (i.e. open surgery) in a non-random manner

Article Attributes

Year of publication Calendar year in which the article was published Journal Name of the peer-reviewed periodical in which the article

appears Length Total number of pages

Author Characteristics

Number Number of named authors Consortium The involvement of an organization or association as a

named author Methodological expert An author was identified as a methodological expert if

he/she had an affiliation with a department of biostatistics, clinical epidemiology, health policy, public health or statistics

Study Attributes

Country If the country where the study took place was not specified in the “Methods,” the address of the corresponding author was used. If more than one country was specified in the “Methods,” the study was categorized as “Multinational”

Academic versus community A study was classified as taking place in an “Academic” setting if;

- for single institution studies, the hospital involved was affiliated with a university

- for multi-institutional studies, the primary investigator was affiliated with a university

Outcomes reported A list of all outcomes reported within the body of an article were abstracted

Outcome data

Binary outcomes Peri-operative mortality

- Number of deaths respectively in the laparoscopy and open surgery groups

- Number of participants in the laparoscopy and open groups

Post-Operative Complications - Number of patients experiencing a

complication in the laparoscopy and open surgerygroups

- Number of participants in the laparoscopy and open surgery groups

Continuous outcomes Length of Stay

- Mean number of days a patient remained hospitalized following surgery in the laparoscopy and open groups

- Standard deviation of the mean for each group Number of lymph nodes harvested

- Mean number of lymph nodes found within the surgical specimen in the laparoscopy and open surgery groups

- Standard deviation of the mean for each group

60

to treat analysis) were abstracted. Additional information about the selection, definition and

validation of study characteristics is available in Section 6.3.2.

All data were directly entered into a hierarchical, relational database constructed using

Microsoft Access™ (Microsoft, Seattle, Washington, USA) (Figure 3.1). Each study meeting

the inclusion/exclusion criteria was assigned a unique “study identification number” (SID)

that was used to link all nested, second-order forms and tables. Logic rules and data checks

were used to ensure data accuracy.

Figure 4.1 Database Structure. SID, study identification number. *Post-operative complications, peri-operative mortality, length of stay and number of lymph nodes harvested. See Section 4.5 for more detail about outcome selection.

4.4 Categorizing studies as RCTs or NRS

Many study design classification tools are limited by fair inter-rater reliability and low

accuracy (Furlan 2006; Hartling et al. 2010; Hartling et al. 2011). For example, in a study by

Furlan and colleagues, the kappa statistic was 0.53 (95% CI, 0.49-0.67) among senior

reviewers classifying a given study as a RCT, controlled-clinical trial, prospective cohort

study, retrospective cohort study, case-control study, cross-sectional study, case series, or

SID

Outcomes of Interest* Other OutcomesStudy-level

Characteristics

1:1 1:4 1:∞

61

case report (Furlan 2006). The kappa statistic was 0.46 (95% CI, 0.37-0.47) among junior

reviewers. Hartling and colleagues similarly found fair inter-rater reliability (kappa=0.45)

among six reviewers using a modified version of the Cochrane Non-Randomized Studies

Methods Group (NRSMG) design algorithm (Hartling et al. 2010). When reviewers’

classifications were compared with a reference standard, the assessments had low accuracy

(Hartling et al. 2011). It possible that low inter-rater reliability and low accuracy of these

approaches is due to the

In light of these findings, we chose to classify the design of comparative studies in our data

set as follows: i) RCTs or ii) NRS. A RCT was defined as a clinical trial in which

participants were randomly assigned to either the experimental group (i.e. laparoscopy) or

the control group (i.e. open surgery). In contrast, a NRS was defined as a study in which

patients received the experimental intervention (i.e. laparoscopy) or conventional treatment

(i.e. open surgery) in a non-random manner (Reeves B.C. et al. 2011). We believe this

classification scheme is far simpler than the approach identified by AD Furlan (Furlan 2006)

or any of the 10 classification tools evaluated by Hartling et al. (Hartling et al. 2010; Hartling

et al. 2011). Moreover, Furlan and colleagues have found that for RCTs, it is “relatively

simple to assign a label. It is based on direct observation, i.e. if the word “randomized” (or its

variations) appears in the study suggesting that subjects were assigned at random to the

intervention and control groups.” Therefore, we believe our approach to study classification

is likely to be simple and accurate.

4.5 Outcome selection and definition

All outcomes reported in NRS and RCTs were abstracted and tabulated. For the current case

study of bias, we chose to focus our attention on outcomes that were both of clinical

significance and frequently reported. Of the over 152 outcomes reported, we selected four

outcomes for our analyses: i) post-operative complications; ii) peri-operative mortality;

62

iii) length of stay and iv) number of lymph nodes harvested. “Post-operative complications”

was defined as the number of patients in either the control (i.e. open surgery) or intervention

(i.e. laparoscopic surgery) arm experiencing a complication within the first 30 days following

surgery. Similarly, the number of patients who died in each group within the first 30 post-

operative days contributed to the definition of peri-operative mortality. “Length of stay” was

considered the amount of time the patient remained hospitalized following surgery. The

“number of lymph nodes harvested” was defined as the number of nodes found within the

surgical specimen when examined by a pathologist.

4.5.1 Subjective versus objective outcomes

In a previous study examining the association between study design attributes and bias in

RCTs, Woods et al. categorized outcomes as objective or subjective (Wood et al. 2008).

They classified mortality and standardized laboratory procedure measures (e.g. hemoglobin

concentration) as objective outcomes whereas subjective outcomes included patient-reported

outcomes and physician-assessed disease outcomes (e.g. wound infection, pneumonia and

other complications). They found a number of design attributes (e.g. allocation concealment)

were associated with biased effect estimates among subjective but not objective outcomes. In

a larger meta-epidemiological study, Savović et al. reached the same conclusion.

Accordingly, we chose to categorize the outcomes of interest (i.e. post-operative

complications, peri-operative mortality, length of stay and number of lymph nodes

harvested) as subjective or objective as well, using the same criteria employed by

Wood et al. and Savović et al. Consensus was also achieved among our research group

regarding these categorizations.

For our analyses, an outcome can be considered “subjective” for three distinct reasons.

Firstly, if multiple definitions of the outcome exist, subjectivity is introduced when choosing

one definition over another. For example, the Centres for Disease Control definition of

63

pneumonia (NHSN Patient Safety Component Manual. National Healthcare Safety Network.

Centres for Disease Control and Prevention) differs from the ACS National Surgical Quality

Improvement Program definition (ACS NSQIP - Classic Variables and Definitions, Chapter

4). In a hypothetical study of intensive care (ICU) patients, the physicians at hospital X

might use the former definition whereas the physicians at hospital Y, the latter.

Consequently, a judgment is being made as to which definition to use, making “pneumonia”

a subjective outcome. Height (in cm) on the other hand is an objective outcome since the

definition of a centimeter is an internationally standardized measure (Wandmacher and

Johnson 1995).

Second, an outcome may be deemed subjective due to variation in the processes used to

eventually assess its occurrence. For example, consider a study of post-operative patients

where the outcome of interest is the development of a clot in an extremity (i.e. a deep vein

thrombosis, DVT). Some patients with a DVT have associated redness and swelling in their

extremity which prompts the treating physician to order a test confirming the presence or

absence of a DVT. However, many patients have DVTs that are subclinical and have no

associated symptoms (Kelly et al. 2001). This does not mean these patients do not have

DVTs - if the appropriate test was ordered, the results would be positive. Ordering the test to

verify the presence of a DVT however, is subject to the judgment or discretion of the treating

physician. The opportunity to assess the outcome is dependent on judgment.

Third, once the opportunity to assess the outcome arises, if the assessment itself can be

influenced by personal interpretation or opinion, then the outcome is subjective. Consider the

following scenario: in a study of peri-operative antibiotics, the definition of wound infection,

the primary outcome, might be standardized to include the presence of skin erythema (i.e.

redness). Physician A might regard the redness around patient X’s wound as evidence of an

infection. Alternatively, physician B might look at the same redness and believe that an

allergic reaction to the overlying tape has occurred. Even though both physicians are using

the identical definition of a wound infection and are looking at the same phenomena (i.e. the

redness), they are interpreting the situation in two different ways. Consequently, physician A

64

might record a positive outcome whereas physician B would not. Again, judgment plays a

central role in determining whether the outcome has occurred.

In summary, an outcome is subjective if i) there are multiple definitions available;

ii) judgment determines the opportunity to assess the outcome or iii) discretion/interpretation

is implicit in the application of the definition. Therefore, post-operative complications and

length of stay were considered subjective outcomes since both can be influenced by

physicians’ discretion. On the other hand, peri-operative mortality was classified as an

objective outcome, as was the number of lymph nodes harvested.

4.5.2 Summary effect measures

A priori decisions were made to express binary outcomes as odds ratios (OR) and continuous

outcomes as mean differences (MD). When combining binary outcomes across multiple

studies, one can choose from among the following three summary effect measures: risk

difference (RD, also referred to as the absolute risk reduction, ARR), the risk ratio (RR, also

called the relative risk) and the OR (Walter 2000). The former is considered an absolute

measure of effect whereas the latter two are relative measures.

Risk is the probability with which an outcome will occur (Altman, Egger, and Smith 2001).

If the risk of an event occurring is 0.1 or 10%, then 10 out of 100 people will experience the

outcome. The RR is the risk of the outcome in the exposed or treatment group divided by the

risk in the unexposed or control group. A RR of 0.2 implies that the risk of the outcome in

the treatment group is 20% of the risk in the control group. The RD is simply the difference

in risk between the treatment group and the control group. The odds of an event however is

the probability an event will occur divided by the probability that it will not occur (Walter

2000). If the odds of an event are 0.33, then for every one person experiencing an outcome,

three will not.

65

When choosing a summary statistic, it is important to weigh the consistency and the

mathematical properties of the measure with the ease of its interpretation (Altman, Egger,

and Smith 2001). If the heterogeneity in a meta-analysis increases because of the summary

measure chosen, the measure has low consistency. Empirical work by Deeks et al. has

demonstrated that meta-analyses using the RD have higher heterogeneity than when the RR

is computed for the same meta-analyses (Deeks 2002). When comparing RR with OR,

comparable levels of heterogeneity were found. Accordingly, the RR and OR are generally

regarded as more consistent than the RD.

The OR or log(OR) have superior mathematical properties as compared with the RR. In

particular, the OR and its log are considered symmetrical summary effect measures (Walter

2000). For example, if the OR for death in a study is 0.2/0.4=0.50, then the OR for survival

is 2.0. The latter OR is simply the reciprocal of the former. Moreover, if the log(OR) is used,

only a change in sign is required.

Several studies have demonstrated that OR are often misinterpreted as RR and that of the

measures discussed, the OR is the most difficult to intuit (Montreuil, Bendavid, and Brophy

2005; Davies, Crombie, and Tavakoli 1998; Grimes and Schulz 2008; Sinclair and Bracken

1994). With rare events, the difference between the OR and the RR is small but as events

become more common, this gap widens (Montreuil, Bendavid, and Brophy 2005). For

example, consider the following experimental scenario where the outcome was rare

(i.e. ≤10%):

Group Outcome

Total Odds Yes No Experimental 1 20 21 1:20 or 0.05

Control 1 25 26 1:25 or 0.04

The OR in this instance is 0.05/0.04 = 1.25. The RR is (1/21)/(1/26)=1.24. Alternatively, one

could use the using the following formula (Sinclair and Bracken 1994) to calculate the RR:

66

𝑅𝑅 = 𝑂𝑅1+Ic(OR−1)

(3.1)

Ic is the incidence of the event in the control group. In another experiment, events (e.g. death)

were common:

Group Outcome

Total Odds Yes No Experimental 5 1 6 5:1 or 5.0

Control 10 1 11 10:1 or 10.0

In this example, the OR is 5.0/10.0=0.5. However, the RR is equal to (5/6)/(10/11)=0.92 .

When the event became more common, interpreting the OR as a RR would have

considerably overestimated the benefit associated with treatment.

The analyses in this thesis were designed to provide insight into the relationship between

study design and bias. The superior clinical intuitiveness of the RR was therefore non-

contributory. Binary outcomes (i.e. post-operative complications and peri-operative

mortality) were thus expressed as OR because this effect measure has superior consistency

and symmetry as compared with the RD and RR.

For continuous outcomes, the MD was chosen as the summary effect measure instead of the

standardized mean difference (SMD) or the ratio of means (RoM). The MD is the difference

in means between two groups whereas the SMD is equal to this difference divided by the

pooled standard deviation. The RoM is equal to the mean in the experimental group divided

by the mean in the control group. SMD is the measure of choice when the outcome is likely

to be measured in different ways (e.g. using different psychometric scales) across included

studies (Higgins, Green, and Cochrane Collaboration. 2011). The SMD, MD and RoM have

similar consistency (Friedrich, Adhikari, and Beyene 2011) but the MD is the most intuitive

of the three measures. Since the outcomes length of stay and number of lymph nodes

harvested were likely to be measured in days and integers respectively, there was no

advantage in using the SMD. Ultimately, the MD was chosen because the statistical software

67

used (R, version 2.15.0, R Foundation for Statistical Computing, Vienna, Austria) has the

graphing capacity to generate funnel plots with MD but not RoM (Viechtbauer 2010).

4.6 Handling multiple publications of the same cohort

We anticipated encountering multiple publications providing results for the same cohort of

study subjects. In these instances, articles were combined into a data group. Article

attributes, author characteristics and study attributes were abstracted from the earliest

publication in a data group. Outcome data for all-cause mortality, post-operative

complications, LOS and number of LN harvested were separately abstracted from the

publication that provided the most complete information (i.e. for the largest number of

patients).

4.7 Approach to missing data for continuous outcomes

Measures of centrally tendency for continuous variables include the mean, median and mode.

Authors will report either the mean and standard deviation when data are normally

distributed or the median and IQR if data are skewed (Fletcher and Fletcher 2005). During

data abstraction, it was often noted that either the mean or median were reported for LOS and

number of LN harvested. To combine results across studies, means and associated standard

deviations were required (Egger et al. 2003). In instances where only medians and ranges

were reported, medians were treated as means and standard deviations (σ) were calculated

using the following formulae (Hozo, Djulbegovic, and Hozo 2005):

𝜎 = IQR1.35

(3.2)

𝜎 = 𝑟𝑎𝑛𝑔𝑒4

(3.3)

68

IQR is equal to the inter-quartile range. The term range above is equal to the difference

between the lowest value subtracted from the highest.

Occasionally the means for both groups (laparoscopy and open surgery) were provided with

no measure of dispersion. If the p-value for the difference of means was reported, the

missing standard deviation (σ) was calculated using the following steps, assuming that the p-

value arose from a two-tailed, equal variances, Student’s t-test comparison; first, the absolute

value of the t-statistic (t), that corresponds to the two-sided p-value (p) from groups of size n1

and n2, was obtained:

𝑡 = |𝑡𝑑𝑓=𝑛1+𝑛2−2−1 �𝑝

2� | (3.4)

When the t-statistic, sample sizes and group means (�̅�1 and �̅�2) are known, then only the σ is

unknown in the following formula:

𝑡 = |�̅�1−�̅�2|

𝜎� 1𝑛1+ 1𝑛2

(3.5)

Solving for the standard deviation (σ):

𝜎 = |�̅�1−�̅�2|

𝑡� 1𝑛1+ 1𝑛2

(3.6)

For studies that did not adhere to the intention to treat principle, laparoscopy patients whose

operations were converted to open procedures occasionally had their outcomes reported

separately. Alternatively, outcomes were reported for the laparoscopy and open surgery

arms of a study across two time periods. In instances where means were combined across

two or more groups, the following calculations were employed to generate the weighted

mean (µ):

𝜇 = ∑ 𝑛𝑖�̅�𝑖𝑘𝑖∑ 𝑛𝑖𝑘𝑖

(3.7)

69

ni represents the number of patients in a given group and �̅�𝑖 the mean in this group. In order

to calculate the weighted standard deviation (σcalculated) (3.11), one employed the following

formulae to in turn calculate the sums of squares between groups (SSbetween) (3.8), the sums of

squares within a group (SSwithin) (3.9) and the combined sums of squares (SSTotal)

(3.10)(Pagano and Gauvreau 2000):

𝑆𝑆𝑏𝑒𝑡𝑤𝑒𝑒𝑛 = ∑ 𝑛𝑖(�̅�𝑖 − μ)2𝑘𝑖 (3.8)

𝑆𝑆𝑤𝑖𝑡ℎ𝑖𝑛 = ∑ (𝑛𝑖 − 1)𝜎𝑖2𝑘

𝑖 (3.9)

𝑆𝑆𝑇𝑜𝑡𝑎𝑙 = 𝑆𝑆𝑏𝑒𝑡𝑤𝑒𝑒𝑛 + 𝑆𝑆𝑤𝑖𝑡ℎ𝑖𝑛 (3.10)

𝜎𝑐𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑒𝑑 = �𝑆𝑆𝑇𝑜𝑡𝑎𝑙∑ (𝑛𝑖−1)k𝑖

(3.11)

Finally, in instances where none of the aforementioned methods could be used to calculate

missing standard deviations, simple imputation was employed (Allison 2002). Using formula

(3.12), the value for the missing standard deviation (σmissing) was calculated from the standard

deviations (σi) and sample sizes (ni) available:

Study n σ

1 n1 σ1

2 n2 σ2

3 n3 σmissing

… … …

k ni σi

𝜎𝑚𝑖𝑠𝑠𝑖𝑛𝑔 = �∑ (𝑛𝑖−1)(σ𝑖)2𝑘𝑖=1

(𝑛𝑖−1)2 (3.12)

The ni in this instance is the number of patients in a given arm of a study – either the

laparoscopy or open surgery arm.

70

4.8 Identifying a referent group – Strong RCTs

4.8.1 Why categorize RCTs as Typical versus Strong?

The analyses of Chapters 5 and 6 required the identification of an appropriate referent group.

The objective of Specific Aim #2 (Chapter 5) was to compare the results of NRS with RCTs.

However, it has been shown that RCTs without appropriate allocation concealment (Schulz

et al. 1995), (Moher et al. 1998), (Kjaergard, Villumsen, and Gluud 2001; Pildal et al. 2007;

Nuesch, Reichenbach, et al. 2009), blinding (Schulz et al. 1995), (Moher et al. 1998),

(Kjaergard, Villumsen, and Gluud 2001; Pildal et al. 2007; Nuesch, Reichenbach, et al. 2009;

Hrobjartsson et al. 2012, 2013), and selective outcome reporting (Chan et al. 2004) are likely

biased. If we compared the results of NRS with those from all the RCTs, we would have

been using a heterogeneous referent group of studies at varying risks of bias. Instead, we

compared summary effect estimates from NRS with those from the low risk of bias RCTs

(i.e. Strong RCTs). The Cochrane Risk of Bias Tool was used to identify Strong RCTs

according to the methods outlined in Section 4.8.2.

The goal of Specific Aim #3 (Chapter 6) was to model the bias associated with NRS design

attributes. Comparing effect estimates from NRS without a design attribute (e.g. concurrent

controls) to those from NRS with the attribute would assume that the effect estimate from the

latter is the most accurate one available – the one closest to the proverbial “truth.” The

estimates of treatment effect from Strong RCTs were instead chosen as the referent group for

two reasons; there is empirical data to support the selection of certain RCTs as less biased

than others (Schulz et al. 1995; Moher et al. 1998; Kjaergard, Villumsen, and Gluud 2001;

Egger et al. 2003; Als-Nielsen et al. 2003; Tierney and Stewart 2005; Pildal et al. 2007;

Nuesch, Reichenbach, et al. 2009; Nuesch, Trelle, et al. 2009; Nuesch et al. 2010; Bassler et

al. 2010; Dechartres et al. 2011; Bafeta et al. 2012; Hrobjartsson et al. 2012, 2013) and

secondly, a RCT at low risk of bias is likely to provide an estimate of treatment effect that is

closest to the “truth.”

71

4.8.2 Cochrane Risk of Bias Tool

Strong RCTs were identified using the Cochrane Risk of Bias Tool. This tool is endorsed by

the Cochrane Collaboration for the evaluation of internal validity of RCTs (Higgins, Green,

and Cochrane Collaboration. 2011). In 2005, members of the Cochrane Bias Methods Group

and the Cochrane Statistical Methods Group developed the first version of the tool. The

development process involved the compilation of potential sources of bias, a review of the

available empirical evidence, informal consensus for the selection and operationalization of

bias domains, the incorporation of feedback in an iterative manner and pilot testing. An

additional three stage project involving focus groups, online surveys and a consensus

meeting were undertaken in 2009 to evaluate the original tool. An updated version of the

instrument was released in 2011 (Higgins et al. 2011).

This instrument is composed of seven items classified under six domains of bias (Table 4.2).

The domains include selection bias, performance bias, detection bias, attrition bias, reporting

bias and other bias. Items appear in the first column of Table 4.2. Of note, some items are

assessed at the study-level whereas others require a separate assessment for each outcome.

Table 4.2 Cochrane Risk of Bias Tool

Item Support for judgement Domain Random sequence generation Describe the method used to generate the

allocation sequence in sufficient detail to allow an assessment of whether it should produce comparable groups.

Selection bias (biased allocation to interventions) due to inadequate generation of a randomised sequence.

Allocation concealment Describe the method used to conceal the allocation sequence in sufficient detail to determine whether intervention allocations could have been foreseen in advance of, or during, enrolment.

Selection bias (biased allocation to interventions) due to inadequate concealment of allocations prior to assignment.

Blinding of participants and personnel Assessments should be made for each main outcome (or class of outcomes).

Describe all measures used, if any, to blind study participants and personnel from knowledge of which intervention a participant received. Provide any information relating to whether the intended blinding was effective.

Performance bias due to knowledge of the allocated interventions by participants and personnel during the study.

Blinding of outcome assessment Assessments should be made for

Describe all measures used, if any, to blind outcome assessors from knowledge of which intervention a participant received. Provide any information relating to whether the

Detection bias due to knowledge of the allocated interventions by outcome assessors.

72

Adapted from Table 8.5.a, Higgins JPT, Altman DG, Sterne JAC (editors). Chapter 8: Assessing risk of bias in included studies. In: Higgins JPT, Green S (editors). Cochrane Handbook for Systematic Reviews of Interventions Version 5.1.0 (updated March 2011). The Cochrane Collaboration, 2011. Available from www.cochrane-handbook.org.

Each item in the tool is assessed in two steps; summarizing known facts followed by a

judgment. First, free text descriptions or pertinent quotes about what took place are

compiled from published reports, protocols or correspondence with authors. Providing this

supporting information augments the transparency of the assessment process. Second,

specific and detailed criteria are used to assign a judgment of low/unclear/high risk of bias

for each item. These criteria are outlined in the Cochrane Handbook for Systematic Reviews

of Interventions (Appendix C) (Higgins, Green, and Cochrane Collaboration. 2011). The

judgments for each item are used to determine if the study on the whole is at

low/unclear/high risk of bias according to the approach outlined in Table 4.3. Study

assessments are in turn used to determine if a meta-analysis of the studies is at

low/unclear/high risk of bias. The role of judgment is clearly central to the Cochrane Risk of

Bias Tool. It is for this reason the Collaboration suggests that judgments be made

independently by at least two people, with discrepancies resolved via discussion.

each main outcome (or class of outcomes).

intended blinding was effective.

Incomplete outcome data

Assessments should be made for each main outcome (or class of outcomes).

Describe the completeness of outcome data for each main outcome, including attrition and exclusions from the analysis. State whether attrition and exclusions were reported, the numbers in each intervention group (compared with total randomized participants), reasons for attrition/exclusions where reported, and any re-inclusions in analyses performed by the review authors.

Attrition bias due to amount, nature or handling of incomplete outcome data.

Selective reporting State how the possibility of selective outcome reporting was examined by the review authors, and what was found.

Reporting bias due to selective outcome reporting.

Other sources of bias State any important concerns about bias not addressed in the other domains in the tool.

If particular questions/entries were pre-specified in the review’s protocol, responses should be provided for each question/entry.

Bias due to problems not covered elsewhere in the table.

http://www.cochrane-handbook.org/


73

Table 4.3 Approach for summary assessments of risk of bias for an item, within a study and within a meta-analysis

Risk of bias Interpretation Judgement within a study Judgement in a meta-

analysis (across studies) Low risk Plausible bias unlikely to

seriously alter the results. Low risk of bias for all key domains.

Most information is from studies at low risk of bias.

Unclear risk Plausible bias that raises some doubt about the results.

Unclear risk of bias for one or more key domains.

Most information is from studies at low or unclear risk of bias.

High risk Plausible bias that seriously weakens confidence in the results.

High risk of bias for one or more key domains.

The proportion of information from studies at high risk of bias is sufficient to affect the interpretation of results.

Adapted from Table 8.7.a, Higgins JPT, Altman DG, Sterne JAC (editors). Chapter 8: Assessing risk of bias in included studies. In: Higgins JPT, Green S (editors). Cochrane Handbook for Systematic Reviews of Interventions Version 5.1.0 (updated March 2011). The Cochrane Collaboration, 2011. Available from www.cochrane-handbook.org.

The Cochrane Risk of Bias Tool was used to identify Strong RCTs instead of other

instruments for three reasons; firstly, the Cochrane tool contains items used to evaluate the

internal validity of a study whereas other tools often have elements relating to external

generalizability, precision, ethics, or reporting (Katrak et al. 2004). For example, the Jadad

scale awards one point to trials that are “described as randomized” (Jadad et al. 1996). This

item does not focus on whether appropriate methods of randomization were used but instead

on whether the authors employed the terms randomly, random or randomization. In other

words, while the Cochrane Risk of Bias Tool focuses specifically on bias, other instruments

occasionally evaluate other aspects of quality. Second, the Cochrane Risk of Bias Tool was

developed using rigorous methods and pilot tested using studies drawn from multiple areas

of medicine (Higgins et al. 2011). In contrast, the Jadad Scale was developed using trials

reporting pain outcomes or analgesic interventions for outcomes other than pain (i.e. adverse

event profile). Third, the Cochrane Risk of Bias Tool does not produce a summary numerical

score in contrast to other available instruments. It has been demonstrated that the use of

summary scores often leads to inconsistent identification of low risk of bias RCTs (Juni et al.

1999; Herbison, Hay-Smith, and Gillespie 2006). This may be attributable to the fact that

many scales assign weights to included dimensions in an ad hoc manner (Juni et al. 1999).



74

The main limitation of the Cochrane Risk of Bias Tool is variable inter-rater reliability;

weighted kappas range from 0.13 (95% CI, -0.05 to 0.31) for selective outcome reporting to

0.74 (95% CI, 0.64 to 0.85) for sequence generation (Hartling et al. 2012). However,

Hartling and colleagues examined an earlier version of the Cochrane Risk of Bias Tool. In

the interim, the instrument has been revised to diminish ambiguity and has more detailed

guidance for making individual domain assessments (Higgins et al. 2011). This more recent

version of the tool was used in this study.

Strong RCTs were defined a priori as trials that were rated at a low risk of bias for five of the

seven bias items within the Cochrane Risk of Bias Tool; random sequence generation,

allocation concealment, incomplete outcome data, selective outcome reporting, and other

sources of bias. Given how blinding is rare in studies of non-pharmacological interventions

(Boutron et al. 2004), the two blinding items were not included in the definition of “Strong”

RCTs for our surgical data set. Considering we wanted to use this tool to discriminate

between Typical and Strong RCTs, using an item that is consistently lacking among all

studies would not help divide trials into those that are Strong versus Typical. “Other” sources

of bias was defined a priori to include violating intention to treat principles; studies that

erroneously included converted laparoscopic patients with those in the open surgery group

were considered at high risk of bias. Typical RCTs were defined as those not meeting the

criteria for Strong RCTs. Assessments were made using published reports, protocols and by

contacting authors when information was unavailable.

4.8.3 Validating risk of bias assessments

All risk of bias assessments for RCTs reporting the outcomes of interest were performed by

LS. A second individual (DRU) independently assessed the risk of bias assessments for

RCTs reporting post-operative complications (n=20 trials). A Cohen’s kappa statistic was

75

generated for the agreement between two-raters for classification of studies as Typical or

Strong. There was perfect agreement (κ=1.00).

4.9 Statistical analyses

Descriptive statistics were calculated to compare NRS and RCTs in terms of year of

publication, number of participants, academic versus community setting, presence of a

consortium among authors, number of named authors, methodological expertise among

authors, length of articles and baseline event rate (or mean) in control groups. Absolute and

relative frequencies were measured for discrete variables and where appropriate, medians

and IQRs were calculated for continuous variables with a non-normal distribution. Medians

were compared using the Mann-Whitney U test and categorical variables using the Chi-

square or Fisher’s exact test, as appropriate (Pagano and Gauvreau 2000). A p-value <0.05

was considered significant. All data were analyzed using R, version 2.15.0 (R Foundation for

Statistical Computing, Vienna, Austria).

4.10 Results

4.10.1 Data cohort

Once duplicates were removed, 7528 distinct abstracts remained (Figure 4.2). A further 7203

abstracts were excluded. Of the 325 abstracts focusing on laparoscopy versus open surgery

for colon cancer, 133 were excluded; 50 were systematic reviews or meta-analyses, 43 were

studies that provided only biochemical outcomes (e.g. pre- and post-operative IL-6 levels),

and 40 were written in a language other than English (Appendix D). The remaining 192

studies met the a priori inclusion criteria (Table 4.4).

76

Figure 4.2 Flow diagram for the identification of eligible studies.

A total of 144 NRS and 48 RCTs met the inclusion criteria (Appendix E). Included NRS are

described in Table 4.5. Table 4.6 describes included RCTs. Once multiple articles of the

same cohort were combined, 141 NRS (1,179,792 patients) and 26 RCT (14,843 patients)

data groups remained. The data groups were published across 50 journals and between 1993

and 2010. The number of authors varied from 1 to 14 (median, 5) but few studies had ≥1

author affiliated with a department of biostatistics, public health, health policy or

epidemiology (16%). Studies took place in 22 countries; most frequently represented nations

include the United States (30%), Italy (10%) and the United Kingdom (10%). The majority

of studies were conducted in academic settings (81%). The median number of study subjects

77

for NRS was 121 (IQR 54-262, range 14-643,700) and RCTs enrolled a median of 116

patients (IQR 60-326, range 29-1082).


NRS (N=141)

RCTs (N=26) p-value•

Year of publication * 2006 (2000-2009)

2004 (2001-2007) 0.27†

Participants* 121 (54-262)

116 (60-326) 0.89†

Authors

Number* 5 (4-7)

6 (5-8) 0.03†

Consortium among authors§ 2 (1.4)

5 19.2) <0.001♦

Methodological expertise§ 16 (0.7)

9 (34.6) <0.001◊

Academic setting§ 106 (75.2)

23 (88.5) 0.22♦

Number of pages* 6 (5-8)

7 (6-9) 0.02†

* Median, (Interquartile Range, IQR). § Number (percentage). † Medians compared using the Mann-Whitney U test. ♦ Frequencies compared using Fisher’s exact test. ◊ Frequencies compared using Chi-square test. • Statistically significant p values (<0.05) indicated in bold.

78

Table 4.5 Non-randomized studies meeting inclusion criteria

First Author Year Country Journal LAP* OPEN* Senagore, A 1993 United States American Surgeon 38 102 Tate, J 1993 Hong Kong British Journal of Surgery 11 14 Falk, PM 1993 United States Diseases of the Colon & Rectum 54 42 Peters, W 1993 United States Diseases of the Colon & Rectum 24 33 Gray, D 1994 United States Journal of Surgical Oncology 22 35 Van Ye, T 1994 United States Surgical Laparoscopy & Endoscopy 14 20 Musser, D 1994 United States Surgical Laparoscopy & Endoscopy 24 24 Hoffman, G 1994 United States Annals of Surgery 80 53 Franklin, M 1995 United States Surgical Endoscopy 84 84 Ramos, J 1995 United States Diseases of the Colon & Rectum 95 105 Saba, A 1995 United States Annals of Surgery 25 25 Ou, H 1995 United States Diseases of the Colon & Rectum 12 12 Konishi, F 1996 Japan Japanese Journal of Surgery 20 47 Begos, D 1996 United States Surgical Endoscopy 50 34 Franklin, M 1996 United States Diseases of the Colon & Rectum 191 224 Bokey, E 1996 Australia Diseases of the Colon & Rectum 28 33 Fleshman, J 1996 United States Diseases of the Colon & Rectum 54 35 Gellman, L 1996 United States Surgical Endoscopy 24 33 Hotokezaka, M 1996 United States Surgical Endoscopy 7 7 Goh, Y 1997 Singapore Diseases of the Colon & Rectum 20 20 Leung, K 1997 Hong Kong Archives of Surgery 50 50 Khalili, T 1998 United States Diseases of the Colon & Rectum 80 90 Psaila, J 1998 United Kingdom British Journal of Surgery 25 29 Bouvet, M 1998 United States American Journal of Surgery 91 57 Leung, K 1999 Hong Kong Journal of Surgical Oncology 28 56 Santoro, E 1999 Italy Hepato-Gastroenterology 36 36

79

Stewart, B. T 1999 Australia British Journal of Surgery 42 35 Schwandner, O 1999 Germany International Journal of Colorectal Disease 32 32 Kakisako, K 2000 Japan Surgical Laparoscopy, Endoscopy & Percutaneous

Techniques 20 23 Lezoche, E 2000 Italy Hepato-Gastroenterology 150 160 Chen, W 2000 China Formosan Journal of Surgery 27 42 Marubashi, S 2000 Japan Surgery Today 40 28 Stocchi, L 2000 United States Diseases of the Colon & Rectum 42 42 Hartley, J. E 2000 United Kingdom Annals of Surgery 53 41 Chen, H 2000 United States Diseases of the Colon & Rectum 83 83 Nishiguchi, K 2001 Japan Diseases of the Colon & Rectum 15 12 Lezoche, E 2001 Italy Journal of the Laparoendoscopic & Advanced Surgical

Techniques 207 153 Hong, D 2001 Canada Diseases of the Colon & Rectum 98 219 Yamamoto, S 2001 Japan Hepato-Gastroenterology 43 43 Mall, J. W 2001 Germany British Journal of Surgery 32 52 Law, WL 2002 Hong Kong Journal of the American College of Surgeons 65 89 Lezoche, E 2002 Italy Surgical Endoscopy 140 107 Braga, M 2002 Italy Surgical Endoscopy 26 26 Vasilev, K 2002 Bulgaria Acta Chirurgica Iugoslavica 31 36 Lujan, H 2002 United States Diseases of the Colon & Rectum 102 233 Feliciotti, F 2002 Italy Surgical Endoscopy 74 75 Feliciotti, F 2002 France Surgical Laparoscopy & Endoscopy 74 83 Kasparek, MS 2003 Germany Journal of Gastrointestinal Surgery 11 9 Kayser, J 2003 Luxembourg Bulletin de la Societe des Sciences Medicales du Grand-

Duche de Luxembourg 76 27 Inoue, Y 2003 Japan Surgical Endoscopy 32 30 Ma, H 2003 China Formosan Journal of Surgery 58 31 Lezoche, E 2003 Italy Minerva Chirurgica 310 159

80

Sklow, B 2003 United States Surgical Endoscopy 77 77 Patankar, S. K 2003 United States Diseases of the Colon & Rectum 172 172 Senagore, A 2003 United States Archives of Surgery 231 245 Delaney, C 2003 United States Annals of Surgery 150 150 Adahci, Y 2003 Japan Hepato-Gastroenrology 26 87 Kojima, M 2004 Japan Surgery Today 118 163 Baker, RP 2004 United Kingdom Diseases of the Colon & Rectum 33 66 Neri, V 2004 Italy Annali Italiani di Chirurgia 7 10 Capussotti, L 2004 Italy Surgical Endoscopy 74 181 Zheng, M 2005 China World Journal of Gastroenterology 30 34 Delaney, C 2005 United States Diseases of the Colon & Rectum 94 94 Vignali, A 2005 Italy Diseases of the Colon & Rectum 61 61 Sahakitrungruang, C 2005 Thailand Journal of the Medical Association of Thailand 24 25 Pokala, N 2005 United States Surgical Endoscopy 34 34 Law, WL 2006 Hong Kong Diseases of the Colon & Rectum 98 167 Lezoche, E 2006 Italy Surgical Endoscopy 85 64 Del Rio 2006 Italy Minerva Chirurgica 27 25 Aboulian, A 2006 Italy Minerva Chirurgica 147 25 Nakamura, T 2006 Japan Hepato-Gastroenterology 59 59 Feng, B 2006 China Aging - Clinical and Experimental Research 51 102 MacKay, G 2006 United Kingdom Colorectal Disease 22 58 Wahl, P 2006 Switzerland ANZ Journal of Surgery 187 215 Sample, CB 2006 Canada Surgical Endoscopy 21 21 Gonzalez, R 2006 United States Diseases of the Colon & Rectum 238 260 Ng, SSM 2006 Hong Kong Surgical Endoscopy 6 12 Salloum, RM 2006 United States Journal of the American College of Surgeons 14 54 Law, WL 2007 Hong Kong Annals of Surgery 255 401 Napolitano, L 2007 Italy Giornale di Chirurgia 73 141 Boni, L 2007 Italy Surgical Oncology 88 75

81

Choi, Y 2007 Korea Surgery Today 26 41 Osarogiagbon, RU 2007 United States Clinical Colorectal Cancer 39 55 McCloskey, CA 2007 United States Surgery 23 22 Tong, DKH 2007 Hong Kong Journal of the Society of Laparoendoscopic Surgeons 77 105 Salimath, J 2007 United States Journal of the Society of Laparoendoscopic Surgeons 68 179 Noblett, SE 2007 United Kingdom Surgical Endoscopy 30 30 Lordan, JT 2007 United Kingdom Colorectal Disease 109 44 Guo, D 2007 Australia ANZ Journal of Surgery 50 33 Lohsiriwat V 2007 Thailand World Journal of Surgery 13 21 Hinojosa, MW 2007 United States Journal of Gastrointestinal Surgery 190 3185 Park, JS 2007 Korea World Journal of Surgery 116 81 Law, WL 2008 Hong Kong Annals of Surgical Oncology 77 123 Mirza, MS 2008 United Kingdom Journal of Laparoendoscopic Surgery 116 117 Kemp, JA 2008 United States Surgical Innovation 27930 615722 Andersen, LPH 2008 Denmark Surgical Endoscopy 58 143 Bilimoria, KY 2008 United States Journal of Gastrointestinal Surgery 837 2222 Cermak, K 2008 Belgium Hepato-Gastroenterology 45 120 Varela, JE 2008 United States American Surgeon 3353 47090 Seitz, G 2008 Germany Surgical Laparoscopy, Endoscopy & Percutaneous

Techniques 39 38 Steele, SR 2008 United States Journal of Gastrointestinal Surgery 3296 95627 Imai, E 2008 Japan American Journal of Infection Control 231 75 Delaney, C 2008 United States Annals of Surgery 11044 21689 Buchanan, GN 2008 United Kingdom British Journal of Surgery 230 135 Ihedioha, U 2008 United Kingdom Surgical Endoscopy 32 61 Bilimoria, KY 2008 United States Archives of Surgery 11038 231381 Nakamura, T 2008 Japan World Journal of Surgery 101 43 Gameiro, M 2008 Germany Surgical Innovation 45 25 Kim, H 2009 Korea Surgical Endoscopy 37 50

82

Zmora, O 2009 Israel Surgical Endoscopy 227 103 Chikkappa, M 2009 United Kingdom International Journal of Colorectal Disease 57 49 Faiz, O 2009 United Kingdom Colorectal Disease 191 50 Yin, W 2009 China Hepato-Gastroenterology 32 30 Wilks, J 2009 United States American Journal of Surgery 60 60 Tan, W 2009 Singapore International Journal of Colorectal Disease 37 40 Scarpa, M 2009 Italy Surgical Endoscopy 21 21 Ptok, H 2009 Germany European Journal of Surgical Oncology 346 8307 Poon, J 2009 Hong Kong Annals of Surgery 296 715 Faiz, O 2009 United Kingdom Diseases of the Colon & Rectum 1095 60851 Shabbir, A 2009 Singapore ANZ Journal of Surgery 32 32 Kennedy, GD 2009 United States Annals of Surgery 2869 4800 Park, J 2009 Korea Surgical Laparoscopy, Endoscopy & Percutaneous

Techniques 119 145 Tei, M 2009 Japan Surgical Laparoscopy, Endoscopy & Percutaneous

Techniques 78 51 Nakamura, T 2009 Japan Surgery Today 100 100 Lin, JH 2009 United States Surgical Innovation 99 70 Kiran, R 2010 United States Archives of Surgery 143 143 Marshall, C 2010 United States American Journal of Surgery 33 17 Maeda, T 2010 Japan Surgical Endoscopy 32 43 Abdel-Halim, M 2010 United Kingdom Annals of the Royal College of Surgeons of England 22 34 Akiyoshi, T 2010 Japan Journal of Gastrointestinal Surgery 253 39 Balentine, C 2010 United States Journal of Surgical Research 42 113 da Luz Moreira, A 2010 United States Surgical Endoscopy 231 231 El-Gazzaz, G 2010 United States Surgical Endoscopy 243 486 Fujii, S 2010 Japan International Journal of Colorectal Disease 258 258 Han, K 2010 Korea Journal of the Korean Society of Coloproctology 35 55 Hemandas, A 2010 United Kingdom Annals of Surgery 224 200

83

Jiang, J 2010 China International Journal of Colorectal Disease 20 19 Kiran, R 2010 United States Journal of the American College of Surgeons 3414 7565 Kurian, A 2010 United States Journal of Surgical Education 150 95 Lian, L 2010 United States Surgical Endoscopy 97 97 Lloyd, G 2010 Multinational Surgical Endoscopy 97 97 Madbouly, K 2010 Egypt British Journal of Surgery 20 10 El-Gazzaz, G 2010 United States Surgical Endoscopy 1516 3528 Morris, E 2011 United Kingdom British Journal of Surgery 238 470

* Number of study subjects with colon cancer in each arm of the study

84

Table 4.6 Randomized controlled trials meeting inclusion criteria

First Author Year Country Journal LAP* OPEN* Lacy, A 1995 Spain Surgical Endoscopy 25 26 Ortiz, H 1996 Spain International Journal of Colorectal Disease 20 20 Milsom, J 1997 United States Journal of Surgical Research 55 54 Stage, J 1997 Denmark British Journal of Surgery 15 14 Lacy, A 1998 Spain Surgical Endoscopy 31 40 Schwenck 1998 Germany Surgical Endoscopy 30 30 Schwenk, W 1998 Germany Langenbecks Archives of Surgery 30 30 Schwenk, W 1999 Germany Archives of Surgery 30 30 Delgado, S 2000 Spain Surgical Endoscopy 129 126 Curet, M 2000 United States Surgical Endoscopy 25 18 Lacy, A 2002 Spain Lancet 111 108 Braga, M 2002 Italy Annals of Surgery 136 133 Weeks, J 2002 United States Journal of the American Medical Association 228 221 Winslow, E 2002 United States Surgical Endoscopy 37 46 Hasegawa, H 2003 Japan Surgical Endoscopy 24 26 Basse, L 2003 Denmark Surgical Endoscopy 16 16 Janson, M 2004 Sweden British Journal of Surgery 98 112 Kang, J 2004 China Surgical Endoscopy 30 30 Kaiser, A 2004 United States Journal of the Laparoendoscopic & Advanced Surgical

Techniques 28 20 Leung, K 2004 Hong Kong Lancet 203 200 Vignali, A 2004 Italy Diseases of the Colon & Rectum 190 194 Nelson, H 2004 Multinational New England Journal of Medicine 435 428 Braga, M 2005 Italy Diseases of the Colon & Rectum 190 201 Guillou, P 2005 Multinational Lancet 140 273 Basse, L 2005 Denmark Annals of Surgery 30 30

85

Veldkamp, R 2005 Multinational Lancet 536 546 Braga, M 2005 Italy Annals of Surgery 258 259 King, P 2006 United Kingdom British Journal of Surgery 41 19 Franks, P 2006 United Kingdom British Journal of Cancer 452 230 Liang, J 2007 China Annals of Surgical Oncology 135 134 Jayne, D 2007 United Kingdom Surgical Innovation 526 268 Braga, M 2007 Italy Annals of Surgery 113 113 Chung, C 2007 Hong Kong Annals of Surgery 41 40 Janson, M 2007 Sweden Surgical Endoscopy 130 155 Fleshman, J 2007 Multinational Annals of Surgery 435 428 King, P 2008 United Kingdom International Journal of Colorectal Disease 41 19 Lacy, A 2008 Spain Annals of Surgery 106 102 Frasson, M 2008 Italy Diseases of the Colon & Rectum 268 267 Hewett, P 2008 Multinational Annals of Surgery 294 298 Gonzalez, I 2008 Spain International Journal of Colorectal Disease 59 57 COLOR Study Group 2009 Multinational Lancet 534 542 Ng, S 2009 Hong Kong Diseases of the Colon & Rectum 76 77 Neudecker, J 2009 Germany British Journal of Surgery 250 222 Pascual, M 2010 Spain British Journal of Surgery 60 60 Taylor, G 2010 United Kingdom Formosan Journal of Surgery 280 131 Allardyce, R 2010 Multinational British Journal of Cancer 294 298 Braga, M 2010 Italy British Journal of Surgery 134 134 Jayne, D 2010 United Kingdom British Journal of Surgery 212 549

* Number of study subjects with colon cancer in each arm of the study

86

4.10.2 Strong RCTs

4.10.2.1 Post-operative complications

Twenty RCTS reported post-operative complications. Most of these trials were at low risk of

bias for the items incomplete outcome reporting and other bias. However, only 75% of RCTs

(n=15) were at low risk of bias for random sequence generation and fewer were at low risk of

bias for allocation concealment (n=13, 65%). A minority of trials were at low risk of bias for

selective outcome reporting (n=4, 20%).

Table 4.7 Summary of risk of bias item responses for RCTs reporting post-operative complications.

Risk of Bias Low n (%)

Unclear n (%)

High n (%)

Randomization sequence generation 15 (75) 5 (25) 0 (0) Allocation concealment 13 (65) 10 (50) 0 (0) Blinding of participants and personnel 1 (5) 0 (0) 19 (95) Blinding of outcome assessment 1 (5) 0 (0) 19 (95) Incomplete outcome data 19 (100) 0 (0) 1 (5) Selective outcome reporting 4 (20) 16 (80) 0 (0) Other bias 19 (100) 0 (0) 1 (5)

Individual item assessments were used to classify RCTs as either Typical (i.e. unclear or high

risk of bias) or Strong (i.e. low risk of bias) according to the guidance in Table 4.3. Four

trials were categorized as Strong RCTs (Guillou 2005, Hewett 2008, Nelson 2004, Veldkamp

2005). These four studies were at low risk of bias for five of seven bias domains

(randomization sequence generation, allocation concealment, incomplete outcome data,

selective reporting and “other” bias). The remaining 19 RCTs were at an unclear risk of bias

for selective outcome reporting as none had published protocols. A number of these studies

were at unclear risk of bias for random sequence generation and allocation concealment.

Accordingly, these 19 studies were classified as Typical RCTs.

87

4.10.2.2 Peri-operative mortality

Seventeen RCTs reported 30-day peri-operative mortality. A third of these trials had unclear

random sequence generation and unclear allocation concealment (Table 4.8). All studies

were at low risk of bias for blinding of outcome assessment since mortality is considered an

objective outcome by the Cochrane Collaboration. A minority of trials were at low risk of

bias for selective outcome reporting (n=4, 24 %).

Table 4.8 Summary of risk of bias item responses for RCTs reporting peri-operative mortality


Unclear n (%)

High n (%)


Four trials were identified as Strong RCTs (Guillou 2005, Hewett 2008, Nelson 2004,

Veldkamp 2005). These four studies were rated at low risk of bias for all seven bias domains

of the Cochrane Risk of Bias Tool. The remaining 15 trials were classified as Typical RCTs

because of deficits with regards to randomization, allocation concealment, and the possibility

of selective outcome reporting.

4.10.2.3 Length of stay

A total of twenty-two RCTs reported length of stay. Approximately one-fifth of these studies

were at unclear risk of bias for random sequence generation and allocation concealment.

Only one study employed blinding and the vast majority of trials were at unclear risk of bias

88

for blinding of participants/personnel and blinding of outcome assessment. Four RCTs were

at low risk of bias for selective outcome reporting since these trials were registered studies

with published protocols.

Table 4.9 Summary of risk of bias item responses for RCTs reporting length of stay


Unclear n (%)

High n (%)


While 25 RCTs reported length of stay, only four were identified at Strong RCTs. These four

studies were at low risk of bias for five of seven bias domains (randomization sequence

generation, allocation concealment, incomplete outcome data, selective reporting and “other”

bias). These trials were the same four studies identified as Strong for the previous two

outcomes.

89

4.10.2.4 Number of lymph nodes harvested

A total of seventeen RCTs reported the number of lymph nodes found within the surgical

specimen. A notable proportion of studies were at an unclear risk of bias with regards to

random sequence generation and allocation concealment (n=6, 35%). Blinding was a rarity in

these trials. Four studies had published protocols and were thus at low risk of bias for

selective outcome reporting.

Table 4.10 Summary of risk of bias item responses for RCTs reporting number of lymph nodes harvested


Unclear n (%)

High n (%)


Of these trials, four were identified as Strong RCTs. These four studies were at low risk of

bias for five of seven bias domains (randomization sequence generation, allocation

concealment, incomplete outcome data, selective reporting and “other” bias). Again, these

four trials were the same four identified as least biased for the previous three outcomes, post-

operative complications, peri-operative mortality, and length of stay.

90

4.11 Risk of bias assessment summary

Four studies were consistently identified as Strong RCTs (Nelson 2004, Guillou 2005,

Veldkamp 2005, Hewett 2008) across the four outcomes of interest. The remaining trials, the

Typical RCTs, were often at unclear risk of bias for randomization sequence generation and

allocation concealment. All of the Typical RCTs were at unclear risk of bias for selective

outcome reporting; the absence of published protocols among these trials precluded the

assessment of this item. This finding is not unexpected since guidance within the Cochrane

Handbook suggests that most studies are expected to be rated at an unclear risk for this

domain precisely for this reason. It was truly the presence of published protocols that set the

Strong RCTs apart from the rest of the Typical RCTs. Additionally, the Strong RCTs were

publicly funded, multi-center trials that had sample sizes of over 400 patients. Trials with

these attributes have been shown to be less susceptible to bias (Als-Nielsen et al. 2003;

Dechartres et al. 2011; Bafeta et al. 2012).

91

Chapter 5 Comparing effect estimates from

non-randomized studies and randomized controlled trials

5.1 Summary

Background

Multiple studies suggest that effect estimates from NRS are comparable to those from RCTs.

However, it has also been shown that biased effect estimates arise in RCTs in the absence of

certain study attributes. Comparisons of NRS and RCTs to date have likely compared NRS

with a heterogeneous group of RCTs.

Objectives

To compare the results of NRS with those of RCTs at low risk of bias. Studies comparing

laparoscopy and conventional (open) surgical treatment of colon cancer were used for this

case study.

Methods

All studies comparing laparoscopy with conventional surgery for the management of colon

cancer were identified. Random-effects meta-analysis was separately performed for two

subjective outcomes (post-operative complications and length of stay [LOS]) and two

objective outcomes (mortality and number of lymph nodes harvested). Meta-analysis was

92

performed for i) All Studies, ii) NRS, iii) RCTs, iv) Typical RCTs and v) Strong RCTs. The

Cochrane Risk of Bias Tool was used to classify studies as “Strong” (low risk of bias) or

“Typical” (unclear or high risk of bias). Meta-regression was conducted with study design as

a predictor variable. Bayesian meta-regression sensitivity analyses assessed the impact of

period effects and between-study case-mix (i.e. baseline event rate) in addition to study

design.

Results

A total of 144 studies reported the outcomes of interest (NRS=121, RCT=23). For post-

operative complications, the odds ratios from NRS were 36% smaller (i.e. demonstrating

more benefit) than those from Strong RCTs (ROR 0.64, [0.42, 0.97], p=0.04). The same

exaggerated benefit among NRS was seen when assessing LOS, (Difference in Mean

Differences, -2.15, [-4.08, -0.21], p=0.03). This pattern was not observed for the objective

outcomes (mortality, ROR 0.74, [0.38, 144], p=0.38, and number of LN harvested, DMD

0.49, [-1.43, 2.42], p=0.62). For both subjective outcomes, Typical RCTs also had more

extreme estimates of benefit as compared with Strong RCTs (post-operative complications,

ROR 0.63, [0.42,0.96], p=0.03 and LOS, DMD -1.40, [-2.76, -0.04], p=0.04). Multivariable

meta-regression results, adjusted for period effects and case-mix between studies, were

similar to the unadjusted meta-regression analyses.

Conclusions

When evaluating subjective outcomes, effect estimates from NRS were associated with

larger estimates of benefit for laparoscopy than Strong RCTs. Typical RCTs also had more

extreme estimates of benefit for laparoscopy as compared with Strong RCTs. Similar trends

were not observed among objective outcomes (mortality and number of lymph nodes

harvested).

93

5.2 Introduction

Randomized controlled trials (RCTs) are considered the gold standard for assessing the

efficacy of therapeutic interventions. Accordingly, systematic reviews and meta-analyses of

RCTs are placed at the top of the evidence hierarchy. In the absence of RCTs, meta-analyses

of non-randomized studies (NRS) may be conducted and this practice is becoming

increasingly common place. However, while some studies suggest that effect estimates from

NRS are comparable to those from RCTs (Concato, Shah, and Horwitz 2000; Benson and

Hartz 2000), others have found important differences (Britton et al. 1998; Shikata et al. 2006;

Kunz, Vist, and Oxman 2007).These comparisons have often included studies performed

over multiple decades, with prominent differences between patients and clinical settings. It

remains unclear how the conclusions of these studies may have been influenced by period

effects and clinical heterogeneity.

Meta-epidemiological studies have shown that RCTs without appropriate random-sequence

generation, allocation concealment and double-blinding yield biased estimates of treatment

effect (Schulz et al. 1995; Moher et al. 1998; Kjaergard, Villumsen, and Gluud 2001; Pildal

et al. 2007; Wood et al. 2008; Nuesch, Reichenbach, et al. 2009; Hrobjartsson et al. 2012;

Savovic et al. 2012; Hrobjartsson et al. 2013). Some studies have also suggested that

objective outcomes, such as mortality are not influenced by the presence or absence of these

study characteristics (Wood et al. 2008; Savovic et al. 2012) whereas subjective outcomes,

such as pain or complications, may instead be more susceptible to bias. Previous

comparisons of NRS and RCTs have not distinguished between RCTs at high or low risk of

bias. The agreement between RCTs at low risk bias and NRS remains unknown.

Our primary objective was to compare effect estimates from RCTs at low risk of bias with

those from NRS, across objective and subjective outcomes. Our secondary aim was to

evaluate how comparisons were influenced by period effects and differences in baseline

event rate in the control groups — a measure of underlying risk in enrolled patients.

94

5.3 Methods

We focused our case study of bias on studies evaluating laparoscopy and conventional

(i.e. open) surgery for colon cancer. These two surgical techniques have been directly

compared via numerous NRS and RCTs. A systematic review was undertaken to identify all

comparative studies (Chapter 4, Section 4.2). Only those studies providing sufficient

information to generate a summary effect estimate for the outcomes of interest (post-

operative complications, peri-operative mortality, length of stay and number of lymph nodes

harvested) were used for the analyses that follow. Post-operative complications and length of

stay were categorized as subjective outcomes whereas peri-operative mortality and number

of lymph nodes harvested were considered objective outcomes (Chapter 4, Section 4.5.1).

These studies were grouped into NRS, Typical RCTs or Strong RCTs according to the

methods outlined in Chapter 4 (Section 4.4 and 4.8).

5.3.1 Statistical analyses

5.3.1.1 Descriptive statistics

Descriptive statistics were calculated to compare NRS and RCTs in terms of year of



authors, length of articles and baseline event rate (or mean) in control groups. Absolute and

relative frequencies were measured for discrete variables and where appropriate, medians

and IQRs were calculated for continuous variables with a non-normal distribution. Medians

were compared using the Mann-Whitney U test and categorical variables using the Chi-

square or Fisher’s exact test, as appropriate (Pagano and Gauvreau 2000). A p-value <0.05

was considered significant. All data were analyzed using R, version 2.15.0 (R Foundation for

Statistical Computing, Vienna, Austria).

95

5.3.1.2 Meta-analysis

5.3.1.2.1 Justification for model selection

One of the aims of a meta-analysis is to produce an overall or combined effect estimate.

Either a fixed-effect or random-effects model may be used. Whereas the fixed-effect model

assumes there is one underlying effect shared by all studies, random-effects models assumes

studies are estimating different underlying effects (Figure 5.1) (Altman, Egger, and Smith

2001).

Figure 5.1 Relationship between observed data, true study effects and the common treatment effect in fixed and random-effects meta-analysis. σk

2 = observed standard error. τb2 = between-study variance in common

(true treatment effect).

Sampling error is assumed to be the sole source of variation when estimating the combined

effect in a fixed-effect model. Thus, the observed variation between individual treatment

effects is attributed solely to chance. In contrast, the true study effect could vary from study

to study in a random-effects model due to factors related to the patient population,

intervention delivery, and study methodology (i.e. clinical and methodological

heterogeneity). The observed study effects in random-effects meta-analysis are each

considered to have been sampled from a distribution of possible true effects. The mean of

this distribution is the combined effect in a random-effects model. Therefore, there are two

96

levels of sampling leading to two sources of variation in random-effects models: individual

patients in studies are sampled from the population of possible study subjects (sampling

error) and studies are each drawn from the distribution of all possible studies (Viechtbauer

2010).

The random-effects modelling approach was chosen for the analyses that follow for two

reasons. First, fixed-effects models generate confidence intervals that are too narrow by

failing to incorporate between-study heterogeneity when it exists (Altman, Egger, and Smith

2001). Second, random-effects modelling is considered standard within the meta-analysis

community (Borenstein 2009).

5.3.1.2.2 Random-effects meta-analysis

When 𝑌𝑖 is the estimate of the effect size in a given study and θ𝑖 is the true effect in that

study, then 𝑌𝑖 is expressed as:

𝑌𝑖 = θ𝑖 + 𝜀𝑖 (5.1)

𝜀𝑖 is the sampling error with which 𝑌𝑖 estimates θ𝑖 (Sutton et al. 1998). This equation can be

further expanded by replacing θ𝑖:

𝑌𝑖 = 𝜇 + ζ𝑖 + 𝜀𝑖 (5.2)

𝜇 is the true effect (mean of the distribution of possible effects) ζ𝑖 is the difference between θ𝑖 and 𝜇 (Figure 5.2) and represents

systematic error or heterogeneity 𝜀𝑖 represents random error (sampling error)

97

Figure 5.2 Relationship between the overall true effect (µ), the true effect in a given study (θ) and the observed effect (Yi).

The general equation for estimating the combined effect size from 𝑘 studies is presented

below where 𝑤𝑖 is the weight of an individual study:

𝑌� = ∑ 𝑤𝑖𝑌𝑖k𝑖=1∑ 𝑤𝑖k𝑖=1

(5.3)

Fixed-effect and random effects models differ computationally in how weights are

calculated. For fixed effects, the following equation is used:

𝑤𝑖 = 1𝑣𝑖

(5.4)

where 𝑣𝑖 represents within-study variance. Weights in fixed effects models are often simply

equal to the inverse of within-study variance. Therefore, large studies that have small

variance (i.e. more precision) are weighted more heavily than smaller studies. In random-

effects models, between-study variation (�̂�2) is also incorporated:

𝑤𝑖 = 1𝑣𝑖+ 𝜏�2

(5.5)

98

In random-effects models, the weights are not as dispersed as with fixed-effects modelling

and large studies have less influence on the overall estimate (𝑌�). Between-study variation

(�̂�2) is calculated using Cochrane’s Q statistic and the degrees of freedom (𝑑𝑓 = 𝑘 − 1):

𝑄 = ∑ 𝑤i𝑘𝑖=1 (𝑌𝑖 − 𝑌�)2 (5.6)

�̂�2 = max

⎣⎢⎢⎢⎡0, � 𝑄−(𝑘−1)

�∑ 𝑤𝑖− ∑ 𝑤i

2𝑘𝑖=1

𝑘𝑖=1

∑ 𝑤𝑖𝑘𝑖=1

��

⎦⎥⎥⎥⎤ (5.7)

Random-effects meta-analysis was separately performed for i) all studies, ii) NRS,

iii) RCTs, iv) Typical RCTs and v) Strong RCTs for each of the outcomes of interest

according to the methods described by DerSimonian and Laird (DerSimonian and Laird

1986). Heterogeneity between studies was assessed by calculating the 𝐼2 for each study:

𝐼2 = 100% x (𝑄−𝑑𝑓𝑄

) (5.8)

𝐼2 is a quantity that describes the percentage of total variation across studies that is due to

heterogeneity instead of chance (Fletcher 2007). The Cochrane Handbook for Systematic

Reviews recommends using the following approach for interpreting I2 values: 0-40%,

heterogeneity “may not be important”; 30-60%, heterogeneity may be moderate; 50-90%,

heterogeneity may be substantial; 75-100%, heterogeneity is “considerable” (Higgins, Green,

and Cochrane Collaboration. 2011). Publication bias was assessed using visual inspection of

funnel plots (Sterne et al. 2011). Axes for plots were chosen according to the principles

outlined by Sterne and Egger (Sterne and Egger 2001). All analyses were performed using R,

version 2.15.0 (R Foundation for Statistical Computing, Vienna, Austria).

99

5.3.1.3 Meta-regression

Meta-regression was used to compare effect estimates across groups of studies. Meta-

regression can be either a linear or a logistic regression model. The unit of analysis in these

models is the study. Predictors in the model are also study-level covariates (e.g. study

design) (Morton et al. 2004).

Yi=βο+ β1xi1 + β2xi2 +…+ βpxip + ζ𝑖 + 𝜀𝑖 (5.10)

Yi represents the effect estimate in a particular study βο is the intercept (average true effect when the value of all

predictor variables is equal to zero) βp denotes how the average true effect change for a one unit

increase in xip p represents the number of covariates in the model ζ𝑖represents remaining heterogeneity between studies (not explained

by covariates in the model) εi ~ N (0,vi) represents random sampling error

Thus, the meta-regression approach uses regression analysis to determine the influence of

independent (predictor) variables on the effect size (dependent variable) in a study (Sterne et

al. 2002) (Higgins and Thompson 2004). Meta-analysis can be considered a special case of

meta-regression where part of the between-study heterogeneity is explained by study-level

covariates; when there are no predictor variables in equation 5.10, it reduces to the general

equation 5.2 for 𝑌𝑖 (the effect size in a given study).

Logistic regression models were developed for binary outcomes (post-operative

complications and mortality) and linear models for continuous outcomes (length of stay and

number of lymph nodes harvested). The coefficients in binary models were exponeniated to

generate ratios of odds ratios (ROR);

ROR = 𝑒β (5.11)

100

𝑅𝑂𝑅 = combined ORx=study design A

combined ORx=study design B (5.12)

An OR less than one indicates that laparoscopy is more beneficial than conventional (open)

surgery. An OR closer to zero denotes more benefit for laparoscopy. A ROR less than one

indicates that the combined OR in the numerator of equation 5.12 was smaller than the

combined OR in the denominator. For example, consider a comparison of NRS with RCTs

for post-operative complications. If the aggregate effect estimate for NRS was 0.75 and 1.25

for RCTs, then the meta-regression results comparing these study designs would generate an

ROR roughly equal to 0.75/1.25 or 0.60. This would imply that NRS estimates showed 40%

more benefit (i.e. smaller OR) than RCTs.

Meta-regressions for linear outcomes (i.e. length of stay and number of lymph nodes) instead

yield differences in mean differences (DMD):

𝐷𝑀𝐷 = 𝑐𝑜𝑚𝑏𝑖𝑛𝑒𝑑 𝑚𝑒𝑎𝑛 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒 𝑠𝑡𝑢𝑑𝑦 𝑑𝑒𝑠𝑖𝑔𝑛 𝐴 − 𝑐𝑜𝑚𝑏𝑖𝑛𝑒𝑑 𝑚𝑒𝑎𝑛 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒𝑠𝑡𝑢𝑑𝑦 𝑑𝑒𝑠𝑖𝑔𝑛 𝐵 (5.13)

For example, consider a comparison of NRS and RCTs for length of stay. At the study-level,

outcomes are expressed as a mean difference (MD):

𝑀𝐷 = 𝑚𝑒𝑎𝑛𝑙𝑎𝑝𝑎𝑟𝑜𝑠𝑐𝑜𝑝𝑦 − 𝑚𝑒𝑎𝑛𝑜𝑝𝑒𝑛 𝑠𝑢𝑟𝑔𝑒𝑟𝑦 (5.14)

A meta-analysis of NRS yields an aggregate mean difference (MD) of -1.50, indicating that

the length of stay for laparoscopy was 1.5 days shorter than the length of stay for patients

undergoing open surgery. The MD for RCTs was -0.25. The results of a meta-regression

comparing NRS with RCTs would thus be equal to -1.50 – (-0.25) or -1.25. This DMD

indicates that the difference in length of stay between laparoscopy and open surgery in NRS

was 1.25 larger on average than in RCTs. Note however, that a negative MD for length of

stay favors laparoscopy but for number of lymph nodes harvested, favors open surgery.

All meta-regression analyses were performed according to the methods outlined by

Thompson and Higgins (Thompson and Higgins 2002) using R, version 2.15.0 (R

Foundation for Statistical Computing, Vienna, Austria).

101

5.3.1.4 Sensitivity analysis

NRS comparing laparoscopy and open surgery were first published in the early 1990s and

high-quality RCTs appeared nearly 15 years later. It is likely that these surgical techniques

and peri-operative processes (e.g. imaging and anaesthesia techniques, prophylactic

antibiotic guidelines and enhanced recovery pathways focusing on early feeding and

mobilization) evolved during this time. It is possible the comparisons of effect estimates

across different groups of studies (i.e. NRS, RCTs, Typical RCTs and Strong RCTs) could be

confounded by period effects.

Additionally, individual studies likely differed in the types of patients included with some

evaluating patients with more advanced disease or a higher frequency of comorbidities with

would impact the risk of developing post-operative complications or death. Moreover,

individual institutions have also been shown to differ in their capacity to rescue patients with

complications (i.e. prevent mortality) and this could lead to important differences in

mortality between institutions or individual studies (Ghaferi, Birkmeyer, and Dimick 2009).

These sources of clinical and methodological heterogeneity could confound the relationship

between study design and observed effect estimates. The best possible method for exploring

such heterogeneity employs the use of individual patient and provider data to explore the

impact of various covariates on the treatment effect. It is uncommon for those conducting

meta-analyses and meta-regression analyses to have access to such data (Sharp and

Thompson 2000). Alternatives include using baseline event rate, which is a covariate

measured at the study level, to adjust analyses for important between-study differences

(Sharp, Thompson, and Altman 1996; Thompson, Smith, and Sharp 1997).

A sensitivity analysis incorporating period effects (i.e. year of publication) and baseline

event rate (i.e. event rate in control groups) was therefore undertaken. Baseline event rates

are considered reflective of differences in the underlying risk of patients (Barza, Trikalinos,

and Lau 2009) and “can be interpreted as a summary of a number of unmeasured patient

characteristics” (Sharp, Thompson, and Altman 1996). Studies with higher rates of post-

102

operative complications or mortality might differ not only in terms of patient case-mix but

also with regards to institutional processes of care. This single measure was therefore used to

incorporate between-study differences in patient case-mix and institutional practice.

For binary outcomes, the baseline event rate was equal to the proportion of patients in the

control group experiencing the outcome (i.e. either a post-operative complication or death).

For continuous outcomes (i.e. length of stay or number of lymph nodes harvested), the mean

in the control group was equal to the baseline event rate. In both instances, the baseline event

rate is also used to calculate the overall effect estimate in a study. Therefore, frequentist

regression methods could not be used because one of the covariates (i.e. baseline event rate)

would be correlated with the dependent variable (i.e. effect estimate). The phenomenon of

regression to the mean can occur; “a high baseline event rate, observed entirely by chance,

will on average, will give rise to a higher than expected effect estimate, and vice versa”

(Higgins, Green, and Cochrane Collaboration. 2011). It is recommended that a Bayesian

analysis should be used because this approach allows for a separate posterior probability to

be calculated for the covariate — one that is unrelated to the posterior probability for the

overall effect estimate (McIntosh 1996; Thompson, Smith, and Sharp 1997; Sharp and

Thompson 2000; van Houwelingen, Arends, and Stijnen 2002). Bayesian hierarchical models

were developed according to this guidance (Thompson, Turner, and Warn 2001) and are

available in Appendix F.

5.3.1.4.1 Model estimation

Bayesian analyses were performed using OpenBUGs, version 3.2.2. OpenBUGs employs

Markov Chain Monte Carlo methods and the Gibbs sampler to estimate posterior probability

distributions for quantities of interest (Lunn et al. 2009). After a burn-in of 20,000 updates,

100,000 iterations were performed. Three simultaneous chains were run and convergence

was assessed by examining Gelman-Rubin convergence plots (Gelman and Rubin 1996).

Initial values for each unknown parameter of each chain were randomly generated from a

103

normal distribution in R, version 2.10.1 (R Foundation for Statistical Computing, Vienna,

Austria). Non-informative prior distributions were used for all model parameters. Given the

non-informative nature of these priors and our large number of studies, we did not perform

sensitivity analyses on the choice of prior distributions. Results are reported according to the

ROBUST guidelines (Sung et al. 2005).

5.4 Results


A subgroup of the data set described in Chapter 4 was used for the following analyses. One-

hundred and forty-four studies reported the outcomes of interest (Table 5.1). These

comparative studies involved a total of 1,177,740 participants (NRS n=1,171,524, RCT

n= 6,216). The earliest comparative studies were NRS, published in 1993. The first RCTs

appeared 3 years later. The majority of studies were affiliated with an academic center,

however, a notable proportion of NRS were conducted in a community setting (21.7%). Both

consortiums and authors with methodological expertise were more common among RCTs.

The reports of NRS were shorter (median 6 versus 7 pages) and authored by fewer

investigators.

104


NRS (N=121)

RCTs (N=23) p-value•


2004 (2001-2006) 0.27†

Participants 129 (67-265)

116 (60-397) 0.95†

Authors

Number* 5 (4-7)

7 (5-8) 0.03†


5 (21.7) <0.001♦


8 (34.8) 0.02◊


20 (87.0) 0.56♦


7 (6-9) 0.02†

* Median, (Interquartile Range, IQR). § Number (percentage). † Medians compared using the Mann-Whitney U test. ♦ Frequencies compared using Fisher’s exact test. ◊ Frequencies compared using Chi-square test. • Statistically significant p values (<0.05) indicated in bold.

5.4.2 Binary outcomes


Ninety-nine studies (NRS=79, RCTs=20) reported the frequency of post-operative

complications. The results of random-effects meta-analysis are outlined in Table 5.2.

Separate analyses of all studies, NRS, all RCTs and Typical RCTs suggest that laparoscopy

was associated with fewer post-operative complications. However, a meta-analysis of Strong

RCTs did not find laparoscopy to be superior to open surgery (OR 0.96, 95% CI 0.80 to 1.15,

p=0.65) (Figure 5.3). Strong RCTs were the least heterogeneous group of studies (I2=15.4%);

the results of individual Strong RCTs are outlined in Table 5.3. Typical RCTs and all RCTs

were moderately heterogeneous (44.8% and 52.9%, respectively). The NRS were the most

diverse group of studies (I2=79.9%).

105

Table 5.2 Random-effects meta-analysis results for studies reporting post-operative complications.

# of Studies OR* 95% CI p-value I2♦ 95% CI All Studies 99 0.65 0.60, 0.71 <0.0001 77.2 72.5, 81.1 NRS 79 0.63 0.57, 0.70 <0.0001 79.9 75.4, 83.6 RCTs 20 0.72 0.58, 0.90 0.0045 52.9 21.7, 71.7 Typical RCTs 16 0.60 0.45, 0.82 0.0012 44.8 0.90, 69.3 Strong RCTs 4 0.96 0.80, 1.15 0.65 15.4 0.00, 87.1

* Odds Ratio. OR<1 indicates that laparoscopy is associated with fewer post-operative complications as compared with open surgery. ♦ I-squared describes the percentage of total variation across studies that is due to heterogeneity instead of chance.

106

Table 5.3 Results of Strong Randomized Controlled Trials

Events Event Rate Author Year N LAP§ OPEN† LAP§ OPEN† Odds Ratio* 95% CI

A. Post-Operative Complications COST Nelson et al. 2004 863 92/432 85/428 0.21 0.20 1.06 0.67, 1.67 MRC CLASICC Guillou et al. 2005 794 172/526 85/268 0.33 0.32 1.05 0.76, 1.43 COLOR Veldkamp et al. 2005 1082 111/536 110/546 0.21 0.20 1.05 0.77, 1.39 ALCCaS Hewett et al. 2008 592 111/294 135/298 0.38 0.45 0.73 0.53, 1.02 B. Mortality COST Nelson et al. 2004 863 2/435 4/428 0.005 0.009 0.49 0.09, 2.69 MRC CLASICC Guillou et al. 2005 794 21/526 13/268 0.040 0.049 0.82 0.40, 1.66 COLOR Veldkamp et al. 2005 1082 6/536 10/546 0.011 0.018 0.60 0.22, 1.68 ALCCaS Hewett et al. 2008 592 4/294 2/298 0.014 0.007 2.04 0.37, 11.23 Mean Mean

Difference*

Author Year N* LAP§ OPEN† 95% CI C. Length of Stay COST Nelson et al. 2004 863 5 6 -1.0 -1.20, -0.80 MRC CLASICC Guillou et al. 2005 794 9 9 0.0 -0.25, 0.25 COLOR Veldkamp et al. 2005 1082 8.2 9.3 -1.1 -1.93, -0.27 ALCCaS Hewett et al. 2008 592 9.5 10.6 -1.1 -2.28, 0.08 D. Number of Lymph Nodes COST Nelson et al. 2004 863 12 12 0 -0.56, 0.56 MRC CLASICC Guillou et al. 2005 794 12 13.5 -1.5 -1.88, -1.12 COLOR Veldkamp et al. 2005 1082 10 10 0 -1.24, 1.24 ALCCaS Hewett et al. 2008 592 13 13 0 -2.52, 2.52 * Odds ratios, mean differences and 95% CI calculated according to methods outlined in Section 5.3.1.2.2. § LAP = laparoscopic surgery † OPEN= open surgery

107

Figure 5.3 Forest plot of meta-analysis results for studies reporting post-operative complications. Squares indicate odds ratios and error bars indicate 95% confidence intervals.

Random-effects meta-regression models were used to compare effect estimates across groups

and the results are summarized in Table 5.4. NRS estimated a benefit for laparoscopy that

was 36% larger on average than in Strong RCTs (ROR 0.64, 95% CI 0.42 to 0.94, p=0.04).

Typical RCTs also estimated a larger benefit with laparoscopy than Strong RCTs (ROR 0.63,

95% CI 0.42 to 0.96, p=0.03). This pattern was not observed when comparing NRS with all

RCTs. The effect estimates from NRS and Typical RCTs were similar (Figure 5.4).

Table 5.4 Meta-regression results comparing effect estimates for post-operative complications from different study designs.

Comparison ROR* 95% CI† p-value NRS/RCTs 0.85 0.65, 1.13 0.28 NRS/Typical RCTs 1.01 0.73, 1.41 0.93 NRS/Strong RCTs 0.64 0.42, 0.97 0.04 Typical RCTs/Strong RCTs 0.63 0.42, 0.96 0.03

* Ratio of odds ratios. A ROR < 1 indicates that the study in the numerator showed more benefit than studies in the denominator.

108

Figure 5.4 Forest plot of ratios of odds ratios (ROR) from meta-regression analysis comparing study designs. Squares indicate ROR and error bars indicate 95% confidence intervals.


Ninety-six studies (NRS=79, RCTs=17) examined the association between surgical approach

(laparoscopic or open colon surgery) and mortality. The effect estimates from all studies and

NRS both suggest that laparoscopy is associated with fewer deaths than open surgery

(p<0.0001) (Table 5.5). Combining all RCTs however, did not demonstrate an advantage

with laparoscopy (Figure 5.5). Typical and Strong RCTs similarly suggest that there is no

benefit associated with laparoscopy. All groups had low between-study heterogeneity.

Table 5.5 Random-effects meta-analysis results for studies reporting peri-operative mortality.

# of Studies OR* 95% CI p-value I2♦ 95% CI All Studies 96 0.62 0.51, 0.75 <0.0001 25.8 3.8, 42.8 NRS 79 0.59 0.47, 0.74 <0.0001 33.9 12.7, 50.0 RCTs 17 0.83 0.55, 1.26 0.39 0 0.0, 0.0 Typical RCTs 13 0.92 0.46, 1.83 0.82 0 0.0, 0.0 Strong RCTs 4 0.78 0.46, 1.32 0.36 0 0.0, 73.9

* Odds Ratio. OR<1 indicates that laparoscopy is associated with fewer deaths. ♦ I-squared describes the percentage of total variation across studies that is due to heterogeneity instead of chance.

109

Figure 5.5 Forest plot of meta-analysis results for studies reporting peri-operative mortality. Squares indicate odds ratios and error bars indicate 95% confidence intervals.

Meta-regression results are outlined in Table 5.6. While there was a suggestion that NRS

over-estimate benefit associated with laparoscopy (ROR ranging from 0.63 to 0.74), none of

these comparisons were statistically significant (Figure 5.6).

Table 5.6 Meta-regression results comparing effect estimates for peri-operative mortality from different study designs.

Comparison ROR* 95% CI† p-value NRS/RCTs 0.69 0.41, 1.16 0.16 NRS/Typical RCTs 0.63 0.30, 1.34 0.24 NRS/Strong RCTs 0.74 0.38, 1.44 0.38 Typical RCTs/Strong RCTs 1.16 0.45, 3.03 0.76

* Ratio of odds ratios. A ROR < 1 indicates that the study in the numerator showed more benefit than studies in the denominator.

110

Figure 5.6 Forest plot of ratios of odds ratios (ROR) from meta-regression analysis comparing study designs. Squares indicate ratios of odds ratios (ROR) and error bars indicate 95% confidence intervals.

5.4.3 Continuous outcomes


Estimates for length of stay were reported in 128 studies (NRS=106, RCTs=22). All groups

demonstrated a benefit associated with laparoscopy (Table 5.7). While Strong RCTs found

that length of stay was 0.70 days shorter for those treated with laparoscopy (95% CI, -1.23 to

-0.17), NRS demonstrated a benefit of nearly 3 days with laparoscopy (MD -2.95, 95% CI

-3.39 to -2.50) (Figure 5.7). Notably, I2 values for all groups were over 90%.

111

Table 5.7 Random-effects meta-analysis results for studies reporting length of stay (days).

# of Studies MD 95% CI p-value I2♦ 95% CI All Studies 128 -2.74 -3.13, -2.36 <0.0001 97.3 97.0, 97.5 NRS 106 -2.95 -3.39, -2.50 <0.0001 97.3 97.0, 97.6 RCTs 22 -1.82 -2.45, -1.18 <0.0001 95.9 94.8, 96.8 Typical RCTs 18 -2.16 -2.89, -1.44 <0.0001 90.4 86.4, 93.2 Strong RCTs 4 -0.70 -1.23, -0.17 0.01 92.3 83.4, 96.4

* Mean Difference, MD=meanlaparoscopy-meanopen. A MD<0 indicates that laparoscopy is associated with a shorter length of stay.

♦ I-squared describes the percentage of total variation across studies that is due to heterogeneity instead of chance.

Figure 5.7 Forest plot of meta-analysis results for studies reporting length of stay. Squares indicate mean differences and error bars indicate 95% confidence intervals.

Meta-regression results are summarized in Table 5.8. NRS estimates for benefit were larger

by more than one day as compared with all RCTs (DMD -1.27, 95% CI -2.30 to -0.25,

p=0.01) (Figure 5.8). The difference between NRS and Strong RCTs estimates was over 2

days (DMD -2.15, 95% CI -4.08 to -0.21, p=0.03). Typical RCTs also ascribed more benefit

to laparoscopy than Strong RCTs (DMD -1.40, 95% CI -2.76 to -0.04, p=0.04).

112

Table 5.8 Meta-regression results comparing effect estimates for length of stay from different study designs.

Comparison DMD* 95% CrI† p-value NRS:RCTs -1.27 -2.30, -0.25 0.01 NRS:Typical RCTs -0.81 -1.90, 0.29 0.15 NRS:Strong RCTs -2.15 -4.08, -0.21 0.03 Typical RCTs:Strong RCTs -1.40 -2.76, -0.04 0.04

* Differences in Mean Differences. DMD=Mean differencestudy design 1-Mean differencestudy design 2. Studies are ordered in the comparison column as study design 1:study design 2. A negative mean difference indicates that laparoscopy is associated with a shorter length of stay.

Figure 5.8 Forest plot of difference in mean differences (DMD) from meta-regression analysis comparing study designs. Squares indicate DMDs and error bars indicate 95% confidence intervals. DMD=Mean differencestudy design 1-Mean differencestudy design 2. Studies are ordered in labels as study design 1:study design 2. A negative mean difference indicates that laparoscopy is associated with a shorter length of stay.


Seventy-six studies reported the number of lymph nodes harvested (NRS=59, RCTs=17).

Meta-analyses across all groups revealed that a comparable number of lymph nodes were

found in specimens from laparoscopic and open surgical procedures (Table 5.9). The effect

113

estimate from Strong RCTs was the least favourable of laparoscopy (MD -0.55, 95% CI -

1.37 to 0.26, p=0.18) (Figure 5.9). Heterogeneity between studies was significant and ranged

from 50.6 to 93.4%. NRS were the most diverse group of studies.

Table 5.9 Random-effects meta-analysis results for studies reporting number of lymph nodes harvested.

# of Studies MD 95% CI p-value I2♦ 95% CI All Studies 76 -0.02 -0.50, 0.46 0.93 92.0 90.6, 93.2 NRS 59 0.07 -0.53, 0.67 0.81 93.4 92.2, 94.5 RCTs 17 -0.35 -0.93, 0.23 0.24 68.3 47.7, 80.8 Typical RCTs 13 -0.23 -1.03, 0.57 0.58 50.6 6.7, 73.8 Strong RCTs 4 -0.55 -1.37, 0.26 0.18 86.3 66.7, 94.4

* Mean Difference, MD=meanlaparoscopy-meanopen. A MD<0 indicates that laparoscopy is associated with finding fewer lymph nodes in the surgical specimen. ♦ I-squared describes the percentage of total variation across studies that is due to heterogeneity instead of chance.

Figure 5.9 Forest plot of meta-analysis results for studies reporting number of lymph nodes harvested. Squares indicate mean differences and error bars indicate 95% confidence intervals.

114

Table 5.10 outlines the results of the meta-regression modelling with studies reporting

number of lymph nodes harvested. The DMD between all comparisons was smaller than 0.5

(i.e. half a lymph node) and none were statistically significant.

Table 5.10 Meta-regression results comparing effect estimates for number of lymph nodes harvested from different study designs.

Comparison DMD* 95% CI† p-value NRS:RCTs 0.38 -0.76, 1.53 0.51 NRS:Typical RCTs 0.34 -0.98, 1.66 0.61 NRS:Strong RCTs 0.49 -1.43, 2.42 0.62 Typical RCTs:Strong RCTs 0.15 -2.04, 2.35 0.89

* Differences in Mean Differences. DMD=Mean differencestudy design 1-Mean differencestudy design 2. Studies are ordered in the comparison column as study design 1:study design 2. A negative mean difference indicates that laparoscopy is associated with finding fewer lymph nodes in the surgical specimen.

Figure 5.9 Forest plot of difference in mean differences (DMD) from meta-regression analysis comparing study designs. Squares indicate DMDs and error bars indicate 95% confidence intervals. DMD=Mean differencestudy design 1-Mean differencestudy design 2. Studies are ordered in labels as study design 1:study design 2. A negative mean difference indicates that laparoscopy is associated with a shorter length of stay.

115

5.4.3.3 Sensitivity analysis

The median year of publication for included studies and baseline event rates among control

groups are reported in Table 5.11. The median year of publication differs by at most two

years between NRS and RCTs. Baseline event rates are also similar. Figure 5.11 presents the

distribution of baseline event rates among NRS and RCTs. For subjective outcomes

(i.e. post-operative complications and length of stay), there were more outlying studies

among NRS; for post-operative complications, 10.1% of NRS had a baseline event rate

>45%, the highest baseline event rate observed among RCTs. For length of stay, 17.9% of

NRS had a baseline mean >14 days, the upper limit for RCTs.

Table 5.11 Median year of publication and baseline event rates in studies reporting the outcomes of interest.

NRS Median (IQR)

RCTs Median (IQR)

Post-Operative Complications Studies 79 20 Year 2006 (2002-2008) 2004 (2000-2006) Baseline Event Rate 0.26 (0.20-0.34) 0.25 (0.20-0.30) Mortality Studies 79 17 Year 2006 (2002-2008) 2004 (2000-2004) Baseline Event Rate 0.01 (0.00-0.03) 0.01 (0.00-0.02) Length of Stay Studies 106 22 Year 2006 (2000-2004) 2004 (2002-2004) Mean in Control Group* 9.40 (7.82-10.83) 9.00 (7.32-11.39) Number of LN harvested Studies 59 17 Year 2005 (1998-2004) 2004 (2002-2004) Mean in Control Group* 13.80 (9.68, 14.35) 13.00 (10.50-16.00) * Control Group = Open Surgery Group

116

Figure 5.11 Event rates in control groups across included studies. Rates in studies reporting A) Post-Operative Complications and B) Mortality expressed as percent rates. Rates in studies reporting C) Length of Stay and D) Number of Lymph Nodes Harvested expressed as means. Black – Randomized controlled trials. Gray – Non-randomized studies.

117

We examined the relationship between baseline event rate and publication year visually

(Figure 5.12) for each of the outcomes of interest. There was no common pattern evident

across outcomes and notable variation between study designs.

Figure 5.12 Baseline event rates over time. Event rates in control groups across included studies. Rates in studies reporting A) Post-operative complications and B) Peri-operative mortality expressed as percent rates. Rates in studies reporting C) Length of stay and D) Number of lymph nodes harvested expressed as means. Black – Randomized controlled trials. Gray – Non-randomized studies.

118

The results of univariable and multivariable Bayesian meta-regression analyses for studies

reporting post-operative complications are outlined in Table 5.12. The results of the adjusted

and unadjusted analyses remain consistent with those of the primary analysis (Table 5.3);

both NRS and Typical RCTs were associated with more extreme estimates of benefit for

laparoscopy as compared with Strong RCTs. Moreover, as the baseline event rate increased

in a given study, laparoscopy was associated with fewer post-operative complications. For

instance, in the comparison of NRS with Strong RCTs, the ROR for baseline event rate is

0.67. This indicates that for a one logit increase in the baseline event rate, the odds ratio for

post-operative complications decreases by 33%. As a demonstrative example, an increase in

the baseline event rate from 0.25 to 0.35 (equal to 0.21 logits) would result in a decrease of

the odds ratio by 8%. Therefore, the benefit of laparoscopy appears to be more pronounced

in studies were patients in the control group were more likely to experience a complication.

A similar trend for baseline event rate was observed among studies reporting peri-operative

mortality; as the baseline rate of deaths in a study increased, the odds ratio for death

decreased (Table 5.13). As with the primary analysis (Table 5.6), effect estimates did not

appear to differ across different study designs in both unadjusted and adjusted analyses.

Among studies reporting length of stay, NRS were again associated with more extreme

estimates of treatment effect as compared with all RCTs and Strong RCTs (Table 5.14).

Effect estimates in Typical RCTS were also more extreme as compared with those from

Strong RCTs. Multivariable analyses, adjusting for year of publication and baseline event

rate, revealed similar findings. The results for this outcome were again similar to those from

primary analyses (Table 5.6). Moreover, differences in length of stay between laparoscopy

and open surgery increased as the baseline mean increased by one day. For example, in the

comparison of NRS and Strong RCTs, as the mean length of stay increased by one day, the

difference between laparoscopy and open surgery increased by 0.39 days. A similar trend

was not observed among studies reporting number of lymph nodes harvested

The results of the Bayesian unadjusted and adjusted analyses for studies reporting number of

lymph nodes harvested (Table 5.15) were similar to primary analyses (Table 5.10); effect

119

Table 5.12 Bayesian meta-regression results comparing effect estimates for post-operative complications from different study designs, adjusted for year of publication and baseline event rate.

Unadjusted Analysis Multivariable Analysis

Comparison Design ROR 95% CrI† Design

ROR* 95% CrI† Year ROR♦ 95% CrI†

Baseline Event Rate

ROR§ 95% CrI†

NRS/RCTs 0.85 0.63, 1.15 0.87 0.67, 1.16 0.99 0.96, 1.02 0.65 0.56, 0.77 NRS/Typical RCTs 1.00 0.70, 1.41 1.06 0.78, 1.47 0.98 0.96, 1.01 0.65 0.55, 0.78 NRS/Strong RCTs 0.58 0.37, 0.93 0.60 0.37, 0.94 0.98 0.95, 1.01 0.67 0.58, 0.79 Typical RCTs/Strong RCTs 0.62 0.38, 0.99 0.57 0.36, 0.88 1.07 1.00, 1.15 0.40 0.32, 0.64

Statistically significant values in bold (i.e. credible intervals do not include unity). * Ratio of odds ratios. A ROR < 1 indicates that the study in the numerator showed more benefit than studies in the denominator. † Credible Interval. ♦ ROR for year can be interpreted as the change in the overall study effect (i.e. odds ratio) for a one unit increase in year. For example, a ROR of 0.95 indicates that publication of an article one year later would be associated with a decrease in the odds ratio by 5%. § ROR for baseline event rates are expressed for a one logit increase in baseline event rate.

Table 5.13 Bayesian meta-regression results comparing effect estimates for peri-operative mortality from different study designs, adjusted for year of publication and baseline event rate.


Comparison Design ROR

95% CrI†

Design ROR* 95% CrI† Year

ROR♦ 95% CrI† Baseline

Event Rate ROR§

95% CrI†

NRS/RCTs 0.65 0.35, 1.10 0.94 0.52, 1.62 0.96 0.91, 1.02 0.38 0.37, 0.39 NRS/Typical RCTs 0.66 0.26, 1.42 1.11 0.49, 2.21 0.96 0.91, 1.02 0.37 0.37, 0.37 NRS/Strong RCTs 0.69 0.49, 1.41 0.89 0.33, 2.03 0.96 0.89, 1.02 0.37 0.37, 0.37 Typical RCTs/Strong RCTs 1.26 0.32, 3.46 0.93 0.22, 2.64 1.01 0.84, 1.22 0.37 0.36, 0.40

Statistically significant values in bold (i.e. credible intervals do not include unity). * Ratio of odds ratios. A ROR < 1 indicates that the study in the numerator showed more benefit than studies in the denominator. † Credible Interval. ♦ ROR for year can be interpreted as the change in the overall study effect (i.e. odds ratio) for a one unit increase in year. For example, a ROR of 0.95 indicates that publication of an article one year later would be associated with a decrease in the odds ratio by 5%. § ROR for baseline event rates are expressed for a one logit increase in baseline event rate.

120

Table 5.14 Bayesian meta-regression results comparing effect estimates for length of stay from different study designs, adjusted for year of publication and baseline event rate.


Comparison Design DMD* 95% CrI† Design

DMD* 95% CrI† Year DMD♦ 95% CrI†

Baseline Event Rate

DMD§ 95% CrI†

NRS/RCTs -1.07 -1.98, -0.02 -0.76 -1.38, -0.11 -0.01 -0.06, 0.04 -0.39 -0.45, -0.33 NRS/Typical RCTs -0.85 1.94, 0.23 -0.43 -1.14, 0.31 -0.01 -0.07, 0.04 -0.39 -0.45, -0.33 NRS/Strong RCTs -1.74 -3.25, -0.26 -1.52 -2.71, -0.32 -0.02 -0.07, 0.04 -0.39 -0.46, -0.33 Typical RCTs/Strong RCTs -1.32 -2.43, -0.21 -1.07 -1.83, -0.33 0.03 -0.17, 0.23 -0.31 -0.59, -0.02

Statistically significant values in bold (i.e. credible intervals do not include zero). * DMD=Mean differencestudy design 1-Mean differencestudy design 2. Studies are ordered in the comparison column as study design 1:study design 2. † Credible Interval. ♦ DMD for year can be interpreted as the change in the overall study effect (i.e. mean difference) for a one unit increase in year. For example, a DMD of -0.50 indicates that if an article was published one year later, the mean difference becomes more negative by 0.50 days. § Baseline Event Rate = mean in the control (open) group. DMD for baseline event rate can be interpreted as the change in the overall study effect (i.e. mean difference) for a one logit increase in baseline mean.

Table 5.15 Bayesian meta-regression results comparing effect estimates for number of lymph nodes harvested from different study designs, adjusted for year of publication and baseline event rate.


Comparison Design DMD* 95% CrI† Design

DMD* 95% CrI† Year DMD♦ 95% CrI†

Baseline Event Rate

DMD§ 95% CrI†

NRS/RCTs 0.42 -0.74, 1.58 0.43 -0.78, 1.70 -0.02 -0.13, 0.09 -0.05 -0.16, 0.04 NRS/Typical RCTs 0.32 -1.07, 1.70 0.36 -0.94, 1.72 -0.02 -0.13, 0.10 -0.06 -0.17, 0.04 NRS/Strong RCTs 0.44 -1.45, 2.46 0.37 -1.68, 2.34 -0.04 -0.16, 0.08 -0.03 -0.14, 0.08 Typical RCTs/Strong RCTs 0.23 -1.34, 1.71 0.92 -0.30, 2.81 0.13 -0.11, 0.25 0.06 -0.19, 0.08

Statistically significant values in bold (i.e. credible intervals do not include zero). * DMD=Mean differencestudy design 1-Mean differencestudy design 2. Studies are ordered in the comparison column as study design 1:study design 2. † Credible Interval. ♦ DMD for year can be interpreted as the change in the overall study effect (i.e. mean difference) for a one unit increase in year. For example, a DMD of -0.50 indicates that if an article was published one year later, the mean difference becomes more negative by 0.50 days. § Baseline Event Rate = mean in the control (open) group. DMD for baseline event rate can be interpreted as the change in the overall study effect (i.e. mean difference) for a one logit increase in baseline mean.

121

estimates did not statistically differ across different study designs. Baseline event also did not

appear to influence estimates of mean differences.

5.4.3.3.1 Publication bias

Due to significant heterogeneity across all four outcomes, formal tests for publication bias

were not undertaken (Ioannidis and Trikalinos 2007). Funnel plot were therefore examined

visually (Figure 5.13). The funnel plot for post-operative complications appeared to have

some minor asymmetry; there were approximately 5 small NRS near the bottom of the funnel

favoring laparoscopy that were not balanced by similarly-sized studies favoring open

surgery. Minor asymmetry was also noted with the funnel plot for length of stay.

5.5 Discussion

This study comparing effect estimates from RCTs with those from NRS has three main

findings. First, among subjective outcomes, NRS had more extreme estimates of benefit for

laparoscopy than Strong RCTs. For the outcome post-operative complications, NRS

attributed 36% more benefit to laparoscopy than in Strong RCTs (ROR 0.64, 95% CI 0.42-

0.97). Laparoscopy was also associated with a length of stay that was 2 days shorter in NRS

as compared with Strong RCTs. A similar pattern was not observed with the objective

outcomes mortality and number of lymph nodes harvested. The observed differences

between NRS and Strong RCTs persisted after adjusting for period effects and differences in

baseline event rates between studies. Second, among subjective outcomes, effect estimates

from Typical RCTs were similar to those from NRS. Like NRS, Typical RCTs were also

associated with larger estimates of benefit; combined odds ratios for post-operative

complications were 37% smaller (e.g. more benefit attributed to laparoscopy) in Typical

RCTs than Strong RCTs. Differences in length of stay between laparoscopy and

122

Figure 5.13 Funnel plots for A) Post-operative complications, 79 non-randomized studies (black) and 20 RCTs (white); B) Mortality, 79 non-randomized studies (black) and 17 RCTs (white); C) Length of stay, 106 non-randomized studies (black) and 22 RCTs (white); D) Number of lymph nodes, 59 non-randomized studies (black) and 17 RCTs (white).

123

conventional surgery were -0.7 days in Strong RCTs (favouring laparoscopy) but were -2.15

days in Typical RCTs. Third, there was significant between-study heterogeneity across all

four outcomes, and NRS were more heterogeneous than Typical or Strong RCTs.

A previous study has compared the findings of NRS with those of RCTs evaluating

laparoscopy and conventional surgery for the management of colon cancer (Abraham et al.

2010). Unlike our study, this study concluded that the effect estimates across these two study

designs are generally comparable. Our study has a number of advantages including having

identified a larger cohort of studies comparing laparoscopy with open surgery for colon

cancer (n=144 studies versus n=61 studies). Our analyses also assessed comparability across

objective and subjective outcomes; recent studies have demonstrated that bias associated

with RCT design attributes (e.g. allocation concealment) is more pronounced with subjective

outcomes (Wood et al. 2008; Savovic et al. 2012). The findings of this study support that

similar bias or exaggerated estimates of benefit were notable among subjective but not

objective outcomes. While Abraham and colleagues measured and acknowledged the

variation in methodological quality among RCTs in their study, they nonetheless aggregated

effect estimates across all RCTs. We instead chose to handle the variability in trial quality by

dividing RCTs into Strong and Typical studies. By isolating rigorously performed RCTs, we

combined effect estimates from a population of studies at the lowest risk of bias. Our results

suggest that the findings of Strong RCTs are more conservative (i.e closer to the null) than

those of Typical RCTs. Hartling et al. have also found similar results in a study comparing

pediatric efficacy trials at low, unclear or high risk of bias (Hartling et al. 2009). In this

study, effect estimates from RCTs at low risk of bias were closest to the null.

Other studies have examined the comparability of NRS and RCTs across other interventions.

While some of have found the results of NRS and RCTs to be generally comparable

(Concato, Shah, and Horwitz 2000; Benson and Hartz 2000), others have found important

differences (Britton et al. 1998; Shikata et al. 2006; Kunz, Vist, and Oxman 2007). We found

that NRS attributed 36% more benefit to laparoscopy than Strong RCTs when examining

subjective outcomes. Whereas we found that NRS overestimated benefit with laparoscopy,

124

others have found that surgical NRS can overestimate harm (Bhandari et al. 2004). In a study

by Bhandari et al., the results of NRS evaluating arthroplasty and internal fixation for hip

fracture were compared with the results of RCTs. They identified 13 NRS and 14 RCTs.

Mortality data was available in 13 NRS and 12 RCTs. The relative risk for mortality with

arthroplasty as compared with internal fixation in NRS was 40% larger than the estimate in

RCTs; the RR was 1.44 in NRS (95% CI 1.13,1.85) versus 1.04 in RCTs (95% CI,

0.84,1.29). It is interesting that the magnitude of bias in our study is similar to the bias

detected by Bhandari et al., but the direction of bias is not.

One of the strengths of our study over previous comparisons includes incorporating RCT

quality into comparisons of NRS and trials. Previous attempts to compare effect estimates

from NRS and RCTs treated the latter as a homogeneous group of high-quality studies.

Important methodological differences between individual RCTs may have been overlooked.

Moreover, many of the studies comparing NRS and RCTs included studies conducted in the

1980s and 1990s. Less than 17% of studies in this cohort were published before 2000.

Therefore, our study represents a more contemporaneous comparison of NRS and RCTs.

The results of this study are limited by a reliance on reported study methods to categorize

RCTs as Typical or Strong. For example, it is possible that even though RCTs did not report

adequate random sequence generation, appropriate methods for randomizing patients may

have been employed. Accordingly, there may have been misclassification of RCTs because

assessments were made using reported methods instead of actual study conduct. To limit the

possibility of such misclassification, study protocols were reviewed and authors were

contacted to collect additional information. Moreover, the Cochrane Risk of Bias Tool,

which was used to classify RCTs, has been criticized for being a subjective instrument

heavily influenced by judgment; Hartling et al. have demonstrated low inter-rater reliability

for the domains blinding, incomplete data and selective reporting (Hartling et al. 2012).

However, Hartling and colleagues examined an earlier version of the Cochrane Risk of Bias

Tool. In the interim, the instrument has been revised to diminish ambiguity and has more

detailed guidance for making individual domain assessments (Higgins et al. 2011). This more

125

recent version of the tool was used in this study. We also evaluated the subjectivity of our

assessments by examining RCTs reporting post-operative complications in duplicate. There

was perfect agreement between the two assessors. The same four RCTs were identified at

least risk of bias or “Strong” across all four outcomes of interest. Notably, these RCTs were

registered, multi-centered, large RCTs that were publicly funded. RCTs with these attributes

have been shown to be less susceptible to bias (Als-Nielsen et al. 2003; Nuesch et al. 2010;

Dechartres et al. 2011).

Our analyses were also limited by using baseline event rate to control for between-study

heterogeneity. Without access to patient and provider-level data, baseline event rate was used

as a measure of aggregate underlying risk. If such data had been available, our analyses

could have been adjusted for differences in age, cancer stage or physician-experience with

laparoscopy — variables that may have influenced between study differences in post-

operative complications, mortality, length of stay and number of lymph nodes harvested.

Using baseline event rate is instead a more indirect measure of these attributes but still

represents an attempt to adjust comparisons of NRS and RCTs for between-study clinical

heterogeneity.

Our analysis focused on a single intervention in surgery. It remains to be seen if our results

are generalizable to others surgical interventions or intervention in other areas of medicine.

Additional studies will be necessary to determine if NRS routinely overestimate the benefit

associated with a novel intervention. Moreover, while we demonstrated a pattern of

exaggerated benefit among NRS for the subjective outcomes post-operative complications

and length of stay, this finding may not apply to other subjective outcomes. We had hoped to

analyze pain as an additional outcome however, it was too inconsistently and infrequently

reported to do so.

The results of this study raise some important questions for the meta-analysis community;

how should effect estimates from RCTs of varying quality be combined? Incorporating

quality scores into meta-regression analyses has been previously discouraged (Juni et al.

1999; Greenland and O'Rourke 2001). Currently, subgroup analyses are used to explore

126

heterogeneity in a meta-analysis. However, performing a separate analysis of Strong RCTs,

even in the absence of heterogeneity, may reveal important differences among trials that can

nuance the interpretation of random-effects meta-analysis.

The results of this study may fuel the ongoing debate over the utility of NRS for decision

making in health care. While meta-analyses of RCTs will continue to be considered the most

reliable source of evidence for evaluating interventions, NRS may provide important insights

when evaluating objective outcomes. It is important to note though that while this study did

not demonstrate a difference in effect estimates between NRS and Strong RCTs for objective

outcomes, an absence of a difference is not evidence of “no difference.” If our findings are

replicated in other disease areas and with other interventions, perhaps the meta-analyses of

NRS for objective outcomes could be placed higher in most evidence hierarchies.

There is increasing interest in evaluating risk of bias in NRS. Just as Strong RCTs yield

results that differ from other RCTs, perhaps the same is true of Strong NRS. A valid and

reliable tool however is required to identify these NRS. Accordingly, empirical evidence is

necessary to determine which aspects of NRS study design are associated with bias.

Numerous meta-epidemiological studies of RCTs helped to establish which aspects of RCT

design are important when assessing risk of bias — we believe that similar studies are

required for NRS. The choice of referent group for meta-epidemiological studies of NRS

however are less clear. Should NRS without a characteristic (e.g. matched controls) be

compared with NRS where controls were matched? Or should the effect estimates from this

former group be compared with RCTs? Doing so would overlook the differences in quality

among RCTs. Instead, the results of this study would suggest that aggregate effect estimates

from Strong RCTs should serve as the referent group for future meta-epidemiological studies

of NRS. Care should also be taken to make a distinction between subjective and objective

outcomes when performing these proposed studies.

127

5.6 Conclusion

When evaluating subjective outcomes, effect estimates from NRS were associated with

larger estimates of benefit for laparoscopy than Strong RCTs. Typical RCTs (i.e. at unclear

and high risk of bias) also had more extreme estimates of benefit for laparoscopy as

compared with Strong RCTs. Similar trends were not observed among objective outcomes

(mortality and number of lymph nodes harvested).

128

Chapter 6 Empirically identifying the study attributes of non-randomized studies associated with bias:

a meta-epidemiology study

6.1 Summary

Objective

Numerous studies suggest that aspects of RCT study design are associated with biased

intervention effect estimates. Comparable empirical evidence is lacking for NRS. The

objective of this study was to explore the relationship between NRS-design attributes and

estimates of treatment effect.

Methods

A systematic review identified all comparative studies evaluating laparoscopy and

conventional surgery for the management of colon cancer. NRS reporting four outcomes of

interest (post-operative complications, peri-operative mortality, length of stay and number of

lymph nodes harvested) were selected. Nine NRS study characteristics were abstracted as

binary variables: (i) whether the outcome of interest was the primary outcome, (ii) presence

of a sample size calculation, (iii) prospective data collection, (iv) concurrent (versus

historical) controls, (v) matched controls, (vi) standardized concurrent therapy (i.e. post-

operative care), (vii) systematic outcome assessment, (viii) blinded outcome assessment and

(ix) intention to treat analysis. Random-effect meta-analyses were conducted to pool

summary effect estimates across NRS with and without study characteristics. Mixed-effects

meta-regression models were used to compare effect estimates across subgroups. The effect

129

estimates from NRS without study characteristics were compared with effect estimates from

NRS with study characteristics. Effect estimates from NRS with and without study

characteristics were each compared with the results of RCTs at low risk of bias.

Results

A total of 121 NRS reported the outcomes of interest. Most RCTs had retrospective data

collection, concurrent controls, intention to treat analysis but lacked sample size calculations,

matched controls, standardized concurrent therapy (i.e. standardized post-operative care),

blinded outcome assessors or systematic outcome assessment. Effect estimates generally did

not differ across NRS with or without study characteristics except for the outcome peri-

operative mortality; NRS with retrospective data collection had more extreme estimates of

benefit for laparoscopy than NRS with prospective data collection (ROR 0.62, 95% CI 0.44,

0.87, p-value=0.01). In addition, effect estimates were closer to the null (i.e. less in favour of

laparoscopy) in NRS where the primary outcome was peri-operative mortality as opposed to

NRS in which post-operative death was a secondary outcome (ROR 1.68, 95% CI 1.11,

2.52). However, when effect estimates from NRS subgroups were compared with the results

of Strong RCTs, none proved to be statistically significant.

Conclusions

Effect estimates did not consistently vary according to the presence or absence of NRS

design characteristics among studies comparing laparoscopy and open surgery for the

treatment of colon cancer. Additional studies are necessary to identify the attributes of NRS-

design associated with bias.

130

6.2 Introduction

Non-randomized studies are regarded as an important source of information for the efficacy

of interventions, especially in instances where RCTs are not possible or rarely undertaken

(Reeves et al. 2013a). NRS are the main source of evidence for organizational, public health

(Higgins et al. 2013) and surgical interventions (Wente et al. 2003). Moreover, NRS often

provide the sole information about the long-term outcomes of interventions, rare events or

adverse events (Loke et al. 2007). The lack of randomization though renders most NRS at a

heightened risk of selection bias. We have previously shown that for subjective outcomes,

NRS are associated with more extreme estimates of benefit as compared with Strong RCTs

(Chapter 5). It is possible however, that there may be a subgroup of rigorous NRS that yield

results comparable to these high quality RCTs.

Previous meta-epidemiological studies have established that certain aspects of RCT design

are associated with biased effect estimates. For example, RCTs lacking appropriate random

sequence generation (Wood et al. 2008; Savovic et al. 2012), allocation concealment (Schulz

et al. 1995; Moher et al. 1998; Kjaergard, Villumsen, and Gluud 2001; Pildal et al. 2007;

Wood et al. 2008), blinding (Schulz et al. 1995; Kjaergard, Villumsen, and Gluud 2001;

Pildal et al. 2007; Wood et al. 2008; Hrobjartsson et al. 2012; Savovic et al. 2012;

Hrobjartsson et al. 2013) or those with exclusions after randomization (Tierney and Stewart

2005; Nuesch, Trelle, et al. 2009) have been associated with bias. Similar empirical evidence

is lacking for NRS. Identifying the attributes of NRS that are associated with bias could help

those reviewing and meta-analyzing NRS to isolate a subgroup of rigorous studies. Such

advancements are necessary to understand how to best use the evidence from NRS,

especially in domains such as surgery where NRS far outnumber RCTs.

The objective of this study was to explore the relationship between NRS-design attributes

and estimates of treatment effect. The literature comparing laparoscopy with open surgery

for colon cancer was used for this case study of bias.

131

6.3 Methods


All NRS and RCTs comparing laparoscopy with conventional (i.e. open) surgery for the

treatment of colon cancer were identified using the search strategy outlined in Chapter 4

(Section 4.2). Only those studies reporting the outcomes of interest (post-operative

complications, peri-operative mortality, length of stay and number of lymph nodes

harvested) were used in the analyses that follow. Post-operative complications and length of

stay were considered subjective outcomes whereas mortality and number of lymph nodes

harvested were classified as objective outcomes (Chapter 4, Section 4.5.1). It has been

demonstrated that bias associated with RCT design attributes (e.g. allocation concealment) is

more pronounced with subjective outcomes (Wood et al. 2008; Savovic et al. 2012). We

therefore chose to analyze both subjective and objective outcomes in this study. Strong RCTs

were identified according to the approach outlined in Chapter 4 (Section 4.8).

6.3.2 NRS study characteristics

For each NRS, the following nine study characteristics were abstracted as binary variables:

(i) whether the outcome of interest was the primary outcome, (ii) presence of a sample size

calculation, (iii) prospective data collection, (iv) concurrent (versus historical) controls,

(v) matched controls, (vi) standardized concurrent therapy, (vii) systematic outcome

assessment, (viii) blinded outcome assessment and (ix) intention to treat analysis (Table 6.1).

Characteristics were primarily chosen from the conceptual framework described in Chapter 3

via informal consensus among investigators. Of note, characteristics i and ii did not stem

from the conceptual framework but were deemed important to analyze. We included

“outcome of interest as primary outcome” since it was hypothesized that NRS designed to

detect a difference in particular outcome may yield different results from studies where the

132

Table 6.1 NRS study characteristics – definitions and relationship to the conceptual framework for bias in NRS.

Characteristic Framework

Domain & Item Definition Outcome of interest as primary outcome N/A Present

Outcome of interest (i.e. post-operative complications, peri-operative mortality, LOS or number of LN harvested) was identified by study investigators as a primary outcome of the study. Absent Outcome of interest not specified as a primary outcome or no primary outcomes specified.

Sample size calculation N/A Present Investigators state that a sample size calculation (to detect a specified minimum clinically important difference) or power calculation was performed. Absent No such specification provided.

Prospective data collection Information Bias - Source of data

Present Data collection was initiated before the occurrence of outcomes among any members of the cohort under study. Absent Data collection was initiated after the occurrence of outcomes among those under study.

Concurrent controls Selection Bias - Comparability of groups at baseline

Present Controls (i.e. patients in the conventional/open surgery group) treated during the same time period as patients in the intervention (i.e. laparoscopy) group. Absent Patients in the open surgery group treated in a time period that pre-dates the treatment of laparoscopy patients.

Matched controls Selection Bias - Comparability of groups at baseline

Present Investigators state patients in the laparoscopy group were “matched” to those in the control group. Absent No indication of matching patients in the intervention (i.e. laparoscopy) group to those in the control (i.e. open) surgery group.

Standardized concurrent therapy Performance Bias - Concurrent treatment/co- interventions

Present Investigators specify that post-operative care was standardized or mention a specific “enhanced recovery pathway” protocol.

133

Absent Post-operative care was not standardized and surgeons each treated patients according to “standard principles of post-operative care,” or no specific mention is made of post-operative care.

Systematic outcome assessment Detection Bias - Systematic determination of outcome

Present Investigators specify that outcomes were assessed according to a standardized protocol and/or by trained abstractors. Absent No mention of a standardized protocol and/or trained abstractors for assessing outcomes.

Blinded outcome assessment Detection Bias - Blinded outcome assessment

Present Outcomes assessed by an individual blinded to treatment allocation. Absent Outcomes assessed by study personnel aware of treatment allocation or no specification of outcome assessor status.

Intention to treat analysis Attrition Bias - Intention to treat analysis

Present Converted patients, those whose surgeries were initiated as laparoscopic procedures but were completed as open surgery, analyzed as part of the laparoscopy (i.e. intervention) group. Absent Converted patients analyzed as part of the open (i.e. control) group.

134

outcome of interest was a secondary outcome. It was also postulated that the description of a

sample size calculation or post-hoc power calculation may have been a marker for higher

methodological rigor. The remaining characteristics emerge from five of six domains in the

conceptual framework; as there are no registered protocols for NRS, aspects of selective

reporting bias could not be assessed.

6.3.2.1 Validation of study characteristic abstraction

A second reviewer abstracted NRS study characteristics from a random subset of NRS

reporting post-operative complications (38 of 79 studies). Crude agreement was above 95%

and for all study characteristics (Table 6.2). Cohen’s kappa coefficients ranged from 0.77 to

1.00, indicating “very good” inter-rater agreement (Landis and Koch 1977).

Table 6.2 Measures of inter-rater agreement.

Study Characteristic Crude Agreement Cohen’s Kappa Outcome of interest as primary outcome 36/38 (94.7%) 0.89 Sample size calculation 38/38 (100%) 1.00 Prospective data collection 37/38 (97.4%) 0.95 Concurrent controls 36/38 (94.7%) 0.87 Matched controls 37/38 (97.4%) 0.91 Standardized concurrent therapy 38/38 (100%) 1.00 Systematic outcome assessment 36/38 (94.7%) 0.77 Blinded outcome assessment 38/38 (100%) 1.00 Intention to treat analysis 37/38 (97.4%) 0.84

6.3.3 Statistical analyses

Descriptive statistics were calculated to compare NRS and Strong RCTs in terms of year of


135


authors, and length of articles. Medians were compared using the Mann-Whitney U test and

categorical variables using the Pearson’s Chi-square or Fisher’s exact test, as appropriate

(Pagano and Gauvreau 2000). A p-value <0.05 was considered significant.

Absolute and relative frequencies were calculated for each of the NRS study characteristics.

Random-effect meta-analyses were conducted to pool summary effect estimates across NRS

with and without study characteristics (Altman, Egger, and Smith 2001). Inverse variance

weighting was used to combine studies (Sutton et al. 1998). Between-study variance (tau

squared, τ2) was estimated using the methods outlined by DerSimonian and Laird

(DerSimonian and Laird 1986). I2 quantities were calculated to describe the degree of

between-study heterogeneity with values of 0-40% considered low, 30-60% moderate, 50-

90% substantial and 75-100% considerable (Higgins, Green, and Cochrane Collaboration.

2011).

Mixed-effects meta-regression models were generated to compare pooled effect estimates

across subgroups (Thompson and Higgins 2002). Between-study variance (tau squared, τ2)

was estimated using the restricted maximum likelihood estimator (Viechtbauer 2010). For

binary outcomes (post-operative complications and peri-operative mortality), meta-

regression modeling yielded ratios of odds ratios for predictor variables:

𝑅𝑂𝑅 = group Agroup B

(6.4)

As an example, a ROR<1.0 would suggest that the pooled odds ratio in group A is smaller

than the pooled odds ratio for group B. If group A represents studies with retrospective data

collections and group B, studies with prospective data collection, a ROR of 0.80 indicates

that odds ratios were 20% smaller for group A, on average, than in group B. A ROR of 1.20

would suggest the opposite. For continuous outcomes (length of stay and number of lymph

nodes harvested), meta-regression modeling produced differences in mean differences

(DMDs) for predictor variables:

136

𝐷𝑀𝐷 = 𝑐𝑜𝑚𝑏𝑖𝑛𝑒𝑑 𝑚𝑒𝑎𝑛 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒 𝑔𝑟𝑜𝑢𝑝 𝐴 − 𝑐𝑜𝑚𝑏𝑖𝑛𝑒𝑑 𝑚𝑒𝑎𝑛 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒 𝑔𝑟𝑜𝑢𝑝 𝐵 (6.5)

For example, consider the outcome length of stay. In each individual study, a mean

difference (MD) was calculated for the length of stay, with negative values indicating that

laparoscopy was associated with a shorter length of stay than open surgery.

𝑀𝐷 = 𝑚𝑒𝑎𝑛𝑙𝑎𝑝𝑎𝑟𝑜𝑠𝑐𝑜𝑝𝑦 − 𝑚𝑒𝑎𝑛𝑜𝑝𝑒𝑛 𝑠𝑢𝑟𝑔𝑒𝑟𝑦 (6.3)

A DMD of -1.50 would suggest that the MD was 1.5 days more in favor of laparoscopy in

group A than in group B.

Mixed-effects meta-regression modeling was first used to compare effect estimates from

NRS with and without study characteristics. These models generated RORs for binary

outcomes and DMDs for continuous outcomes;

𝑅𝑂𝑅 = combined effect estimate NRS𝒘𝒊𝒕𝒉𝒐𝒖𝒕 study charcaretistic

combined effect estimate NRS𝒘𝒊𝒕𝒉 study charcaretistic (6.1)

𝐷𝑀𝐷 = 𝑐𝑜𝑚𝑏𝑖𝑛𝑒𝑑 𝑚𝑒𝑎𝑛 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒 𝑁𝑅𝑆𝒘𝒊𝒕𝒉𝒐𝒖𝒕 𝑠𝑡𝑢𝑑𝑦 𝑐ℎ𝑎𝑟𝑎𝑐𝑡𝑒𝑟𝑖𝑠𝑡𝑖𝑐 − 𝑐𝑜𝑚𝑏𝑖𝑛𝑒𝑑 𝑚𝑒𝑎𝑛 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒 𝑁𝑅𝑆𝒘𝒊𝒕𝒉 𝑠𝑡𝑢𝑑𝑦 𝑐ℎ𝑎𝑟𝑎𝑐𝑡𝑒𝑟𝑖𝑠𝑡𝑖𝑐 (6.2)

Subsequently, effect estimates from NRS with or without study characteristics were each

compared with the results of Strong RCTs.

i) 𝑅𝑂𝑅 = combined effect estimate NRS𝒘𝒊𝒕𝒉 study charcaretistic

combined effect estimate Strong RCTs (6.3)

𝐷𝑀𝐷 = 𝑐𝑜𝑚𝑏𝑖𝑛𝑒𝑑 𝑚𝑒𝑎𝑛 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒 𝑁𝑅𝑆𝒘𝒊𝒕𝒉 𝑠𝑡𝑢𝑑𝑦 𝑐ℎ𝑎𝑟𝑎𝑐𝑡𝑒𝑟𝑖𝑠𝑡𝑖𝑐 − 𝑐𝑜𝑚𝑏𝑖𝑛𝑒𝑑 𝑚𝑒𝑎𝑛 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒 𝑆𝑡𝑟𝑜𝑛𝑔 𝑅𝐶𝑇𝑠 (6.4)

ii) 𝑅𝑂𝑅 = Pooled effect estimate NRS𝒘𝒊𝒕𝒉𝒐𝒖𝒕 study charcaretistic

combined effect estimate Strong RCTs (6.3)

𝐷𝑀𝐷 = 𝑐𝑜𝑚𝑏𝑖𝑛𝑒𝑑 𝑚𝑒𝑎𝑛 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒 𝑁𝑅𝑆𝒘𝒊𝒕𝒉𝒐𝒖𝒕 𝑠𝑡𝑢𝑑𝑦 𝑐ℎ𝑎𝑟𝑎𝑐𝑡𝑒𝑟𝑖𝑠𝑡𝑖𝑐 − 𝑐𝑜𝑚𝑏𝑖𝑛𝑒𝑑 𝑚𝑒𝑎𝑛 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒 𝑆𝑡𝑟𝑜𝑛𝑔 𝑅𝐶𝑇𝑠 (6.4)

Effect estimates from Strong RCTs were considered the “gold-standard” summary estimate

when comparing laparoscopy with open colon surgery. All analyses were performed using R,

137

version 2.15.0 (R Foundation for Statistical Computing, Vienna, Austria). A p-value <0.05

was considered significant.

6.4 Results


A total of 121 NRS reported the outcomes of interest (Table 6.3). Four trials were

categorized as Strong RCTs (Nelson et al. 2004; Guillou et al. 2005; Hewett et al. 2008).

Most NRS and all Strong RCTs were conducted in an academic setting. NRS articles were

shorter and authored by fewer investigators. On average, Strong RCTs enrolled more patients

than NRS. All Strong RCTs had at least one author with statistical expertise. Consortiums

were rarely involved in the execution of NRS.


NRS (N=121)

Strong RCTs (N=4) p-value•


2005 (2005-2006) 0.98†

Participants 129 (67-265)

828.5 (743.5-917.8) <0.001†

Authors

Number* 5 (4-7)

8.5 (5.3-10.5) 0.22†


4 (100.0) <0.001♦


4 (100.0) <0.001♦


4 (100.0) 0.58♦


8.8 (8.8-10.25) <0.001†

* Median, (Interquartile Range, IQR). § Number (percentage). † Medians compared using the Mann-Whitney U test. ♦ Frequencies compared using Fisher’s exact test. • Statistically significant p values (<0.05) indicated in bold.

138

6.4.2 Subjective outcomes


Seventy-nine NRS reported the frequency of post-operative complications. These

comparative studies involved a total of 1,086,216 participants (n=281,385 undergoing

laparoscopy and n=804,831 open surgery). There was no difference in the number of patients

assigned to either laparoscopy or open surgery (median=74.0, IQR=33.5-150.0 versus

median=75.0, IQR=34.0-154.5, p-value=0.66).

Table 6.4 summarizes the frequency of study characteristics across these studies. Post-

operative complications was the primary outcome of 15.2% (n=12) of studies. Retrospective

data collection was more common than prospective data collection. Very few studies used

historical controls (n=7, 8.9%) and nearly a third employed matched controls to overcome

selection bias (n=24, 30.4%). Post-operative care was rarely standardized in these studies

(n=4, 5.1%). None of the NRS utilized blinded outcome assessment and a small minority

standardized the assessment of outcomes (n=10, 12.7%).

Table 6.4 Distribution of study attributes among NRS reporting post-operative complications (n=79).

Attribute Present N (%)

Absent N (%)

Primary outcome 12 (15.2) 67 (84.8) Sample size calculation performed 3 (3.8) 76 (96.2) Prospective data collection 34 (43.0) 45 (56.9) Concurrent controls 72 (91.1) 7 (8.9) Matched controls 24 (30.4) 55 (69.6) Standardized concurrent therapy 4 (5.1) 75 (94.9) Systematic outcome assessment 10 (12.7) 69 (87.3) Blinded outcome assessment 0 (0.0) 79 (100.0) Intention to treat analysis 68 (86.1) 11 (13.9)

There were 29, or 512 possible combinations of study characteristics across NRS since we

examined nine binary study characteristics. However, half of the NRS adhered to either one

139

of three patterns (Table 6.5). A total of 22.8% of studies had retrospective data collection,

concurrent controls and an intention to treat analysis but lacked sample size calculations,

matched controls, standardized concurrent therapy (i.e. standardized post-operative care),

blinded outcome assessment or systematic outcome assessment. The frequency of post-

operative complications was not the primary outcome of these studies. The second most

common pattern (n=14 studies, 17.8%) differed from the first in that data collection was

prospective. Pattern 3 (n=5 studies, 8.5%) instead had retrospective data collection, matched

controls but was otherwise identical to Patterns 1 and 2.

The results of subgroup random-effects meta-analyses are outlined in Table 6.6 and

Figure 6.1. Laparoscopy was associated with fewer post-operative complications than open

surgery for all subgroup analyses, except in instances where a sample size calculation had

been performed (n=3 studies, OR 0.80, 95% CI 0.31,2.04), historical controls were employed

(n= 7 studies, OR 0.77, 95% CI 0.48,1.25) and outcomes had been assessed according to a

standardized protocol (n=10 studies, OR 0.75, 95% CI 0.57,0.70). Only two subgroups had I2

values below 40% (i.e. NRS where a sample size calculation was performed and NRS

without concurrent controls).

Mixed-effects meta-regression models were used to compare effect estimates across

subgroups and the results are summarized in Table 6.7. The pooled effect estimates for NRS

without a characteristic were each compared with the pooled effect estimate for NRS with

the study characteristic. The ratios of odds ratios ranged from 0.82 to 1.29 for these

comparisons and none were statistically significant.

Table 6.8 presents selected results from Chapter 5; combined effect estimates are separately

outlined for NRS and Strong RCTs. These estimates were previously compared with one

another and for subjective outcomes (post-operative complications and length of stay) and

NRS attributed more benefit to laparoscopy than Strong RCTs. Summary effect estimates

from NRS with and without study characteristics were compared with the results of Strong

RCTs (Table 6.9). An inconsistent pattern emerged where the absence of a NRS study

characteristic was occasionally associated with more extreme benefits for laparoscopy;

140

Table 6.5 Study characteristics patterns across NRS reporting post-operative complications (n=79 studies).

Pattern*

N %

Primary outcome

Sample size

calculation

Prospective Data

Collection Matched controls

Concurrent controls

Standardized concurrent

therapy

Systematic outcome

assessment

Blinded outcome

assessment

Intention to treat analysis

Pattern 1 18/79 (22.8%) - - - - + - - - +

Pattern 2 14/79 (17.8%) - - + - + - - - +

Pattern 3 8/79 (10.1%) - - - + + - - - +

Pattern 4 5/79 (6.3%) - - + + + - - - +

Pattern 5 4/79 (5.1%) - - + - + - - - -

Pattern 6 2/79 (2.5%) - + + + + - - - +

Pattern 7 2/79 (2.5%) + - + - + - + - +

Pattern 8 2/79 (2.5%) - - + - + - + - +

Pattern 9 2/79 (2.5%) + - - + + - - - +

Pattern 10 2/79 (2.5%) + - - - + - + - +

*Patterns are listed in order of decreasing frequency. The ten most frequent patterns are described, and represent 74.6% of NRS reporting post-operative complications.

141

Table 6.6 Random-effects meta-analyses results among NRS reporting post-operative complications (n=79).

Attribute Present Absent

N

OR* [95% CI] I2♦ (95% CI) N

OR* [95% CI] I2♦ (95% CI)

Primary outcome specified 12 0.65 (0.56, 0.75) 84.7 (74.9-90.7) 67 0.61 (0.52, 0.71) 78.6 (73.2-82.9) Sample size calculation performed 3 0.80 (0.31, 2.04) 0.0 (0.0-87.2) 76 0.63 (0.57, 0.70) 80.6 (76.2-84.2) Prospective data collection 24 0.65 (0.55, 0.77) 53.4 (31.2-68.5) 45 0.62 (0.55, 0.70) 84.4(79.9-87.9) Concurrent controls 72 0.63 (0.57, 0.69) 81.4 (77.1-84.9) 7 0.77 (0.48, 1.25) 6.3 (0.0-72.6) Matched controls 24 0.56 (0.43, 0.75) 52.0 (23.4-69.9) 55 0.65 (0.58, 0.72) 83.8 (79.6-87.1) Standardized concurrent therapy 4 0.56 (0.23,1.39) 72.3 (21.6-90.2) 75 0.64 (0.57, 0.70) 80.2 (75.6-83.9) Systematic outcome assessment 10 0.75 (0.63, 0.89) 95.8 (93.8-97.1) 69 0.59 (0.52, 0.67) 48.3 (31.5-60.9) Blinded outcome assessment 0 - - 79 0.63 (0.57, 0.70) 79.9 (75.4-83.6) Intention to treat analysis 68 0.63 (0.57, 0.70) 81.8 (77.4-85.3) 11 0.63 (0.41, 0.97) 48.8 (0.0-74.4)

* Odds Ratio. OR<1 indicates that laparoscopy is associated with fewer post-operative complications as compared with open surgery. ♦ I-squared describes the percentage of total variation across studies that is due to heterogeneity instead of chance.

Table 6.7 Univariable meta-regression results among NRS reporting post-operative complications.

ROR* (95% CI) p-value Primary outcome 1.01 (0.72, 1.41) 0.97 Sample size calculation 0.82 (0.28, 2.49) 0.74 Prospective data collection 0.90 (0.69, 1.17) 0.44 Matched controls 1.09 (0.81, 1.48) 0.56 Concurrent controls 1.29 (0.72, 2.29) 0.39 Standardized concurrent therapy 1.11 (0.60, 2.08) 0.74 Systematic outcome assessment 0.82 (0.60, 1.11) 0.20 Blinded outcome assessment§ - - Intention to treat analysis 1.03 (0.68, 1.56) 0.90

* Ratios of odds ratios. Summary Effect estimates from NRS without characteristics were compared with summary effect estimates from NRS with study characteristics. A ROR<1.0 indicates that NRS without a study characteristic yield combined effect estimates that are more extreme than in NRS with the study characteristic.

142

Figure 6.1 Forest plot of meta-analysis results, stratified according to the presence or absence of specific NRS study characteristics for the outcome post-operative complications. Squares indicate odds ratios and error bars indicate 95% confidence intervals.

143

Table 6.8 Random-effects meta-analysis results across outcomes of interest and different study designs.

* Odds Ratio. OR<1 indicates that laparoscopy is associated with fewer post-operative complications as compared with open surgery. ◊ Odds Ratio. OR<1 indicates that laparoscopy is associated with fewer deaths. † Mean Difference, MD=meanlaparoscopy-meanopen. A MD<0 indicates that laparoscopy is associated with a shorter length of stay. § Mean Difference, MD=meanlaparoscopy-meanopen. A MD<0 indicates that laparoscopy is associated with finding fewer lymph nodes in the surgical specimen. ♦ I-squared describes the percentage of total variation across studies that is due to heterogeneity instead of chance. a Ratios of odds ratios. b Difference in mean differences.

Outcome NRS Strong RCTs NRS compared with Strong RCTs

OR (95% CI) I2♦ (95% CI) OR (95% CI) I2♦ (95% CI) RORa (95% CI) p-value Post-operative complications* 0.63 (0.57, 0.70) 77.2 (72.5, 81.1) 0.96 (0.80, 1.15) 15.4 (0.0, 87.1) 0.64 (0.42, 0.97) 0.04 Peri-operative mortality◊ 0.59 (0.47, 0.74) 33.9 (12.7, 50.0) 0.78 (0.46, 1.32) 0.0 (0.0, 73.9) 0.74 (0.38, 1.44) 0.38 MD (95% CI) I2♦ (95% CI) MD (95% CI) I2♦ (95% CI) DMDb (95% CI) p-value Length of stay† -2.95 (-3.39, -2.50) 97.3 (97.0, 97.6) -0.70 (-1.23, -0.17) 92.3 (83.4, 96.4) -2.15 (-4.08, -0.21) 0.03 Number of lymph nodes harvested§ 0.07 (-0.53, 0.67) 93.4 (92.2, 94.5) -0.55 (-1.37, 0.26) 86.3 (66.7, 94.4) 0.49 (-1.43, 2.42) 0.62

144

studies where post-operative complications were not the primary outcome, sample sizes were

absent, data retrospectively collected, controls were concurrent, systematic outcome

assessment was absent and where intention to treat analysis was performed had statistically

significant ratios of odds ratios < 1.0. Ratios of odds ratios were for the remaining

comparisons were also less than 1.0 and thus, the absence of a study characteristic was not

consistently related to more extreme estimates among NRS as compared with Strong RCTs.

Table 6.9 Univariable meta-regression results comparing NRS with or without study characteristics with Strong RCTs.

ROR* (95% CI) p-value§ Primary Outcome

Characteristic Absent 0.64 (0.41, 0.97) 0.04 Characteristic Present 0.63 (0.39, 1.04) 0.07

Strong RCTs Ref - Sample size calculation


Strong RCTs Ref - Prospective data collection


Strong RCTs Ref - Matched controls


Strong RCTs Ref - Concurrent controls


Strong RCTs Ref - Standardized concurrent therapy


Strong RCTs Ref - Systematic outcome assessment


Strong RCTs Ref - Blinded Outcome Assessment

Characteristic Absent 0.64 (0.42, 0.96) 0.03 Strong RCTs Ref -

Intention to treat analysis Characteristic Absent 0.65 (0.37, 1.13) 0.12 Characteristic Present 0.63 (0.42, 0.97) 0.03

Strong RCTs Ref - * Ratios of odds ratios. § Statistically significant p values (<0.05) indicated in bold.

145


Estimates for length of stay were reported in 106 NRS studies. These NRS involved 917,990

patients (n=57,900 having laparoscopic surgery and n=860,090 open surgery). Laparoscopy

and open groups did not differ in size (median=55.5, IQR=28.0-103.0 versus median=56.5,

IQR=30.2-142.5, p-value=0.56).

Table 6.10 describes the frequency of study characteristics across these NRS. Length of stay

was rarely the primary outcome of most studies (n=5, 4.7%). Sample size calculations (n=2,

1.9%), standardized post-operative care (n=11, 10.4%) and systematic outcome assessment

(n=11, 10.4%) were similarly infrequent. No studies employed blinded outcome assessment.

Just over a quarter of studies (n=28, 26.4%) used matching to mitigate the effects of selection

bias. Retrospective data collection was again common.

Table 6.10 Distribution of study attributes among NRS reporting length of stay (n=106).


Absent N (%)

Primary outcome* 5 (4.7) 101 (95.3) Sample size calculation performed 2 (1.9) 104 (98.1) Prospective data collection 45 (42.5) 61 (57.5) Concurrent controls 94 (88.7) 12 (11.3) Matched controls 28 (26.4) 78 (73.6) Standardized concurrent therapy 9 (8.5) 97 (91.5) Systematic outcome assessment 11 (10.4) 95 (89.6) Blinded outcome assessment 0 (0.0) 106 (100.0) Intention to treat analysis 94 (88.7) 12 (11.3)

Patterns of study characteristics across NRS were analyzed and the ten most common

patterns are outlined in Table 6.11. Approximately half of these studies adhered to one of

two patterns; 25.5% of studies had retrospective data collection, concurrent controls and

146

intention to treat analysis but did not feature sample size calculations, matched controls,

standardized concurrent therapy (i.e. standardized post-operative care), blinded outcome

assessment or systematic outcome assessment. Length of stay was not the primary outcome

of these studies. An additional 21.7% were similar except that data had been collected

prospectively. Pattern 3 (n=12 studies, 11.3%) was identical to Pattern 1 except for the use of

matched controls.

Table 6.12 and Figure 6.2 outline the results of subgroup meta-analyses — effect estimates

were combined for NRS when specific study characteristics were present or absent.

Laparoscopy was associated with a shorter length of stay than open surgery for all subgroup

meta-analyses with two exceptions; NRS lacking concurrent controls did not show a benefit

for laparoscopy (DMD -2.84, 95% CI -3.65, 1.85) and NRS with a sample size calculation

(n=2 studies, MD -9.56, 95% CI -20.02, 0.90) similar did not favor laparoscopy. Notably,

only 2 studies with 149 patients contributed to this last subgroup and confidence intervals are

accordingly wide. In one of these studies, the mean length of stay among patients undergoing

open surgery (n=62 patients) was 35.8 days versus 18.7 days for laparoscopy patients (n=45)

(Marubashi et al. 2000).

Mixed-effects meta-regression modeling was used to compare summary effect estimates

across subgroups. Table 6.13 outlines the results of comparing NRS without a characteristic

to NRS with a specific characteristic. The effect estimate from studies lacking a sample size

calculation (n=104) was statistically different from effect estimates from studies with a stated

sample size calculation (n=2). However, this comparison should be interpreted with caution

given only two small studies contributed to the referent group (i.e. NRS with a sample size

calculation) and one of these studies had very long lengths of stay for enrolled patients. None

of the remaining comparisons were statistically significant and thus, effect estimates did not

differ according to the presence or absence of a specific study characteristic.

147

Table 6.11 Study characteristics patterns across NRS reporting length of stay (n=106 studies).

Pattern*

N %

Primary outcome

Sample size

calculation

Prospective Data


Concurrent controls


therapy

Systematic outcome

assessment

Blinded outcome

assessment


Pattern 1 27/106 (25.5%) - - - - + - - - +

Pattern 2 23/106 (21.7%) - - + - + - - - +

Pattern 3 12/106 (11.3%) - - - + + - + - +

Pattern 4 5/106 (4.7%) - - + - + - - - +

Pattern 5 5/106 (4.7%) - - - - - - - - +

Pattern 6 4/106 (3.8%) - - + + + - + - +

Pattern 7 4/106 (3.8%) - - + - + - - - -

Pattern 8 3/106 (2.8%) - - - - + - - - +

Pattern 9 3/106 (2.8%) - - - - + - - - -

Pattern 10 2/106 (1.9%) - - + - + + - - +

*Patterns are listed in order of decreasing frequency. The ten most frequent patterns are described, and represent 83.0% of NRS reporting length of stay.

148

Table 6.12 Random-effects meta-analysis results among NRS reporting length of stay (n=106).


N MD* (95% CI) I2♦ (95% CI) N MD* (95% CI) I2♦ (95% CI) Primary outcome specified 5 -2.98 (-3.97, -1.99) 98.7 (98.1-99.1) 101 -2.94 (-3.41, -2.48) 96.9 (96.5-97.2) Sample size calculation performed 2 -9.56 (-20.02, 0.90) 98.9* 104 -2.79 (-3.14, -2.44) 97.2 (96.9-97.5) Prospective data collection 45 -2.84 (-3.37, -2.31) 87.4 (84.1-90.1) 61 -2.99 (-3.65, -2.32) 98.3 (98.1-98.5) Concurrent controls 94 -2.94 (-3.43, -2.46) 97.6 (97.3-97.8) 12 -2.84 (-3.83, -1.85) 80.3 (66.4-88.4) Matched controls 28 -3.10 (-3.78, -2.42) 97.2 (96.5-97.7) 78 -2.90 (-3.45, -2.35) 97.3 (97.0-97.6) Standardized concurrent therapy 9 -3.47 (-5.22, -1.72) 93.7 (90.1-96.0) 97 -2.89 (-3.33, -2.44) 97.4 (97.1-97.6) Systematic outcome assessment 11 -2.42 (-2.88, -1.96) 98.8 (98.5-99.1) 95 -3.01 (-3.52, -2.52) 96.9 (96.6-97.2) Blinded outcome assessment 0 - - 106 -2.95 (-3.39, -2.50) 97.3 (97.0-97.6) Intention to treat analysis 94 -2.87 (-3.24, -2.49) 97.5 (97.2-97.7) 12 -3.34 (-5.87, -0.80) 95.1 (93.0-96.6)

* Mean Difference, MD=meanlaparoscopy-meanopen. A MD<0 indicates that laparoscopy is associated with a shorter length of stay. ♦ I-squared describes the percentage of total variation across studies that is due to heterogeneity instead of chance.

Table 6.13 Univariable meta-regression results among NRS reporting length of stay.

DMD* (95% CI) p-value Primary Outcome 0.07 (-2.05, 2.18) 0.95 Sample size calculation 6.87 (3.89, 9.84) <0.001 Prospective data collection -0.09 (-0.99, 0.81) 0.84 Matched controls 0.21 (-0.81, 1.24) 0.69 Concurrent controls -0.04 (-1.49, 1.42) 0.96 Standardized concurrent therapy 0.62 (-0.96, 2.20) 0.44 Systematic outcome assessment -0.55 (-1.88, 0.79) 0.43 Blinded Outcome Assessment - - Intention to treat analysis -0.50 (-1.93, 0.94) 0.50

* Differences in mean differences. A negative DMD indicates that NRS without a study characteristics yielded combined effect estimates that are more extreme than NRS with the study characteristic.

149

Figure 6.2 Forest plot of meta-analysis results, stratified according to the presence or absence of specific NRS study characteristics for the outcome length of stay. Squares indicate mean differences and error bars indicate 95% confidence intervals.

150

An inconsistent pattern again emerged when comparing effect estimates from NRS with or

without study characteristics to the gold-standard (i.e. effect estimates from Strong RCTs)

(Table 6.14). Differences in mean differences were statistically significant when NRS lacked

a study characteristic (length of stay was not the primary outcome, prospective data

collection was absent, concurrent therapy was not standardized, systematic outcome

assessment was lacking) as well as when concurrent controls were used. For the

characteristics sample size calculation, matched controls and intention to treat analysis,

DMDs were statistically significant when both study characteristics were present or absent.

The negative DMDs observed indicate that NRS generally attributed shorter lengths of stay

to laparoscopy than Strong RCTs, however, there was no clear pattern of more extreme

estimates of benefit attributable to the absence of NRS study characteristics.

6.4.3 Objective outcomes


Seventy-nine NRS examined the association between surgical approach (laparoscopic or

open colon surgery) and mortality. A total of 1,078,369 patients were included in these NRS

(n=52,485 had laparoscopy and n=1,025,884 open surgery). On average, laparoscopy and

open surgery groups were equal in size (median=61.0, IQR=36.0-138.5 versus median=61.0,

IQR 34.0-147.5, p-value=0.64).

Peri-operative mortality was the primary outcome of 11.4% of studies (Table 6.15). Again,

retrospective data collection was common (n=43, 54.4%) as was the use of concurrent

controls (n=71, 89.9%). Post-operative care was rarely standardized (n=10.1%) and outcome

collection was similarly not standardized in most studies (n=70, 88.6%). Matched controls

were used in 29.1% (n=23 studies).

151


DMD* (95% CI) p-value§ Primary Outcome

Characteristic Absent -2.15 (-4.20, -0.10) 0.04 Characteristic Present -2.22 (-5.06, 0.63) 0.13

Strong RCTs Ref - Sample size calculation

Characteristic Absent -2.02 (-3.82, -0.21) 0.03 Characteristic Present -8.89 (-12.28, -5.50) <0.001

Strong RCTs Ref - Prospective data collection


Strong RCTs Ref - Matched controls

Characteristic Absent -2.10 (-4.16, -0.04) 0.046 Characteristic Present -2.31 (-4.49, -0.13) 0.04

Strong RCTs Ref - Concurrent controls

Characteristic Absent -2.18 (-4.60, 0.24) 0.08 Characteristic Present -2.15 (-4.20, -0.10) 0.04

Strong RCTs Ref - Standardized concurrent therapy

Characteristic Absent -2.10 (-4.14, -0.06) 0.04 Characteristic Present -2.73 (-5.21, -0.25) 0.03

Strong RCTs Ref - Systematic outcome assessment


Strong RCTs Ref - Blinded Outcome Assessment

Characteristic Absent -2.15 (-4.19, -0.12) 0.04 Strong RCTs Ref -

Intention to treat analysis Characteristic Absent -2.60 (-5.00, -0.20) 0.04 Characteristic Present -2.10 (-4.15, -0.05) 0.04

Strong RCTs Ref - * Differences in mean differences. § Statistically significant p values (<0.05) indicated in bold.

152

Table 6.15 Distribution of study attributes among NRS reporting peri-operative mortality (n=79).


Absent N (%)


Table 6.16 outlines the most commonly observed patterns of study characteristics across

NRS reporting peri-operative mortality. Half of these studies adhered to either Pattern 1, 2 or

3. A total of 24.1% of studies had retrospective data collection, concurrent controls and

intention to treat analysis but did not match controls, standardize concurrent therapy (i.e.

post-operative care), blind outcome assessors, have systematic outcome assessment or

sample size calculations. Another 20.3% of studies had prospective data collection and were

otherwise identical to Pattern 1. Patterns 1 & 3 only differed in that the latter (n=7 studies,

8.8%) employed matched controls.

Subgroup meta-analyses were performed and studies were divided according to the presence

or absence of NRS study characteristics (Table 6.17 and Figure 6.3). Laparoscopy was

associated with lower peri-operative mortality in most subgroups except when a sample size

calculation had been performed, concurrent therapy standardized, historical controls used and

when intention to treat analysis was absent.

Table 6.18 outlines the results of mixed-effects meta-regression modeling which allowed for

the comparison of effect estimates across NRS subgroups. NRS with retrospective data

collection attributed more benefit to laparoscopy than NRS with prospective data collection

(ROR 0.62, 95% CI 0.44, 0.87, p-value=0.01). NRS whose primary outcome had been peri-

153

Table 6.16 Study characteristics patterns across NRS reporting peri-operative mortality (n=79 studies).

Pattern*

N %

Primary outcome

Sample size

calculation

Prospective Data


Concurrent controls


therapy

Systematic outcome

assessment

Blinded outcome

assessment


Pattern 1 19/79 (24.1%) - - - - + - - - +

Pattern 2 16/79 (20.3%) - - + - + - - - +

Pattern 3 7/79 (8.9%) - - - + + - - - +

Pattern 4 4/79 (5.1%) - - + + + - - - +

Pattern 5 3/79 (2.5%) - - + - + - - - +

Pattern 6 3/79 (2.5%) - - + - + - - - +

Pattern 7 2/79 (2.5%) - - + + + + + - +

Pattern 8 2/79 (2.5%) - - + - + + + - +

Pattern 9 2/79 (2.5%) - - - + - - - - +

Pattern 10 2/79 (2.5%) - - - - + - + - +

*Patterns are listed in order of decreasing frequency. The ten most frequent patterns are described, and represent 88.4% of NRS reporting peri-operative mortality.

154

Table 6.17 Random-effects meta-analysis results among NRS reporting peri-operative mortality (n=79).


N OR*

[95% CI] I2♦ (95% CI) N OR*

[95% CI] I2♦ (95% CI) Primary outcome 9 0.36 (0.32, 0.42) 0.0 (0.0-57.4) 70 0.74 (0.67, 0.83) 0.0 (0.0-2.8) Sample size calculation performed 1 2.16 (0.09, 55.08) - 78 0.59 (0.47, 0.73) 34.4 (13.3-50.4) Prospective data collection 36 0.78 (0.70, 0.88) 0.0 (0.0-13.5) 43 0.39 (0.34, 0.45) 0.0 (0.0-14.8) Concurrent controls 71 0.59 (0.46, 0.74) 39.0 (18.7-54.3) 8 0.70 (0.24, 2.10) 0.0 (0.0, 27.2) Matched controls 23 0.61 (0.43, 0.85) 0.0 (0.0-0.0) 56 0.58 (0.43, 0.76) 49.5 (31.2-63.0) Standardized concurrent therapy 8 0.47 (0.19, 1.15) 0.0 (0.0-55.4) 71 0.60 (0.47, 0.76) 37.9 (17.1-53.5) Systematic outcome assessment 9 0.51 (0.35, 0.76) 57.7 (11.4-79.8) 70 0.76 (0.68, 0.85) 0.0 (0.0-0.0) Blinded outcome assessment 0 - - 79 0.59 (0.47, 0.74) 33.9 (12.7-50.0) Intention to treat analysis 70 0.58 (0.45, 0.74) 39.6 (19.3- 54.8) 9 0.80 (0.35, 1.79) 0.0 (0.0-11.7)

* Odds Ratio. OR<1 indicates that laparoscopy is associated with fewer deaths. ♦ I-squared describes the percentage of total variation across studies that is due to heterogeneity instead of chance.

Table 6.18 Univariable meta-regression results among NRS reporting peri-operative mortality.

ROR (95% CI) p-value§ Primary Outcome 1.68 (1.11, 2.52) 0.01 Sample size calculation 0.27 (0.01, 7.30) 0.44 Prospective data collection 0.62 (0.44, 0.87) 0.01 Matched controls 0.85 (0.50, 1.43) 0.54 Concurrent controls 1.22 (0.39, 3.81) 0.73 Standardized concurrent therapy 1.23 (0.46, 3.25) 0.68 Systematic outcome assessment 1.33 (0.89, 1.99) 0.17 Blinded Outcome Assessment - - Intention to treat analysis 1.41 (0.59, 3.38) 0.44

* Ratios of odds ratios. Summary Effect estimates from NRS without characteristics were compared with summary effect estimates from NRS with study characteristics. A ROR<1.0 indicates that NRS without a study characteristic yield combined effect estimates that are more extreme than in NRS with the study characteristic. § Statistically significant p values (<0.05) indicated in bold.

155

Figure 6.3 Forest plot of meta-analysis results, stratified according to the presence or absence of specific NRS study characteristics for the outcome peri-operative mortality. Squares indicate mean differences and error bars indicate 95% confidence intervals. The dashed black line indicates that the confidence interval extends beyond the plot area.

156

operative morality (n= 9 studies) had effect estimates closer to the null, and thus were less in

favour of laparoscopy than NRS where peri-operative mortality was a secondary outcome

(ROR 1.68; 95% CI 1.11, 2.52). The absence of the remaining study characteristics was not

associated with more extremes estimates of benefit for laparoscopy.

Table 6.19 outlines the results of comparing effect estimates from NRS with or without a

study characteristic with the results of Strong RCTs. The presence or absence of NRS study

characteristics was not associated with more extreme estimates of benefit for laparoscopy.


Fifty-nine NRS reported the number of lymph nodes harvested. These studies involved

252,482 participants (n=15,302 underwent laparoscopy and n= 237,180 open surgery).

Laparoscopy and open surgery groups had roughly an equal number of patients (median 50,

IQR 27.0-94.5 versus median 55, IQR 30.5-132.5, p-value=0.46).

Number of lymph nodes harvested was rarely the primary outcome of most NRS (Table

6.20). A minority of studies employed blinding of pathologists (n=3, 5.1%). Only one study

mentioned standardizing the processing and assessment of surgical specimens (1.7%) and

thus was considered to have standardized outcome assessment. Data collection was more

often retrospective than prospective and concurrent controls were more common than

historical ones. Matched controls were used in 23.7% of studies.

157


ROR* (95% CI) p-value Primary Outcome


Strong RCTs Referent - Sample size calculation


Strong RCTs Referent - Prospective data collection


Strong RCTs Referent - Matched controls


Strong RCTs Referent - Concurrent controls


Strong RCTs Referent - Standardized concurrent therapy


Strong RCTs Referent - Systematic outcome assessment


Strong RCTs Referent - Blinded Outcome Assessment

Characteristic Absent 0.74 (0.38, 1.44) 0.38 Strong RCTs Referent -

Intention to treat analysis Characteristic Absent 1.10 (0.35, 3.51) 0.87 Characteristic Present 0.78 (0.34, 1.77) 0.56

Strong RCTs Referent - * Ratios of odds ratios. § Statistically significant p values (<0.05) indicated in bold.

158

Table 6.20 Distribution of study attributes among NRS reporting number of lymph nodes harvested (n=59).


Absent N (%)


Table 6.21 describes the most common patterns of study characteristics across NRS reporting

number of lymph nodes harvested. As with previous outcomes, most studies adhered to one

of three patterns. A total of 24.1% of studies had retrospective data collection, concurrent

controls and intention to treat analysis but had not employed matched controls, standardized

concurrent therapy (i.e. post-operative care), used blinded outcome assessors or systematic

outcome assessment. Pattern 2 (n=16 studies, 27.1%) differed only with regards to the use of

prospective data collection. Pattern 3 (n=8.5%) instead featured the use of matched controls

but was otherwise identical to Pattern 1.

Subgroup meta-analyses were performed with NRS stratified by the presence or absence of

study characteristics (Table 6.22 and Figure 6.4). Mean differences ranged from -0.65 to 1.40

and none were statistically significant. Studies in most strata were highly heterogeneous with

I2 values greater than 50% for 13 of 15 strata. Table 6.23 outlines the results of comparing

effect estimates from NRS with or without a study characteristic with the results of Strong

RCTs. The presence or absence of NRS study characteristics was not associated with more

extreme estimates of benefit for laparoscopy.

Mixed-effects meta-regression modeling was used to compare the results of NRS with or

without study characteristics with the summary effect estimate from Strong RCTs (Table

6.24). As with the previous three outcomes of interest, no clear pattern emerged. Differences

159

Table 6.21 Study characteristics patterns across NRS reporting number of lymph nodes harvested (n=59 studies).

Pattern*

N %

Primary outcome

Sample size

calculation

Prospective Data


Concurrent controls


therapy

Systematic outcome

assessment

Blinded outcome

assessment


Pattern 1 17/59 28.8% - - - - + - - - +

Pattern 2 16/59 27.1% - - + - + - - - +

Pattern 3 5/59 8.5% - - - + + - - - +

Pattern 4 3/59 5.1% - - + + + - - - +

Pattern 5 3/59 5.1% - - - - - - - - -

Pattern 6 2/59 3.4% - - + - + - - - +

Pattern 7 2/59 3.4% - - - + + + - - +

Pattern 8 2/59 3.4% - - - - - - - - +

Pattern 9 1/59 1.2% - - + + + + - - +

Pattern 10 1/59 1.2% - - + + - - - - +

*Patterns are listed in order of decreasing frequency. The ten most frequent patterns are described, and represent 74.6% of NRS reporting number of lymph nodes harvested.

160

Table 6.22 Random-effects meta-analysis results among NRS reporting number of lymph nodes (n=59).


N

MD* [95% CI] I2 (95% CI) N

MD* [95% CI] I2 (95% CI)

Primary outcome 0 - - 59 0.07 (-0.53, 0.67) 93.4 (92.2-94.5) Sample size calculation performed 0 - - 59 0.07 (-0.53, 0.67) 93.4 (92.2-94.5) Prospective data collection 25 -0.11 (-0.92, 0.69) 90.5 (87.3-93.0) 34 0.25 (-0.63, 1.14) 94.7 (93.4-95.7) Concurrent controls 51 -0.10 (-0.72, 0.53) 93.6 (92.4-94.7) 8 1.26 (-0.27, 2.80) 67.3 (31.0-84.5) Matched controls 14 -0.15 (-1.08, 0.78) 49.4 (6.3-72.6) 45 0.18 (-0.53, 0.89) 94.6 (93.5-95.5) Standardized concurrent therapy 5 0.26 (-1.17, 1.69) 61.2 (0.0-85.4) 54 0.07 (-0.56, 0.71) 93.6 (92.4-94.6) Systematic outcome assessment† 1 1.40 (0.45, 2.35) - 58 0.04 (-0.57, 0.65) 93.4 (92.1-94.4) Blinded outcome assessment 3 -0.65 (-3.54, 2.24) 98.5 (97.3-99.1) 56 0.14 (-0.46, 0.73) 87.6 (84.7-90.0) Intention to treat analysis 53 0.04 (-0.62, 0.71) 94.1 (92.9-95.0) 6 0.22 (-0.72, 1.16) 0.0 (0.0-55.5)

* Mean Difference, MD=meanlaparoscopy-meanopen. A MD<0 indicates that laparoscopy is associated with finding fewer lymph nodes in the surgical specimen. ♦ I-squared describes the percentage of total variation across studies that is due to heterogeneity instead of chance. †Since one NRS reported systematic outcome assessment, there is no measure of between-study heterogeneity.

Table 6.23 Univariable meta-regression results among NRS reporting number of lymph nodes harvested.

DMD (95% CI) p-value Primary Outcome - - Sample size calculation - - Prospective data collection 0.37 (-0.85, 1.60) 0.55 Matched controls 0.50 (-0.99,. 1.99) 0.51 Concurrent controls 1.31 (-0.47, 3.08) 0.15 Standardized concurrent therapy -0.02 (-2.22, 2.19) 0.99 Systematic outcome assessment -1.36 (-5.31, 2.59) 0.50 Blinded Outcome Assessment 0.84 (-1.54, 3.23) 0.49 Intention to treat analysis 0.30 (-1.70, 2.30) 0.77

* Differences in mean differences. A negative DMD indicates that NRS without a study characteristics yielded combined effect estimates that are more extreme than NRS with the study characteristic.

161

161

Figure 6.4 Forest plot of meta-analysis results, stratified according to the presence or absence of specific NRS study characteristics for the outcome number of lymph nodes harvested. Squares indicate mean differences and error bars indicate 95% confidence intervals.

162


DMD (95% CI) p-value Primary Outcome

Characteristic Absent 0.49 (-1.52, 2.50) 0.63 Strong RCTs Referent -

Sample size calculation Characteristic Absent 0.49 (-1.52, 2.50) 0.63

Strong RCTs Referent - Prospective data collection

Characteristic Absent 0.50 (-1.62, 2.62) 0.64 Characteristic Present 0.14 (-2.02, 2.28) 0.90

Strong RCTs Referent - Matched controls

Characteristic Absent 0.44 (-1.64, 2.51) 0.68 Characteristic Present -0.05 (-2.40, 2.29) 0.96

Strong RCTs Referent - Concurrent controls


Strong RCTs Referent - Standardized concurrent therapy


Strong RCTs Referent - Systematic outcome assessment


Strong RCTs Referent - Blinded Outcome Assessment

Characteristic Absent 0.39 (-1.65, 2.44) 0.71 Characteristic Present -0.47 (-3.42, 2.48) 0.76

Strong RCTs Referent - Intention to treat analysis


Strong RCTs Referent - * Differences in mean differences. § Statistically significant p values (<0.05) indicated in bold.

in mean differences were not statistically significant when comparing effect estimates from

NRS without study characteristics and Strong RCTs.

163

6.5 Discussion

The objective of this study was to explore the relationship between study characteristics and

effect estimates in NRS comparing laparoscopy with open surgery for the treatment of colon

cancer. Our overarching aim was to identify specific study characteristics associated with

more extreme estimates of benefit for laparoscopy as compared with Strong RCTs. The

relative frequency of NRS study characteristics is largely unknown and this study sheds light

on how NRS in surgery have been performed; sample size calculations, historical controls,

standardized post-operative care (i.e. standardized concurrent therapy), blinded or systematic

outcome assessment were all rare among NRS. Retrospective data collection was common

and matching was used in approximately a quarter of NRS to overcome selection bias. The

frequency of these characteristics may vary among NRS in other areas of medicine but this

study nonetheless provides some insight into how NRS have been conducted in surgery.

We used mixed-effects meta-regression modeling to evaluate how effect estimates from NRS

with or without certain study characteristics compare. For the outcomes post-operative

complications, length of stay and number of lymph nodes harvested, none of the effect

estimates differed across NRS subgroups. For the outcome peri-operative mortality, NRS

with retrospective data collection had more extreme estimates of benefit for laparoscopy than

NRS with prospective data collection (ROR 0.62, 95% CI 0.44, 0.87, p-value=0.01). In

addition, effect estimates were closer to the null (i.e. less in favour of laparoscopy) in NRS

where the primary outcome was peri-operative mortality as opposed to NRS in which post-

operative death was a secondary outcome (ROR 1.68, 95% CI 1.11, 2.52). However, when

the effect estimates from these subgroups were compared with the results of Strong RCTs,

none proved to be statistically significant. Indeed, across all four outcomes of interest, we did

not observe a consistent pattern of more extreme benefit for laparoscopy with the absence of

a particular NRS study characteristic.

Many of the NRS study characteristics examined appear frequently in popular NRS quality

assessment tools. For example, the Downs and Black checklist assesses whether sample size

164

calculations were performed, concurrent controls used, and if outcome assessors were

blinded, among other criteria (Downs and Black 1998). The Newcastle-Ottawa scale also

includes a consideration of blinded outcome assessment (GA Wells). Furthermore, Wells et

al. have recently described a checklist that evaluates NRS with regards to the use of matched

controls (Wells et al. 2013). Our results suggest that these study characteristics may not help

distinguish between NRS at low or high risk of bias. Indeed, the absence of sample size

calculations, concurrent controls, blinded outcome assessment and matched controls was not

associated with more extreme estimates of benefit. NRS with or without these study

characteristics yielded summary effect measures that did not statistically differ from those of

Strong RCTs. Moreover, since many of the aforementioned characteristics were rarely

present, they are unlikely to be helpful in discriminating NRS at high and low risk of bias.

While we did not identify a relationship between NRS study characteristics and effect

estimates in this study, it is possible one might exist. There are number of reasons why a

Type II error may have occurred. First, we relied on reported study methods to determine the

presence or absence of study characteristics. For example, even though many articles did not

mention systematic outcome assessment, it is possible some investigators had standardized

the collection of outcome data. Indeed, it has been shown that there can be significant

discrepancies between reported study methods and actual study conduct for RCTs (Mhaskar

et al. 2012). Mhaskar et al. found that even though 39% of RCT protocols specified adequate

methods for randomization, only 23% of articles included this information. A similar

discrepancy between published reports and NRS methods may exist. Our study was also

limited by the relative infrequency of many study characteristics. NRS with sample size

calculations, historical controls, standardized post-operative care, systematic outcome

assessment and blinded outcome assessment were relatively rare. Moreover, we had limited

power to detect important effects as we used only 121 NRS for the current analysis. The

strengths of this study include the use of mixed-effects meta-regression modeling to

determine the comparability of effect estimates across subgroups. Furthermore, our analyses

used summary effect estimates from Strong RCTs as the “gold-standard” for comparisons;

165

we chose to handle the known variability in surgical trial quality by pooling effect estimates

from RCTs at the lowest risk of bias.

Multiple meta-epidemiological studies have established that RCTs with inadequate random-

sequence generation, allocation concealment, and double-blinding yield biased estimates of

treatment effect (Schulz et al. 1995; Moher et al. 1998; Kjaergard, Villumsen, and Gluud

2001; Balk et al. 2002; Als-Nielsen et al. 2003; Tierney and Stewart 2005; Pildal et al. 2007;

Nuesch, Reichenbach, et al. 2009; Nuesch, Trelle, et al. 2009; Nuesch et al. 2010; Bassler et

al. 2010; Dechartres et al. 2011; Bafeta et al. 2012; Hrobjartsson et al. 2012, 2013; Wood et

al. 2008; Savovic et al. 2012). The analyses in these studies routinely involved >300 trials,

across multiple interventions. Perhaps a similarly large cohort of NRS, examining multiple

interventions, is required to definitively identify which aspects of NRS-design are associated

with biased treatment effects.

6.6 Conclusion

Effect estimates did not consistently vary according to the presence or absence of NRS

design characteristics among studies comparing laparoscopy and open surgery for the

treatment of colon cancer. Additional studies are necessary to identify the attributes of NRS-

design associated with bias.

166

Chapter 7 General Discussion and Future Directions

7.1 Summary of findings

This thesis focused on examining bias in RCTs and NRS of surgical interventions. NRS

remain far more common in surgery than RCTs even though the latter are considered the

most reliable source of evidence when evaluating therapeutic interventions (Sackett and

Sackett 1991). There are a number of reasons why surgical RCTs remain rare. First, surgical

devices and interventions do not require regulatory approval in the same manner as novel

drugs (McLeod 1999). Accordingly, there is far less funding available to conduct surgical

RCTs. Second, recruitment to surgical trials is hindered by the uncertainty associated with

randomization — such uncertainty affects both patients and physicians (Mills et al. 2003;

Campbell et al. 2010). Third, investigators must also overcome the logistical challenges of

standardizing surgical technique (Meakins 2002). For these reasons, NRS heavily inform the

evidence base in surgery and are likely to continue to do so. However, an important question

remained unanswered; do surgical NRS and RCTs yield similar estimates of treatment

effect? This question was the starting point for this thesis work.

Before we could study bias in NRS, a conceptual framework of this phenomenon was

required. Because no existing framework was identified to guide our analyses, we proceeded

to develop a novel one. In Chapter 3, we conducted a modified framework synthesis to

develop such a framework. Sources of bias were identified from systematic reviews of NRS

quality assessment tools and analyzed thematically. This process yielded a hierarchical

framework with six overarching domains (selection bias, information bias, performance bias,

detection bias, attrition bias, and selective reporting bias), with 37 individual sources of bias

nested under these domains.

167

In Chapter 5, we compared effect estimates from NRS with those from i) All RCTs;

ii) Typical RCTs (i.e. at unclear or high risk of bias) and iii) Strong RCTs (i.e. low risk of

bias trials). Studies evaluating laparoscopy and conventional (open) surgery for the treatment

of colon cancer were used for this case study of bias. Among subjective outcomes (post-

operative complications and length of stay), NRS were associated with more extreme

estimates of benefit for laparoscopy than Strong RCTs. For the outcome post-operative

complications, NRS attributed 36% more benefit to laparoscopy than Strong RCTs.

Laparoscopy was associated with reductions in length of stay that were exaggerated

three-fold in NRS as compared with Strong RCTs. A similar pattern was not observed with

the objective outcomes peri-operative mortality and number of lymph nodes harvested. The

observed differences between NRS and Strong RCTs persisted after adjusting for period

effects and differences in baseline event rates between studies. Moreover, Typical RCTs

were also associated with larger estimates of benefit for laparoscopy as compared to Strong

RCTs. Odds ratios for post-operative complications were 37% smaller (i.e. more benefit) in

Typical than Strong RCTs. Laparoscopy was associated with a reduction in length of stay

that was two-fold larger in Typical than Strong RCTs.

Our findings suggest that surgical NRS are associated with more extreme estimates of benefit

as compared with Strong RCTs. Previous meta-epidemiological studies had identified a

number of design characteristics associated with biased effect estimates in RCTs (Schulz et

al. 1995; Moher et al. 1998; Kjaergard, Villumsen, and Gluud 2001; Balk et al. 2002; Als-

Nielsen et al. 2003; Tierney and Stewart 2005; Pildal et al. 2007; Nuesch, Reichenbach, et al.

2009; Nuesch, Trelle, et al. 2009; Nuesch et al. 2010; Bassler et al. 2010; Dechartres et al.

2011; Bafeta et al. 2012; Hrobjartsson et al. 2012, 2013; Wood et al. 2008; Savovic et al.

2012). We hypothesized that study characteristics may be similarly associated with bias

among NRS. Identifying these characteristics could explain part of the variation observed

between NRS and Strong RCTs. We therefore examined the relationship between NRS

design attributes and estimates of treatment effect. In Chapter 6, nine study characteristics

were examined: (i) whether the outcome of interest was the primary outcome; (ii) presence of

a sample size calculation; (iii) prospective data collection; (iv) concurrent (versus historical)

168

controls; (v) matched controls; (vi) standardized concurrent therapy (i.e. standardized post-

operative care); (vii) systematic outcome assessment; (viii) blinded outcome assessment and

(ix) intention to treat analysis. The majority of NRS were retrospective, had concurrent

controls and applied intention to treat principles. Few studies used standardized concurrent

therapy (i.e. standardized post-operative care), blinded outcome assessment or systematic

outcome assessment. Mixed-effects meta-regression models were used to compare effect

estimates across subgroups of NRS (i.e. those with or without a study characteristic) and

between NRS and Strong RCTs. We did not observe a consistent pattern of more extreme

benefit for laparoscopy with the absence of any particular NRS study characteristic.

The findings of this thesis represent novel contributions to the field of methodological

research in surgery. Many have hypothesized that the results of NRS differ from those of

well-conducted RCTs. Our results provide empirical evidence supporting this assertion; NRS

appear to be associated with notable bias when evaluating subjective outcomes. Moreover,

the results of the most rigorous RCTs appear to differ from those of more Typical RCTs.

This thesis work has broad implications on how we interpret evidence from surgical studies.

7.2 Implications

The work presented in this dissertation has three main implications.

(1) Implications for the meta-analysis of surgical RCTs.

(2) Implications for the interpretation of surgical NRS.

(3) Implications for future meta-epidemiological studies of NRS study characteristics.

169

7.2.1 Implications for the meta-analysis of surgical RCTs

Physicians and policy-makers rely on the synthesis of the best available evidence to inform

decision-making. Generally, the meta-analysis of RCT data is considered the strongest

source of evidence for the evaluation of interventions. In Chapter 5 however, we observed

that a minority of RCTs comparing laparoscopy with open surgery for the treatment of colon

cancer were low risk of bias trials (i.e. Strong RCTs) — the remaining 80% (i.e. Typical

RCTs) had methodological shortcomings. Our results provide empirical support for a

difference in effect estimates between trials at low risk of bias versus those at high/unclear

risk of bias when examining subjective outcomes. Furthermore, the results of unclear/high

risk of bias trials (i.e. Typical RCTs) were very similar to the results of NRS. It is possible

that the trial conduct in Typical RCTs more closely resemble what is encountered in NRS;

RCTs generally differ from NRS not only in how patients are allocated to interventions but

also in other respects, such as study registration, detailed protocols and data monitoring

safety boards. While Typical RCTs indeed involved randomizing patients to treatment, they

generally lacked these other features. We therefore recommend that the meta-analysis of

surgical RCTs should routinely incorporate both risk of bias assessments and subgroup

analyses of low risk of bias trials. In instances where no low risk of bias RCTs are available,

we advise that authors should be wary of drawing conclusions from Typical RCTs.

Moreover, we recommend that RCTs and NRS should be analyzed separately in the course

of a surgical meta-analysis. Considering there was a notable discrepancy between the results

of NRS and Strong RCTs, including NRS in the meta-analysis of surgical RCTs could lead to

misleading conclusions.

7.2.2 Implications for the interpretation of surgical NRS

In Chapter 5, we demonstrated that the results of NRS were 36% more extreme than those of

Strong RCTS when evaluating subjective, binary outcomes. For subjective continuous

170

outcomes, NRS overestimated the benefit associated with laparoscopy between two- and

three-fold. These results suggest that NRS may not be a reliable source of evidence for

decision-making in surgery. Therefore, the meta-analysis of NRS may provide surgeons and

policy-makers with misleading results; the accumulation of numerous studies often leads to

narrow confidence intervals but effect estimates may be nonetheless biased. Others have

previously cautioned investigators about the pitfalls of meta-analyzing NRS (Reeves et al.

2013b). Whereas such caution has been previously advised on theoretical grounds, the

findings of this thesis provide empirical support for such apprehension. We therefore

recommend that the investigators should be cautious when drawing conclusions from the

meta-analysis of surgical NRS, especially when analyzing subjective outcomes. However,

inferences from the meta-analysis of objective outcomes in surgical NRS do not appear to be

biased.

7.2.3 Implications for future meta-epidemiological studies of

NRS study characteristics.

Unfortunately, we were not able to identify NRS study characteristics associated with biased

estimates of treatment effect. This may be due to our limited power to detect important

effects as we used only 121 NRS for the current analysis. We were also limited by the

relative infrequency of many of the study characteristics assessed. However, it is also

possible that design characteristics of NRS may not be associated with bias in a predictable

way. For example, the absence of blinded outcome assessment may bias results towards the

null in one study and away from the null in another. Alternatively, there may interactions

between study characteristics that act in equally unpredictable ways. For example, systematic

outcome assessment may be associated with biased effect estimates only in studies where

outcome assessors were unblinded. Accordingly, in instances where blinded outcome

assessment is employed, the presence or absence of systematic outcome assessment may not

be related to bias. Therefore, while it is likely that NRS study characteristics are associated

171

with bias, such bias may be difficult to quantify and detect. Our work has nonetheless

established that the referent group for future meta-regression analyses trying to isolate NRS-

design attributes associated with bias should be low risk of bias trials and not all RCTs for a

given intervention.

7.3 Limitations

7.3.1 Limitations of available data

The analyses in Chapter 5 relied heavily on reported methods to determine the risk of bias in

RCTs. It is possible however, that actual study conduct may have differed from what was

reported in publications. Indeed, it has been shown that adequate random sequence

generation, allocation concealment and blinding are often unreported in trial publications

(Hill, LaValley, and Felson 2002; Mhaskar et al. 2012). This limitation was partially

overcome by contacting authors and referring to study protocols when assessing risk of bias.

It is nonetheless possible that some RCTs may have been rigorously performed but

misclassified as Typical RCTs. Moreover, various NRS were found to lack sample size

calculations, standardized concurrent therapy, blinded outcome assessment and systematic

outcome assessment. Some of these study characteristics may have been present but simply

not mentioned in publications of NRS. Studies examining the concordance between reported

design characteristics and actual study conduct in NRS would be helpful in assessing the

impact of this limitation.

Another limitation of using published literature for analysis is that data abstraction is limited

to those outcomes which authors chose to report. Selective outcome reporting may have

impacted the findings in this dissertation. Study authors may have chosen to report certain

outcomes based on whether effect estimates favoured laparoscopy or open surgery. If effect

estimates favouring open surgery were specifically omitted, it is possible our findings

172

overestimate the bias associated with NRS and Typical RCTs. An examination of funnel

plots though revealed minor asymmetry for the outcomes post-operative complications and

length of stay. Therefore, the impact of selective outcome reporting is perhaps minimal.

Our analyses may have also been subject to measurement error. Authors of NRS and RCTs

rarely provided definitions for the outcome post-operative complications. It is entirely

possible that these definitions varied between studies. While some authors calculated the

frequency of post-operative complications that delayed discharge, others have calculated the

frequency of all post-operative complications. This variation in outcome definition could

have influenced the effect estimates generated in these studies and thus influenced our

findings.

Our ability to detect an association between NRS-design attributes and effect estimates was

also limited by our sample size — the systematic search strategy identified only 144 NRS

comparing laparoscopy with open surgery for the management of colon cancer. It is possible

that a relationship between specific study characteristics and effect estimates exists, but that a

much larger cohort of NRS is required to detect it.

7.3.2 Limitations of data analysis

One of the limitations of our analyses was the use of the Cochrane Risk of Bias tool to

categorize RCTs as Typical or Strong trials. Other instruments for assessing RCT quality

were considered but could not be used because they lacked rigorous development, focused

heavily on blinding (Jadad et al. 1996) or had unknown validity and reliability (Olivo et al.

2008). In contrast, the Cochrane Risk of Bias tool has been developed using rigorous

methods, pilot tested and recently revised to diminish ambiguity (Higgins et al. 2011).

However, the inter-rater reliability for the most recent iteration of the instrument is unknown.

We partially overcame this limitation by having two assessors evaluate RCTs reporting post-

operative complications. There was perfect agreement between assessors (κ=1.00).

173

Moreover, Strong RCTs were large, multi-center and publicly funded studies. Trials with

these attributes have been shown to be less susceptible to bias (Als-Nielsen et al. 2003;

Dechartres et al. 2011; Bafeta et al. 2012). However, it is nonetheless possible some trials

may have been classified as Typical instead of Strong RCTs. If effect estimates in these

studies were in favour of laparoscopy, the results of NRS would appear less extreme in

comparison. However, there is no reason to assume that effect estimates in misclassified

Typical RCTs systematically favoured laparoscopy.

A fundamental assumption was made when designing the analyses of Chapters 5 and 6; that

effect estimates from Strong RCTs are a reasonable surrogate for the truth. However, it is

possible that these rigorous RCTs may have themselves been biased in ways we were not

able to appreciate.

In Chapter 6, we explored the relationship between NRS study characteristics and effect

estimates. However, it is likely that NRS differed not only with respect to study design but

also patient case-mix. Moreover, there may have been differences in surgeon skill between

studies as well. Unfortunately, we could not adjust our analyses for these sources of clinical

heterogeneity as we did not have access to individual patient or provider data. Specifically,

patient and provider characteristics may have explained part of the between-study

heterogeneity in our regression models. The residual error in these models would have

accordingly decreased and we would have had more power to detect a difference in effect

estimates with the presence or absence of NRS study characteristics. Unfortunately, we not

only lacked individual patient and provider data but also aggregate characteristics; group-

level patient characteristics were inconsistently reported across included NRS. Studies varied

with respect to which patient information was provided and how it was presented. For

example, some authors provided measures of disease severity (e.g. cancer stage) whereas

others did not. Moreover, cancer stage was at times reported according to the Duke’s

classification system and in other articles, investigators used the American Joint Committee

on Cancer (AJCC) TNM staging system. Such variation in reporting precluded the use of

174

aggregate patient-level characteristics in the mixed-effects meta-regression models in

Chapter 6.

7.3.3 Limitations of generalizability

Comparative NRS and RCTs of a single surgical procedure were used for the case study of

bias presented in this dissertation. The overestimation of treatment effect in NRS and Typical

RCTs may be specific to studies examining laparoscopy and conventional surgery for the

treatment of colon cancer. Furthermore, our findings may apply only to surgical studies and

may not reflect the relationship between study design and bias in other areas of medicine. We

also demonstrated that more extreme estimates of benefit are obtained when evaluating the

subjective outcomes post-operative complications and length of stay. Our findings may apply

only to these particular outcomes and not other subjective outcomes such as pain, patient

satisfaction and so on. Additional studies are necessary to determine if the findings of this

dissertation are indeed reproducible. This additional evidence would augment the

generalizability of our findings.

7.4 Future Directions

7.4.1 Outcome reporting in NRS and RCTs

In Chapter 4, we observed that many outcomes were infrequently reported in NRS and

RCTs. Examples of these outcomes included margin status, number of lymph nodes

harvested, and quality of life. Definitions of pertinent outcomes, such as post-operative

complications, were also provided in a minority of NRS. In other instances, outcomes were

reported in a variety of ways; for example, studies reporting overall survival sometimes

175

provided 2-year overall survival whereas others reported 3- and 5-year survival. Others too

have found variation in outcome reporting in surgical studies. For example, in a study by

Blencowe et al., authors examined which complications were reported in articles of

esophageal cancer surgery. They identified 105 NRS and 17 RCTs published between 2005

and 2009. They found that no single complication was reported in all papers. Five studies

(5.1%) categorized complications with a validated grading system. Anastomotic leak was the

most commonly reported complication and was defined in 28.3% (n=28) of studies but 22

different definitions were used. They concluded that outcome reporting is “heterogeneous

and inconsistent, and it lacks methodological rigor” (Blencowe et al. 2012). We observed

similar variation in outcome reporting.

Efforts are required to establish core outcome sets for various surgical interventions to

facilitate the reporting, comparison and combination of results. Efforts are currently

underway by the COMET (Core Outcome Measures in Effectiveness Trials) Initiative to

standardize outcome reporting in studies of colon cancer surgery. Specific knowledge

translation strategies are necessary, potentially in conjunction with national surgical

societies, to facilitate the uptake of these outcome sets.

7.4.2 Investigating the relationship between reporting and

actual RCT quality

A minority of surgical RCTs were found to be at low risk of bias. It is unclear whether these

assessments reflect deficits in reporting or suboptimal trial execution. Further investigation is

required to determine if there is indeed discordance between reported and actual study

quality in surgical RCTs. If such a discrepancy is identified, it would be interesting to

determine why authors are omitting important methodological detail. Are authors unaware of

the CONSORT statement? Do word limits imposed by journal editors limit their ability to

appropriately describe their trials? Alternatively, RCT reports may reflect the real absence of

176

adequate random sequence generation, allocation concealment or other methodological

safeguards against bias. If inferior trial quality is confirmed, specific educational strategies

should be devised by national surgical societies to improve the methods of surgical RCTs.

7.4.3 Ongoing evaluations of NRS study characteristics

In Chapter 6, we did not observe a consistent pattern of more extreme benefit for laparoscopy

with the absence of a particular NRS study characteristic. Further research is required,

examining other interventions and other characteristics, to determine which attributes of NRS

study design are associated with biased estimates of treatment effect. The possibility

however remains that unlike RCTs, it may not be possible to identify which characteristics of

NRS design are associated with bias. The findings of this dissertation suggest that the

referent group for meta-regression analyses should be low risk of bias RCTs. Until empirical

evidence is available, expert consensus will be used to develop a risk of bias tool for NRS.

Indeed, the Cochrane Collaboration is currently developing an extension to the Cochrane

Risk of Bias tool for NRS and we are involved in the Collaboration’s efforts. These efforts

are necessary because NRS can be an important study design to evaluate the effectiveness, as

opposed to the efficacy, of surgical interventions. Once the risk of bias tool for NRS is

complete, additional research will be required to determine the validity and reliability of the

instrument.

7.5 Conclusions

In summary, we have demonstrated that the results of surgical NRS can be significantly

biased as compared with those of low risk of bias RCTs when evaluating subjective

outcomes. However, none of the nine NRS-design characteristics examined was consistently

177

associated with biased effect estimates. Additional research is necessary to determine which

NRS-design attributes, if any, are associated with bias.

178

References

Abraham, N. S., C. J. Byrne, J. M. Young, and M. J. Solomon. 2010. "Meta-analysis of well-designed nonrandomized comparative studies of surgical procedures is as good as randomized controlled trials." Journal of Clinical Epidemiology no. 63 (3):238-45.

Abraham, N. S., C. M. Byrne, J. M. Young, and M. J. Solomon. 2007. "Meta-analysis of non-randomized comparative studies of the short-term outcomes of laparoscopic resection for colorectal cancer." ANZ Journal of Surgery no. 77 (7):508-16.

ACS NSQIP - Classic Variables and Definitions, Chapter 4. 2012. [cited November 15 2012]. Available from http://nsqip.healthsoftonline.com/lib/Documents/Ch_4_Variables_Definitions_062810.pdf.

Agabegi, S. S., and P. J. Stern. 2008. "Bias in research." American Journal of Orthopedics (Belle Mead, N.J.) no. 37 (5):242-8.

Alexander, R. J., B. C. Jaques, and K. G. Mitchell. 1993. "Laparoscopically assisted colectomy and wound recurrence." Lancet no. 341 (8839):249-50.

Allison, Paul David. 2002. Missing data, Sage university papers Quantitative applications in the social sciences. Thousand Oaks, Calif.: Sage Publications.

Als-Nielsen, B., W. Chen, C. Gluud, and L. L. Kjaergard. 2003. "Association of funding and conclusions in randomized drug trials: a reflection of treatment effect or adverse events?" JAMA no. 290 (7):921-8.

Altman, Douglas G., Matthias Egger, and George Davey Smith. 2001. Systematic reviews in health care:meta-analysis in context. 2nd ed. London: BMJ.

American Society of Colon and Rectal Surgeons. 1995. "Position statement on laparoscopic colectomy." Diseases of the Colon and Rectum no. 35:5A.

Antman, K., D. Amato, W. Wood, J. Carson, H. Suit, K. Proppe, R. Carey, J. Greenberger, R. Wilson, and E. Frei, 3rd. 1985. "Selection bias in clinical trials." Journal of Clinical Oncology no. 3 (8):1142-7.

Atkins, D., M. Eccles, S. Flottorp, G. H. Guyatt, D. Henry, S. Hill, A. Liberati, D. O'Connell, A. D. Oxman, B. Phillips, H. Schunemann, T. T. Edejer, G. E. Vist, J. W. Williams, Jr., and Grade Working Group. 2004. "Systems for grading the quality of evidence and the strength of recommendations I: critical appraisal of existing approaches The GRADE Working Group." BMC Health Services Research no. 4 (1):38.

Bafeta, A., A. Dechartres, L. Trinquart, A. Yavchitz, I. Boutron, and P. Ravaud. 2012. "Impact of single centre status on estimates of intervention effects in trials with continuous outcomes: meta-epidemiological study." BMJ no. 344:e813.

Baillie, J. K. 2007. "Activated protein C: controversy and hope in the treatment of sepsis." Curr Opin Investig Drugs no. 8 (11):933-8.

Balk, E. M., P. A. Bonis, H. Moskowitz, C. H. Schmid, J. P. Ioannidis, C. Wang, and J. Lau. 2002. "Correlation of quality measures with estimates of treatment effect in meta-analyses of randomized controlled trials." JAMA no. 287 (22):2973-82.

http://nsqip.healthsoftonline.com/lib/Documents/Ch_4_Variables_Definitions_062810.pdf

http://nsqip.healthsoftonline.com/lib/Documents/Ch_4_Variables_Definitions_062810.pdf

179

Barkun, J. S., J. K. Aronson, L. S. Feldman, G. J. Maddern, S. M. Strasberg, D. G. Altman, J. M. Blazeby, I. C. Boutron, W. B. Campbell, P. A. Clavien, J. A. Cook, P. L. Ergina, D. R. Flum, P. Glasziou, J. C. Marshall, P. McCulloch, J. Nicholl, B. C. Reeves, C. M. Seiler, J. L. Meakins, D. Ashby, N. Black, J. Bunker, M. Burton, M. Campbell, K. Chalkidou, I. Chalmers, M. de Leval, J. Deeks, A. Grant, M. Gray, R. Greenhalgh, M. Jenicek, S. Kehoe, R. Lilford, P. Littlejohns, Y. Loke, R. Madhock, K. McPherson, P. Rothwell, B. Summerskill, D. Taggart, P. Tekkis, M. Thompson, T. Treasure, U. Trohler, and J. Vandenbroucke. 2009. "Evaluation and stages of surgical innovations." Lancet no. 374 (9695):1089-96.

Barnett-Page, E., and J. Thomas. 2009. "Methods for the synthesis of qualitative research: a critical review." BMC Medical Research Methodology no. 9:59.

Barza, M., T. A. Trikalinos, and J. Lau. 2009. "Statistical considerations in meta-analysis." Infectious Disease Clinics of North America no. 23 (2):195-210, Table of Contents.

Bassler, Dirk, Matthias Briel, Victor M. Montori, Melanie Lane, Paul Glasziou, Qi Zhou, Diane Heels-Ansdell, Stephen D. Walter, Gordon H. Guyatt, Stopit- Study Group, David N. Flynn, Mohamed B. Elamin, Mohammad Hassan Murad, Nisrin O. Abu Elnour, Julianna F. Lampropulos, Amit Sood, Rebecca J. Mullan, Patricia J. Erwin, Clare R. Bankhead, Rafael Perera, Carolina Ruiz Culebro, John J. You, Sohail M. Mulla, Jagdeep Kaur, Kara A. Nerenberg, Holger Schunemann, Deborah J. Cook, Kristina Lutz, Christine M. Ribic, Noah Vale, German Malaga, Elie A. Akl, Ignacio Ferreira-Gonzalez, Pablo Alonso-Coello, Gerard Urrutia, Regina Kunz, Heiner C. Bucher, Alain J. Nordmann, Heike Raatz, Suzana Alves da Silva, Fabio Tuche, Brigitte Strahm, Benjamin Djulbegovic, Neill K. J. Adhikari, Edward J. Mills, Femida Gwadry-Sridhar, Haresh Kirpalani, Heloisa P. Soares, Paul J. Karanicolas, Karen E. A. Burns, Per Olav Vandvik, Fernando Coto-Yglesias, Pedro Paulo M. Chrispim, and Tim Ramsay. 2010. "Stopping randomized trials early for benefit and estimation of treatment effects: systematic review and meta-regression analysis." JAMA no. 303 (12):1180-7.

Baumgaertner, M. R., W. D. Cannon, Jr., J. M. Vittori, E. S. Schmidt, and R. C. Maurer. 1990. "Arthroscopic debridement of the arthritic knee." Clinical Orthopaedics and Related Research (253):197-202.

Benson, K., and A. J. Hartz. 2000. "A comparison of observational studies and randomized, controlled trials." New England Journal of Medicine no. 342 (25):1878-86.

Bhandari, M., P. Tornetta, 3rd, T. Ellis, L. Audige, S. Sprague, J. C. Kuo, and M. F. Swiontkowski. 2004. "Hierarchy of evidence: differences in results between non-randomized studies and randomized trials in patients with femoral neck fractures." Archives of Orthopaedic and Trauma Surgery no. 124 (1):10-6.

Blencowe, N. S., A. G. McNair, C. R. Davis, S. T. Brookes, and J. M. Blazeby. 2012. "Standards of outcome reporting in surgical oncology: a case study in esophageal cancer." Annals of Surgical Oncology no. 19 (13):4012-8.

Borenstein, Michael. 2009. Introduction to meta-analysis. Chichester, U.K.: John Wiley & Sons.

180

Boutron, I., F. Tubach, B. Giraudeau, and P. Ravaud. 2003. "Methodological differences in clinical trials evaluating nonpharmacological and pharmacological treatments of hip and knee osteoarthritis." JAMA no. 290 (8):1062-70.

———. 2004. "Blinding was judged more difficult to achieve and maintain in nonpharmacologic than pharmacologic trials." Journal of Clinical Epidemiology no. 57 (6):543-50.

Briggle, Adam, and Carl Mitcham. Ethics and science : an introduction, Cambridge applied ethics.

Britton, A., M. McKee, N. Black, K. McPherson, C. Sanderson, and C. Bain. 1998. "Choosing between randomised and non-randomised studies: a systematic review." Health Technology Assessment no. 2 (13):i-iv, 1-124.

Campbell, Angela J., Anita Bagley, Ann Van Heest, and Michelle A. James. 2010. "Challenges of randomized controlled surgical trials." Orthopedic Clinics of North America no. 41 (2):145-55.

Canadian Cancer Society’s Steering Committee on Cancer Statistics. 2012. Canadian Cancer Statistics 2012. Toronto, ON: Canadian Cancer Society.

CASS. 1984. "Coronary artery surgery study (CASS): a randomized trial of coronary artery bypass surgery. Comparability of entry characteristics and survival in randomized patients and nonrandomized patients meeting randomization criteria." Journal of the American College of Cardiology no. 3 (1):114-28.

Chan, A. W., A. Hrobjartsson, M. T. Haahr, P. C. Gotzsche, and D. G. Altman. 2004. "Empirical evidence for selective reporting of outcomes in randomized trials: comparison of protocols to published articles." JAMA no. 291 (20):2457-65.

Chang, D. C., S. L. Matsen, and C. E. Simpkins. 2006. "Why should surgeons care about clinical research methodology?" Journal of the American College of Surgeons no. 203 (6):827-30.

Chaudhry, H., R. Mundi, I. Singh, T. A. Einhorn, and M. Bhandari. 2008. "How good is the orthopaedic literature?" Indian Journal of Orthopaedics no. 42 (2):144-9.

Choi, B. C., and A. L. Noseworthy. 1992. "Classification, direction, and prevention of bias in epidemiologic research." Journal of Occupational Medicine no. 34 (3):265-71.

Cobb, L. A., G. I. Thomas, D. H. Dillard, K. A. Merendino, and R. A. Bruce. 1959. "An evaluation of internal-mammary-artery ligation by a double-blind technic." New England Journal of Medicine no. 260 (22):1115-8.

Compton, C. C., L. P. Fielding, L. J. Burgart, B. Conley, H. S. Cooper, S. R. Hamilton, M. E. Hammond, D. E. Henson, R. V. Hutter, R. B. Nagle, M. L. Nielsen, D. J. Sargent, C. R. Taylor, M. Welton, and C. Willett. 2000. "Prognostic factors in colorectal cancer. College of American Pathologists Consensus Statement 1999." Archives of Pathology and Laboratory Medicine no. 124 (7):979-94.

Concato, J., N. Shah, and R. I. Horwitz. 2000. "Randomized, controlled trials, observational studies, and the hierarchy of research designs." New England Journal of Medicine no. 342 (25):1887-92.

Cook, J. A. 2009. "The challenges faced in the design, conduct and analysis of surgical randomised controlled trials." Trials no. 10:9.

181

Crowe, M., and L. Sheppard. 2011. "A review of critical appraisal tools show they lack rigor: Alternative tool structure is proposed." Journal of Clinical Epidemiology no. 64 (1):79-89.

Curry, J. I., B. Reeves, and M. D. Stringer. 2003. "Randomized controlled trials in pediatric surgery: could we do better?" Journal of Pediatric Surgery no. 38 (4):556-9.

Davies, H. T., I. K. Crombie, and M. Tavakoli. 1998. "When can odds ratios mislead?" BMJ no. 316 (7136):989-91.

Dechartres, Agnes, Isabelle Boutron, Ludovic Trinquart, Pierre Charles, and Philippe Ravaud. 2011. "Single-center trials show larger treatment effects than multicenter trials: evidence from a meta-epidemiologic study." Annals of Internal Medicine no. 155 (1):39-51.

Deeks, J. J. 2002. "Issues in the selection of a summary statistic for meta-analysis of clinical trials with binary outcomes." Statistics in Medicine no. 21 (11):1575-600.

Deeks, J. J., J. Dinnes, R. D'Amico, A. J. Sowden, C. Sakarovitch, F. Song, M. Petticrew, D. G. Altman, Group International Stroke Trial Collaborative, and Group European Carotid Surgery Trial Collaborative. 2003. "Evaluating non-randomised intervention studies." Health Technology Assessment no. 7 (27):iii-x.

Delgado-Rodriguez, M., and J. Llorca. 2004. "Bias." Journal of Epidemiology and Community Health no. 58 (8):635-41.

DerSimonian, R., and N. Laird. 1986. "Meta-analysis in clinical trials." Controlled Clinical Trials no. 7 (3):177-88.

Dimond, E. G., C. F. Kittle, and J. E. Crockett. 1960. "Comparison of internal mammary artery ligation and sham operation for angina pectoris." American Journal of Cardiology no. 5:483-6.

Downs, S. H., and N. Black. 1998. "The feasibility of creating a checklist for the assessment of the methodological quality both of randomised and non-randomised studies of health care interventions." Journal of Epidemiology and Community Health no. 52 (6):377-84.

Dwan, K., D. G. Altman, J. A. Arnaiz, J. Bloom, A. W. Chan, E. Cronin, E. Decullier, P. J. Easterbrook, E. Von Elm, C. Gamble, D. Ghersi, J. P. Ioannidis, J. Simes, and P. R. Williamson. 2008. "Systematic review of the empirical evidence of study publication bias and outcome reporting bias." PloS One no. 3 (8):e3081.

Edge, Stephen B., and American Joint Committee on Cancer. 2010. AJCC cancer staging manual. 7th ed. New York: Springer.

Egger, M., P. Juni, C. Bartlett, F. Holenstein, and J. Sterne. 2003. "How important are comprehensive literature searches and the assessment of trial quality in systematic reviews? Empirical study." Health Technology Assessment no. 7 (1):1-76.

Ergina, Patrick L., Jonathan A. Cook, Jane M. Blazeby, Isabelle Boutron, Pierre-Alain Clavien, Barnaby C. Reeves, and Christoph M. Seiler. 2009. "Challenges in evaluating surgical innovation." The Lancet no. 374 (9695):1097-1104.

Ernst, E., and M. H. Pittler. 2001. "Assessment of therapeutic safety in systematic reviews: literature review." BMJ no. 323 (7312):546.

Evans, D. 2003. "Hierarchy of evidence: a framework for ranking evidence evaluating healthcare interventions." Journal of Clinical Nursing no. 12 (1):77-84.

182

Evans, M., and A. V. Pollock. 1985. "A score system for evaluating random control clinical trials of prophylaxis of abdominal surgical wound infection." British Journal of Surgery no. 72 (4):256-60.

Falk, P. M., R. W. Beart Jr, S. D. Wexner, A. G. Thorson, D. G. Jagelman, I. C. Lavery, O. B. Johansen, and R. J. Fitzgibbons Jr. 1993. "Laparoscopic colectomy: A critical appraisal." Diseases of the Colon and Rectum no. 36 (1):28-34.

Farrokhyar, F., P. J. Karanicolas, A. Thoma, M. Simunovic, M. Bhandari, P. J. Devereaux, M. Anvari, A. Adili, and G. Guyatt. 2010. "Randomized controlled trials of surgical interventions." Annals of Surgery no. 251 (3):409-16.

Ferlay, J., H. R. Shin, F. Bray, D. Forman, C. Mathers, and D. M. Parkin. 2010. "Estimates of worldwide burden of cancer in 2008: GLOBOCAN 2008." International Journal of Cancer no. 127 (12):2893-917.

Fletcher, J. 2007. "What is heterogeneity and is it important?" BMJ no. 334 (7584):94-6. Fletcher, Robert H., and Suzanne W. Fletcher. 2005. Clinical epidemiology:the essentials.

4th ed. Philadelphia: Lippincott Williams & Wilkins. Fowler, D. L., and S. A. White. 1991. "Laparoscopy-assisted sigmoid resection." Surgical

Laparoscopy and Endoscopy no. 1 (3):183-8. Frazier, S. K., and G. J. Skinner. 2008. "Pulmonary artery catheters: state of the

controversy." Journal of Cardiovascular Nursing no. 23 (2):113-21; quiz 122-3. Freed, C. R., P. E. Greene, R. E. Breeze, W. Y. Tsai, W. DuMouchel, R. Kao, S. Dillon, H.

Winfield, S. Culver, J. Q. Trojanowski, D. Eidelberg, and S. Fahn. 2001. "Transplantation of embryonic dopamine neurons for severe Parkinson's disease." New England Journal of Medicine no. 344 (10):710-9.

Friedrich, J. O., N. K. Adhikari, and J. Beyene. 2011. "Ratio of means for analyzing continuous outcomes in meta-analysis performed as well as mean difference methods." Journal of Clinical Epidemiology no. 64 (5):556-64.

Furlan, Andrea D. 2006. Non-randomized studies: an evaluation of search strategies, taxonomy and comparative effectiveness with randomized trials in the field of low-back pain [dissertation], University of Toronto.

GA Wells, B Shea, D O'Connell, J Peterson, V Welch, M Losos, P Tugwell. The Newcastle-Ottawa Scale (NOS) for assessing the quality of nonrandomised studies in meta-analyses. http://www.ohri.ca/programs/clinical_epidemiology/oxford.asp. Accessed July 2, 2011.

Gagliardi, A. R., M. C. Brouwers, V. A. Palda, L. Lemieux-Charles, and J. M. Grimshaw. 2011. "How can we improve guideline use? A conceptual framework of implementability." Implementation science : IS no. 6:26.

Garas, G., A. Ibrahim, H. Ashrafian, K. Ahmed, V. Patel, K. Okabayashi, P. Skapinakis, A. Darzi, and T. Athanasiou. 2012. "Evidence-based surgery: barriers, solutions, and the role of evidence synthesis." World Journal of Surgery no. 36 (8):1723-31.

Gelman, A., and D. B. Rubin. 1996. "Markov chain Monte Carlo methods in biostatistics." Statistical Methods in Medical Research no. 5 (4):339-55.

Ghaferi, A. A., J. D. Birkmeyer, and J. B. Dimick. 2009. "Variation in hospital mortality associated with inpatient surgery." New England Journal of Medicine no. 361 (14):1368-75.

http://www.ohri.ca/programs/clinical_epidemiology/oxford.asp

183

Goodman, S., and K. Dickersin. 2011. "Metabias: a challenge for comparative effectiveness research." Annals of Internal Medicine no. 155 (1):61-2.

Graham, I. D., J. Logan, M. B. Harrison, S. E. Straus, J. Tetroe, W. Caswell, and N. Robinson. 2006. "Lost in knowledge translation: time for a map?" Journal of Continuing Education in the Health Professions no. 26 (1):13-24.

Gray, R., M. Sullivan, D. G. Altman, and A. N. Gordon-Weeks. 2012. "Adherence of trials of operative intervention to the CONSORT statement extension for non-pharmacological treatments: a comparative before and after study." Annals of the Royal College of Surgeons of England no. 94 (6):388-94.

Greenland, S., and K. O'Rourke. 2001. "On the bias produced by quality scores in meta-analysis, and a hierarchical view of proposed solutions." Biostatistics no. 2 (4):463-71.

Grimes, D. A., and K. F. Schulz. 2002. "Bias and causal associations in observational research." Lancet no. 359 (9302):248-52.

———. 2008. "Making sense of odds and odds ratios." Obstetrics and Gynecology no. 111 (2 Pt 1):423-6.

Grodstein, F., M. J. Stampfer, J. E. Manson, G. A. Colditz, W. C. Willett, B. Rosner, F. E. Speizer, and C. H. Hennekens. 1996. "Postmenopausal estrogen and progestin use and the risk of cardiovascular disease." New England Journal of Medicine no. 335 (7):453-61.

Gross, D. E., S. L. Brenner, I. Esformes, and M. L. Gross. 1991. "Arthroscopic treatment of degenerative joint disease of the knee." Orthopedics no. 14 (12):1317-21.

Grossman, J., and F. J. Mackenzie. 2005. "The randomized controlled trial: gold standard, or merely standard?" Perspectives in Biology and Medicine no. 48 (4):516-34.

Guillou, P. J., P. Quirke, H. Thorpe, J. Walker, D. G. Jayne, A. M. H. Smith, R. M. Heath, and J. M. Brown. 2005. "Short-term endpoints of conventional versus laparoscopic-assisted surgery in patients with colorectal cancer (MRC CLASICC trial): Multicentre, randomised controlled trial." Lancet no. 365 (9472):1718-1726.

Gurwitz, J. H., N. F. Col, and J. Avorn. 1992. "The exclusion of the elderly and women from clinical trials in acute myocardial infarction." JAMA no. 268 (11):1417-22.

Guyatt, G. H., A. D. Oxman, R. Kunz, G. E. Vist, Y. Falck-Ytter, H. J. Schunemann, and Grade Working Group. 2008. "What is "quality of evidence" and why is it important to clinicians?" BMJ no. 336 (7651):995-8.

Hadorn, D. C., D. Baker, J. S. Hodges, and N. Hicks. 1996. "Rating the quality of evidence for clinical practice guidelines." Journal of Clinical Epidemiology no. 49 (7):749-54.

Hall, J. C., B. Mills, H. Nguyen, and J. L. Hall. 1996. "Methodologic standards in surgical trials." Surgery no. 119 (4):466-72.

Hannan, E. L. 2008. "Randomized clinical trials and observational studies: guidelines for assessing respective strengths and limitations." JACC: Cardiovascular Interventions no. 1 (3):211-7.

Harrison, J. D., M. J. Solomon, J. M. Young, A. Meagher, G. Hruby, G. Salkeld, and S. Clarke. 2007. "Surgical and oncology trials for rectal cancer: who will participate?" Surgery no. 142 (1):94-101.

184

Hartling, L., K. Bond, K. Harvey, P. L. Santaguida, M. Viswanathan, and D. M. Dryden. 2010. Developing and Testing a Tool for the Classification of Study Designs in Systematic Reviews of Interventions and Exposures. Edited by AHRQ. Rockville MD.

Hartling, L., M. P. Hamm, A. Milne, B. Vandermeer, P. L. Santaguida, M. Ansari, A. Tsertsvadze, S. Hempel, P. Shekelle, and D. M. Dryden. 2012. "Testing the Risk of Bias tool showed low reliability between individual reviewers and across consensus assessments of reviewer pairs." Journal of Clinical Epidemiology.

Hartling, Lisa, Kenneth Bond, P. Lina Santaguida, Meera Viswanathan, and Donna M. Dryden. 2011. "Testing a tool for the classification of study designs in systematic reviews of interventions and exposures showed moderate reliability and low accuracy." Journal of Clinical Epidemiology no. 64 (8):861-71.

Hartling, Lisa, Maria Ospina, Yuanyuan Liang, Donna M. Dryden, Nicola Hooton, Jennifer Krebs Seida, and Terry P. Klassen. 2009. "Risk of bias versus quality assessment of randomised controlled trials: cross sectional study." BMJ no. 339:b4012.

Herbison, P., J. Hay-Smith, and W. J. Gillespie. 2006. "Adjustment of meta-analyses on the basis of quality scores should be abandoned." Journal of Clinical Epidemiology no. 59 (12):1249-56.

Hewett, Peter J., Randall A. Allardyce, Philip F. Bagshaw, Christopher M. Frampton, Francis A. Frizelle, Nicholas A. Rieger, J. Shona Smith, Michael J. Solomon, Jacqueline H. Stephens, and Andrew R. L. Stevenson. 2008. "Short-term outcomes of the Australasian randomized clinical study comparing laparoscopic and conventional open surgical treatments for colon cancer: the ALCCaS trial." Annals of Surgery no. 248 (5):728-38.

Higgins, J. P., D. G. Altman, P. C. Gotzsche, P. Juni, D. Moher, A. D. Oxman, J. Savovic, K. F. Schulz, L. Weeks, J. A. Sterne, Group Cochrane Bias Methods, and Group Cochrane Statistical Methods. 2011. "The Cochrane Collaboration's tool for assessing risk of bias in randomised trials." BMJ no. 343 (oct18 2):d5928.

Higgins, J. P., and S. G. Thompson. 2004. "Controlling the risk of spurious findings from meta-regression." Statistics in Medicine no. 23 (11):1663-82.

Higgins, Julian P. T., Sally Green, and Cochrane Collaboration. 2011. Cochrane handbook for systematic reviews of interventions, Cochrane book series. Chichester, England ; Hoboken, NJ: Wiley-Blackwell.

Higgins, Julian PT , Craig Ramsay, Barnaby C Reeves, Jonathan J Deeks, Beverley Shea, Jeffrey C Valentine, Peter Tugwell, and George Wells. 2013. "Issues relating to study design and risk of bias when including non-randomized studies in systematic reviews on the effects of interventions." Research Synthesis Methods no. 4 (1):12-25.

Hill, C. L., M. P. LaValley, and D. T. Felson. 2002. "Discrepancy between published report and actual conduct of randomized clinical trials." Journal of Clinical Epidemiology no. 55 (8):783-6.

Howes, N., L. Chagla, M. Thorpe, and P. McCulloch. 1997. "Surgical practice is evidence based." British Journal of Surgery no. 84 (9):1220-3.

185

Hozo, S. P., B. Djulbegovic, and I. Hozo. 2005. "Estimating the mean and variance from the median, range, and the size of a sample." BMC Medical Research Methodology no. 5:13.

Hrobjartsson, A., A. S. Thomsen, F. Emanuelsson, B. Tendal, J. Hilden, I. Boutron, P. Ravaud, and S. Brorson. 2012. "Observer bias in randomised clinical trials with binary outcomes: systematic review of trials with both blinded and non-blinded outcome assessors." BMJ no. 344:e1119.

———. 2013. "Observer bias in randomized clinical trials with measurement scale outcomes: a systematic review of trials with both blinded and nonblinded assessors." CMAJ : Canadian Medical Association journal = journal de l'Association medicale canadienne no. 185 (4):E201-11.

Hutchins, L. F., J. M. Unger, J. J. Crowley, C. A. Coltman, Jr., and K. S. Albain. 1999. "Underrepresentation of patients 65 years of age or older in cancer-treatment trials." New England Journal of Medicine no. 341 (27):2061-7.

Ingraham, A. M., B. Haas, M. E. Cohen, C. Y. Ko, and A. B. Nathens. 2012. "Comparison of hospital performance in trauma vs emergency and elective general surgery: implications for acute care surgery quality improvement." Archives of Surgery no. 147 (7):591-8.

Ioannidis, J. P., and T. A. Trikalinos. 2007. "The appropriateness of asymmetry tests for publication bias in meta-analyses: a large survey." CMAJ : Canadian Medical Association journal = journal de l'Association medicale canadienne no. 176 (8):1091-6.

Jackson, H. H., J. D. Jackson, S. J. Mulvihill, M. A. Firpo, and R. E. Glasgow. 2004. "Trends in research support and productivity in the changing environment of academic surgery." Journal of Surgical Research no. 116 (2):197-201.

Jacobs, M., J. C. Verdeja, and H. S. Goldstein. 1991. "Minimally invasive colon resection (laparoscopic colectomy)." Surgical Laparoscopy and Endoscopy no. 1 (3):144-50.

Jacquier, I., I. Boutron, D. Moher, C. Roy, and P. Ravaud. 2006. "The reporting of randomized clinical trials using a surgical intervention is in need of immediate improvement: a systematic review." Annals of Surgery no. 244 (5):677-83.

Jadad, A. R., R. A. Moore, D. Carroll, C. Jenkinson, D. J. Reynolds, D. J. Gavaghan, and H. J. McQuay. 1996. "Assessing the quality of reports of randomized clinical trials: is blinding necessary?" Controlled Clinical Trials no. 17 (1):1-12.

Jha, P., M. Flather, E. Lonn, M. Farkouh, and S. Yusuf. 1995. "The antioxidant vitamins and cardiovascular disease. A critical review of epidemiologic and clinical trial data." Annals of Internal Medicine no. 123 (11):860-72.

Juni, P., F. Holenstein, J. Sterne, C. Bartlett, and M. Egger. 2002. "Direction and impact of language bias in meta-analyses of controlled trials: empirical study." International Journal of Epidemiology no. 31 (1):115-23.

Juni, P., A. Witschi, R. Bloch, and M. Egger. 1999. "The hazards of scoring the quality of clinical trials for meta-analysis." JAMA no. 282 (11):1054-60.

Kallmes, D. F., B. A. Comstock, P. J. Heagerty, J. A. Turner, D. J. Wilson, T. H. Diamond, R. Edwards, L. A. Gray, L. Stout, S. Owen, W. Hollingworth, B. Ghdoke, D. J. Annesley-Williams, S. H. Ralston, and J. G. Jarvik. 2009. "A randomized trial of

186

vertebroplasty for osteoporotic spinal fractures." New England Journal of Medicine no. 361 (6):569-79.

Karanicolas, P. J., F. Farrokhyar, and M. Bhandari. 2010. "Practical tips for surgical research: blinding: who, what, when, why, how?" Canadian Journal of Surgery no. 53 (5):345-8.

Katrak, P., A. E. Bialocerkowski, N. Massy-Westropp, S. Kumar, and K. A. Grimmer. 2004. "A systematic review of the content of critical appraisal tools." BMC Medical Research Methodology no. 4:22.

Kazemier, G., H. J. Bonjer, F. J. Berends, and J. F. Lange. 1995. "Port site metastases after laparoscopic colorectal surgery for cure of malignancy.[see comment]." British Journal of Surgery no. 82 (8):1141-2.

Kelly, J., A. Rudd, R. R. Lewis, and B. J. Hunt. 2001. "Screening for subclinical deep-vein thrombosis." QJM no. 94 (10):511-9.

Kelly, M., L. Sharp, F. Dwane, T. Kelleher, and H. Comber. 2012. "Factors predicting hospital length-of-stay and readmission after colorectal resection: a population-based study of elective and emergency admissions." BMC Health Services Research no. 12:77.

Kirchhoff, P., P. A. Clavien, and D. Hahnloser. 2010. "Complications in colorectal surgery: risk factors and preventive strategies." Patient Safety in Surgery no. 4 (1):5.

Kirkham, J. J., K. M. Dwan, D. G. Altman, C. Gamble, S. Dodd, R. Smyth, and P. R. Williamson. 2010. "The impact of outcome reporting bias in randomised controlled trials on a cohort of systematic reviews." BMJ no. 340:c365.

Kjaergard, L. L., J. Villumsen, and C. Gluud. 2001. "Reported methodologic quality and discrepancies between large and small randomized trials in meta-analyses." Annals of Internal Medicine no. 135 (11):982-9.

Kleinbaum, D. G., H. Morgenstern, and L. L. Kupper. 1981. "Selection bias in epidemiologic studies." American Journal of Epidemiology no. 113 (4):452-63.

Konrat, C., I. Boutron, L. Trinquart, G. R. Auleley, P. Ricordeau, and P. Ravaud. 2012. "Underrepresentation of elderly people in randomised controlled trials. The example of trials of 4 widely prescribed drugs." PloS One no. 7 (3):e33559.

Krieger, N., I. Lowy, R. Aronowitz, J. Bigby, K. Dickersin, E. Garner, J. P. Gaudilliere, C. Hinestrosa, R. Hubbard, P. A. Johnson, S. A. Missmer, J. Norsigian, C. Pearson, C. E. Rosenberg, L. Rosenberg, B. G. Rosenkrantz, B. Seaman, C. Sonnenschein, A. M. Soto, J. Thornton, and G. Weisz. 2005. "Hormone replacement therapy, cancer, controversies, and women's health: historical, epidemiological, biological, clinical, and advocacy perspectives." Journal of Epidemiology and Community Health no. 59 (9):740-8.

Kuhry, E., W. F. Schwenk, R. Gaupset, U. Romild, and H. J. Bonjer. 2008. "Long-term results of laparoscopic colorectal cancer resection." Cochrane Database of Systematic Reviews (2):CD003432.

Kunz, R., G. Vist, and A. D. Oxman. 2007. "Randomisation to protect against selection bias in healthcare trials." Cochrane Database of Systematic Reviews (2):MR000012.

Lacy, A. M., J. C. Garcia-Valdecasas, J. M. Pique, S. Delgado, E. Campo, J. M. Bordas, P. Taura, L. Grande, J. Fuster, J. L. Pacheco, and et al. 1995. "Short-term outcome

187

analysis of a randomized study comparing laparoscopic vs open colectomy for colon cancer." Surgical Endoscopy no. 9 (10):1101-5.

Landis, J. R., and G. G. Koch. 1977. "The measurement of observer agreement for categorical data." Biometrics no. 33 (1):159-74.

Lassen, K., A. Hvarphiye, and T. Myrmel. 2012. "Randomised trials in surgery: the burden of evidence." Reviews on Recent Clinical Trials no. 7 (3):244-8.

Lee, P. Y., K. P. Alexander, B. G. Hammill, S. K. Pasquali, and E. D. Peterson. 2001. "Representation of elderly persons and women in published randomized trials of acute coronary syndromes." JAMA no. 286 (6):708-13.

Legare, F., D. Stacey, I. D. Graham, G. Elwyn, P. Pluye, M. P. Gagnon, D. Frosch, M. B. Harrison, J. Kryworuchko, S. Pouliot, and S. Desroches. 2008. "Advancing theories, models and measurement for an interprofessional approach to shared decision making in primary care: a study protocol." BMC Health Services Research no. 8:2.

Leung, E., A. M. Ferjani, N. Stellard, and L. S. Wong. 2009. "Predicting post-operative mortality in patients undergoing colorectal surgery using P-POSSUM and CR-POSSUM scores: a prospective study." International Journal of Colorectal Disease no. 24 (12):1459-64.

Lewis, J. H., M. L. Kilgore, D. P. Goldman, E. L. Trimble, R. Kaplan, M. J. Montello, M. G. Housman, and J. J. Escarce. 2003. "Participation of patients 65 years of age or older in cancer clinical trials." Journal of Clinical Oncology no. 21 (7):1383-9.

Livesley, P. J., M. Doherty, M. Needoff, and A. Moulton. 1991. "Arthroscopic lavage of osteoarthritic knees." Journal of Bone and Joint Surgery (British Volume) no. 73 (6):922-6.

Loke, Y. K., D. Price, A. Herxheimer, and Group Cochrane Adverse Effects Methods. 2007. "Systematic reviews of adverse effects: framework for a structured approach." BMC Medical Research Methodology no. 7:32.

Lunn, D., D. Spiegelhalter, A. Thomas, and N. Best. 2009. "The BUGS project: Evolution, critique and future directions." Statistics in Medicine no. 28 (25):3049-67.

Martel, G., and R. P. Boushey. 2006. "Laparoscopic colon surgery: past, present and future." Surgical Clinics of North America no. 86 (4):867-97.

Marti-Carvajal, A. J., I. Sola, D. Lathyris, and A. F. Cardona. 2012. "Human recombinant activated protein C for severe sepsis." Cochrane Database of Systematic Reviews no. 3:CD004388.

Marubashi, S., H. Yano, T. Monden, T. Hata, H. Takahashi, S. Fujita, T. Kanoh, T. Iwazawa, S. Matsui, Y. Nakano, H. Tateishi, M. Kinuta, S. Takiguchi, and J. Okamura. 2000. "The usefulness, indications, and complications of laparoscopy-assisted colectomy in comparison with those of open colectomy for colorectal carcinoma." Surgery Today no. 30 (6):491-6.

Masoudi, F. A., E. P. Havranek, P. Wolfe, C. P. Gross, S. S. Rathore, J. F. Steiner, D. L. Ordin, and H. M. Krumholz. 2003. "Most hospitalized older persons do not meet the enrollment criteria for clinical trials in heart failure." American Heart Journal no. 146 (2):250-7.

188

Mathieu, S., I. Boutron, D. Moher, D. G. Altman, and P. Ravaud. 2009. "Comparison of registered and published primary outcomes in randomized controlled trials." JAMA no. 302 (9):977-84.

McCulloch, P., D. G. Altman, W. B. Campbell, D. R. Flum, P. Glasziou, J. C. Marshall, J. Nicholl, J. K. Aronson, J. S. Barkun, J. M. Blazeby, I. C. Boutron, P. A. Clavien, J. A. Cook, P. L. Ergina, L. S. Feldman, G. J. Maddern, B. C. Reeves, C. M. Seiler, S. M. Strasberg, J. L. Meakins, D. Ashby, N. Black, J. Bunker, M. Burton, M. Campbell, K. Chalkidou, I. Chalmers, M. de Leval, J. Deeks, A. Grant, M. Gray, R. Greenhalgh, M. Jenicek, S. Kehoe, R. Lilford, P. Littlejohns, Y. Loke, R. Madhock, K. McPherson, J. Meakins, P. Rothwell, B. Summerskill, D. Taggart, P. Tekkis, M. Thompson, T. Treasure, U. Trohler, and J. Vandenbroucke. 2009. "No surgical innovation without evaluation: the IDEAL recommendations." Lancet no. 374 (9695):1105-12.

McCulloch, P., A. Kaul, G. F. Wagstaff, and J. Wheatcroft. 2005. "Tolerance of uncertainty, extroversion, neuroticism and attitudes to randomized controlled trials among surgeons and physicians." British Journal of Surgery no. 92 (10):1293-7.

McDonald, A. M., R. C. Knight, M. K. Campbell, V. A. Entwistle, A. M. Grant, J. A. Cook, D. R. Elbourne, D. Francis, J. Garcia, I. Roberts, and C. Snowdon. 2006. "What influences recruitment to randomised controlled trials? A review of trials funded by two UK funding agencies." Trials no. 7:9.

McIntosh, M. W. 1996. "The population risk as an explanatory variable in research synthesis of clinical trials." Statistics in Medicine no. 15 (16):1713-28.

McLaren, A. C., C. P. Blokker, P. J. Fowler, J. N. Roth, and M. G. Rock. 1991. "Arthroscopic debridement of the knee for osteoarthrosis." Canadian Journal of Surgery no. 34 (6):595-8.

McLeod, R. S. 1999. "Issues in surgical randomized controlled trials." World Journal of Surgery no. 23 (12):1210-4.

McRae, C., E. Cherin, T. G. Yamazaki, G. Diem, A. H. Vo, D. Russell, J. H. Ellgring, S. Fahn, P. Greene, S. Dillon, H. Winfield, K. B. Bjugstad, and C. R. Freed. 2004. "Effects of perceived treatment on quality of life and medical outcomes in a double-blind placebo surgery trial." Archives of General Psychiatry no. 61 (4):412-20.

Meakins, J. L. 2002. "Innovation in surgery: the rules of evidence." American Journal of Surgery no. 183 (4):399-405.

Mhaskar, R., B. Djulbegovic, A. Magazin, H. P. Soares, and A. Kumar. 2012. "Published methodological quality of randomized controlled trials does not reflect the actual quality assessed in protocols." Journal of Clinical Epidemiology no. 65 (6):602-9.

Miles, Matthew B., and A. M. Huberman. 1994. Qualitative data analysis : an expanded sourcebook. 2nd ed. Thousand Oaks: Sage Publications.

Mills, N., J. L. Donovan, M. Smith, A. Jacoby, D. E. Neal, and F. C. Hamdy. 2003. "Perceptions of equipoise are crucial to trial participation: a qualitative study of men in the ProtecT study." Controlled Clinical Trials no. 24 (3):272-82.

Moher, D., B. Pham, A. Jones, D. J. Cook, A. R. Jadad, M. Moher, P. Tugwell, and T. P. Klassen. 1998. "Does quality of reports of randomised trials affect estimates of intervention efficacy reported in meta-analyses?" Lancet no. 352 (9128):609-13.

189

Moher, D., K. F. Schulz, D. Altman, and Consort Group. 2001. "The CONSORT statement: revised recommendations for improving the quality of reports of parallel-group randomized trials." JAMA no. 285 (15):1987-91.

Montorsi, M., U. Fumagalli, R. Rosati, S. Bona, B. Chella, and C. Huscher. 1995. "Early parietal recurrence of adenocarcinoma of the colon after laparoscopic colectomy." British Journal of Surgery no. 82 (8):1036-7.

Montreuil, B., Y. Bendavid, and J. Brophy. 2005. "What is so odd about odds?" Canadian Journal of Surgery no. 48 (5):400-8.

Morton, S. C., J. L. Adams, M. J. Suttorp, and P. G. Shekelle. 2004. Meta-regression Approaches: What, Why, When, and How? Rockville MD.

Moseley, J. B., K. O'Malley, N. J. Petersen, T. J. Menke, B. A. Brody, D. H. Kuykendall, J. C. Hollingsworth, C. M. Ashton, and N. P. Wray. 2002. "A controlled trial of arthroscopic surgery for osteoarthritis of the knee." New England Journal of Medicine no. 347 (2):81-8.

National Heart Lung Blood Institute Acute Respiratory Distress Syndrome Clinical Trials Network, A. P. Wheeler, G. R. Bernard, B. T. Thompson, D. Schoenfeld, H. P. Wiedemann, B. deBoisblanc, A. F. Connors, Jr., R. D. Hite, and A. L. Harabin. 2006. "Pulmonary-artery versus central venous catheter to guide treatment of acute lung injury." New England Journal of Medicine no. 354 (21):2213-24.

Nduka, C. C., J. R. Monson, N. Menzies-Gow, and A. Darzi. 1994. "Abdominal wall metastases following laparoscopy.[see comment]." British Journal of Surgery no. 81 (5):648-52.

Nelson, H., N. Petrelli, A. Carlin, J. Couture, J. Fleshman, J. Guillem, B. Miedema, D. Ota, D. Sargent, and Panel National Cancer Institute Expert. 2001. "Guidelines 2000 for colon and rectal cancer surgery." J Natl Cancer Inst no. 93 (8):583-96.

Nelson, H., D. J. Sargent, H. S. Wieand, J. Fleshman, M. Anvari, S. J. Stryker, R. W. Beart Jr, M. Hellinger, R. Flanagan Jr, W. Peters, and D. Ota. 2004. "A Comparison of Laparoscopically Assisted and Open Colectomy for Colon Cancer." New England Journal of Medicine no. 350 (20):2050-2059+2114.

Neumayer, L. A., A. A. Gawande, J. Wang, A. Giobbie-Hurder, K. M. Itani, R. J. Fitzgibbons, Jr., D. Reda, O. Jonasson, and C. S. P. Investigators. 2005. "Proficiency of surgeons in inguinal hernia repair: effect of experience and age." Annals of Surgery no. 242 (3):344-8; discussion 348-52.

NHSN Patient Safety Component Manual. National Healthcare Safety Network. Centres for Disease Control and Prevention. 2012. [cited November 15 2012]. Available from http://www.cdc.gov/nhsn/toc_pscmanual.html.

Nicolaides, K., L. Brizot Mde, F. Patel, and R. Snijders. 1994. "Comparison of chorionic villus sampling and amniocentesis for fetal karyotyping at 10-13 weeks' gestation." Lancet no. 344 (8920):435-9.

Norris, S. L., H. K. Holmer, L. A. Ogden, R. Fu, A. M. Abou-Setta, M. S. Viswanathan, and M. L. McPheeters. 2012. Selective Outcome Reporting as a Source of Bias in Reviews of Comparative Effectiveness. Rockville MD.

Nuesch, Eveline, Stephan Reichenbach, Sven Trelle, Anne W. S. Rutjes, Katharina Liewald, Rebekka Sterchi, Douglas G. Altman, and Peter Juni. 2009. "The importance of

http://www.cdc.gov/nhsn/toc_pscmanual.html

190

allocation concealment and patient blinding in osteoarthritis trials: a meta-epidemiologic study." Arthritis and Rheumatism no. 61 (12):1633-41.

Nuesch, Eveline, Sven Trelle, Stephan Reichenbach, Anne W. S. Rutjes, Elizabeth Burgi, Martin Scherer, Douglas G. Altman, and Peter Juni. 2009. "The effects of excluding patients from the analysis in randomised controlled trials: meta-epidemiological study." BMJ no. 339:b3244.

Nuesch, Eveline, Sven Trelle, Stephan Reichenbach, Anne W. S. Rutjes, Beatrice Tschannen, Douglas G. Altman, Matthias Egger, and Peter Juni. 2010. "Small study effects in meta-analyses of osteoarthritis trials: meta-epidemiological study." BMJ no. 341:c3515.

Olivo, S. A., L. G. Macedo, I. C. Gadotti, J. Fuentes, T. Stanton, and D. J. Magee. 2008. "Scales to assess the quality of randomized controlled trials: a systematic review." Physical Therapy no. 88 (2):156-75.

Owens, D. K., K. N. Lohr, D. Atkins, J. R. Treadwell, J. T. Reston, E. B. Bass, S. Chang, and M. Helfand. 2010. "AHRQ series paper 5: grading the strength of a body of evidence when comparing medical interventions--agency for healthcare research and quality and the effective health-care program." Journal of Clinical Epidemiology no. 63 (5):513-23.

Oxford Centre for Evidence-Based Medicine. 2011. Oxford Centre for Evidence-Based Medicine - Levels of evidence (March 2009) 2009 [cited March 1 2011]. Available from http://www.cebm.net/?o=1025

Pagano, Marcello, and Kimberlee Gauvreau. 2000. Principles of biostatistics. 2nd ed. 1 vols. Pacific Grove, CA: Duxbury.

Panesar, S. S., R. Thakrar, T. Athanasiou, and A. Sheikh. 2006. "Comparison of reports of randomized controlled trials and systematic reviews in surgical journals: literature review." Journal of the Royal Society of Medicine no. 99 (9):470-2.

Pannucci, C. J., and E. G. Wilkins. 2010. "Identifying and avoiding bias in research." Plastic and Reconstructive Surgery no. 126 (2):619-25.

Pildal, J., A. Hrobjartsson, K. J. Jorgensen, J. Hilden, D. G. Altman, and P. C. Gotzsche. 2007. "Impact of allocation concealment on conclusions drawn from meta-analyses of randomized trials.[Erratum appears in Int J Epidemiol. 2008 Apr;37(2):422]." International Journal of Epidemiology no. 36 (4):847-57.

Pyorala, S., N. P. Huttunen, and M. Uhari. 1995. "A review and meta-analysis of hormonal treatment of cryptorchidism." Journal of Clinical Endocrinology and Metabolism no. 80 (9):2795-9.

Rahima Nenshi, Nancy Baxter, Erin Kennedy, Susan E. Schultz, Nadia Gunraj, Andrew S. Wilton, David R. Urbach, and Marko Simunovic. 2008. "Surgery for Colon Cancer." In Cancer Surgery in Ontario: ICES Atlas, edited by Simunovic M Urbach DR, Schultz SE. Toronto: Institute for Clinical Evaluative Sciences.

Reeves B.C., Deeks J.J., Higgins J.P.T., and Wells G.A. 2011. "Chapter 13: Including non-randomized studies. In: Higgins JPT, Green S (editors), Cochrane Handbook for Systematic Reviews of Interventions Version 5.1.0 (updated March 2011)." In: The Cochrane Collaboration. www.cochrane-handbook.org.

http://www.cebm.net/?o=1025


191

Reeves, B.C., J.P.T. Higgins, C. Ramsay, B. Shea, P. Tugwell, and G.A. Wells. 2013a. "An introduction to methodological issues when including non-randomised studies in systematic reviews on the effects of interventions." Research Synthesis Methods no. 4 (1):1-11.

Reeves, Barnaby C, Julian PT Higgins, Craig Ramsay, Beverley Shea, Peter Tugwell, and George A. Wells. 2013b. "An introduction to methodological issues when including non-randomised studies in systematic reviews on the effects of interventions." Research Synthesis Methods no. 4 (1):1-11.

Reilly, W. T., H. Nelson, G. Schroeder, H. S. Wieand, J. Bolton, and M. J. O'Connell. 1996. "Wound recurrence following conventional treatment of colorectal cancer. A rare but perhaps underestimated problem." Diseases of the Colon and Rectum no. 39 (2):200-7.

Reimold, S. C., T. C. Chalmers, J. A. Berlin, and E. M. Antman. 1992. "Assessment of the efficacy and safety of antiarrhythmic therapy for chronic atrial fibrillation: observations on the role of trial design and implications of drug-related mortality." American Heart Journal no. 124 (4):924-32.

RMITG. 1994. "Worldwide collaborative observational study and meta-analysis on allogenic leukocyte immunotherapy for recurrent spontaneous abortion. Recurrent Miscarriage Immunotherapy Trialists Group." American Journal of Reproductive Immunology no. 32 (2):55-72.

Ross, S., A. Grant, C. Counsell, W. Gillespie, I. Russell, and R. Prescott. 1999. "Barriers to participation in randomised controlled trials: a systematic review." Journal of Clinical Epidemiology no. 52 (12):1143-56.

Sackett, D. L. 1979. "Bias in analytic research." Journal of Chronic Diseases no. 32 (1-2):51-63.

Sackett, David L., and David L. Sackett. 1991. Clinical epidemiology : a basic science for clinical medicine. 2nd ed. Boston: Little, Brown.

Sanderson, S., I. D. Tatt, and J. P. Higgins. 2007. "Tools for assessing quality and susceptibility to bias in observational studies in epidemiology: a systematic review and annotated bibliography." International Journal of Epidemiology no. 36 (3):666-76.

Saunders, L. D., G. M. Soomro, J. Buckingham, G. Jamtvedt, and P. Raina. 2003. "Assessing the methodological quality of nonrandomized intervention studies." Western Journal of Nursing Research no. 25 (2):223-37.

Savovic, J., H. E. Jones, D. G. Altman, R. J. Harris, P. Juni, J. Pildal, B. Als-Nielsen, E. M. Balk, C. Gluud, L. L. Gluud, J. P. Ioannidis, K. F. Schulz, R. Beynon, N. J. Welton, L. Wood, D. Moher, J. J. Deeks, and J. A. Sterne. 2012. "Influence of reported study design characteristics on intervention effect estimates from randomized, controlled trials." Annals of Internal Medicine no. 157 (6):429-38.

Schulz, K. F., I. Chalmers, R. J. Hayes, and D. G. Altman. 1995. "Empirical evidence of bias. Dimensions of methodological quality associated with estimates of treatment effects in controlled trials." JAMA no. 273 (5):408-12.

192

Schwenk, W., O. Haase, J. Neudecker, and J. M. Muller. 2005. "Short term benefits for laparoscopic colorectal resection." Cochrane Database of Systematic Reviews (3):CD003145.

Shapiro, C. L., and A. Recht. 1994. "Late effects of adjuvant therapy for breast cancer." Journal of the National Cancer Institute. Monographs (16):101-12.

Sharp, S. J., and S. G. Thompson. 2000. "Analysing the relationship between treatment effect and underlying risk in meta-analysis: comparison and development of approaches." Statistics in Medicine no. 19 (23):3251-74.

Sharp, S. J., S. G. Thompson, and D. G. Altman. 1996. "The relation between treatment benefit and underlying risk in meta-analysis." BMJ no. 313 (7059):735-8.

Shikata, S., T. Nakayama, Y. Noguchi, Y. Taji, and H. Yamagishi. 2006. "Comparison of effects in randomized controlled trials with observational studies in digestive surgery." Annals of Surgery no. 244 (5):668-76.

Shikora, S. A., R. Bergenstal, M. Bessler, F. Brody, G. Foster, A. Frank, M. Gold, S. Klein, R. Kushner, and D. B. Sarwer. 2009. "Implantable gastric stimulation for the treatment of clinically severe obesity: results of the SHAPE trial." Surgery for Obesity and Related Diseases no. 5 (1):31-7.

Sinclair, J. C., and M. B. Bracken. 1994. "Clinically useful measures of effect in binary analyses of randomized trials." Journal of Clinical Epidemiology no. 47 (8):881-9.

Smith, A. J., D. K. Driman, K. Spithoff, A. Hunter, R. S. McLeod, M. Simunovic, B. Langer, Colon Expert Panel on, Surgery Rectal Cancer, and Pathology. 2010. "Guideline for optimization of colorectal cancer surgery and pathology." Journal of Surgical Oncology no. 101 (1):5-12.

Solomon, M. J., and R. S. McLeod. 1993. "Clinical studies in surgical journals--have we improved?" Diseases of the Colon and Rectum no. 36 (1):43-8.

Stang, A. 2010. "Critical evaluation of the Newcastle-Ottawa scale for the assessment of the quality of nonrandomized studies in meta-analyses." European Journal of Epidemiology no. 25 (9):603-5.

Sterne, J. A., and M. Egger. 2001. "Funnel plots for detecting bias in meta-analysis: guidelines on choice of axis." Journal of Clinical Epidemiology no. 54 (10):1046-55.

Sterne, J. A., P. Juni, K. F. Schulz, D. G. Altman, C. Bartlett, and M. Egger. 2002. "Statistical methods for assessing the influence of study characteristics on treatment effects in 'meta-epidemiological' research." Statistics in Medicine no. 21 (11):1513-24.

Sterne, J. A., A. J. Sutton, J. P. Ioannidis, N. Terrin, D. R. Jones, J. Lau, J. Carpenter, G. Rucker, R. M. Harbord, C. H. Schmid, J. Tetzlaff, J. J. Deeks, J. Peters, P. Macaskill, G. Schwarzer, S. Duval, D. G. Altman, D. Moher, and J. P. Higgins. 2011. "Recommendations for examining and interpreting funnel plot asymmetry in meta-analyses of randomised controlled trials." BMJ no. 343 (jul22 1):d4002.

Sung, L., J. Hayden, M. L. Greenberg, G. Koren, B. M. Feldman, and G. A. Tomlinson. 2005. "Seven items were identified for inclusion when reporting a Bayesian analysis of a clinical study." Journal of Clinical Epidemiology no. 58 (3):261-8.

Sutton, A. J., K. R. Abrams, D. R. Jones, T. A. Sheldon, and F. Song. 1998. "Systematic reviews of trials and other studies." Health Technology Assessment no. 2 (19):1-276.

193

Swaen, G. M., N. Carmichael, and J. Doe. 2011. "Strengthening the reliability and credibility of observational epidemiology studies by creating an Observational Studies Register." Journal of Clinical Epidemiology no. 64 (5):481-6.

Swank, D. J., S. C. Swank-Bordewijk, W. C. Hop, W. F. van Erp, I. M. Janssen, H. J. Bonjer, and J. Jeekel. 2003. "Laparoscopic adhesiolysis in patients with chronic abdominal pain: a blinded randomised controlled multi-centre trial." Lancet no. 361 (9365):1247-51.

Thompson, S. G., T. C. Smith, and S. J. Sharp. 1997. "Investigating underlying risk as a source of heterogeneity in meta-analysis." Statistics in Medicine no. 16 (23):2741-58.

Thompson, S. G., R. M. Turner, and D. E. Warn. 2001. "Multilevel models for meta-analysis, and their application to absolute risk differences." Statistical Methods in Medical Research no. 10 (6):375-92.

Thompson, S.G., and J. Higgins. 2002. "How should meta-regression analyses be undertaken and interpreted?" Statistics in Medicine no. 21 (11):1559-1573.

Tierney, Jayne F., and Lesley A. Stewart. 2005. "Investigating patient exclusion bias in meta-analysis." International Journal of Epidemiology no. 34 (1):79-87.

Urbach, D. R., and N. N. Baxter. 2004. "Does it matter what a hospital is "high volume" for? Specificity of hospital volume-outcome associations for surgical procedures: analysis of administrative data." BMJ no. 328 (7442):737-40.

van Houwelingen, H. C., L. R. Arends, and T. Stijnen. 2002. "Advanced methods in meta-analysis: multivariate approach and meta-regression." Statistics in Medicine no. 21 (4):589-624.

Van Spall, H. G., A. Toren, A. Kiss, and R. A. Fowler. 2007. "Eligibility criteria of randomized controlled trials published in high-impact general medical journals: a systematic sampling review." JAMA no. 297 (11):1233-40.

Varas-Lorenzo, C., L. A. Garcia-Rodriguez, S. Perez-Gutthann, and A. Duque-Oliart. 2000. "Hormone replacement therapy and incidence of acute myocardial infarction. A population-based nested case-control study." Circulation no. 101 (22):2572-8.

Viechtbauer, W. 2010. "Conducting meta-analyses in R with the metafor package." Journal of Statistical Software no. 36 (3).

Viswanathan, M., and N. D. Berkman. 2011. Development of the RTI Item Bank on Risk of Bias and Precision of Observational Studies. Rockville MD.

Walter, C. J., J. C. Dumville, C. E. Hewitt, K. C. Moore, D. J. Torgerson, P. J. Drew, and J. R. Monson. 2007. "The quality of trials in operative surgery." Annals of Surgery no. 246 (6):1104-9.

Walter, S. D. 2000. "Choice of effect measure for epidemiological data." Journal of Clinical Epidemiology no. 53 (9):931-9.

Wandmacher, Cornelius, and A. I. Johnson. 1995. Metric units in engineering--going SI : how to use the international sytems of measurement units (SI) to solve standard engineering problems. Rev. ed. New York, N.Y.: ASCE Press.

Wang, J. L., T. T. Sun, Y. W. Lin, R. Lu, and J. Y. Fang. 2011. "Methodological reporting of randomized controlled trials in major hepato-gastroenterology journals in 2008 and 1998: a comparative study." BMC Medical Research Methodology no. 11:110.

194

Wells, George A., Beverley Shea, Julian PT Higgins, Jonathan Sterne, Peter Tugwell, and Barnaby C Reeves. 2013. "Checklists of methodological issues for review authors to consider when including non-randomized studies in systematic reviews." Research Synthesis Methods no. 4 (1):63-77.

Wente, M. N., C. M. Seiler, W. Uhl, and M. W. Buchler. 2003. "Perspectives of evidence-based surgery." Digestive Surgery no. 20 (4):263-9.

West, S., V. King, T. S. Carey, K. N. Lohr, N. McKoy, S. F. Sutton, and L. Lux. 2002. "Systems to rate the strength of scientific evidence." Evid Rep Technol Assess (Summ) (47):1-11.

Wexner, S. D., and S. M. Cohen. 1995. "Port site metastases after laparoscopic colorectal surgery for cure of malignancy." British Journal of Surgery no. 82 (3):295-8.

Williams, R. J., T. Tse, W. R. Harlan, and D. A. Zarin. 2010. "Registration of observational studies: is it time?" CMAJ: Canadian Medical Association Journal no. 182 (15):1638-42.

Wolf, B. R., and J. A. Buckwalter. 2006. "Randomized surgical trials and "sham" surgery: relevance to modern orthopaedics and minimally invasive surgery." Iowa Orthopaedic Journal no. 26:107-11.

Wood, Lesley, Matthias Egger, Lise Lotte Gluud, Kenneth F. Schulz, Peter Juni, Douglas G. Altman, Christian Gluud, Richard M. Martin, Anthony J. G. Wood, and Jonathan A. C. Sterne. 2008. "Empirical evidence of bias in treatment effect estimates in controlled trials with different interventions and outcomes: meta-epidemiological study." BMJ no. 336 (7644):601-5.

Yamamoto, H., R. W. Hughes, Jr., K. W. Schroeder, T. R. Viggiano, and E. P. DiMagno. 1992. "Treatment of benign esophageal stricture by Eder-Puestow or balloon dilators: a comparison between randomized and prospective nonrandomized trials." Mayo Clinic Proceedings no. 67 (3):228-36.

Yamamoto, M., J. Okuda, K. Tanaka, K. Kondo, K. Asai, H. Kayano, S. Masubuchi, and K. Uchiyama. 2013. "Evaluating the learning curve associated with laparoscopic left hemicolectomy for colon cancer." American Surgeon no. 79 (4):366-71.

Zmora, O., P. Gervaz, and S. D. Wexner. 2001. "Trocar site recurrence in laparoscopic surgery for colorectal cancer: Myth or real concern?" Surgical Endoscopy no. 15 (8):788-793.

195

Appendix A Literature Search Strategy for the Development of a

Conceptual Framework of Bias in Non-Randomised Studies

Ovid MEDLINE(R) 1946 to January Week 4 2012

# Searches Results

1 scale*.mp. 387231

2 checklist*.mp. 16212

3 check-list*.mp. 2000

4 critic* apprais*.mp. 4672

5 tool*.mp. 306344

6 or/1-5 688068

7 valid*.mp. 332006

8 quality.mp. 560181

9 ((bias* or confounding) and (assess* or measure* or evaluat*)).mp. 67288

10 7 or 8 or 9 904895

11 6 and 10 136206

12 observational stud*.mp. 32638

13 exp Cohort Studies/ 1217162

14 cohort stud*.mp. 171787

15 exp case-control studies/ 579877

16 case control* stud*.mp. 169638

17 Cross-Sectional Studies/ 149188

18 cross sectional stud*.mp. 159276

19 followup stud*.mp. 654

20 follow-up stud*.mp. 470558

21 (nonrandom* adj2 stud*).mp. 2775

22 (non-random* adj2 stud*).mp. 2304

23 or/12-22 1508781

24 11 and 23 28183

25 limit 24 to systematic reviews 1416

196

Appendix B

Literature Search Strategy for the Identification of Comparative Studies Evaluating Laparoscopy versus Conventional Surgery

for Colon Cancer

i) MEDLINE

Ovid MEDLINE(R) 1950 to January 31, 2011.

# Searches

Colon, Colonic Diseases, & Colon Cancer Component

1 exp Colon/

2 exp Colonic Diseases/

3 exp Colorectal Neoplasms/

4 (Adenocarcinom: adj3 (colorect: or colon: or rect: or intestine: or large bowel: or bowel: or anal or anus or perianal or peri-anal or circumanal or sigmoid:)).mp.

5 (Adenom: adj3 (colorect: or colon: or rect: or intestine: or large bowel: or bowel: or anal or anus or perianal or peri-anal or circumanal or sigmoid:)).mp.

6 (Cancer: adj3 (colorect: or colon: or rect: or intestine: or large bowel: or bowel: or anal or anus or perianal or peri-anal or circumanal or sigmoid:)).mp.

7 (Carcinom: adj3 (colorect: or colon: or rect: or intestine: or large bowel: or bowel: or anal or anus or perianal or peri-anal or circumanal or sigmoid:)).mp.

8 (Malignan: adj3 (colorect: or colon: or rect: or intestine: or large bowel: or bowel: or anal or anus or perianal or peri-anal or circumanal or sigmoid:)).mp.

9 (Neoplas: adj3 (colorect: or colon: or rect: or intestine: or large bowel: or bowel: or anal or anus or perianal or peri-anal or circumanal or sigmoid:)).mp.

10 (Tumor: adj3 (colorect: or colon: or rect: or intestine: or large bowel: or bowel: or anal or anus or perianal or peri-anal or circumanal or sigmoid:)).mp.

11 (Tumour: adj3 (colorect: or colon: or rect: or intestine: or large bowel: or bowel: or anal or anus or perianal or peri-anal or circumanal or sigmoid:)).mp.

12 or/1-11

13 exp Intestines/

14

exp neoplasms by histologic type/ or exp neoplasms, hormone-dependent/ or exp neoplasms, multiple primary/ or exp neoplasms, post-traumatic/ or exp neoplasms, radiation-induced/ or exp neoplasms, second primary/ or exp neoplastic processes/ or exp neoplastic syndromes, hereditary/

15 13 and 14

197

16 12 or 15

Laparotomy / Open Surgery Component

17 surgical procedures, elective/

18 exp colectomy/

19 Ostomy/

20 enterostomy/

21 cecostomy/

22 colostomy/

23 ileostomy/

24 jejunostomy/

25 proctocolectomy, restorative/

26 surgically-created structures/

27 colonic pouches/

28 surgical stomas/

29 Laparotomy/

30 laparotom*.mp.

31 minilaparotom*.mp.

32 mini-laparotom*.mp.

33 (open adj3 surg*).mp.

34 (operative adj2 therap*).mp.

35 exp colorectal surgery/

36 General surgery/

37 conventional*.mp.

38 convert*.mp.

39 conversion*.mp.

40 reoperation/

41 suture techniques/

42 surgical stapling/

43 anastomos*.mp.

44 cecostom*.mp.

45 colectom*.mp.

46 coloanal pouch*.mp.

47 colo-anal pouch*.mp.

48 colocolonic.mp.

198

49 colo-colonic.mp.

50 colostom*.mp.

51 diversion?.mp.

52 enterostom*.mp.

53 hartmann*.mp.

54 hemicolectom*.mp.

55 hemi-colectom*.mp.

56 ileocolic*.mp.

57 ileostom*.mp.

58 "j-pouch*".mp.

59 jejunostom*.mp.

60 (mesorectal* adj2 excis*).mp.

61 ostom*.mp.

62 proctectom*.mp.

63 proctocolectom*.mp.

64 rectosigmoidectom*.mp.

65 recto-sigmoidectom*.mp.

66 sigmoidectom*.mp.

67 (surgical* adj2 approach*).mp.

68 (surgical* adj2 therap*).mp.

69 (digestive adj2 (surgic* or surger*)).mp.

70 surgeon*.mp.

71 su.fs.

72 traditional*.mp.

73 conservative*.mp.

74 or/17-73

Laparoscopy & Related Terms Component

75 exp Laparoscopy/

76 exp laparoscopes/

77 laparoscop:.mp.

78 celioscop*.mp.

79 coelioscop*.mp.

80 peritoneoscop*.mp.

81 Laparoendoscop*.mp.

199

82 Laparo-endoscop*.mp.

83 Minimal* invasive*.mp.

84 Surgical Procedures, Minimally Invasive/

85 Video-assisted Surgery/

86 or/75-85

87 16 and 74 and 86 Colon + Open Surgery + Laparoscopic Surgery

Limits:

88 limit 87 to humans

89 limit 88 to english language

90 remove duplicates from 89

200

ii) EMBASE

EMBASE 1980 to January 31, 2011. # Searches

Colon, Colonic Diseases, & Colon Cancer Component 1 exp Colon/ 2 exp Colon Diseases/ 3 exp Rectum Cancer/ 4 exp Colon Cancer/

5 (Adenocarcinom: adj3 (colorect: or colon: or rect: or intestine: or large bowel: or bowel: or anal or anus or perianal or peri-anal or circumanal or sigmoid:)).mp.

6 (Adenom: adj3 (colorect: or colon: or rect: or intestine: or large bowel: or bowel: or anal or anus or perianal or peri-anal or circumanal or sigmoid:)).mp.

7 (Cancer: adj3 (colorect: or colon: or rect: or intestine: or large bowel: or bowel: or anal or anus or perianal or peri-anal or circumanal or sigmoid:)).mp.

8 (Carcinom: adj3 (colorect: or colon: or rect: or intestine: or large bowel: or bowel: or anal or anus or perianal or peri-anal or circumanal or sigmoid:)).mp.

9 (Malignan: adj3 (colorect: or colon: or rect: or intestine: or large bowel: or bowel: or anal or anus or perianal or peri-anal or circumanal or sigmoid:)).mp.

10 (Metasta* adj3 (colorect: or colon: or rect: or intestine: or large bowel: or bowel: or anal or anus or perianal or peri-anal or circumanal or sigmoid:)).mp.

11 (Neoplas: adj3 (colorect: or colon: or rect: or intestine: or large bowel: or bowel: or anal or anus or perianal or peri-anal or circumanal or sigmoid:)).mp.

12 (Tumor: adj3 (colorect: or colon: or rect: or intestine: or large bowel: or bowel: or anal or anus or perianal or peri-anal or circumanal or sigmoid:)).mp.

13 (Tumour: adj3 (colorect: or colon: or rect: or intestine: or large bowel: or bowel: or anal or anus or perianal or peri-anal or circumanal or sigmoid:)).mp.

14 or/1-13 15 exp Intestine/ 16 exp Neoplasm/ 17 15 and 16 18 14 or 17

Laparotomy / Open Surgery Component 19 Elective Surgery/ 20 exp Colon Surgery/ 21 exp Enterostomy/ 22 exp Colorectal Surgery/ 23 (surgically-created adj2 structure*).mp. 24 Surgical Approach/ 25 Laparotomy/

201

26 laparotom*.mp. 27 minilaparotom*.mp. 28 mini-laparotom*.mp. 29 (open adj3 surg*).mp. 30 (operative adj2 therap*).mp. 31 General surgery/ 32 conventional*.mp. 33 convert*.mp. 34 conversion*.mp. 35 reoperation/ 36 suturing method/ 37 surgical stapling/ 38 anastomos*.mp. 39 cecostom*.mp. 40 colectom*.mp. 41 coloanal pouch*.mp. 42 colo-anal pouch*.mp. 43 colocolonic.mp. 44 colo-colonic.mp. 45 colostom*.mp. 46 diversion?.mp. 47 enterostom*.mp. 48 hartmann*.mp. 49 hemicolectom*.mp. 50 hemi-colectom*.mp. 51 ileocolic*.mp. 52 ileostom*.mp. 53 "j-pouch*".mp. 54 jejunostom*.mp. 55 (mesorectal* adj2 excis*).mp. 56 ostom*.mp. 57 proctectom*.mp. 58 proctocolectom*.mp. 59 rectosigmoidectom*.mp. 60 recto-sigmoidectom*.mp. 61 sigmoidectom*.mp. 62 (surgical* adj2 approach*).mp.

202

63 (surgical* adj2 therap*).mp. 64 (digestive adj2 (surgic* or surger*)).mp. 65 surgeon*.mp. 66 su.fs. 67 traditional*.mp. 68 conservative*.mp. 69 or/19-68

Laparoscopy & Related Terms Component 70 exp Laparoscopy/ 71 exp laparoscope/ 72 exp Laparoscopic Surgery/ 73 Endoscopic Surgery/ 74 laparoscop:.mp. 75 celioscop*.mp. 76 coelioscop*.mp. 77 peritoneoscop*.mp. 78 Laparoendoscop*.mp. 79 Laparo-endoscop*.mp. 80 Minimal* invasive*.mp. 81 Minimally Invasive Surgery/ 82 (video-assisted adj2 surg*).mp. 83 (videoassisted adj2 surg*).mp. 84 or/70-83 85 18 and 69 and 84 Colon + Open Sx + Laparoscopic Sx 86 limit 85 to humans 87 remove duplicates from 86 88 limit 87 to english language

203

Appendix C

Criteria for judging risk of bias in the Cochrane Risk of Bias Tool

RANDOM SEQUENCE GENERATION Selection bias (biased allocation to interventions) due to inadequate generation of a randomised sequence. Criteria for a judgement of ‘Low risk’ of bias.

The investigators describe a random component in the sequence generation process such as:

• Referring to a random number table; • Using a computer random number generator; • Coin tossing; • Shuffling cards or envelopes; • Throwing dice; • Drawing of lots; • Minimization*.

*Minimization may be implemented without a random element, and this is considered to be equivalent to being random.

Criteria for the judgement of ‘High risk’ of bias.

The investigators describe a non-random component in the sequence generation process. Usually, the description would involve some systematic, non-random approach, for example:

• Sequence generated by odd or even date of birth; • Sequence generated by some rule based on date (or day) of

admission; • Sequence generated by some rule based on hospital or clinic

record number. Other non-random approaches happen much less frequently than the systematic approaches mentioned above and tend to be obvious. They usually involve judgement or some method of non-random categorization of participants, for example:

• Allocation by judgement of the clinician; • Allocation by preference of the participant; • Allocation based on the results of a laboratory test or a series

of tests; • Allocation by availability of the intervention.

204

ALLOCATION CONCEALMENT Selection bias (biased allocation to interventions) due to inadequate concealment of allocations prior to assignment. Criteria for a judgement of ‘Low risk’ of bias.

Participants and investigators enrolling participants could not foresee assignment because one of the following, or an equivalent method, was used to conceal allocation:

• Central allocation (including telephone, web-based and pharmacy-controlled randomization);

• Sequentially numbered drug containers of identical appearance;

• Sequentially numbered, opaque, sealed envelopes. Criteria for the judgement of ‘High risk’ of bias.

Participants or investigators enrolling participants could possibly foresee assignments and thus introduce selection bias, such as allocation based on:

• Using an open random allocation schedule (e.g. a list of random numbers);

• Assignment envelopes were used without appropriate safeguards (e.g. if envelopes were unsealed or nonopaque or not sequentially numbered);

• Alternation or rotation; • Date of birth; • Case record number; • Any other explicitly unconcealed procedure.

Criteria for the judgement of ‘Unclear risk’ of bias.

Insufficient information to permit judgement of ‘Low risk’ or ‘High risk’. This is usually the case if the method of concealment is not described or not described in sufficient detail to allow a definite judgement – for example if the use of assignment envelopes is described, but it remains unclear whether envelopes were sequentially numbered, opaque and sealed.

BLINDING OF PARTICIPANTS AND PERSONNEL Performance bias due to knowledge of the allocated interventions by participants and personnel during the study. Criteria for a judgement of ‘Low risk’ of bias.

Any one of the following: • No blinding or incomplete blinding, but the review authors

judge that the outcome is not likely to be influenced by lack of blinding;

• Blinding of participants and key study personnel ensured, and unlikely that the blinding could have been broken.


Any one of the following: • No blinding or incomplete blinding, and the outcome is likely to

be influenced by lack of blinding; • Blinding of key study participants and personnel attempted, but

likely that the blinding could have been broken, and the outcome is likely to be influenced by lack of blinding.

Criteria for the Any one of the following:

205

judgement of ‘Unclear risk’ of bias.

• Insufficient information to permit judgement of ‘Low risk’ or ‘High risk’;

• The study did not address this outcome. BLINDING OF OUTCOME ASSESSMENT Detection bias due to knowledge of the allocated interventions by outcome assessors. Criteria for a judgement of ‘Low risk’ of bias.

Any one of the following: • No blinding of outcome assessment, but the review authors

judge that the outcome measurement is not likely to be influenced by lack of blinding;

• Blinding of outcome assessment ensured, and unlikely that the blinding could have been broken.


Any one of the following: • No blinding of outcome assessment, and the outcome

measurement is likely to be influenced by lack of blinding; • Blinding of outcome assessment, but likely that the blinding

could have been broken, and the outcome measurement is likely to be influenced by lack of blinding.


Any one of the following: • Insufficient information to permit judgement of ‘Low risk’ or

‘High risk’; • The study did not address this outcome

INCOMPLETE OUTCOME DATA Attrition bias due to amount, nature or handling of incomplete outcome data. Criteria for a judgement of ‘Low risk’ of bias.

Any one of the following: • No missing outcome data; • Reasons for missing outcome data unlikely to be related to true

outcome (for survival data, censoring unlikely to be introducing bias);

• Missing outcome data balanced in numbers across intervention groups, with similar reasons for missing data across groups;

• For dichotomous outcome data, the proportion of missing outcomes compared with observed event risk not enough to have a clinically relevant impact on the intervention effect estimate;

• For continuous outcome data, plausible effect size (difference in means or standardized difference in means) among missing outcomes not enough to have a clinically relevant impact on observed effect size;

• Missing data have been imputed using appropriate methods. Criteria for the judgement of ‘High risk’ of bias.

Any one of the following: • Reason for missing outcome data likely to be related to true

outcome, with either imbalance in numbers or reasons for missing data across intervention groups;

• For dichotomous outcome data, the proportion of missing

206

outcomes compared with observed event risk enough to induce clinically relevant bias in intervention effect estimate;

• For continuous outcome data, plausible effect size (difference in means or standardized difference in means) among missing outcomes enough to induce clinically relevant bias in observed effect size;

• ‘As-treated’ analysis done with substantial departure of the intervention received from that assigned at randomization;

• Potentially inappropriate application of simple imputation. Criteria for the judgement of ‘Unclear risk’ of bias.

Any one of the following: • Insufficient reporting of attrition/exclusions to permit judgement

of ‘Low risk’ or ‘High risk’ (e.g. number randomized not stated, no reasons for missing data provided);

• The study did not address this outcome. SELECTIVE REPORTING Reporting bias due to selective outcome reporting. Criteria for a judgement of ‘Low risk’ of bias.

Any of the following: • The study protocol is available and all of the study’s pre-

specified (primary and secondary) outcomes that are of interest in the review have been reported in the pre-specified way;

• The study protocol is not available but it is clear that the published reports include all expected outcomes, including those that were pre-specified (convincing text of this nature may be uncommon).


Any one of the following: • Not all of the study’s pre-specified primary outcomes have

been reported; • One or more primary outcomes is reported using

measurements, analysis methods or subsets of the data (e.g. subscales) that were not pre-specified;

• One or more reported primary outcomes were not pre-specified (unless clear justification for their reporting is provided, such as an unexpected adverse effect);

• One or more outcomes of interest in the review are reported incompletely so that they cannot be entered in a meta-analysis;

• The study report fails to include results for a key outcome that would be expected to have been reported for such a study.


Insufficient information to permit judgement of ‘Low risk’ or ‘High risk’. It is likely that the majority of studies will fall into this category.

207

OTHER BIAS Bias due to problems not covered elsewhere in the table. Criteria for a judgement of ‘Low risk’ of bias.

The study appears to be free of other sources of bias.


There is at least one important risk of bias. For example, the study: • Had a potential source of bias related to the specific study

design used; or • Has been claimed to have been fraudulent; or • Had some other problem.


There may be a risk of bias, but there is either: • Insufficient information to assess whether an important risk of

bias exists; or • Insufficient rationale or evidence that an identified problem will

introduce bias. Adapted from Table 8.5.d, Higgins JPT, Altman DG, Sterne JAC (editors). Chapter 8: Assessing risk of bias in included studies. In: Higgins JPT, Green S (editors). Cochrane Handbook for Systematic Reviews of Interventions Version 5.1.0 (updated March 2011). The Cochrane Collaboration, 2011. Available from www.cochrane-handbook.org.



208

Appendix D Comparative Studies of Laparoscopy versus Conventional

Surgery for Colon Cancer Meeting a priori Exclusion Criteria i) Systematic Reviews and Meta-Analyses 1. Abraham NS, Byrne CJ, Young JM, Solomon MJ. Meta-analysis of well-designed

nonrandomized comparative studies of surgical procedures is as good as randomized controlled trials. J Clin Epidemiol. Mar 2010;63(3):238-245.

2. Abraham NS, Byrne CM, Young JM, Solomon MJ. Meta-analysis of non-randomized comparative studies of the short-term outcomes of laparoscopic resection for colorectal cancer.[see comment]. ANZ Journal of Surgery. 2007;77(7):508-516.

3. Abraham NS, Young JM, Solomon MJ. Meta-analysis of short-term outcomes after laparoscopic resection for colorectal cancer.[see comment]. British Journal of Surgery. 2004;91(9):1111-1124.

4. Angst E, Hiatt JR, Gloor B, Reber HA, Hines OJ. Laparoscopic surgery for cancer: A systematic review and a way forward. J Am Coll Surg. September 2010;211 (3):412-423.

5. Bai HL, Chen B, Zhou Y, Wu XT. Five-year long-term outcomes of laparoscopic surgery for colon cancer. World Journal of Gastroenterology. October 21 2010;16 (39):4992-4997.

6. Bonjer HJ, Hop WCJ, Nelson H, et al. Laparoscopically assisted vs open colectomy for colon cancer: A meta-analysis. Archives of Surgery. Mar 2007;142(3):298-303.

7. Chan M. Erratum: Systematic review on the short term outcome of laparoscopic resection for colon and rectosigmoid cancer [5]. Colorectal Disease. Mar 2008;10(3):305-306.

8. Chapman AE, Levitt MD, Hewett P, Woods R, Sheiner H, Maddern GJ. Laparoscopic-assisted resection of colorectal malignancies: a systematic review. Annals of Surgery. 2001;234(5):590-606.

9. Coratti F, Coratti A, Malatesti R, Testi W, Tani F. [Laparoscopic versus open resection for colorectal cancer: meta-analysis of the chief trials]. G Chir. Aug-Sep 2009;30(8-9):377-384.

10. Dowson HM, Cowie AS, Ballard K, Gage H, Rockall TA. Systematic review of quality of life following laparoscopic and open colorectal surgery. Colorectal Disease. 2008;10(8):757-768.

11. Dowson HM, Huang A, Soon Y, Gage H, Lovell DP, Rockall TA. Systematic review of the costs of laparoscopic colorectal surgery. Dis Colon Rectum. 2007;50(6):908-919.

12. Fingerhut A, Ata T, Chouillard E, Alexakis N, Veyrie N. Laparoscopic approach to colonic cancer: critical appraisal of the literature. Digestive Diseases. 2007;25(1):33-43.

13. Gervaz P, Pikarsky A, Utech M, et al. Converted laparoscopic colorectal surgery. Surg Endosc. 2001;15(8):827-832.

209

14. Gujral S, Avery KNL, Blazeby JM. Quality of life after surgery for colorectal cancer: clinical implications of results from randomised trials. Supportive Care in Cancer. 2008;16(2):127-132.

15. Hayes JL, Hansen P. Is laparoscopic colectomy for cancer cost-effective relative to open colectomy? ANZ Journal of Surgery. 2007;77(9):782-786.

16. Hernandez RA, De Verteuil RM, Fraser CM, Vale LD. Systematic review of economic evaluations of laparoscopic surgery for colorectal cancer. Colorectal Disease. 2008;10(9):859-868.

17. Hildebrandt U, Kreissler-Haag D, Lindemann W. [Laparoscopy-assisted colorectal resections: morbidity, conversions, outcomes of a decade]. Zentralbl Chir. 2001;126(4):323-332.

18. Jackson TD, Kaplan GG, Arena G, Page JH, Rogers Jr SO. Laparoscopic Versus Open Resection for Colorectal Cancer: A Metaanalysis of Oncologic Outcomes. J Am Coll Surg. Mar 2007;204(3):439-446.

19. Jun L. Systematic review of laparoscopic versus open surgery for colorectal cancer (Br J Surg 2006; 93; 921-928).[comment]. British Journal of Surgery. 2007;94(2):250; author reply 250.

20. Kahnamoui K, Cadeddu M, Farrokhyar F, Anvari M. Laparoscopic surgery for colon cancer: a systematic review. Can J Surg. 2007;50(1):48-57.

21. Kaido T. Current evidence supporting indications for laparoscopic surgery in colorectal cancer. Hepato-Gastroenterology. 2008;55(82-83):438-441.

22. Kehlet H. Systematic review of laparoscopic versus open surgery for colorectal cancer (Br J Surg 2006; 93: 921-928).[comment]. British Journal of Surgery. 2006;93(11):1434-1435.

23. Korolija D, Tadic S, Simic D. Extent of oncological resection in laparoscopic vs. open colorectal surgery: meta-analysis. Langenbecks Arch Surg. 2003;387(9-10):366-371.

24. Kuhry E, Schwenk W, Gaupset R, Romild U, Bonjer J. Long-term outcome of laparoscopic surgery for colorectal cancer: a cochrane systematic review of randomised controlled trials. Cancer Treatment Reviews. 2008;34(6):498-504.

25. Kuhry E, Schwenk WF, Gaupset R, Romild U, Bonjer HJ. Long-term results of laparoscopic colorectal cancer resection. Cochrane Database of Systematic Reviews. 2008(2):CD003432.

26. Kurian MS, Patterson E, Andrei VE, Edye MB. Hand-assisted laparoscopic surgery: an emerging technique. Surg Endosc. 2001;15(11):1277-1281.

27. Li J, Ding K-f, Zhang S-z. [Meta-analysis of short-term efficacy and safety after laparoscopic resection for colorectal cancer]. Chung Hua I Hsueh Tsa Chih. 2006;86(35):2485-2490.

28. Liang Y, Li G, Chen P, Yu J. Laparoscopic versus open colorectal resection for cancer: a meta-analysis of results of randomized controlled trials on recurrence. European Journal of Surgical Oncology. 2008;34(11):1217-1224.

29. Liang Y-c, Li G-x, Chen P-y, Yu J, Zhang C. [Laparoscopic versus conventional open resection for colorectal cancer: a meta-analysis on recurrence]. Zhonghua Wei Chang Wai Ke Za Zhi. Sep 2008;11(5):414-420.

210

30. Lourenco T, Murray A, Grant A, McKinley A, Krukowski Z, Vale L. Laparoscopic surgery for colorectal cancer: Safe and effective? - A systematic review. Surg Endosc. May 2008;22(5):1146-1160.

31. Lovett BE, Taylor I. Randomized controlled trials in colorectal disease; a review of recent trials. Colorectal Disease. 2001;3(1):58-64.

32. Luck A, Hensman C, Hewett P. Laparoscopic colectomy for cancer: a review. Australian & New Zealand Journal of Surgery. 1998;68(5):318-327.

33. Manterola C, Pineda V, Vial M. [Open versus laparoscopic resection in non-complicated colon cancer. A systematic review]. Cir Esp. 2005;78(1):28-33.

34. Martel G, Boushey RP, Marcello PW. Results of the Laparoscopic Colon Cancer Randomized Trials: An Evidence-Based Review. Seminars in Colon and Rectal Surgery. Dec 2007;18(4):210-219.

35. Maxwell-Armstrong CA, Robinson MH, Scholefield JH. Laparoscopic colorectal cancer surgery. American Journal of Surgery. 2000;179(6):500-507.

36. McLeod RS, Stern H, McKenzie ME. Canadian Association of General Surgeons Evidence Based Reviews in Surgery. 10. Laparoscopy-assisted colectomy versus open colectomy for treatment of nonmetastatic colon cancer: A randomized trial. Can J Surg. Jun 2004;47(3):209-211.

37. Moloo H, Haggar F, Grimshaw JM, et al. Hand assisted laparoscopic surgery versus conventional laparoscopy for colorectal surgery. Cochrane Database of Systematic Reviews. 2007;(3)(CD006585).

38. Murray A, Lourenco T, de Verteuil R, et al. Clinical effectiveness and cost-effectiveness of laparoscopic surgery for colorectal cancer: systematic reviews and economic evaluation. Health Technology Assessment. 2006;10(45):1-141, iii-iv.

39. Noel JK, Fahrbach K, Estok R, et al. Minimally Invasive Colorectal Resection Outcomes: Short-term Comparison with Open Procedures. J Am Coll Surg. Feb 2007;204(2):291-307.

40. Reza MM, Blasco JA, Andradas E, Cantero R, Mayol J. Systematic review of laparoscopic versus open surgery for colorectal cancer.[see comment]. British Journal of Surgery. 2006;93(8):921-928.

41. Sammour T, Kahokehr A, Chan S, Booth RJ, Hill AG. The humoral response after laparoscopic versus open colorectal surgery: a meta-analysis. J Surg Res. Nov 2010;164(1):28-37.

42. Sammour T, Kahokehr A, Connolly AB, Bissett IP, Hill AG. Does laparoscopic colectomy have a higher intraoperative complication rate than open colectomy? Annals of Surgery. March 2010;251 (3):577-578.

43. Sato K, Adachi Y, Kitano S. [Trends in laparoscopic surgery for colorectal cancer: 10-year experience worldwide]. Nippon Geka Gakkai Zasshi. 2001;Journal of Japan Surgical Society. 102(2):236-242.

44. Schaeff B, Paolucci V, Thomopoulos J. Port site recurrences after laparoscopic surgery. A review. Digestive Surgery. 1998;15(2):124-134.

211

45. Schwenk W, Haase O, Neudecker J, Muller JM. Short term benefits for laparoscopic colorectal resection. Cochrane Database of Systematic Reviews. 2005(3):CD003145.

46. Stocchi L, Nelson H. Wound recurrences following laparoscopic-assisted colectomy for cancer. Archives of Surgery. 2000;135(8):948-958.

47. Tilney HS, Lovegrove RE, Purkayastha S, Heriot AG, Darzi AW, Tekkis PP. Laparoscopic vs open subtotal colectomy for benign and malignant disease. Colorectal Disease. Jun 2006;8(5):441-450.

48. Tjandra JJ, Chan MKY. Systematic review on the short-term outcome of laparoscopic resection for colon and rectosigmoid cancer. Colorectal Disease. Jun 2006;8(5):375-388.

49. Vlug MS, Wind J, van der Zaag E, Ubbink DT, Cense HA, Bemelman WA. Systematic review of laparoscopic vs open colonic surgery within an enhanced recovery programme. Colorectal Disease. 2009;11(4):335-343.

50. Yamamoto S, Fujita S, Ishiguro S, Akasu T, Moriya Y. Wound infection after a laparoscopic resection for colorectal cancer. Surgery Today. 2008;38(7):618-622.

i) Biochemical Outcomes 1. Belizon A, Balik E, Feingold DL, et al. Major abdominal surgery increases plasma levels of

vascular endothelial growth factor: Open more so than minimally invasive methods. Annals of Surgery. Nov 2006;244(5):792-798.

2. Bessa X, Castells A, Lacy AM, et al. Laparoscopic-assisted vs. open colectomy for colorectal cancer: influence on neoplastic cell mobilization. J Gastrointest Surg. 2001;5(1):66-73.

3. Bono A, Bianchi PP, Locatelli A, et al. Angiogenic cells, macroparticles and RNA transcripts in laparoscopic vs. open surgery for colorectal cancer. Cancer Biology and Therapy. 01 Oct 2010;10 (7):682-685.

4. Braga M, Vignali A, Zuliani W, et al. Metabolic and functional results after laparoscopic colorectal surgery: a randomized, controlled trial. Dis Colon Rectum. 2002;45(8):1070-1077.

5. Brokelman W, Holmdahl L, Falk P, Klinkenbijl J, Reijnen M. The peritoneal fibrinolytic response to conventional and laparoscopic colonic surgery. J Laparoendosc Adv Surg Tech A. Aug 2009;Part A. 19(4):489-493.

6. Buchmann P, Christen D, Moll C, Flury R. [Intraperitoneal tumor seeding in colorectal carcinoma surgery--a comparison of laparoscopic versus open procedures in a longitudinal study]]. Langenbecks Archiv fur Chirurgie - Supplement - Kongressband. 1996;113:573-576.

7. Buchmann P, Christen D, Moll C, Flury R, Sartoretti C. [Tumor cells in peritoneal irrigation fluid in conventional and laparoscopic surgery for colorectal carcinoma]. Swiss Surgery. 1996;Suppl 4:45-49.

8. Delgado S, Lacy AM, Filella X, et al. Acute phase response in laparoscopic and open colectomy in colon cancer: randomized study. Dis Colon Rectum. 2001;44(5):638-646.

212

9. Evans C, Galustian C, Kumar D, et al. Impact of surgery on immunologic function: comparison between minimally invasive techniques and conventional laparotomy for surgical resection of colorectal tumors. American Journal of Surgery. 2009;197(2):238-245.

10. Fukushima R, Kawamura YJ, Saito H, et al. Interleukin-6 and stress hormone responses after uncomplicated gasless laparoscopic-assisted and open sigmoid colectomy. Dis Colon Rectum. 1996;39(10 Suppl):S29-34.

11. Gelpi JR, Dorsey-Tyler K, Luchtefeld MA, Senagore AJ. Prospective comparison of gastric emptying after laparoscopic-aided colectomy versus open colectomy. American Surgeon. 1996;62(7):594-596; discussion 596-597.

12. Hammer JH, Basse L, Svendsen MN, et al. Impact of elective resection on plasma TIMP-1 levels in patients with colon cancer. Colorectal Disease. 2006;8(3):168-172.

13. Han S-A, Lee WY, Park C-M, Yun SH, Chun H-K. Comparison of immunologic outcomes of laparoscopic vs open approaches in clinical stage III colorectal cancer. International Journal of Colorectal Disease. May 2010;25(5):631-638.

14. Hewitt PM, Ip SM, Kwok SPY, et al. Laparoscopic-assisted vs. open surgery for colorectal cancer: Comparative study of immune effects. Diseases of the Colon and Rectum. Jul 1998;41(7):901-909.

15. Hildebrandt U, Kessler K, Plusczyk T, Pistorius G, Vollmar B, Menger MD. Comparison of surgical stress between laparoscopic and open colonic resections. Surg Endosc. 2003;17(2):242-246.

16. Hu X, Li H-z, Zhang J, An W-d, Zhang C-j. [Evaluation of the minimal invasiveness of laparoscopic operation for colorectal carcinoma]. Zhonghua Wei Chang Wai Ke Za Zhi. Sep 2005;8(5):404-406.

17. Jingli C, Rong C, Rubai X. Influence of colorectal laparoscopic surgery on dissemination and seeding of tumor cells. Surg Endosc. 2006;20(11):1759-1761.

18. Kim SH, Milsom JW, Gramlich TL, et al. Does laparoscopic vs. conventional surgery increase exfoliated cancer cells in the peritoneal cavity during resection of colorectal cancer? Dis Colon Rectum. 1998;41(8):971-978.

19. Kirman I, Cekic V, Poltaratskaia N, et al. Plasma from patients undergoing major open surgery stimulates in vitro tumor growth: Lower insulin-like growth factor binding protein 3 levels may, in part, account for this change. Surgery. 2002;132(2):186-192.

20. Kirman I, Cekic V, Poltaratskaia N, et al. The percentage of CD31<sup>+</sup> T cells decreases after open but not laparoscopic surgery. Surg Endosc. 01 2003;17(5):754-757.

21. Kirman I, Cekic V, Poltoratskaia N, et al. Open surgery induces a dramatic decrease in circulating intact IGFBP-3 in patients with colorectal cancer not seen with laparoscopic surgery. Surg Endosc. 2005;19(1):55-59.

22. Kirman I, Poltaratskaia N, Cekic V, et al. Depletion of circulating insulin-like growth factor binding protein 3 after open surgery is associated with high interleukin-6 levels. Dis Colon Rectum. 2004;47(6):911-917; discussion 917-918.

213

23. Leung KL, Lai PB, Ho RL, et al. Systemic cytokine response after laparoscopic-assisted resection of rectosigmoid carcinoma: A prospective randomized trial. Annals of Surgery. 2000;231(4):506-511.

24. Leung KL, Tsang KS, Ng MHL, et al. Lymphocyte subsets and natural killer cell cytotoxicity after laparoscopically assisted resection of rectosigmoid carcinoma. Surg Endosc. 2003;17(8):1305-1310.

25. Mehigan BJ, Hartley JE, Drew PJ, et al. Changes in T cell subsets, interleukin-6, and C-reactive protein after laparoscopic and open colorectal resection for malignancy. Surg Endosc. 2001;15(11):1289-1293.

26. Neudecker J, Junghans T, Raue W, Ziemer S, Schwenk W. Fibrinolytic capacity in peritoneal fluid after laparoscopic and conventional colorectal resection: data from a randomized controlled trial.[see comment]. Langenbecks Arch Surg. 2005;390(6):523-527.

27. Neudecker J, Junghans T, Ziemer S, Raue W, Schwenk W. Effect of laparoscopic and conventional colorectal resection on peritoneal fibrinolytic capacity: a prospective randomized clinical trial. International Journal of Colorectal Disease. 2002;17(6):426-429.

28. Neudecker J, Junghans T, Ziemer S, Raue W, Schwenk W. Prospective randomized trial to determine the influence of laparoscopic and conventional colorectal resection on intravasal fibrinolytic capacity. Surg Endosc. 2003;17(1):73-77.

29. Neudecker J, Neudecker BA, Raue W, Stern R, Schwenk W. Hyaluronan levels during laparoscopic versus open colonic resections. Surg Endosc. 2008;22(3):660-663.

30. Ordemann J, Jacobi CA, Schwenk W, Stosslein R, Muller JM. Cellular and humoral inflammatory response after laparoscopic and conventional colorectal resections. Surg Endosc. 2001;15(6):600-608.

31. Ozawa A, Konishi F, Nagai H, Okada M, Kanazawa K. Cytokine and hormonal responses in laparoscopic-assisted colectomy and conventional open colectomy. Surgery Today. 2000;30(2):107-111.

32. Schwenk W, Jacobi C, Mansmann U, Bohm B, Muller JM. Inflammatory response after laparoscopic and conventional colorectal resections - results of a prospective randomized trial. Langenbecks Arch Surg. 2000;385(1):2-9.

33. Sietses C, Havenith CEG, Eijsbouts QAJ, et al. Laparoscopic surgery preserves monocyte-mediated tumor cell killing in contrast to the conventional approach. Surg Endosc. May 2000;14(5):456-460.

34. Svendsen MN, Werther K, Christensen IJ, Basse L, Nielsen HJ. Influence of open versus laparoscopically assisted colectomy on soluble vascular endothelial growth factor (sVEGF) and its soluble receptor 1 (sVEGFR1). Inflammation Research. 2005;54(11):458-463.

35. Tan M, Xu FF, Peng JS, et al. Changes in the level of serum liver enzymes after laparoscopic surgery. World Journal of Gastroenterology. 15 2003;9(2):364-367.

36. Tang CL, Eu KW, Tai BC, Soh JGS, Machin D, Seow-Choen F. Randomized clinical trial of the effect of open versus laparoscopically assisted colectomy on systemic immunity in patients with colorectal cancer. British Journal of Surgery. 2001;88(6):801-807.

214

37. Vignali A, Di Palo S, Orsenigo E, Ghirardelli L, Radaelli G, Staudacher C. Effect of prednisolone on local and systemic response in laparoscopic vs. open colon surgery: a randomized, double-blind, placebo-controlled trial. Dis Colon Rectum. 2009;52(6):1080-1088.

38. Voloshin T, Gingis-Velitski S, Shaked Y. The angiogenic profile of colorectal cancer patients following open or laparoscopic colectomy. Cancer Biology and Therapy. 01 Oct 2010;10 (7):686-688.

39. Whelan RL, Franklin M, Holubar SD, et al. Postoperative cell mediated immune response is better preserved after laparoscopic vs open colorectal resection in humans. Surg Endosc. 01 2003;17(6):972-978.

40. Wichmann MW, Huttl TP, Winter H, et al. Immunological effects of laparoscopic vs open colorectal surgery. A prospective clinical study. Archives of Surgery. Jul 2005;140(7):692-697.

41. Wu FPK, Hoekman K, Sietses C, et al. Systemic and peritoneal angiogenic response after laparoscopic or conventional colon resection in cancer patients: a prospective, randomized trial. Dis Colon Rectum. 2004;47(10):1670-1674.

42. Wu FPK, Sietses C, von Blomberg BME, van Leeuwen PAM, Meijer S, Cuesta MA. Systemic and peritoneal inflammatory response after laparoscopic or conventional colon resection in cancer patients: a prospective, randomized trial. Dis Colon Rectum. 2003;46(2):147-155.

43. Zhao G, Xiao G, Huang M-x, Long H-k. [Effect of laparoscopic radical operation on systemic immunity in patients with colorectal cancer]. Zhonghua Wei Chang Wai Ke Za Zhi. Sep 2005;8(5):407-409.

i) Non-English Articles 1. Baccari P, Di Palo S, Redaelli A, Carlucci M, Staudacher C. [Laparoscopic versus

conventional surgery in the treatment of colorectal diseases]. Chir Ital. 2000;52(1):17-27.

2. Bohm B, Schwenk W, Grundel K, Junghans T, Muller JM. [Value of laparoscopic technique in primary colorectal carcinoma]. Chirurg. 1997;68(3):231-236.

3. Brummer S, Sohr D, Ruden H, Gastmeier P. [Surgical site infection rates using a laparoscopic approach: results of the German national nosocomial infections surveillance system]. Chirurg. 2007;78(10):910-914.

4. Buchmann P, Bischofberger U, De Lorenzi D, Christen D. [Early postoperative nutrition after laparoscopic and open colorectal resection]. Swiss Surgery. 1998;4(3):146-155.

5. Buchmann P, Christen D, Buschta G, Sartoretti C. [Intraperitoneal tumor seeding in colorectal carcinoma surgery--follow-up of a comparison of laparoscopic versus open procedure]. Langenbecks Archiv fur Chirurgie - Supplement - Kongressband. 1997;114:1122-1124.

215

6. Buchmann P, Christen D, Flury R, Luthy A, Bischofberger U. [Does laparoscopic colonic carcinoma surgery satisfy the radicality criteria of open surgery?]. Schweizerische Medizinische Wochenschrift. 1995;Journal Suisse de Medecine. 125(39):1825-1829.

7. Chi P, Lin H-m, Chen Y-c, Xu Z-b. [Feasibility of lymphadenectomy with skeletonization in extended right hemicolectomy by hand-assisted laparoscopic surgery]. Zhonghua Wei Chang Wai Ke Za Zhi. Sep 2005;8(5):410-412.

8. Chi P, Lin H-m, Xu Z-b. [Comparison of surgical complication rate between laparoscopic and open radical resection for colorectal cancer]. Zhonghua Wei Chang Wai Ke Za Zhi. May 2006;9(3):221-224.

9. Frasson M, Braga M, Vignali A, et al. [Laparoscopic-assisted versus open surgery for colorectal cancer: postoperative morbidity in a single center randomized trial]. Minerva Chir. 2006;61(4):283-292.

10. Gong T, Wang T. Laparoscopic surgery for colorectal cancer. [Chinese]. World Chinese Journal of Digestology. 18 Jul 2010;18 (20):2121-2126.

11. Gutt CN, Hanisch E. [Laparoscopic resection in comparison with open resection of adenocarcinoma of the colon]. Zeitschrift fur Gastroenterologie. 1998;36(5):471-473.

12. Habr-Gama A, de Silva e Souza Junior AH, Araujo SE. [Videolaparoscopic access in the surgical treatment of colorectal cancer: critical analysis]. Revista Da Associacao Medica Brasileira. 1997;43(4):352-356.

13. Innocenti P, Aceto L, Di Bartolomeo N, et al. [Conventional versus laparoscopic surgery in tumors of the colon]. Supplementi di Tumori: Official Journal of Societa Italiana di Cancerologia. 2002;1(3):S1-4.

14. Junghans T, Raue W, Haase O, Neudecker J, Schwenk W. [Value of laparoscopic surgery in elective colorectal surgery with "fast-track"-rehabilitation]. Zentralbl Chir. 2006;131(4):298-303.

15. Kohler L, Eypasch E, Holthausen U, Troidl H. [Laparoscopic colon resection in carcinoma--beneficial or not?]. Langenbecks Archiv fur Chirurgie - Supplement - Kongressband. 1996;113:577-579.

16. Kohler L, Holthausen U, Troidl H. [Laparoscopic colorectal surgery--attempt at evaluating a new technology].[see comment]. Chirurg. 1997;68(8):794-800; discussion 800.

17. Konishi F, Nagai H, Kanazawa K. Laparoscopic colectomy for colerectal carcinomas. [Japanese]. Japanese Journal of Gastroenterological Surgery. 1999;32(8):2172-2176.

18. Kruger IM, Nilius J, Krings F, Bullermann C. [Analysis of the cost-income ratio for open and laparoscopic sigmoid resection]. Zentralbl Chir. 2004;129(4):285-290.

19. Kube R, Ptok H, Steinert R, et al. [Clinical value of laparoscopic surgery for colon cancer]. Chirurg. 2008;79(12):1145-1150.

20. Kube R, Ptok H, Steinert R, et al. Clinical value of laparoscopic surgery for colon cancer. [German]. Chirurg. December 2008;79 (12):1145-1150.

21. Kuhry E, Saetnan E, Graeslie H, Gaupset R. [Laparoscopic surgery for colorectal cancer]. Tidsskrift for Den Norske Laegeforening. 2007;127(22):2946-2949.

216

22. Mao Z-h, Chen H-z, Li J-w, et al. [Comparison of inflammatory response after laparoscopic and conventional surgery for colorectal carcinoma]. Zhonghua Wei Chang Wai Ke Za Zhi. Jul 2006;9(4):297-300.

23. Martinek L, Dostalik J, Gunka I, Gunkova P, Vavra P. [Comparison of oncological outcomes between laparoscopic and open procedures in non-metastazing colonic carcinomas]. Rozhl Chir. Dec 2009;88(12):725-729.

24. Martinek L, Dostalik J, Vavra P, Gunikova P, Gunka I. [Implementation of POSSUM scoring system in assessing morbidity after laparoscopic colorectal surgery]. Rozhl Chir. 2008;87(1):26-31.

25. Mou Y-p, Yang P, Yan J-f, et al. [Clinical evaluation of laparoscopic radical resection of colon cancer]. Chung Hua Wai Ko Tsa Chih. 2006;44(9):581-583.

26. Pommergaard H-C, Olsen JA, Burgdorf SK, Achiam MP. [Laparoscopic versus right-sided hemicolectomy in cancer of colon therapy]. Ugeskr Laeger. Mar 29 2010;172(13):1034-1038.

27. Procacciante F, Flati D, Diamantini G, et al. [Severe postoperative complications in colorectal surgery for cancer. Incidence related to the techniques employed: open versus laparoscopic colectomy]. Chir Ital. 2008;60(3):329-336.

28. Ptok H, Steinert R, Meyer F, et al. [Long-term oncological results after laparoscopic, converted and primary open procedures for rectal carcinoma. Results of a multicenter observational study]. Chirurg. 2006;77(8):709-717.

29. Qian L-y, Wu J-h, Chen D-j, Li X-r, Wei Y-s. [Comparative study on long-term results of laparoscopic and open radical resection for colorectal carcinoma]. Zhonghua Wei Chang Wai Ke Za Zhi. Jul 2006;9(4):294-296.

30. Ramacciato G, D'Angelo F, Aurello P, et al. [Right hemicolectomy for colon cancer: a prospective randomised study comparing laparoscopic vs. open technique]. Chir Ital. 2008;60(1):1-7.

31. Sazhin VP, Gostkin PA, Soboleva VI, Siatkin DA, Sazhin IV, Bublikov ID. [Complex approach to the complicated forms of colorectal cancer]. Khirurgiia (Mosk). 2010(7):15-19.

32. Sazhin VP, Savel'ev VM, Pigin AS, Malashenko PA. [Role and perspectives of the use of laparoscopic surgery in colo-proctology]. Khirurgiia (Mosk). 1995(5):25-27.

33. Schneider C, Scheidbach H, Scheuerlein H, Kockerling F. [Prospective multicenter study of laparoscopic colorectal surgery. Quality assurance during introduction of new methods]. Zentralbl Chir. 2000;125 Suppl 2:164-168.

34. Schwenk W, Raue W, Haase O, Junghans T, Muller JM. ["Fast-track" colonic surgery-first experience with a clinical procedure for accelerating postoperative recovery]. Chirurg. 2004;75(5):508-514.

35. Shamsia RA. [Quality of life in patients after laparoscopic and open interventions for colonic tumors]. Klinicheskaia Khirurgiia. 2005(1):24-28.

36. Siani LM, Ferranti F, Marzano M, De Carlo A, Quintiliani A. [Five-year oncological results of laparoscopic versus open left hemicolectomy]. Chir Ital. Sep-Dec 2009;61(5-6):579-583.

217

37. Siani LM, Ferranti F, Marzano M, De Carlo A, Quintiliani A. [Laparoscopic versus open right hemicolectomy: 5-year oncology results]. Chir Ital. Sep-Dec 2009;61(5-6):573-577.

38. Smedh K, Strand E, Jansson P, et al. [Rapid recovery after colonic resection. Multimodal rehabilitation by means of Kehlet's method practiced in Vasteras]. Lakartidningen. 2001;98(21):2568-2574.

39. Wang Z-d, Wu Z-y, Li Y, Wu W-l, Lin F. [Clinical efficacy comparison between laparoscopy and open radical resection for 191 advanced colorectal cancer patients]. Zhonghua Wei Chang Wai Ke Za Zhi. Jul 2009;12(4):368-370.

40. Xu M, Yang XB, Shi LG, Wang YJ. Effect of laparoscopic resection on systemic stress responses in colorectal cancer patients. [Chinese]. Journal of Dalian Medical University. 2009;31 (3):328-330.

218

Appendix E Comparative Studies of Laparoscopy versus Conventional Surgery for Colon Cancer Meeting a priori Inclusion Criteria

1. Lohsiriwat V, Lohsiriwat D, Chinswangwatanakul V, Akaraviputh T, Lert-Akyamanee N.

Comparison of short-term outcomes between laparoscopically-assisted vs. transverse-incision open right hemicolectomy for right-sided colon cancer: a retrospective study. World Journal of Surgical Oncology. 2007;5:49.

2. Ng SSM, Leung KL, Lee JFY, Yiu RYC, Li JCM, Hon SSF. Long-term morbidity and oncologic outcomes of laparoscopic-assisted anterior resection for upper rectal cancer: ten-year results of a prospective, randomized trial. Dis Colon Rectum. 2009;52(4):558-566.

3. Lin JH, Whelan RL, Sakellarios NE, et al. Prospective study of ambulation after open and laparoscopic colorectal resection. Surgical Innovation. 2009;16(1):16-20.

4. Shabbir A, Roslani AC, Wong K-S, Tsang CBS, Wong H-B, Cheong W-K. Is laparoscopic colectomy as cost beneficial as open colectomy?[see comment]. ANZ Journal of Surgery. 2009;79(4):265-270.

5. Scarpa M, Erroi F, Ruffolo C, et al. Minimally invasive surgery for colorectal cancer: quality of life, body image, cosmesis, and functional results. Surg Endosc. 2009;23(3):577-582.

6. Kennedy GD, Heise C, Rajamanickam V, Harms B, Foley EF. Laparoscopy decreases postoperative complication rates after abdominal colectomy: results from the national surgical quality improvement program. Annals of Surgery. 2009;249(4):596-601.

7. Zmora O, Hashavia E, Munz Y, et al. Laparoscopic colectomy is associated with decreased postoperative gastrointestinal dysfunction. Surg Endosc. 2009;23(1):87-89.

8. Kemp JA, Finlayson SRG. Outcomes of laparoscopic and open colectomy: a national population-based comparison. Surgical Innovation. 2008;15(4):277-283.

9. Bilimoria KY, Bentrem DJ, Merkow RP, et al. Laparoscopic-assisted vs. open colectomy for cancer: comparison of short-term outcomes from 121 hospitals. J Gastrointest Surg. 2008;12(11):2001-2009.

10. Imai E, Ueda M, Kanao K, et al. Surgical site infection risk factors identified by multivariate analysis for patient undergoing laparoscopic, open colon, and gastric surgery. American Journal of Infection Control. 2008;36(10):727-731.

11. Mirza MS, Longman RJ, Farrokhyar F, Sheffield JP, Kennedy RH. Long-term outcomes for laparoscopic versus open resection of nonmetastatic colorectal cancer. J Laparoendosc Adv Surg Tech A. 2008;Part A. 18(5):679-685.

12. Andersen LPH, Klein M, Gogenur I, Rosenberg J. Incisional hernia after open versus laparoscopic sigmoid resection. Surg Endosc. 2008;22(9):2026-2029.

219

13. Hewett PJ, Allardyce RA, Bagshaw PF, et al. Short-term outcomes of the Australasian randomized clinical study comparing laparoscopic and conventional open surgical treatments for colon cancer: the ALCCaS trial. Annals of Surgery. 2008;248(5):728-738.

14. Bilimoria KY, Bentrem DJ, Nelson H, et al. Use and outcomes of laparoscopic-assisted colectomy for cancer in the United States. Archives of Surgery. 2008;143(9):832-839; discussion 839-840.

15. Varela JE, Asolati M, Huerta S, Anthony T. Outcomes of laparoscopic and open colectomy at academic centers. American Journal of Surgery. 2008;196(3):403-406.

16. Lacy AM, Delgado S, Castells A, et al. The long-term results of a randomized clinical trial of laparoscopy-assisted versus open surgery for colon cancer.[see comment]. Annals of Surgery. 2008;248(1):1-7.

17. Buchanan GN, Malik A, Parvaiz A, Sheffield JP, Kennedy RH. Laparoscopic resection for colorectal cancer. British Journal of Surgery. 2008;95(7):893-902.

18. Nakamura T, Mitomi H, Ihara A, et al. Risk factors for wound infection after surgery for colorectal cancer. World Journal of Surgery. 2008;32(6):1138-1141.

19. Delaney CP, Chang E, Senagore AJ, Broder M. Clinical outcomes and resource utilization associated with laparoscopic and open colectomy using a large national database. Annals of Surgery. 2008;247(5):819-824.

20. Seitz G, Seitz EM, Kasparek MS, Konigsrainer A, Kreis ME. Long-term quality-of-life after open and laparoscopic sigmoid colectomy. Surg Laparosc Endosc Percutan Tech. 2008;18(2):162-167.

21. Lordan JT, Tilney HS, Shirol S, Jourdan I, Gudgeon AM. Does the laparoscopic colorectal surgery learning curve adversely affect the results of colorectal cancer resection? A 3-year prospective study in a district general hospital. Colorectal Disease. 2008;10(4):363-369.

22. Law WL, Fan JKM, Poon JTC, Choi HK, Lo OSH. Laparoscopic bowel resection in the setting of metastatic colorectal cancer. Annals of Surgical Oncology. 2008;15(5):1424-1428.

23. Ihedioha U, Mackay G, Leung E, Molloy RG, O'Dwyer PJ. Laparoscopic colorectal resection does not reduce incisional hernia rates when compared with open colorectal resection.[see comment]. Surg Endosc. 2008;22(3):689-692.

24. Frasson M, Braga M, Vignali A, Zuliani W, Di Carlo V. Benefits of laparoscopic colorectal resection are more pronounced in elderly patients. Dis Colon Rectum. 2008;51(3):296-300.

25. Steele SR, Brown TA, Rush RM, Martin MJ. Laparoscopic vs open colectomy for colon cancer: results from a large nationwide population-based analysis. J Gastrointest Surg. 2008;12(3):583-591.

26. Braga M, Frasson M, Vignali A, Zuliani W, Di Carlo V. Open right colectomy is still effective compared to laparoscopy: results of a randomized trial. Annals of Surgery. 2007;246(6):1010-1014; discussion 1014-1015.

27. Chung CC, Ng DCK, Tsang WWC, et al. Hand-assisted laparoscopic versus open right colectomy: a randomized controlled trial. Annals of Surgery. 2007;246(5):728-733.

220

28. McCloskey CA, Wilson MA, Hughes SJ, Eid GM. Laparoscopic colorectal surgery is safe in the high-risk patient: a NSQIP risk-adjusted analysis.[erratum appears in Surgery. 2008 Feb;143(2):301]. Surgery. 2007;142(4):594-597; discussion 597.e591-592.

29. Hinojosa MW, Murrell ZA, Konyalian VR, Mills S, Nguyen NT, Stamos MJ. Comparison of laparoscopic vs open sigmoid colectomy for benign and malignant disease at academic medical centers. J Gastrointest Surg. 2007;11(11):1423-1429; discussion 1429-1430.

30. Park J-S, Kang S-B, Kim S-W, Cheon G-N. Economics and the laparoscopic surgery learning curve: comparison with open surgery for rectosigmoid cancer. World Journal of Surgery. 2007;31(9):1827-1834.

31. Osarogiagbon RU, Ogbeide O, Ogbeide E, George RK. Hand-assisted laparoscopic colectomy compared with open colectomy in a nontertiary care setting. Clinical Colorectal Cancer. 2007;6(8):588-592.

32. Tong DKH, Law WL. Laparoscopic versus open right hemicolectomy for carcinoma of the colon. J Soc Laparoendosc Surg. 2007;11(1):76-80.

33. Salimath J, Jones MW, Hunt DL, Lane MK. Comparison of return of bowel function and length of stay in patients undergoing laparoscopic versus open colectomy. J Soc Laparoendosc Surg. 2007;11(1):72-75.

34. Napolitano L, Waku M, De Nicola P, et al. Laparoscopic colectomy in colon cancer. A single-center clinical experience. G Chir. 2007;28(4):126-133.

35. Janson M, Lindholm E, Anderberg B, Haglind E. Randomized trial of health-related quality of life after open and laparoscopic surgery for colon cancer. Surg Endosc. 2007;21(5):747-753.

36. MacKay G, Ihedioha U, McConnachie A, Serpell M, Molloy RG, O'Dwyer PJ. Laparoscopic colonic resection in fast-track patients does not enhance short-term recovery after elective surgery.[see comment]. Colorectal Disease. 2007;9(4):368-372.

37. Noblett SE, Horgan AF. A prospective case-matched comparison of clinical and financial outcomes of open versus laparoscopic colorectal resection. Surg Endosc. 2007;21(3):404-408.

38. Law WL, Lee YM, Choi HK, Seto CL, Ho JW. Impact of laparoscopic resection for colorectal cancer on operative outcomes and survival.[see comment]. Annals of Surgery. 2007;245(1):1-7.

39. Liang J-T, Huang K-C, Lai H-S, Lee P-H, Jeng Y-M. Oncologic results of laparoscopic versus conventional open surgery for stage II or III left-sided colon cancers: a randomized controlled trial. Annals of Surgical Oncology. 2007;14(1):109-117.

40. Del Rio P, Dell'Abate P, Soliani P, Tacci S, Arcuri MF, Sianesi M. Standardized laparoscopic right hemicolectomy technique for colon cancer. Minerva Chir. 2006;61(4):293-297.

41. Ng SSM, Li JCM, Lee JFY, Yiu RYC, Leung KL. Laparoscopic total colectomy for colorectal cancers: a comparative study. Surg Endosc. 2006;20(8):1193-1196.

42. Law WL, Lee YM, Choi HK, Seto CL, Ho JWC. Laparoscopic and open anterior resection for upper and mid rectal cancer: an evaluation of outcomes. Dis Colon Rectum. 2006;49(8):1108-1115.

221

43. Nakamura T, Mitomi H, Ohtani Y, et al. Comparison of long-term outcome of laparoscopic and conventional surgery for advanced colon and rectosigmoid cancer. Hepato-Gastroenterology. 2006;53(69):351-353.

44. Lezoche E, Guerrieri M, De Sanctis A, et al. Long-term results of laparoscopic versus open colorectal resections for cancer in 235 patients with a minimum follow-up of 5 years. Surg Endosc. 2006;20(4):546-553.

45. King PM, Blazeby JM, Ewings P, et al. Randomized clinical trial comparing laparoscopic and open surgery for colorectal cancer within an enhanced recovery programme. British Journal of Surgery. 2006;93(3):300-308.

46. Wahl P, Hahnloser D, Chanson C, Givel J-C. Laparoscopic and open colorectal surgery in everyday practice: retrospective study. ANZ Journal of Surgery. 2006;76(1-2):20-27.

47. Gonzalez R, Smith CD, Mason E, et al. Consequences of conversion in laparoscopic colorectal surgery. Dis Colon Rectum. 2006;49(2):197-204.

48. Salloum RM, Bulter DC, Schwartz SI. Economic evaluation of minimally invasive colectomy.[see comment]. J Am Coll Surg. 2006;202(2):269-274.

49. Sample CB, Watson M, Okrainec A, Gupta R, Birch D, Anvari M. Long-term outcomes of laparoscopic surgery for colorectal cancer. Surg Endosc. 2006;20(1):30-34.

50. Sahakitrungruang C, Pattana-arun J, Tantiphlachiva K, Rojanasakul A. Laparoscopic versus open surgery for rectosigmoid and rectal cancer. J Med Assoc Thai. 2005;88 Suppl 4:S59-64.

51. Braga M, Frasson M, Vignali A, Zuliani W, Civelli V, Di Carlo V. Laparoscopic vs. open colectomy in cancer patients: long-term complications, quality of life, and survival. Dis Colon Rectum. 2005;48(12):2217-2223.

52. Vignali A, Di Palo S, Tamburini A, Radaelli G, Orsenigo E, Staudacher C. Laparoscopic vs. open colectomies in octogenarians: a case-matched control study. Dis Colon Rectum. 2005;48(11):2070-2075.

53. Veldkamp R, Kuhry E, Hop WCJ, et al. Laparoscopic surgery versus open surgery for colon cancer: short-term outcomes of a randomised trial. Lancet Oncol. 2005;6(7):477-484.

54. Pokala N, Delaney CP, Senagore AJ, Brady KM, Fazio VW. Laparoscopic vs open total colectomy: a case-matched comparative study. Surg Endosc. 2005;19(4):531-535.

55. Neri V, Ambrosi A, Fersini A, Valentino TP. Right colectomy for cancer: validity of laparoscopic approach. Annali Italiani di Chirurgia. 2004;75(6):649-653.

56. Kaiser AM, Kang J-C, Chan LS, Vukasin P, Beart RW, Jr. Laparoscopic-assisted vs. open colectomy for colon cancer: a prospective randomized trial. J Laparoendosc Adv Surg Tech A. 2004;Part A. 14(6):329-334.

57. Kojima M, Konishi F, Okada M, Nagai H. Laparoscopic colectomy versus open colectomy for colorectal carcinoma: a retrospective analysis of patients followed up for at least 4 years. Surgery Today. 2004;34(12):1020-1024.

58. Vignali A, Braga M, Zuliani W, Frasson M, Radaelli G, Di Carlo V. Laparoscopic colorectal surgery modifies risk factors for postoperative morbidity. Dis Colon Rectum. 2004;47(10):1686-1693.

222

59. Baker RP, Titu LV, Hartley JE, Lee PWR, Monson JRT. A case-control study of laparoscopic right hemicolectomy vs. open right hemicolectomy. Dis Colon Rectum. 2004;47(10):1675-1679.

60. Capussotti L, Massucco P, Muratore A, Amisano M, Bima C, Zorzi D. Laparoscopy as a prognostic factor in curative resection for node positive colorectal cancer: results for a single-institution nonrandomized prospective trial. Surg Endosc. 2004;18(7):1130-1135.

61. Kang JC, Chung MH, Chao PC, et al. Hand-assisted laparoscopic colectomy vs open colectomy: a prospective randomized study. Surg Endosc. 2004;18(4):577-581.

62. Janson M, Bjorholt I, Carlsson P, et al. Randomized clinical trial of the costs of open and laparoscopic surgery for colonic cancer.[see comment]. British Journal of Surgery. 2004;91(4):409-417.

63. Kiran RP, Delaney CP, Senagore AJ, Millward BL, Fazio VW. Operative blood loss and use of blood products after laparoscopic and conventional open colorectal operations. Archives of Surgery. 2004;139(1):39-42.

64. Kayser J, Faber C, Bisdorff J, et al. Review of laparoscopic and open colorectal surgery in the "Zitha" Hospital (Luxembourg) in the year 2002. Bulletin de la Societe des Sciences Medicales du Grand-Duche de Luxembourg. 2003(1):7-16.

65. Inoue Y, Kimura T, Noro H, et al. Is laparoscopic colorectal surgery less invasive than classical open surgery? Quantitation of physical activity using an accelerometer to assess postoperative convalescence. Surg Endosc. 2003;17(8):1269-1273.

66. Basse L, Madsen JL, Billesbolle P, Bardram L, Kehlet H. Gastrointestinal transit after laparoscopic versus open colonic resection. Surg Endosc. 2003;17(12):1919-1922.

67. Kasparek MS, Muller MH, Glatzle J, et al. Postoperative colonic motility in patients following laparoscopic-assisted and open sigmoid colectomy. J Gastrointest Surg. 2003;7(8):1073-1081; discussion 1081.

68. Adachi Y, Sato K, Kakisako K, Inomata M, Shiraishi N, Kitano S. Quality of life after laparoscopic or open colonic resection for cancer. Hepato-Gastroenterology. 2003;50(53):1348-1351.

69. Sklow B, Read T, Birnbaum E, Fry R, Fleshman J. Age and type of procedure influence the choice of patients for laparoscopic colectomy. Surg Endosc. 2003;17(6):923-929.

70. Patankar SK, Larach SW, Ferrara A, et al. Prospective comparison of laparoscopic vs. open resections for colorectal adenocarcinoma over a ten-year period. Dis Colon Rectum. 2003;46(5):601-611.

71. Hasegawa H, Kabeshima Y, Watanabe M, Yamamoto S, Kitajima M. Randomized controlled trial of laparoscopic versus open colectomy for advanced colorectal cancer. Surg Endosc. 2003;17(4):636-640.

72. Senagore AJ, Madbouly KM, Fazio VW, Duepree HJ, Brady KM, Delaney CP. Advantages of laparoscopic colectomy in older patients. Archives of Surgery. 2003;138(3):252-256.

73. Vasilev K, Ivanov P, Gurbev G. Laparoscopic versus conventional colorectal surgery--a comparative trial. Acta chir. 2002;49(2):77-78.

223

74. Law WL, Chu KW, Tung PHM. Laparoscopic colorectal resection: a safe option for elderly patients. J Am Coll Surg. 2002;195(6):768-773.

75. Braga M, Vignali A, Gianotti L, et al. Laparoscopic versus open colorectal surgery: a randomized trial on short-term outcome. Annals of Surgery. 2002;236(6):759-766; disscussion 767.

76. Winslow ER, Fleshman JW, Birnbaum EH, Brunt LM. Wound complications of laparoscopic vs open colectomy. Surg Endosc. 2002;16(10):1420-1425.

77. Lezoche E, Feliciotti F, Paganini AM, Guerrieri M, De Sanctis A, Campagnacci R. Laparoscopic colonic resection. J Laparoendosc Adv Surg Tech A. 2001;Part A. 11(6):401-408.

78. Hong D, Tabet J, Anvari M. Laparoscopic vs. open resection for colorectal adenocarcinoma. Dis Colon Rectum. 2001;44(1):10-18; discussion 18-19.

79. Yamamoto S, Watanabe M, Hasegawa H, Kitajima M. Oncologic outcome of laparoscopic versus open surgery for advanced colorectal cancer. Hepato-Gastroenterology. 2001;48(41):1248-1251.

80. Nishiguchi K, Okuda J, Toyoda M, Tanaka K, Tanigawa N. Comparative evaluation of surgical stress of laparoscopic and open surgeries for colorectal carcinoma. Dis Colon Rectum. 2001;44(2):223-230.

81. Mall JW, Schwenk W, Rodiger O, Zippel K, Pollmann C, Muller JM. Blinded prospective study of the incidence of deep venous thrombosis following conventional or laparoscopic colorectal resection. British Journal of Surgery. 2001;88(1):99-100.

82. Curet MJ, Putrakul K, Pitcher DE, Josloff RK, Zucker KA. Laparoscopically assisted colon resection for colon carcinoma: perioperative results and long-term outcome. Surg Endosc. 2000;14(11):1062-1066.

83. Lezoche E, Feliciotti F, Paganini AM, Guerrieri M, Campagnacci R, De Sanctis A. Laparoscopic colonic resections versus open surgery: a prospective non-randomized study on 310 unselected cases. Hepato-Gastroenterology. 2000;47(33):697-708.

84. Hartley JE, Mehigan BJ, MacDonald AW, Lee PW, Monson JR. Patterns of recurrence and survival after laparoscopic and conventional resections for colorectal carcinoma. Annals of Surgery. 2000;232(2):181-186.

85. Marubashi S, Yano H, Monden T, et al. The usefulness, indications, and complications of laparoscopy-assisted colectomy in comparison with those of open colectomy for colorectal carcinoma. Surgery Today. 2000;30(6):491-496.

86. Kakisako K, Sato K, Adachi Y, Shiraishi N, Miyahara M, Kitano S. Laparoscopic colectomy for Dukes A colon cancer. Surg Laparosc Endosc Percutan Tech. 2000;10(2):66-70.

87. Stocchi L, Nelson H, Young-Fadok TM, Larson DR, Ilstrup DM. Safety and advantages of laparoscopic vs. open colectomy in the elderly: matched-control study. Dis Colon Rectum. 2000;43(3):326-332.

88. Delgado S, Lacy AM, Garcia Valdecasas JC, et al. Could age be an indication for laparoscopic colectomy in colorectal cancer? Surg Endosc. 2000;14(1):22-26.

224

89. Stewart BT, Stitz RW, Lumley JW. Laparoscopically assisted colorectal surgery in the elderly. British Journal of Surgery. 1999;86(7):938-941.

90. Leung KL, Meng WC, Lee JF, Thung KH, Lai PB, Lau WY. Laparoscopic-assisted resection of right-sided colonic carcinoma: a case-control study. J Surg Oncol. 1999;71(2):97-100.

91. Santoro E, Carlini M, Carboni F, Feroce A. Colorectal carcinoma: laparoscopic versus traditional open surgery. A clinical trial. Hepato-Gastroenterology. 1999;46(26):900-904.

92. Schwenk W, Bohm B, Witt C, Junghans T, Grundel K, Muller JM. Pulmonary function following laparoscopic or conventional colorectal resection: a randomized controlled evaluation. Archives of Surgery. 1999;134(1):6-12; discussion 13.

93. Bouvet M, Mansfield PF, Skibber JM, et al. Clinical, pathologic, and economic parameters of laparoscopic colon resection for cancer. American Journal of Surgery. 1998;176(6):554-558.

94. Schwenk W, Bohm B, Muller JM. Postoperative pain and fatigue after laparoscopic or conventional colorectal resections. A prospective randomized trial. Surg Endosc. 1998;12(9):1131-1136.

95. Lacy AM, Delgado S, Garcia-Valdecasas JC, et al. Port site metastases and recurrence after laparoscopic colectomy. A randomized trial. Surg Endosc. 1998;12(8):1039-1042.

96. Khalili TM, Fleshner PR, Hiatt JR, et al. Colorectal cancer: comparison of laparoscopic with open approaches. Dis Colon Rectum. 1998;41(7):832-838.

97. Milsom JW, Bohm B, Hammerhofer KA, Fazio V, Steiger E, Elson P. A prospective, randomized trial comparing laparoscopic versus conventional techniques in colorectal cancer surgery: a preliminary report.[see comment]. J Am Coll Surg. 1998;187(1):46-54; discussion 54-45.

98. Psaila J, Bulley SH, Ewings P, Sheffield JP, Kennedy RH. Outcome following laparoscopic resection for colorectal cancer.[see comment]. British Journal of Surgery. 1998;85(5):662-664.

99. Schwenk W, Bohm B, Haase O, Junghans T, Muller JM. Laparoscopic versus conventional colorectal resection: a prospective randomised study of postoperative ileus and early postoperative feeding. Langenbecks Arch Surg. 1998;383(1):49-55.

100. Leung KL, Kwok SP, Lau WY, et al. Laparoscopic-assisted resection of rectosigmoid carcinoma. Immediate and medium-term results. Archives of Surgery. 1997;132(7):761-764; discussion 765.

101. Goh YC, Eu KW, Seow-Choen F. Early postoperative results of a prospective series of laparoscopic vs. Open anterior resections for rectosigmoid cancers. Dis Colon Rectum. 1997;40(7):776-780.

102. Philipson BM, Bokey EL, Moore JW, Chapuis PH, Bagge E. Cost of open versus laparoscopically assisted right hemicolectomy for cancer. World Journal of Surgery. 1997;21(2):214-217.

103. Ortiz H, Armendariz P, Yarnoz C. Early postoperative feeding after elective colorectal surgery is not a benefit unique to laparoscopy-assisted procedures.[see comment]. International Journal of Colorectal Disease. 1996;11(5):246-249.

225

104. Begos DG, Arsenault J, Ballantyne GH. Laparoscopic colon and rectal surgery at a VA hospital. Analysis of the first 50 cases. Surg Endosc. 1996;10(11):1050-1056.

105. Gellman L, Salky B, Edye M. Laparoscopic assisted colectomy. Surg Endosc. 1996;10(11):1041-1044.

106. Franklin ME, Jr., Rosenthal D, Abrego-Medina D, et al. Prospective comparison of open vs. laparoscopic colon surgery for carcinoma. Five-year results. Dis Colon Rectum. 1996;39(10 Suppl):S35-46.

107. Bokey EL, Moore JW, Chapuis PH, Newland RC. Morbidity and mortality following laparoscopic-assisted right hemicolectomy for cancer. Dis Colon Rectum. 1996;39(10 Suppl):S24-28.

108. Hotokezaka M, Dix J, Mentis EP, Minasi JS, Schirmer BD. Gastrointestinal recovery following laparoscopic vs open colon surgery. Surg Endosc. 1996;10(5):485-489.

109. Fleshman JW, Fry RD, Birnbaum EH, Kodner IJ. Laparoscopic-assisted and minilaparotomy approaches to colorectal diseases are similar in early outcome. Dis Colon Rectum. 1996;39(1):15-22.

110. Lacy AM, Garcia-Valdecasas JC, Pique JM, et al. Short-term outcome analysis of a randomized study comparing laparoscopic vs open colectomy for colon cancer. Surg Endosc. 1995;9(10):1101-1105.

111. Franklin ME, Jr., Rosenthal D, Norem RF. Prospective evaluation of laparoscopic colon resection versus open colon resection for adenocarcinoma. A multicenter study. Surg Endosc. 1995;9(7):811-816.

112. Saba AK, Kerlakian GM, Kasper GC, Hearn AT. Laparoscopic assisted colectomies versus open colectomy. Journal of Laparoendoscopic Surgery. 1995;5(1):1-6.

113. Ramos JM, Beart RW, Jr., Goes R, Ortega AE, Schlinkert RT. Role of laparoscopy in colorectal surgery. A prospective evaluation of 200 cases.[see comment]. Dis Colon Rectum. 1995;38(5):494-501.

114. Van Ye TM, Cattey RP, Henry LG. Laparoscopically assisted colon resections compare favorably with open technique. Surgical Laparoscopy & Endoscopy. 1994;4(1):25-31.

115. Senagore AJ, Luchtefeld MA, Mackeigan JM, Mazier WP. Open colectomy versus laparoscopic colectomy: are there differences? American Surgeon. 1993;59(8):549-553; discussion 553-544.

116. Poon JT, Law WL, Wong IW, et al. Impact of laparoscopic colorectal resection on surgical site infection. Annals of Surgery. January 2009;249(1):77-81.

117. Faiz O, Brown T, Colucci G, Kennedy RH. A cohort study of results following elective colonic and rectal resection within an enhanced recovery programme. Colorectal Disease. 2009;11(4):366-372.

118. Survival after laparoscopic surgery versus open surgery for colon cancer: long-term outcome of a randomised clinical trial. The Lancet Oncology. January 2009;10(1):44-52.

119. Park JS, Kang SB, Kim DW, Lee KH, Kim YH. Laparoscopic versus open resection without splenic flexure mobilization for the treatment of rectum and sigmoid cancer: A study from a

226

single institution that selectively used splenic flexure mobilization. Surgical Laparoscopy, Endoscopy and Percutaneous Techniques. February 2009;19(1):62-68.

120. Chikkappa MG, Jagger S, Griffith JP, Ausobsky JR, Steward MA, Davies JB. In-house colorectal laparoscopic preceptorship: A model for changing a unit's practice safely and efficiently. International Journal of Colorectal Disease. 2009;24(7):771-776.

121. Gameiro M, Eichler W, Schwandner O, et al. Patient mood and neuropsychological outcome after laparoscopic and conventional colectomy. Surgical Innovation. 2008;15(3):171-178.

122. Cermak K, Thill V, Simoens CH, Smets D, Ngongang CH, Mendes Da Costa P. Surgical resection for colon cancer: Laparoscopic assisted vs. open colectomy. Hepato-Gastroenterology. Mar 2008;55(82-83):412-417.

123. King PM, Blazeby JM, Ewings P, Kennedy RH. Detailed evaluation of functional recovery following laparoscopic or open surgery for colorectal cancer within an enhanced recovery programme. International Journal of Colorectal Disease. Aug 2008;23(8):795-800.

124. Boni L, Di Giuseppe M, Bertoglio C, et al. Preliminary results of laparoscopic colorectal resections: Does surgeon's age influences outcomes? Surgical Oncology. Dec 2007;16:57-60.

125. Gonzalez IA, Fernandez EMLT, Pinero YH, et al. Effectiveness of colorectal laparoscopic surgery on patients at high anesthetic risk: An intervention cohort study. International Journal of Colorectal Disease. Jan 2008;23(1):101-106.

126. Fleshman J, Sargent DJ, Green E, et al. Laparoscopic colectomy for cancer is not inferior to open surgery based on 5-year data from the COST Study Group trial. Annals of Surgery. Oct 2007;246(4):655-662.

127. Jayne DG, Guillou PJ, Thorpe H, et al. Randomized trial of laparoscopic-assisted resection of colorectal carcinoma: 3-Year results of the UK MRC CLASICC trial group. Journal of Clinical Oncology. 20 2007;25(21):3061-3068.

128. Choi YS, Lee SI, Lee TG, Kim SW, Cheon G, Kang SB. Economic outcomes of laparoscopic versus open surgery for colorectal cancer in Korea. Surgery Today. Feb 2007;37(2):127-132.

129. Guo DY, Eteuati J, Hung Nguyen M, Lloyd D, Ragg JL. Laparoscopic assisted colectomy: Experience from a rural centre. ANZ Journal of Surgery. Apr 2007;77(4):283-286.

130. Feng B, Zheng MH, Mao ZH, et al. Clinical advantages of laparoscopic colorectal cancer surgery in the elderly. Aging - Clinical and Experimental Research. Jun 2006;18(3):191-195.

131. Franks PJ, Bosanquet N, Thorpe H, et al. Short-term costs of conventional vs laparoscopic assisted surgery in patients with colorectal cancer (MRC CLASICC trial). British Journal of Cancer. 03 2006;95(1):6-12.

132. Delaney CP, Pokala N, Senagore AJ, et al. Is laparoscopic colectomy applicable to patients with body mass index >30? A case-matched comparative study with open colectomy. Diseases of the Colon and Rectum. May 2005;48(5):975-981.

133. Guillou PJ, Quirke P, Thorpe H, et al. Short-term endpoints of conventional versus laparoscopic-assisted surgery in patients with colorectal cancer (MRC CLASICC trial): Multicentre, randomised controlled trial. Lancet. 14 2005;365(9472):1718-1726.

227

134. Zheng MH, Feng B, Lu AG, et al. Laparoscopic versus open right hemicolectomy with curative intent for colon carcinoma. World Journal of Gastroenterology. 21 2005;11(3):323-326.

135. Nelson H, Sargent DJ, Wieand HS, et al. A Comparison of Laparoscopically Assisted and Open Colectomy for Colon Cancer. New England Journal of Medicine. 13 2004;350(20):2050-2059+2114.

136. Leung KL, Kwok SPY, Lam SCW, et al. Laparoscopic resection of rectosigmoid carcinoma: Prospective randomised trial. Lancet. 10 2004;363(9416):1187-1192.

137. Delaney CP, Kiran RP, Senagore AJ, Brady K, Fazio VW. Case-Matched Comparison of Clinical and Financial Outcome after Laparoscopic or Open Colorectal Surgery. Annals of Surgery. Jul 2003;238(1):67-72.

138. Lezoche E, Feliciotti F, Guerrieri M, et al. Laparoscopic versus open hemicolectomy. [Italian, English]. Minerva Chir. Aug 2003;58(4):491-507.

139. Ma HF, Wang HM. Comparison between complications of laparoscopic anterior resction and conventional anterior resection for sigmoid colon cancer. Formosan Journal of Surgery. Jul 2003;36(4):166-172.

140. Feliciotti F, Paganini AM, Guerrieri M, De Sanctis A, Campagnacci R, Lezoche E. Results of laparoscopic vs open resections for colon cancer in patients with a minimum follow-up of 3 years. Surg Endosc. 2002;16(8):1158-1161.

141. Lacy AM, Garcia-Valdecasas JC, Delgado S, et al. Laparoscopy-assisted colectomy versus open colectomy for treatment of non-metastatic colon cancer: A randomised trial. Lancet. 29 2002;359(9325):2224-2229.

142. Champault GG, Barrat C, Raselli R, Elizalde A, Catheline JM. Laparoscopic versus open surgery for colorectal carcinoma: A prospective clinical trial involving 157 cases with a mean follow-up of 5 years. Surgical Laparoscopy, Endoscopy and Percutaneous Techniques. 2002;12(2):88-95.

143. Lujan HJ, Plasencia G, Jacobs M, Viamonte IM, Hartmann RF. Long-term survival after laparoscopic colon resection for cancer: Complete five-year follow-up. Diseases of the Colon and Rectum. 2002;45(4):491-501.

144. Lezoche E, Feliciotti F, Paganini AM, et al. Laparoscopic vs open hemicolectomy for colon cancer: Long-term outcome. Surg Endosc. 2002;16(4):596-602.

145. Weeks JC, Nelson H, Gelber S, Sargent D, Schroeder G. Short-term quality-of-life outcomes following laparoscopic- assisted colectomy vs open colectomy for colon cancer: A randomized trial. Journal of the American Medical Association. 16 2002;287(3):321-328.

146. Braga M, Vignali A, Zuliani W, et al. Training period in laparoscopic colorectal surgery: A case-matched comparative study with open surgery. Surg Endosc. 2002;16(1):31-35.

147. Chen WTL, Chen HC, Chiu CM, Lai YC, Hsu GH, Huang TM. Laparoscopic resection of colorectal cancer. Formosan Journal of Surgery. 2000;33(5):215-220.

148. Chen HH, Wexner SD, Iroatulam AJN, et al. Laparoscopic colectomy compares favorably with colectomy by laparotomy for reduction of postoperative ileus. Diseases of the Colon and Rectum. Jan 2000;43(1):61-65.

228

149. Schwandner O, Schiedeck THK, Killaitis C, Bruch HP. A case-control-study comparing laparoscopic versus open surgery for rectosigmoidal and rectal cancer. International Journal of Colorectal Disease. Aug 1999;14(3):158-163.

150. Stage JG, Schulze S, Moller P, et al. Prospective randomized study of laparoscopic versus open colonic resection for adenocarcinoma. British Journal of Surgery. 1997;84(3):391-396.

151. Ou H. Laparoscopic-assisted mini laparotomy with colectomy. Diseases of the Colon and Rectum. 1995;38(3):324-326.

152. Gray D, Lee H, Schlinkert R, Beart Jr RW. Adequacy of lymphadenectomy in laparoscopic-assisted colectomy for colorectal cancer: A preliminary report. J Surg Oncol. 1994;57(1):8-10.

153. Musser DJ, Boorse RC, Madera F, Reed IJF. Laparoscopic colectomy: At what cost? Surgical Laparoscopy and Endoscopy. 1994;4(1):1-5.

154. Tate JJT, Kwok S, Dawson JW, Lau WY, Li AKC. Prospective comparison of laparoscopic and conventional anterior resection. British Journal of Surgery. 1993;80(11):1396-1398.

155. Peters WR, Bartels TL. Minimally invasive colectomy: Are the potential benefits realized? Diseases of the Colon and Rectum. 1993;36(8):751-756.

156. Falk PM, Beart Jr RW, Wexner SD, et al. Laparoscopic colectomy: A critical appraisal. Diseases of the Colon and Rectum. 1993;36(1):28-34.

157. Wilks JA, Balentine CJ, Berger DH, et al. Establishment of a minimally invasive program at a Veterans' Affairs Medical Center leads to improved care in colorectal cancer patients. American Journal of Surgery. Nov 2009;198(5):685-692.

158. Taylor GW, Jayne DG, Brown SR, et al. Adhesions and incisional hernias following laparoscopic versus open surgery for colorectal cancer in the CLASICC trial. British Journal of Surgery. Jan 2010;97(1):70-78.

159. Allardyce RA, Bagshaw PF, Frampton CM, et al. Australasian Laparoscopic Colon Cancer Study shows that elderly patients may benefit from lower postoperative complication rates following laparoscopic versus open resection. British Journal of Surgery. Jan 2010;97(1):86-91.

160. Neudecker J, Klein F, Bittner R, et al. Short-term outcomes from a prospective randomized trial comparing laparoscopic and open surgery for colorectal cancer. British Journal of Surgery. Dec 2009;96(12):1458-1467.

161. Ptok H, Kube R, Schmidt U, et al. Conversion from laparoscopic to open colonic cancer resection - associated factors and their influence on long-term oncological outcome. European Journal of Surgical Oncology. Dec 2009;35(12):1273-1279.

162. Yin W-Y, Wei C-K, Tseng K-C, et al. Open colectomy versus laparoscopic-assisted colectomy supported by hand-assisted laparoscopic colectomy for resectable colorectal cancer: a comparative study with minimum follow-up of three years. Hepato-Gastroenterology. Jul-Aug 2009;56(93):998-1006.

163. Kim HJ, Lee IK, Lee YS, et al. A comparative study on the short-term clinicopathologic outcomes of laparoscopic surgery versus conventional open surgery for transverse colon cancer. Surg Endosc. Aug 2009;23(8):1812-1817.

229

164. Tan WS, Chew MH, Ooi BS, et al. Laparoscopic versus open right hemicolectomy: A comparison of short-term outcomes. International Journal of Colorectal Disease. 2009;24(11):1333-1339.

165. Faiz O, Warusavitarne J, Bottle A, Tekkis PP, Darzi AW, Kennedy RH. Laparoscopically assisted vs. open elective colonic and rectal resection: A comparison of outcomes in english national health service trusts between 1996 and 2006. Diseases of the Colon and Rectum. October 2009;52(10):1695-1704.

166. Konishi F, Okada M, Nagai H, Ozawa A, Kashiwagi H, Kanazawa K. Laparoscopic-assisted colectomy with lymph node dissection for invasive carcinoma of the colon. Surg Today. 1996;26(11):882-889.

167. Hoffman GC, Baker JW, Fitchett CW, Vansant JH. Laparoscopic-assisted colectomy. Initial experience. Annals of Surgery. Jun 1994;219(6):732-740; discussion 740-733.

168. Abdel-Halim MRE, Moore HM, Cohen P, Dawson P, Buchanan GN. Impact of laparoscopic right hemicolectomy for colon cancer. Ann R Coll Surg Engl. Apr 2010;92(3):211-217.

169. Akiyoshi T, Kuroyanagi H, Fujimoto Y, et al. Short-term outcomes of laparoscopic colectomy for transverse colon cancer. J Gastrointest Surg. May 2010;14(5):818-823.

170. Balentine CJ, Marshall C, Robinson C, et al. Obese patients benefit from minimally invasive colorectal cancer surgery. J Surg Res. Sep 2010;163(1):29-34.

171. Braga M, Frasson M, Zuliani W, Vignali A, Pecorelli N, Di Carlo V. Randomized clinical trial of laparoscopic versus open left colonic resection. British Journal of Surgery. Aug 2010;97(8):1180-1186.

172. da Luz Moreira A, Kiran RP, Kirat HT, et al. Laparoscopic versus open colectomy for patients with American Society of Anesthesiology (ASA) classifications 3 and 4: the minimally invasive approach is associated with significantly quicker recovery and reduced costs. Surg Endosc. Jun 2010;24(6):1280-1286.

173. El-Gazzaz G, Geisler D, Hull T. Risk of clinical leak after laparoscopic versus open bowel anastomosis. Surg Endosc. Aug 2010;24(8):1898-1903.

174. El-Gazzaz G, Hull T, Hammel J, Geisler D. Does a laparoscopic approach affect the number of lymph nodes harvested during curative surgery for colorectal cancer? Surg Endosc. Jan 2010;24(1):113-118.

175. Fujii S, Ota M, Ichikawa Y, et al. Comparison of short, long-term surgical outcomes and mid-term health-related quality of life after laparoscopic and open resection for colorectal cancer: a case-matched control study. International Journal of Colorectal Disease. Nov 2010;25(11):1311-1323.

176. Han KS, Choi GS, Park JS, Kim HJ, Park SY, Jun SH. Short-term outcomes of a laparoscopic left hemicolectomy for descending colon cancer: Retrospective comparison with an open left hemicolectomy. Journal of the Korean Society of Coloproctology. October 2010;26 (5):347-353.

177. Hemandas AK, Abdelrahman T, Flashman KG, et al. Laparoscopic colorectal surgery produces better outcomes for high risk cancer patients compared to open surgery. Annals of Surgery. Jul 2010;252(1):84-89.

230

178. Jayne DG, Thorpe HC, Copeland J, Quirke P, Brown JM, Guillou PJ. Five-year follow-up of the Medical Research Council CLASICC trial of laparoscopically assisted versus open surgery for colorectal cancer. British Journal of Surgery. Nov 2010;97(11):1638-1645.

179. Jiang JK, Chen WS, Wang SJ, Lin JK. A novel lifting system for minimally accessed surgery: A prospective comparison between "Laparo-V" gasless and CO2 pneumoperitoneum laparoscopic colorectal surgery. International Journal of Colorectal Disease. August 2010;25 (8):997-1004.

180. Kiran RP, El-Gazzaz GH, Vogel JD, Remzi FH. Laparoscopic approach significantly reduces surgical site infections after colorectal surgery: data from national surgical quality improvement program. J Am Coll Surg. Aug 2010;211(2):232-238.

181. Kurian AA, Suryadevara S, Vaughn D, et al. Laparoscopic colectomy in octogenarians and nonagenarians: a preferable option to open surgery? J Surg Educ. May-Jun 2010;67(3):161-166.

182. Lian L, Kalady M, Geisler D, Kiran RP. Laparoscopic colectomy is safe and leads to a significantly shorter hospital stay for octogenarians. Surgical Endoscopy and Other Interventional Techniques. August 2010;24 (8):2039-2043.

183. Lloyd GM, Kirby R, Hemingway DM, Keane FB, Miller AS, Neary P. The RAPID protocol enhances patient recovery after both laparoscopic and open colorectal resections. Surg Endosc. Jun 2010;24(6):1434-1439.

184. Madbouly KM, Senagore AJ, Delaney CP. Endogenous morphine levels after laparoscopic versus open colectomy.[Erratum appears in Br J Surg. 2010 Aug;97(8):1314]. British Journal of Surgery. May 2010;97(5):759-764.

185. Maeda T, Tan KY, Konishi F, et al. Accelerated learning curve for colorectal resection, open versus laparoscopic approach, can be attained with expert supervision. Surgical Endoscopy and Other Interventional Techniques. November 2010;24 (11):2850-2854.

186. Marshall CL, Chen GJ, Robinson CN, et al. Establishment of a minimally invasive surgery program leads to decreased inpatient cost of care in veterans with colon cancer. American Journal of Surgery. Nov 2010;200(5):632-635.

187. Morris EJA, Jordan C, Thomas JD, et al. Comparison of treatment and outcome information between a clinical trial and the National Cancer Data Repository. British Journal of Surgery. Feb 2011;98(2):299-307.

188. Nakamura T, Onozato W, Mitomi H, et al. Retrospective, matched case-control study comparing the oncologic outcomes between laparoscopic surgery and open surgery in patients with right-sided colon cancer. Surgery Today. 2009;39(12):1040-1045.

189. Pascual M, Alonso S, Pares D, et al. Randomized clinical trial comparing inflammatory and angiogenic response after open versus laparoscopic curative resection for colonic cancer. British Journal of Surgery. Jan 2011;98(1):50-59.

190. Tei M, Ikeda M, Haraguchi N, et al. Postoperative complications in elderly patients with colorectal cancer: comparison of open and laparoscopic surgical procedures. Surg Laparosc Endosc Percutan Tech. Dec 2009;19(6):488-492.

231

191. Basse L, Jakobsen DH, Bardram L, et al. Functional recovery after open versus laparoscopic colonic resection: a randomized, blinded study. Annals of Surgery. Mar 2005;241(3):416-423.

192. Braga M, Vignali A, Zuliani W, Frasson M, Di Serio C, Di Carlo V. Laparoscopic versus open colorectal surgery: cost-benefit analysis in a single-center randomized trial. Annals of Surgery. Dec 2005;242(6):890-895, discussion 895-896.

232

Appendix F Bayesian Models

A) Meta-Analysis Model (Binary Outcome) model{ for (i in 1:TrialNum){ #Trial[i]~dnorm(0,0.0001) #StudyType[i]~dnorm(0,0.0001)

#Likelihood ControlOC[i]~dbin(pControl[i], ControlTotal[i]) TreatLAP[i]~dbin(pTreat[i], TreatTotal[i]) #Linear model for logit of probability logit(pControl[i]) <- mu[i] logit(pTreat[i]) <- mu[i] + delta[i] #Prior on baseline logit mu[i] ~ dnorm (0, 0.0001) #Prior on log-odds-ratios delta[i] ~ dnorm(d, tau) } #Prior on hyperparameter: mean of log-odds ratio d ~ dnorm(0, 0.00001) populationOR <- exp(d) #Prior for random effects variance tau<- 1/(sd*sd) sd ~ dunif (0,3)

#Generate predictive interval deltaNew~dnorm(d,tau) ORnew<-exp(deltaNew) ProbORnewLT1<-step(1-ORnew)

}

233

B) Meta-Analysis Model (Continuous Outcome) model{ for (i in 1:TrialNum){ #Likelihood ControlOC[i]~dnorm(muControl[i], precisionControl[i]) TreatLAP[i]~dnorm(muTreat[i], precisionTreat[i]) precisionControl[i] <- nC[i]/(sdC[i]*sdC[i]) precisionTreat[i] <- nT[i]/(sdT[i]*sdT[i]) #Linear model for mean muTreat[i] <- muControl[i] + delta[i] #Prior on baseline mean muControl[i] ~ dnorm(0, 0.0001) #Prior on difference in LOS delta[i] ~ dnorm(d, tau) } #Prior on hyperparameter: mean of delta (treatment effect) d ~ dnorm(0, 0.00001) #Prior for random effects variance tau<- 1/(sd*sd) sd ~ dunif (0,15)

#Generate new predictive interval deltaNew~dnorm(d,tau) #probablilty that LOS is -1 or more negative (favorable to LAP) DiffProb <- 1-step(d+1) #probablilty that LOS is -2 or more negative (favorable to LAP) #DiffProb2 <- 1-step(d+2) }

234

C) Sensitivity Analysis Model (Binary Outcome) model{ for (i in 1:TrialNum) { #Trial[i]~dnorm(0,0.0001) #StudyType[i]~dnorm(0,0.0001) #Likelihood ControlOC[i]~dbin(pControl[i], ControlTotal[i]) TreatLAP[i]~dbin(pTreat[i], TreatTotal[i]) #Linear model for logit of probability logit(pControl[i]) <- mu[i] logit(pTreat[i]) <- mu[i] + delta[i] #Prior on baseline logit mu[i] ~ dnorm (0, 0.0001) #Prior on log-odds-ratios delta.star[i] ~ dnorm(d, tau) #Model DESIGN + Baseline Rate + Year delta[i] <- delta.star[i]+beta*(DESIGN[i]) + beta.rate*(mu[i]-mean.mu) + beta.year*(year[i]-mean(year[])) #Dummy Statements #DESIGN[i]~dnorm(0,1) #year[i]~dnorm(0,1) } #mean.mu, mean(mu[]) mean.mu <- -4.2 #prior on hyperparameter: mean of log-odds ratio d ~ dnorm(0, 0.00001) #prior for random effects variance tau<- 1/(sd*sd) sd ~ dunif (0,3) v<- sd*sd

235

#Prior for betas beta~dnorm(0,0.1) beta.year~dnorm(0,0.1) beta.rate~dnorm(0,0.1) PopulationOR<-exp(d) ROR.year<-exp(beta.year) ROR.rate<-exp(beta.rate) ROR.design<-exp(beta) } }

236

D) Sensitivity Analysis Model (Continuous Outcome)

model{ for (i in 1:TrialNum){ #Likelihood ControlOC[i]~dnorm(muControl[i], precisionControl[i]) TreatLAP[i]~dnorm(muTreat[i], precisionTreat[i]) precisionControl[i] <- nC[i]/(sdC[i]*sdC[i]) precisionTreat[i] <- nT[i]/(sdT[i]*sdT[i]) #Linear model for mean muTreat[i] <- muControl[i] + delta[i] #Prior on baseline mean muControl[i] ~ dnorm (0, 0.0001) #Prior on difference in LOS deltastar[i] ~ dnorm(d, tau) #Treatment effects at the average control group LOS delta[i]<- deltastar[i] + beta*(DESIGN[i]) + beta.rate*( muControl[i]-mean.mu) + beta.year*( year[i]-mean(year[]) ) year[i]~dnorm(0,1) } mean.mu<- 4.2 #Prior on hyperparameter: mean of delta (treatment effect) d ~ dnorm(0, 0.00001) #Prior for random effects variance tau<- 1/(sd*sd) sd ~ dunif (0,15) #Prior for the beta beta~dnorm(0,0.1) beta.rate~dnorm(0,0.1) beta.year~dnorm(0,0.1) }

237

Appendix G Bayesian Meta-Analysis Results

A) Bayesian Meta-Analysis Results

Table G.1 Bayesian random-effects meta-analysis results for studies reporting post-operative complications.

# of Studies OR* 95% CrI† sd§ Probability OR<1♦ All Studies 99 0.61 0.54,0.69 0.44 0.87 NRS 79 0.59 0.51,0.68 0.46 0.87 RCTs 20 0.71 0.53,0.91 0.42 0.80 Typical RCTs 16 0.60 0.41,0.84 0.46 0.87 Strong RCTs 4 0.99 0.64,1.44 0.31 0.58

* Odds Ratio. OR<1 indicates that laparoscopy is associated with fewer post-operative complications † Credible Interval § Standard deviation (between-study heterogeneity) ♦ Probability that the OR<1or more negative (favoring laparoscopy).

Table G.2 Bayesian random-effects meta-analysis results for studies reporting mortality.

# of Studies OR* 95% CrI† sd§ Probability OR<1♦ All Studies 96 0.56 0.44, 0.70 0.40 0.93 NRS 79 0.51 0.39, 0.66 0.44 0.93 RCTs 17 0.84 0.48, 1.39 0.32 0.72 Typical RCTs 13 0.99 0.30, 2.50 0.80 0.59 Strong RCTs 4 0.95 0.00, 2.52 0.66 0.66

* Odds Ratio. OR<1 indicates that laparoscopy is associated with lower mortality. † Credible Interval § Standard deviation (between-study heterogeneity) ♦ Probability that the OR<1or more negative (favoring laparoscopy).

238

Table G.3 Bayesian random-effects meta-analysis results for studies reporting length of stay.

# of Studies MD* 95% CrI† sd§ Probability MD>-1♦ All Studies 128 -2.74 -3.13, -2.35 1.97 1.00 NRS 106 -2.94 -3.40, -2.49 2.07 1.00 RCTs 22 -1.82 -2.53, -1.12 1.39 0.99 Typical RCTs 18 -2.16 -2.98, -1.32 1.42 1.00 Strong RCTs 4 -0.74 -2.30, 0.73 1.17 0.28

* Mean Difference, MD=meanlaparoscopy-meanopen. A MD<0 indicates that laparoscopy is associated with a shorter length of stay. † Credible Interval § Standard deviation (between-study heterogeneity) ♦ Probability that the MD is -1 or more negative (favoring laparoscopy).

Table G.4 Bayesian random-effects meta-analysis results for studies reporting number of lymph nodes harvested.

# of Studies MD* 95% CrI† sd§ Probability MD>-1♦ All Studies 76 -0.02 -0.52, 0.48 1.76 0.00 NRS 59 0.08 -0.55, 0.08 1.98 0.00 RCTs 17 -0.34 -1.03, 0.39 1.00 0.03 Typical RCTs 13 -0.25 -1.29, 0.75 1.20 0.06 Strong RCTs 4 -0.47 -2.64, 1.85 1.68 0.23 * Mean Difference, MD=meanlaparoscopy-meanopen. A MD<0 indicates that laparoscopy is associated with

finding fewer lymph nodes in the surgical specimen. † Credible Interval § Standard deviation (between-study heterogeneity) ♦ Probability that the MD is -1 or more negative (favoring open surgery).