Marginal structural models demonstrate treatment …...Marginal structural models demonstrate...

Marginal structural models demonstrate treatment time effect on mortality in multi-system trauma patients with femoral shaft fracture

by

Saam Morshed

A dissertation submitted in partial satisfaction of the

Requirements for the degree of

Doctor of Philosophy

in

Epidemiology

in the

Graduate Division

of the

University of California, Berkeley

Committee in charge:

Professor John M. Colford, Chair

Professor Alan Hubbard

Professor Tony M. Keaveny

Spring 2010


© 2010

by Saam Morshed

i

Table of Contents

CHAPTER 1: OPTIMAL TIMING OF FRACTURE STABILIZATION AMONG MULTISYSTEM TRAUMA PATIENTS: LITERATURE REVIEW 1

REVIEW OF CLINICAL STUDIES 1 METHODOLOGICAL LIMITATIONS 3 STUDY QUESTION 5

CHAPTER 2 : CONVENTIONAL AND COUNTERFACTUAL APPROACHES TO CONFOUNDING AND COVARIATE ADJUSTMENT IN OBSERVATIONAL STUDIES 6

BASICS PRINCIPLES OF STATISTICAL INFERENCE 6 BIAS, CONFOUNDING, AND COUNTERFACTUALS 7 CONTROLLING CONFOUNDING 8 COMMENTS 15

CHAPTER 3: APPLICATION OF MARGINAL STRUCTURAL MODELS TO ESTIMATE CAUSAL EFFECTS OF TIMING OF TREATMENT OF FEMORAL SHAFT FRACTURE 24

STUDY COHORT AND DATA SOURCE 24 TREATMENT VARIABLES 25 MAIN OUTCOME MEASURE 25 STATISTICAL METHODS 25

CHAPTER 4: TESTING THE EXPERIMENTAL TREATMENT ASSIGNMENT ASSUMPTION 31

GRAPHICAL EVALUATION OF THE EXPERIMENTAL TREATMENT ASSIGNMENT ASSUMPTION 32 MONTE CARLO SIMULATION APPROACH 32 STANDARDIZED RISK RATIO APPROACH 33

CHAPTER 5: RESULTS 38

INVERSE PROBABILITY OF TREATMENT-WEIGHTED ANALAYSIS 38 LOGISTIC REGRESSION ANALYSIS 39 STANDARDIZED RISK RATIOS ANALYSIS 39

CHAPTER 6: DISCUSSION 50

CLINICAL COMMENTS 50 METHODOLOGICAL COMMENTS 52 FINAL COMMENTS 54

REFERENCES: 55

ii

For Nooshin, Kea and Darya who allowed this endeavor, I am grateful.

I love you.

1

Abstract


by

Saam Morshed

Doctor of Philosophy in Epidemiology

University of California, Berkeley

Professor John M. Colford, Chair

Context: Fractures of the femoral shaft are common and have potentially serious consequences in patients with multiple traumatic injuries. The appropriate timing of fracture repair is controversial. Objective: To assess the effect of timing of internal fixation on mortality in multi-system trauma patients Design: Retrospective cohort. Setting: Data from public and private Trauma Centers throughout the United States reported to the National Trauma Data Bank (version 5.0 for 2000-2005). Patients: 3069 multi-system trauma patients (Injury Severity Score >15) who underwent internal fixation of femoral shaft fractures. Intervention: Time to treatment was defined in categories as the time from admission to internal fixation: t0<12 hours; 12<t1<24; 24<t2<48; 48<t3<120; 120<t4. Main Outcome Measure: The relative risk of in-hospital mortality comparing the four later periods to the earliest one was estimated using logistic regression, multivariate standardized risk ratios, and Inverse Probability of Treatment Weighting (IPTW). Subgroups with serious head, chest, abdominal and additional extremity injury were investigated. Results: The mortality risk estimated by IPTW was significantly lower in several time categories: t1 (RR=0.45, 95%CI: 0.15 – 0.98, p=0.03), t2 (RR=0.83, 95%CI: 0.43 – 1.44, p=0.49), t3 (RR=0.58, 95%CI: 0.28 – 0.93, p=0.03) and t4 (RR=0.43, 95%CI: 0.097 – 0.94, p=0.03). Those with serious abdominal trauma (Abbreviated Injury Score, AIS>3) experienced the greatest benefit from delay of internal fixation beyond 12 hours (RR=0.82 for AIS<3, 95%CI: 0.54 – 1.35; versus RR=0.36 for AIS>3, 95%CI: 0.13 – 0.87; p-value for interaction =0.09). The results of the logistic regression and multivariate standardized risk ratios agreed

2

with IPTW estimates. Qualitative and quantitative testing of modeling assumptions supported the validity of the marginal IPTW estimates. Conclusion: Delayed repair of femoral shaft fractures beyond 12 hours in multi-system trauma patients significantly reduced mortality for three of the four treatment periods studied. Patients with serious abdominal injury benefit most from delayed treatment. This work applies causal inference methodology and assumption testing that is well suited to orthopaedic therapeutic studies when randomization is not possible or ethical.

Chapter 1: Optimal timing of fracture stabilization among multisystem trauma patients: Literature review

Surgical treatment of orthopaedic injuries in severely injured patients has been one of the most significant advances in trauma care in the past century. This achievement has been exemplified by internal fixation of femoral shaft fractures which markedly reduced patient discomfort and the morbidity of protracted skeletal traction and immobilization1-3. While femoral shaft fractures may be surgically stabilized using a variety of methods including external fixation (trans-cutaneous pins linked by bars) and plate osteosynthesis (plate fixation using bone screws), medullary fixation (metal rod inserted in the long-bone medullary canal) has been the most successful method with heeling rates approaching 100% with minimal complications4-6. First performed by Kuntcher in 19397, the medullary nail was brought to North America by repatriated prisoners of war and the technique slowly followed with the first clinical series published by Clawson and Hansen8. Since that time this technique has become the “gold standard” definitive surgical treatment for fracture of the femoral shaft. The clinical result from observational studies have been so dramatic and reproducible that no randomized controlled trial has ever been undertaken to establish superiority of operative versus non-operative management of femoral shaft fracture. However, the optimal timing of surgical management of long-bone fractures in multi-system trauma patients is controversial. This controversy has persisted because of inconsistency in way that the question has been framed as well as methodological inadequacies in how researchers have attempted to answer it.

Review of Clinical Studies Several observational studies1, 9-15 and one randomized study 16 have suggested clinical benefits from early stabilization of major orthopaedic injuries—those of the shaft of the femur being most common—in reducing the incidence of pulmonary complications and mortality. Riska1 was one of the first authors to show a reduction in fat embolism syndrome (FES) in multi-system trauma patients undergoing early (time not defined) fixation of long-bone, pelvic and spine fracture versus a historical conservatively (non-operative) treated control group with similar injury characteristics (4.5% versus 22%). The endpoint of this and many other similar studies was FES, an inconsistently defined clinical syndrome thought to be due to the embolization of marrow contents to the lungs manifested by: “snow-storm” infiltration on chest radiograph, arterial partial pressure of oxygen less than 60mmHg, skin petechial hemorrhages, aberrance of consciousness, progressive anemia, tachypnea, and thrombocytopenia. Given that the validity or reliability of this definition has never been proven and that acute lung injury in the post trauma setting is now believed to be secondary to many other factors, references to FES have mostly disappeared from the literature. A more current, but still flawed definition of pulmonary end-organ injury is Adult Respiratory Distress Syndrome (ARDS)17 which is determined on the basis of: 1) acuity, 2) bilateral infiltrates on chest radiographs without a cardiogenic cause (i.e. no left heart failure), 3) an arterial partial pressure oxygen to fraction of inspired oxygen ratio of less than 200. Again, this definition has been shown to have limited intra-observer reliability but has served to define pulmonary end-organ injury in the majority of studies assessing the importance of timing of fracture fixation among multi-system trauma patients. Johnson10, in a retrospective observational study of patients with multi-system trauma

1

and at least two major long-bone fractures, found that early operative stabilization was associated with a five fold increase in the odds of ARDS when controlling for the Injury Severity Score.

The Injury Severity Score (ISS)18, 19 is one of the first and most commonly used of many systems of grading overall injury severity and was borne from efforts to improving triaging of patient during the early stages of regionalization of trauma systems in the Unites States. It has since become an instrumental tool in risk adjustment for research on trauma patients. It is comprised of the Abbreviated Injury Score (AIS)20 which assigns a severity grade from minor (0) to unsurvivable (6) to each of a patient’s injuries. The body is divided into six regions—head and neck, face, chest, abdomen, extremity and pelvis, skin—and the highest AIS graded injury from the three most severely involved regions are summed to give the ISS. The ISS has been criticized for it’s non-additivity, non-parametric distribution, and failure to account for more than one severe injury in a given region21-23. While improvements to the ISS have been suggested such as the New Injury Severity Score (NISS)19, which sums the squares of he highest AIS regardless of body region, the use of the ISS is ubiquitous in injury research pertaining to optimal timing of fracture care.

As definitive internal fixation techniques were popularized, enthusiasm for their apparent ability to allow patients who would have otherwise been immobilized for weeks or months to mobilize earlier expanded into a belief that earlier surgery was causally responsible for improving outcomes. Behrman13 performed a retrospective cohort study of multi-system trauma patients with femur fractures and found that those treated early (less than 48 hours from admission) had reduced pulmonary complications included ARDS and pneumonia. Despite the observational nature of the study and lack of any formal confounding adjustment, these authors concluded, “Delayed femur fixation increased the incidence of pulmonary complications.” Similar unfounded causal assertions were made by Charash14 who found few complications and decreased total hospital and intensive care unit length of stay in patients treated less than 24 hours from admission with medullary nails.

Bone16 performed the first and only published randomized controlled trial testing the hypothesis that early definitive management of femoral fractures reduced morbidity. They randomized 178 patients with femoral fractures, 83 of which had an ISS of greater than 18 (one of several thresholds defining the multi-system trauma patient). Despite small sample size, they were able to show a significantly higher rate of the composite endpoint of ARDS, pulmonary dysfunction, or FES in patients randomized to late (more than 48 hours from admission) treatment. However, there was a ten-fold higher rate of chest injury at the time of admission that was not balanced by the randomization or adjusted for in the analysis. These initial chest injuries may have led to higher rates of the pulmonary complications in the late treatment group.

While it has been shown that patients with isolated femoral shaft fracture have reduced morbidity with early treatment 24 and perhaps those with multiple injuries and higher ISS do as well as suggested by the aforementioned papers, the question has been posed as to whether certain associated injuries patterns may define subgroups of patients that do poorly with early treatment. Pape25 studied 106 patients with mid-shaft femoral fractures and an ISS greater than 18. Patients were treated either prior to or later than 24 hours from admission and stratified by presence or absence of chest trauma (AIS greater than or equal to 2). Those with chest injury treated early had a higher incidence of ARDS (p<0.05) and mortality (not statistically significant) than those without severe chest injury that were treated early, though a formal test of

2

interaction was not reported. These authors and others26, 27 have found higher incidence of inflammatory deregulation (primarily increased serum interleukin-6) and organ dysfunction (ARDS and pneumonia) with treatment up to five days from admission. Similar susceptibility to ongoing injury from early surgery has been suggested among multi-system trauma patients with severe head injuries (AIS greater than or equal to 2)28 though how acute physiologic perturbations translate into long-term functional effects has been questioned29.

Studies that suggest adverse effects of early treatment among the most badly injured patient such as those with associated severe head or chest injury attempt to establish coherence with the “two-hit” model for post traumatic organ failure where systemic hypo-perfusion30 and inflammation31-33 may increase susceptibility to end-organ injury and multiple organ failure if further stress (iatrogenic in the case of fracture surgery that typically involves large volume blood loss and systemic embolization of marrow contents) is prematurely undertaken. Several different pathways have been implicated in the development of end-organ injury and MOF in multi-system trauma patients, yet no central paradigm of causation has emerged. The degree and duration of hypo-perfusion, or inadequate resuscitation (with fluids, electrolytes, and/or blood products), is known to be an important risk factor in predicting subsequent complications among the critically injured34, 35. Hypo-perfusion is typically measured by global measures of oxygen delivery, namely arterial base deficit or lactate36. Several investigators have found that admission base deficit and lactate correlated with increased need for blood transfusion, length of hospitalization, and end-organ injury such as ALI37, 38. Because the optimal method for the assessment of the adequacy of resuscitation is not clear36 it is believed that hypo-perfusion often remains occult and unrecognized prompting increased emphasis on early and aggressive resuscitative efforts prior to major secondary surgery such as internal fixation of long-bones. Still, how long-bone fractures and the care contribute to these complex metabolic processes is still poorly understood.

Methodological Limitations In an attempt to draw inferences to guide clinical decision-making based on extant

clinical literature on the topic of optimal timing for femoral fracture fixation in multisystem trauma patients, several systematic reviews have been undertaken. A meta-analysis39 found a large relative risk reduction (RR 0.30, 95% confidence interval 0.22 – 0.40) in respiratory complications with early operative fixation (usually within 24 hours). However, quantitatively pooling these studies may not be appropriate. Differences in treatment definition (both time cut-offs and type of fixation used) and outcome (mortality versus length of hospitalization versus unreliably defined clinical complications) introduce an important source of possible information or measurement bias and because of their heterogeneity, confuse the clinical utility due to lack of a clear target population. Moreover, heterogeneity of treatment definitions and outcomes assessed as well as lack of rigorous confounding adjustment make it is difficult to draw valid inferences from published summaries of these data39-41. Two systematic reviews40, 41 qualitatively summarized this literature and reported inadequate evidence to suggest difference in morbidity or mortality between early (usually less than 24 hours) versus late operative treatment of femoral shaft fractures, overall or within subgroups of patients with associated head injuries or thoracic injuries. These inconsistent findings have left clinicians debating what causal effect definitive fracture care has on adverse outcomes.

3

Problems with treatment definition are numerous. Treatment time has typically been dichotomized so as to compare fixation before to fixation after twenty-four or forty-eight hours (most commonly used cut-points). While categorization of a continuous measure such as time helps to draw clinically relevant distinctions, a simple dichotomy needlessly limits the surgeon’s decision-making possibilities when more choices to represent this continuum would better match reality. Second, the term treatment has come to mean many different things in the literature. Early studies assessing the effect of treatment time tended to compare early medullary nail fixation of femur fractures to late fixation or non-operative fixation (i.e. skeletal traction) and tended to conclude that early fixation was beneficial1, 10, 16. In the modern era of orthopaedic trauma care, non-operative management of a femoral shaft fracture is not a therapeutic option and fractures are either definitively stabilized early or late by internal fixation (plate osteosynthesis or medullary nail fixation) or external fixation. This is done to allow early mobilization in order to prevent the complications of long-period recumbence including venous thromboembolic disease, pneumonia and decubitus skin ulcerations. Therefore, the conclusions of many studies may have been mixing the effects of operative management with any independent effects from treatment time.

Outcome assessment and reporting has been equally problematic for discerning possible treatment effects of choice of timing. Measures of outcome have ranged from the objective and rare (i.e. mortality) to the subjective and unreliably reported (Acute Respiratory Distress Syndrome, fat embolus syndrome, or pneumonia). Small sample sizes have led to inadequate power to detect differences in mortality forcing the majority of published studies to follow reported complications that often lack systematic or unbiased assessment. Several investigators have turned to reporting differences in length of stay (LOS) or some component of LOS (i.e. time spent on a ventilator or in the intensive care unit) as a surrogate for morbidity caused by the timing of definitive fracture care. Unfortunately, the causal interpretation of treatment time effect on LOS is complicated by whether tabulation starts from the time the patient enters the hospital and a delay in treatment is chosen, or from the time of definitive treatment. Arguments can be made in support of either approach, though most studies have taken the former approach and looked at total hospital and intensive care unit stay. The use of LOS is also subject to vagaries of clinical and institutional practice such as insurance-related contracts for transfer of patients to sub-acute levels of care and selection bias due to censoring by mortality. Without an objective or reliable method of outcome measurement, the effect of treatment time has been difficult to fully assess.

Confounding bias has been one of the greatest threats to valid inference from the data that have been reported on this subject. In any study of the causal effects of treatment, the potential for confounding by indication for exists42, 43. In the case of multi-system trauma, patients who are treated early typically have a better prognosis given less severe injuries than those who are treated later. This has the potential to bias results towards favoring early fixation. This phenomenon of confounding by indication has either been ignored or addressed with only crude methods of statistical adjustment in most studies on this subject to date.

The literature has also struggled with how to identify those patients who may be most susceptible to the effects, positive or negative, of early treatment. This is fundamentally a question of interaction where-by an investigator strives to understand whether the treatment of interest has potentially different effects among subgroups of patients defined by the presence or

4

absence of some characteristic. Tests for homogeneity of effects across strata of such extraneous characteristics have been classically described conditional on treatment or exposures received using stratification44, 45 as well as conventional regression46. Unfortunately, with regards to this recurrent investigation of patients with and without serious chest injury or head injury, no formal assessments of interaction have been made at all. Instead, improper comparisons of incidence of an outcome of interest have been compared. For example, in the stratified analysis performed by Pape25 discussed earlier, these authors reported a significant increase in the incidence of ARDS among chest injured patients who were treated with medullary nailing early (<24 hours from admission) versus those who were not chest injured (33% versus 7.7%) and a similar but not statistically significant difference in mortality (21% versus 4%). While these authors infer interaction by presence of chest injury, their study fails to contrast the effect of treatment time between subgroups defined by this associated injury. Ultimately, even well established methods for assessing interaction are conditional on treatment received by patients. The ideal test for interaction would be a comparison of the causal effects of treatment among all patient with or without a given treatment within strata of factor of interest, a parameter that has until recently been undefined47 and will be discussed in further detail in Chapter 3.

The multidimensional nature and urgency of treatment assignment have made it logistically difficult and ethically dubious to conduct randomized clinical trials to evaluate the effect of treatment timing on patient outcomes. Observational studies are therefore the primary source of data for the investigation of this research question. Yet the appropriate means by which to analyze such data for causal effects of early treatment have not been established. The majority of published studies have either reported crude (unadjusted) estimates or stratified their analyses by one or a handful of potentially confounding factors such as ISS or age. A few studies have used conventional regression techniques such as linear and logistic regression that “adjust” for one covariate at a time and ignore the joint distribution of confounding. Moreover, these “conventional” methods offer conditional associations that are difficult to interpret in the face of numerous covariates and interaction terms and will be reviewed in the following chapter. Hence, the observational nature of reports to date and their limited consideration of confounding severely limit any causal inferences that are desired by investigators and clinicians who look to them for answers.

Study question In an attempt to address methodological limitation discussed, I conducted a study with specifically defined parameters and analytical methods for estimation of causal effects with reference to the following two specific aims:

1. To estimate the association of early definitive stabilization of the femoral shaft fracture on in-hospital mortality in multi-system trauma patients via semi-parametric, “causal inference” methods.

2. To assess the causal effect of early definitive stabilization of the femoral shaft fracture on mortality among subgroups defined by the presence or absence of associated serious head, chest or other skeletal injury in order to identify those patients that are most sensitive to the effect of treatment time.

5

Chapter 2 : Conventional and Semi-parametric Approaches to Confounding and Covariate Adjustment in Observational Studies

Properly conducted randomized controlled trails (RCT) offer the highest level of evidence from clinical research investigating therapeutic interventions. Yet there are many scenarios where RCTs are inappropriate or impossible such as studies of rare conditions or those with prolonged induction or latent periods48; moreover, the generalizability of RCTs is often limited by strict inclusion and exclusion criteria. Non-randomized or observational studies can provide an important complementary source of information when RCTs cannot or should not be undertaken, provided that the data are analyzed and interpreted in the context of the many forms of bias to which they are prone. Clinicians and researchers trying to understand the literature pertaining to questions such as the optimal timing for femoral shaft fixation are inundated with claims of association of risk factors and assertions of benefit of certain treatments for a given condition. Understanding study design strategies and statistical methods of bias reduction are essential to interpreting data and determining what constitutes useful evidence.

In this chapter I will focus on causation and confounding bias in the context of observational orthopaedic outcome studies and discuss strengths and weaknesses of various design and analysis strategies that have been undertaken. I will start by discussing basic principles including relationship between a study sample and the target population. The notion of probability distributions will be presented as a means of understanding the two primary forms of statistical analysis, estimation and hypothesis testing. The counterfactual approach to causation will be introduced as a framework to understanding threats to valid inferences and examples will be discussed from the orthopaedic literature as well my own research to illustrate concepts.

Basics Principles of Statistical Inference The analysis of any clinical study is based on the principle of taking a sample of subjects

for the purposes of drawing inferences about a larger population of similar individuals (Figure 2.1). Taking a carefully selected sample that is randomly selected or representative of the population of interest is a powerful way to make inferences about the target population. However, going from population to a sample leads to some degree of uncertainty or margin of error because we are estimating without knowledge of entire population. We rely on mathematically defined probability distributions in order to quantify this uncertainty such as the Normal distribution for continuous data and Binomial distribution for categorical data. These distributions are based on parameters such as a mean and standard deviation. If the assumption is made that the observed data are a sample from a population with a distribution that has a known theoretical form, then it is reasonable to use parameters of that distribution (those observed) to calculate probabilities of different values occurring. This parametric approach to statistics is wide-ranging and ubiquitous in medical research49, 50. However, if these distributional assumptions are not realistic, then we may end up with results that are not valid. When data deviate from a so-called “Normal” pattern, non-parametric (or semi-parametric if there are some aspects of the distribution that can be assumed known) or distribution-free methods are required51.

6

Estimation and hypothesis testing. One of the primary objectives of any clinical study is to provide some numerical value expressing risk of outcome or of relative effect associated with a specific treatment or prognostic factor. Estimation covers a broad range of statistical procedures that yield the magnitude of risk or effect as well as the precision or confidence interval of that estimate52. Hypothesis testing concerns itself with understanding the likelihood of having observed a difference or association from data if no such effect exists in the population. The ubiquitous use of this likelihood, known as the P-values, leads to treating the analysis as a process of decision-making within which it is customary to consider a statistically significant effect as real and non-significant result to indicate no effect. Notice that this value gives no information about the magnitude of interest or the uncertainty of the result. For these reasons the approach based on estimation is often considered superior50, 53.

The threshold level at which one may consider a P-value to be statistically significant also depends on how many times one sample group is compared to another. The more times that one looks for a difference between two groups, the more likely one is to find a difference there that has occurred purely by chance (i.e. the Type I error rate increases).54 This is an important consideration when two groups defined by treatment received are compared with respect to multiple outcomes of interest, or multiple subgroup analyses are performed. This is ideally done as a way of using the data to generate new hypotheses for future study, rather than trying to use the same data to try to definitively answer multiple questions. When multiple comparisons are made and multiple hypothesis tests undertaken with the same data, the cut-off P-value for statistical significance should be lowered accordingly using a Bonferroni or similar adjustment method to guard against Type I error55.

Bias, Confounding, and Counterfactuals Gathering useful information from non-randomized studies requires a clear understanding

of the role of bias in the data and how it aught to be handled. While there are multiple described forms of bias that threaten the validity of a clinical research study56, most fall into one of three categories: selection bias, information bias, and confounding53. Selection bias is defined as a distortion of estimates that results from the way in which subjects are selected into the study sample and includes factors such as differential inclusion or diagnosis of persons in the study and loss to follow-up. For example, if likelihood of censoring is influenced by which treatment or implant a patient receives, analyzing only those that followed up will bias any relative estimate of effect. Information bias arises from systematic or non-systematic difference between groups in the way that data are obtained. This is a bias that may result from a non-blinded surgeon assessing outcomes of a particular treatment and, like selection bias, is also a threat to validity no matter what study design is used. Confounding bias represents a mixing of effects between the treatments of interest and associated extraneous factors that also impact outcome, potentially obscuring or distorting the relationship of interest. Confounding is of particular concern in observational studies, and surgical research specifically when causal relationships are investigated.

Despite the importance of understanding and proving causal relationships in scientific research, a formal definition has been debated for centuries. Greenland 57 has reviewed this chronology from the early definitions of the eighteenth century Scottish philosopher, David Hume through a statistical derivation based on potential outcomes by the early twentieth century statistician, Jerzy Neyman. These authors have established a causal exposure (A) as necessary

7

for the occurrence of a measured outcome (Y) under observed background circumstances such that had the cause (A) been altered, the effect (Y) would have changed. A causal effect, then, is a contrast between the outcomes of a single unit under different treatment possibilities (yik versus yi0). However, a single unit can only receive one treatment so such a contrast (yik - yi0) can never actually be made. Still, this theoretical framework using an unobservable comparator group provides the basis for the counterfactual approach to causality.

In empirical research, investigators can only sample from observed groups (say A and B) that receive different treatments (A= 0 or A=1) and compare the observed mean outcomes µA1 = E(Y=1|A=1) in group A, and µB0 = E(Y=1|A=0) in group B (see Table 2.1). A causal interpretation from the observed comparison requires that µA1 – µB0 equal µA1 – µA0 and therefore, µA0 = µB0. This implies exchangeability in the response distributions under homogeneous treatment assignment. This assumption is satisfied generally when treatment assignment is independent of outcome, the successfully randomized controlled trial being the ideal example of such a scenario. In RCTs, the randomization process will, on average, evenly balance both known and unknown confounders, and this guarantees the validity of the statistical test used. The randomization process makes it possible to ascribe a probability distribution to the difference in outcome that is not influenced by differences in any prognostic factor other than the intervention under investigation58. The chi-squared test for 2-by-2 tables and the Student’s t-test for comparing two means can be justified on the basis of randomization alone without making further assumptions concerning the distribution of baseline variables. In the absence of randomization, additional design and analysis methods are needed to account for those sources of bias arising from a lack of comparability or exchangeability between groups. Unfortunately, most these methods only account for known sources of bias and come with there own set of assumptions that can only partially be tested.

As mentioned earlier, confounding is a particular problem in non-randomized studies. Confounding can arises when patients selected for one treatment group are fundamentally different from the other group in their pre-treatment likelihood for the outcome of interest. In the notation introduced above, confounding is present when µA0 ≠ µB0. A patient characteristic or associated factor may be considered a confounder when: 1) it can causally affect the outcome parameter within treatment groups; 2) it is distributed differently among compared populations; and 3) is absent from the causal pathway of interest. It is important to note that confounding depends not only on single covariate differences between compared groups, but in the joint distribution of confounding factors. In surgery, treatment decisions are commonly made on grounds of certain overt and subtle factors related to prognosis or severity of disease. As discussed in Chapter 1, if patients with less severe systemic injuries are able to undergo fixation of their long-bone fractures at an earlier time than those with severe multi-system trauma, there may be factors other then timing of treatment that influence the lower rates of morbid complications among those treated early. This type of bias in studies of therapies has been termed confounding by indication or confounding by severity59 and is a major threat to validity of conclusions from non-randomized studies.

Controlling Confounding Control of confounding can be undertaken either in the design phase or the analysis phase

of a study, or in both. Design-based methods (randomization, restriction, matching) attempt to

8

make µA0 = µB0 in the study sample and analysis-based methods (stratification, regression) typically assert this equality but only within strata of a confounder or regressor. These methods as well as two more advanced multivariable methods that have been increasingly used in analysis of therapeutic studies, propensity score analysis and instrumental variable analysis, are discussed below. Restriction, matching, stratification, regression, and even propensity score-based methods work through the notion of fixing the level of one, or more, factors constant in order to study the variability in outcome specifically associated with a change in the treatment or risk factor of interest. The interpretation of such analyses is therefore appropriately described as conditional on holding other known variables constant (again, with the exception of instrumental variable analysis). Achieving sufficient control of confounding for estimation of stratum-specific causal effects is contingent on exchangeability of µA0 and µB0 within strata of the confounder or confounders57. Randomization assures sufficiency of confounding control through its probabilistic balancing of known and unknown confounders as described above. In the absence of randomization, causal inference becomes contingent on the identification of a set of baseline factors whose control will be sufficient. This is based on a priori subject matter knowledge and can be systematically illustrated in causal diagrams that depict the direction of causal relationships and allow for identification of minimally sufficient subsets of variables for control60. Figure 2.2 is a potential causal diagram depicting important variable relationships for estimation of the causal effects of treatment time on mortality in multi-system trauma patients with femoral shaft fracture. Restriction is simply an attempt to enforce homogeneity within the study sample by only including those subjects with the same value of one or more potential confounders, thereby removing any potential associations with outcome. This is probably the most effective approach in removing confounding by any one variable, however it estimates a different parameter than generally desired (i.e., only within a target population defined by the restriction) but the curse of dimensionality for high-dimensional data (lots of covariates) will render such an approach unfeasible. Far more commonly used in the orthopaedic literature are the use of matching, stratification, and multivariable regression.

Matching is a conceptually straightforward strategy whereby confounders are identified and subjects in the treatment groups are matched based on these factors so that they are in the end, “the same” with regards to these factors. Matching can either be done on a one-to-one basis (matched pairs), or based on frequencies (i.e. equal percentage of subjects in each group with confounder) and subjects can be matched with respect to a single confounder or multiple confounders. Matching can be used in both prospective and retrospective non-randomized study design (including case-control studies). For example, Ciminiello61 examined the impact of small incisions (<5cm) on a variety of outcomes including blood-loss, operative time, and post-operative complications in patients undergoing primary total hip arthroplasty (THA). In order to assure that the group of patients undergoing THA with small incision and the comparator group undergoing THA with a standard size incision were as homogenous as possible, the authors used a matched-pair cohort design. They matched 60 patients in each group on a variety of potentially confounding factors including age, sex, body mass index, American Society of Anesthesiologists Score, diagnosis (osteoarthritis), prosthesis, type of fixation, anesthesia, surgical approach and positioning and were unable to identify any significant differences in outcome between the two techniques. Matching is a useful method for improving study efficiency, especially with such known confounders measured on a nominal scale.

9

While matching is an effective way of balancing multiple confounders, it has several important limitations to consider. One is that it may be difficult or impossible to find exact matches between the two groups of patients, and this difficulty increases rapidly as the number of factors that one wishes to match on increases. While a better understanding of the relationship between baseline variables, including treatment, and outcome may come from an analysis of the entire cohort, matching may sacrifice significant numbers (and therefore, power) to study the relationships of interest (in essence, it capitalizes on no smoothness in the conditional distribution of the outcome given the covariates, allowing no “borrowing” across different strata via a model). One solution to this is matching within a reasonable range (for example, increasing this “borrowing” by allowing age + 5 years) within which differences of prognostic importance are not thought to exist. Another problem is that matching generally precludes the evaluation of the underlying relations between matching variable and exposure in a prospective cohort study and matching factor and outcomes in a case-control study. This is because of the sampling schemes (based on exposure for the prospective cohort study, and outcome for the case-control) and the way that balance is forced with respect to the matching factor in each of these designs. Finally, if matching is undertaken on variables that are not true confounders, a loss of statistical power can result; moreover, in a case-control study such “overmatching” can create a new bias62. Therefore, matching should be used cautiously and only on factors that are strong risk factors for the outcome of interest and thought to be differentially distributed between treated and untreated subjects (i.e. only on true confounders).

Stratification (one could consider as a less severe form of matching) provides another means by which to control confounding. Potentially confounding variables are identified and the cohort grouped by levels of this factor. The analysis is then performed on each subgroup within which the factor remains constant, thereby removing the confounding potential of that factor. If the estimates among stratified groups are relatively homogenous, they can be averaged into an estimate unconfounded by the stratification variable, usually by the Mantel-Haenzel method45. Bosse12 undertook a study to assess the impact of reamed intra-medullary nailing versus plate fixation of patients with femoral shaft fractures on several adverse outcomes including adult respiratory distress syndrome and multiple organ failure. Because the presence of thoracic injury could cause differences in surgical approach and impact the outcome of interest, this was considered a potential confounder and stratified analysis was undertaken. Table 2.2 shows the crude or unadjusted analysis of the cohort as well as the subgroup analysis stratifying on presence or absence of thoracic injury. The stratified analysis shows that confounding by thoracic injury did not significantly alter the relative risk of complication seen in the crude analysis. Because a very fine level of stratification would lead essentially to matching (treated to untreated), the use of the term relative to matching is somewhat arbitrary.

Stratification is a useful strategy when there are only one or two risk factors or confounders that allow easy grouping. It also allows for easy visualization of interaction or a difference in treatment effect across subgroups defined by a separate factor. For, example, if there were a difference in the effect of surgical treatment on complications depending on whether or not patients had thoracic injury in the study by Bosse12, this difference itself could be tested and reported as a separate finding of interest. The problem with stratified analysis (much like occurs in matching on a large number of potential confounders) is that it quickly becomes unmanageable and difficult to interpret when there are multiple confounders with multiple levels each. In this scenario, it is better to turn to multivariable regression.

10

Multivariable Regression for the adjustment of confounding factors is one of the most common techniques used in therapeutic and prognostic observational studies. Regression analysis is based on modeling the mathematical relationships between two or more variables that give an approximate description of the observed data. Regression models should not be thought of as explaining underlying mechanisms (i.e. statistical models are not reality), but rather as simplifications that are compatible with the data and provide us with some inference as to associations seen in the data. These models are usually additive in that an observed dependent variable (i.e. outcome of interest) can be explained by a model in which the effects of different influences or independent variables (i.e. treatment of interest as well as other predictors of outcome or confounding factors) are added. Though these models are not based on theory, they should in principle be fit in a manner that “learns” from data. However, most analyses are based on some variation of an arbitrarily chosen the generalized linear model:

h(E[Y|X])= α + β1X1 + β2X2 + . . . βpXp

where a function, h, of the conditional expectation (or mean value) of Y given X (or E[Y|X]) is an additive combination of an intercept (α) and (p) explanatory independent variables multiplied by their respective coefficients (β1 through βp); h is referred to as a link function and includes the identity link (linear regression), log and logit (logistic regression). Each coefficient represents an estimate of effect or risk depending on h (Examples: mean difference for linear regression and log odds ratio for logistic regression). Multivariable analysis allows one to estimate the association between dependent and independent variables, controlling for influence of other independent variables.

The exact model that is used typically depends on the type of outcome data one has. For example, Saleh63 performed a case-control study to evaluate the predictors of surgical site infections (a binary outcome) complicating total knee and total hip replacement. These authors used multivariable logistic regression to control for several demographic, peri-operative, and postoperative factors and found post-operative hematoma formation and persistent wound drainage to be the only significant associated risk factors. In a study looking for prognostic factors associated with re-operation after operative treatment of tibial shaft fracture, Bhandari64 used a proportional hazards analysis in order to assess multiple risk factors. After adjusting for over twenty possible variables, three predicted a relative increase in re-operation (open fracture, transverse pattern, and lack of cortical continuity). This form of analysis gives an estimate of effect that is analogous to the odds ratio from logistic regression called the hazard ratio, and can similarly be interpreted as a relative measure of risk of event associated with a unit change in a given predictor, holding other factors constant.

While the results of such multivariable analysis are commonly presented and accepted as absolute, the details of how the models were selected are rarely scrutinized. Readers may be led to assume the results are accurate when they may have been derived using inappropriate models. Control for confounding solely by regression obscures the ability to visualize the adequacy of overlap or exchangeability between treatment and control groups and relies on dubious model-based extrapolations. Regression models assume that there is no effect modification, or difference of effect between different levels of a confounder as discussed earlier in the stratification example. Unless an interaction term (usually a product of two predictor variables) is inserted to represent interaction, the model may remain naïve to such a relationship and falsely

11

conclude that the effect of treatment A on outcome Y is constant across all levels of factor W (i.e. gender). Assumptions are made when a model is fit (such as that of multivariable Normality for linear regression or that the relative contribution of each factor is constant over time for a proportional hazards model) and it is important that these assumptions are verified and the overall fit of the model to the data be assessed. Model fit is determined in terms of the amount of variability in the data explained by the model and how well the model predicts individual outcomes for a given observation. There are many “diagnostic” procedures for assessing the most valid and best fitting regression model65, 66. Unfortunately, the phenomenon of “wrong model bias” is underappreciated in the orthopaedic literature as is it in epidemiological analysis in general. Given the availability of semi-parametric methods, machine learning algorithms and interesting association parameters that do not depend on the arbitrary specification of a specific regression model, there is enormous improvement possible in the standard of practice by both choosing interesting relevant parameters and leveraging the information contained in the data, while not having the results tainted by arbitrary (and often consequential) assumptions on a model with no theoretical or empirical basis.

The multidimensional nature and urgency of treatment assignment in the acute trauma setting have made it logistically difficult and ethically dubious to conduct randomized clinical trials to evaluate the effect of treatment timing of femoral fracture on patient outcomes. Observational studies are therefore the primary source of data for the investigation of this research question. Yet the appropriate means by which to analyze such data for causal effects of early treatment have not been established. Conventional regression techniques such as linear and logistic regression that “adjust” for one covariate at a time ignore the joint distribution of confounding. Moreover, these “traditional” methods offer conditional associations that are difficult to interpret in the face of numerous covariates and interaction terms. Propensity score analysis has features addressing some of these problems.

Propensity score analysis67 is another approach to controlling for confounding through the generation of a score that “summarizes” the confounding by multiple variables. The propensity score is defined as a balancing score that is a function b(X) of the covariates (X) such that the conditional distribution of X given b(X) is the same for treated (A=1) and control (A=0) units. This form of analysis requires a two-stage approach in which first, rather than modeling the outcome as a function of multiple risk factors, the probability of being treated is modeled taking into account any possible confounding variables. This probability, possibly generated by a logistic regression model, is the propensity score and ranges from 0 to 1. Once the propensity score is generated for each subject, it can be used to match them (usually within some narrow range), perform stratified analysis on levels (such as deciles) of the propensity score, or it can be inserted into multivariable regression along with the treatment variable in estimating the outcome. Propensity score analysis tends to produce unbiased estimates of the treatment effects when treatment assignment is strongly ignorable (treatment assignment is considered strongly ignorable if treatment assignment, A, and the response, Y, are know to be conditionally independent given the covariates, X). More simply stated, this is the assumption of no unmeasured confounding.

While orthopedic investigators have been slower to apply propensity score analysis than researchers in the medical specialties such as cardiology and cardiothoracic surgery fields in particular68, there are some examples. McHenry and colleagues69 used propensity score analysis

12

to control for confounding by indication for timing of treatment of surgically treated thoracic and lumbar spine fractures in order to assess risk factors for respiratory failure following operative stabilization of these injuries (i.e. this was a prognostic study but used propensity score methods to adjust for the strong likelihood of differential treatment-time assignment due to injury severity). Subjects were matched based on their propensity score for treatment before 48 hours from injury. Logistic regression was then performed on the matched set, identifying age, Injury Severity Score, Glasgow Coma Score, presence of blunt chest injury, and timing to surgery more than 2 days as independent risk factors for respiratory failure. By matching on the propensity score, these investigators were able to limit bias due to an important surgeon controlled risk factor in assessing the relative importance of multiple prognostic factors.

By estimating the treatment mechanism, propensity score analysis offers several insights into the data and theoretical advantages over conventional techniques of multivariable adjustment. First, propensity scores allow one to see the degree to which likelihood of treatment differs between two groups and allows one to assess how comparable treatment groups actually are (i.e. the two groups should have fairly similar distributions of propensity scores to make the comparison tenable). Preliminary analysis from my study, further described in Chapter 3, showed qualitatively similar distributions of the estimated propensity score among multi-system trauma patients whose femoral shaft fractures were treated before and after 12 hours from admission (Figure 2.3). Second, by matching or stratifying subjects on the basis of their likelihood of treatment, one can intuitively see how selection bias is countered because comparisons are made only among those equally likely to have received treatment, as in a randomized controlled trial. From preliminary analysis just mentioned, stratification by decile of the estimated propensity score shows remarkable similarities within strata for at least three variables thought to be strong confounders of the treatment time-mortality relationship (Table 2.3).

Other advantages of this strategy over standard regression include confounding control that is more robust to modeling assumptions because one can be less parsimonious about numbers and combinations of terms to optimize model fit in estimating the propensity score70 and propensity score analysis has shown superior performance in the setting of low numbers of events per confounder adjusted for (less than eight events per confounder)71. Unfortunately, like all but instrumental variable methods, propensity score analysis is no more immune to threats to validity caused by unknown or unmeasured confounders67, 72. Also, two recent systematic reviews have not shown significant differences in estimates from studies performing side-by-side conventional multivariable and propensity score analyses73, 74. While the use of propensity score analysis is growing quickly among many fields of research in medicine, guidelines for proper use of this methods are lacking75. In addition, because the outcome is not used in the selection of the model for the propensity score, the approach is inherently inefficient (e.g., consider a variable which is a strong predictor of treatment but unrelated to the outcome, which is sure to be selected in the propensity score model and yet damage the efficiency of the resulting estimator while not contributing to reduction in the bias). In addition, because only the untreated can be matched with a treated subject, under assumptions the propensity score estimates a specific parameter, which does not apply to the entire target population but instead the association only among the treated. Distance76 and genetic matching77 algorithms are equivalent alternatives to matching on the propensity score.

13

The instrumental variable approach to bias and confounding in medical research has been used frequently by econometricians for decades but only recently been implemented in health research78-80. Health economists now commonly use this method to examine questions about quality and distribution of care when using administrative data where many confounders are potentially missing. The theoretical advantage of instrumental variable methodology in the analysis of non-randomized therapeutic studies is that it offers the possibility of controlling of both known and unknown confounding and is therefore appealing when the threat of unobserved or unobservable confounders looms large. The idea is as follows: if a variable (the instrumental variable) can be identified that has the ability to cause variation in the treatment of interest and has no impact whatsoever on outcome (other than that which passes directly through its association with treatment), then one can estimate how much the variation in the treatment is induced by the instrument and only that induced variation affects the outcome. Figure 2.4 provides a schematic diagram of this prerequisite relationship for the identification of a useful and valid instrument. Instrumental variables can be thought of as achieving pseudo-randomization. The RCT is a special case where the random number assignment (say a fair coin toss) is the instrumental variable inducing variation in the outcome variable.

Like propensity score analysis, examples of the use of instrumental variable analysis are rare in the orthopaedic literature. McGuire and colleagues81 used a large Medicare data set to examine the controversial topic of impact of timing of fixation of hip fractures on mortality. Acknowledging the fact that there are likely prognostic factors beyond those measured in such a large administrative data set, these investigators chose day of the week grouping (Saturday through Monday vs. Tuesday through Friday) as an instrumental variable by which to pseudo-randomize the cohort. The authors site evidence showing that day of the week is a strong predictor of delay to operative hip fracture treatment and assume that day of the week that one breaks one’s hip should have no independent influence on mortality or association with other confounders such as presence of co-morbidities. The instrumental variable analysis showed an increased risk of mortality (risk difference 15%, p=0.047) among those undergoing surgery more than two days from admission. It is likely that this methodology will grow in popularity among researchers trying to draw unconfounded estimates of effects of similar health care decision from large datasets for clinical practice and policy-making reasons.

While the thought that one can get around unobserved (and therefore uncontrolled for) confounding in non-randomized studies is appealing, there are certain important limitations for using these methods to establish causality. First, indentifying an instrumental variable that meets the assumptions of no association with outcome, independent of treatment, is difficult. Because this assumption is not directly testable, there must be general consensus that the instrumental variable is tenable. In comparing instrumental variables with standard multivariable adjustment or propensity score techniques, one is trading the assumption just mentioned for the assumption of no unmeasured confounding which is also not directly testable. The other important consideration when interpreting the result of an instrumental variable analysis, assuming the instrumental variable has been chosen correctly, is that the effect measured only applies to those whose treatment was affected by the instrumental variable. In the study by McGuire81, 15% increase in risk of mortality associated with delay of surgery applies only to the patient whose treatment timing was influenced by the day of the week they were admitted. This so called marginal patient82 within a cohort study, is important to distinguish from the entire study sample to whom any average treatment effect can be inferred in a randomized controlled trial. In general,

14

there will also be other required restrictions (such as no non-compliers among the untreated) and these methods, because they are estimating association parameters in a very large model, require enormous sample sizes to be practical statistical tools.83 Thus, their popularity in econometrics where enormous populations are often measured for the outcomes and variables of interest.

Comments Non-randomized studies continue to provide an important method for clinical

investigation in orthopedic surgery in settings were randomized controlled trials are not feasible and when increased generalizability of findings is desired. Confounding bias represents a major obstacle to drawing valid inferences and has been described here from a counterfactual approach to causal inference. Design and analysis tactics for control of confounding have been elaborated on as to their respective strengths and weaknesses. Restriction, matching and stratification are useful but inefficient means when dealing with multiple confounding factors. Conventional multivariable adjustment offers the power to adjust for multiple confounders at the same time, but cannot give causation unless factors like appropriate temporal ordering of predictor and outcome are assured, and there are no unaccounted for confounders in the analysis. While propensity score analysis offers a more plausible accounting for the multivariable nature of confounding and balancing of confounding by indication, causal interpretation from such is still limited by the similar assumptions. Instrumental variable analysis depends on universal acceptance of the instrumental variable. The ensuing chapters will describe an approach novel to the orthopaedic surgery literature, using marginal structural models47 to explore causal effects of treatment time on mortality in multi-system trauma patients with femoral shaft fracture.

15

CHAPTER 2 FIGURE LEGEND:

Figure 2.1: Target population, study sample, and statistical inference. (Reprinted with permission, Journal of Bone and Joint Surgery-American Edition)84

Figure 2.2: Causal diagram depicting the relationships between treatment time of multi-system trauma patients with a femoral shaft fracture and mortality as well as known and unknown potentially confounding covariates.

Figure 2.3: Distribution of the propensity score among the early (<12 hours) and late (>12 hours) treated groups of multi-system trauma patients with femoral shaft fracture. Propensity score estimated using a logistic model including Injury Severity Score, Glasgow Coma Scale, age, time of arrival, number of serious extremity/pelvic or head/neck injuries, number of femoral fractures, presence of cardiac or cerebro-vascular co-morbid condition, hospital teaching status, American College of Surgeons Level-1 designation.

Figure 2.4: Assumptions of an appropriate instrumental variable (IV): 1) IV must be associated with treatment; 2) IV must have no association with outcome, other than through its influence on treatment. * Confounders here represent both those observed and unobserved. (Reprinted with permission, Journal of Bone and Joint Surgery-American Edition)84

16

Figure 2.1

17

Figure 2.2

18

Figure 2.3

19

Figure 2.4

Outcome

Instrumental Variable

Treatment

Confounders*

20

Table 2.1: Contingency table of observed outcome (Y) among groups of individuals defined by

treatment (A). µA1 is the mean outcome or incidence in those receiving treatment (A=1), µB0 is

the mean outcome in those not receiving treatment (A=0).

Y=1 Y=0

A=1 a b a/(a+b)= µA1

A=0 c d c/(c+d)= µB0

21

Tabl

e 2.

2: S

tratif

ied

Ana

lysi

s of N

ail v

ersu

s Pla

te F

ixat

ion

and

Dev

elop

men

t of A

dult

Res

pira

tory

Dis

tress

Syn

drom

e or

Mul

tiple

O

rgan

Fai

lure

from

Bos

se e

t al17

* Ris

k D

iffer

ence

s bet

wee

n st

rata

are

not

sign

ifica

ntly

diff

eren

t, th

at is

, no

inte

ract

ion

(Tes

t for

Het

erog

enei

ty P

-val

ue=0

.96)

. ^ G

iven

the

abse

nce

of in

tera

ctio

n, a

poo

led

sum

mar

y va

lidly

est

imat

es th

e R

isk

Diff

eren

ce a

djus

ting

for C

hest

Inju

ry.

+ P-va

lue

test

ing

the

null

hypo

thes

is o

f no

asso

ciat

ion

betw

een

treat

men

t met

hod

and

outc

ome,

stra

tifyi

ng b

y ch

est i

njur

y an

d as

sum

ing

no in

tera

ctio

n.

C

hest

Inju

ry

N

o C

hest

Inju

ry

To

tal C

ohor

t

Nai

l Pl

ate

N

ail

Plat

e

Nai

l Pl

ate

Dev

elop

ed A

dult

Res

pira

tory

Dis

tress

Sy

ndro

me

or

Mul

tiple

Org

an

Failu

re

5 2

4

1

9 3

Num

ber A

t Ris

k 11

7 10

4

118

114

23

5 21

8 R

isk

0.04

3 0.

019

0.

033

0.00

9

0.03

8 0.

014

Ris

k D

iffer

ence

(9

5% C

onfid

ence

In

terv

al)

0.02

4 (-

0.02

2, 0

.069

)*

0.02

5 (-

0.01

2, 0

.062

)*

0.02

4 (-

0.00

4, 0

.054

)

Sum

mar

y R

isk

Diff

eren

ce (9

5%

Con

fiden

ce In

terv

al)

0.02

4 (-

0.00

5, 0

.053

)^

P =

0.11

+

22

Tabl

e 2.

3: B

alan

cing

of t

hree

cov

aria

tes h

ighl

y as

soci

ated

with

bot

h tre

atm

ent t

ime

and

mor

talit

y am

ong

the

stud

y sa

mpl

e of

mul

ti-sy

stem

trau

ma

patie

nts u

nder

goin

g fe

mor

al sh

aft f

ixat

ion.

GC

S –

Gla

sgow

Com

a Sc

ale;

Age

(yea

rs);

ISS

– In

jury

Sev

erity

Sco

re.

Col

umns

repr

esen

t dec

iles o

f the

pro

pens

ity sc

ore.

Pro

pens

ity sc

ore

estim

ated

usi

ng a

logi

stic

mod

el in

clud

ing

Inju

ry S

ever

ity S

core

, G

lasg

ow C

oma

Scal

e, a

ge, t

ime

of a

rriv

al, n

umbe

r of s

erio

us e

xtre

mity

/pel

vic

or h

ead/

neck

inju

ries,

num

ber o

f fem

oral

frac

ture

s, pr

esen

ce o

f car

diac

or c

ereb

ro-v

ascu

lar c

o-m

orbi

d co

nditi

on, h

ospi

tal t

each

ing

stat

us, A

mer

ican

Col

lege

of S

urge

ons L

evel

-1

desi

gnat

ion.

E -

Early

trea

tmen

t (<1

2 ho

urs f

rom

adm

issi

on);

L –

Late

trea

tmen

t (>1

2 ho

urs f

rom

adm

issi

on).

C

ells

con

tain

the

mea

n va

lue

for r

espe

ctiv

e va

riabl

e at

that

dec

ile o

f the

pro

pens

ity sc

ore

and

treat

men

t ass

ignm

ent (

Early

or L

ate)

. Tr

eatm

ent g

roup

s co

mpa

red

for e

ach

varia

ble

at e

ach

deci

le u

sing

the

Wilc

oxon

rank

-sum

test

. H

ighl

ight

ed c

ells

repr

esen

t the

onl

y co

mpa

rison

with

a

test

stat

istic

p<0

.05.

1

2

3

4

5

6

7

8

9

10

E

L E

L

E L

E L

E L

E L

E L

E L

E L

E L

GC

S 28

95

85

14

2 96

12

7 12

9 12

3 14

0 12

8 16

5 11

1 15

7 13

8 21

5 82

22

4 82

26

5 57

Age

38

.6

39.5

35

.3

38.1

35

.4

35.9

31

.5

33.2

34

32

.2

31.9

33

.4

32.2

31

.1

30

30.4

28

.8

28.6

28

.4

25.2

ISS

36

36

29

29

27

27

26

26

26

26

25

23

24

23

23

24

22

22

21

23

23

24

Chapter 3: Estimation of marginal structural models to estimate causal effects of timing of treatment of femoral shaft fracture

The prior chapters have introduced the study question in the context of extant literature and provided an overview of confounding bias and methods currently used to minimize it in observational studies of orthopaedic interventions. In this chapter I explain an application of marginal structural models to study the effect of treatment timing on mortality in multi-system trauma patients.

Study Cohort and Data Source I derived the study cohort from the National Trauma Data Bank (NTDB) version 5.0 which draws from 567 trauma centers from around the United States85 and contains nearly one million incident trauma cases that occurred over a five year period between January 1st, 2000 and December 31st, 2004. The NTDB contains pre-hospitalization and hospitalization data compiled from medical records during admission—including injury characteristics, co-morbidities, inpatient treatments, and outcomes—that are submitted to the American College of Surgeons for quality control and maintenance.

Patients were included in the study sample if they: 1) had closed or open fracture(s) of the femoral shaft identified by an International Classification of Diseases, 9th Clinical Modification (ICD-9CM) diagnostic code of 821.0 (fracture femur shaft/not otherwise specified—closed), 821.01 (fracture femur shaft—closed), 821.1 (fracture femur shaft/not otherwise specified—open), or 821.11 (fracture femur shaft—open); and 2) had an International Classification of Diseases - derived Injury Severity Score (ISS)18, 86 of greater than or equal to 15; and 3) were 16 years of age or older; and 4) underwent a “definitive” treatment procedure involving internal fixation of the femur identified by an ICD-9CM procedure code of 78.55 (internal fixation—femur), 79.15 (closed reduction—internal fixation femur), or 79.35 (open reduction—internal fixation femur). Patients were excluded from the study sample for any of the following reasons: 1) the patient was received in transfer or not admitted on the day of injury; 2) the record lacked information on time from admission to definitive fracture fixation, mortality status, or length of hospitalization; 3) there was an associated burn; or 4) the patient did not have the fracture definitively fixed within two weeks of admission.

Because of the potential for treatment selection bias in an observational study, only subjects with complete information on potentially relevant confounding covariates were included. An a priori list of 39 baseline covariates from over 100 available in the NTDB 5.0 was generated based on the literature and clinical suspicion as potential confounders of the relationship between treatment time and mortality and included the following: age, sex, race, maximum Abbreviated Injury Score (AIS)86 for each of the six anatomic regions (head/neck, face, chest, abdomen, extremity/pelvis, and skin), New Injury Severity Score 19, 21, Glasgow Coma Score on arrival, first systolic blood pressure, blunt versus penetrating trauma, open versus closed femur fracture, number of femur fractures, number of serious extremity/pelvic injuries (AIS > 3), alcohol or drugs present at the time of admission, year of incident, time of arrival at hospital (day divided into four quarters), modified Charlson co-morbidity index87 and the presence or history of specific co-morbidities at admission (coronary artery disease including

25

heart failure; chronic obstructive pulmonary disease; stroke; hematologic disease including coagulopathy; diabetes; pregnancy; or hepato-renal disease or failure). Hospital characteristics include mean number of femoral shaft fractures admitted to the treating facility per year, teaching status, public versus private, American College of Surgeons trauma center level, and geographic region of the United States (Northeast, Midwest, Southern, Western). A subset of these covariates (Wm ) was then identified that was associated with mortality (p < 0.2) using appropriate non-parametric bivariate tests of association adjusted for multiple testing with the Benjamini—Hochberg procedure88. Observations with missing values in Wm were excluded in order to obtain a final sample with complete data in all measured potentially confounding covariates for analysis.

Treatment Variables This study asks whether the timing of definitive treatment of femoral shaft fracture affects mortality. In order to provide clinically relevant categories for treatment beyond a simple dichotomy at 12 or 24 hours, 5 time periods were chosen a priori, based on commonly used cut-points from the literature27, 39, 40: t0 < 12 hours, 12 < t1 < 24 hours, 24 < t2 < 48 hours, 48 < t3 < 120 hours, t4>120 hours. While 24 hours was the most commonly used threshold before which “early” treatment has been described in prior studies, I defined the referent group (t0) as treatment within the first 12 hours in order to represent those subjects most likely to have been inadequately resuscitated and in a state of occult hypo-perfusion30, 89. I hypothesize that an additional physiologic stress from definitive fracture surgery in such patients could activate an adverse systemic response leading to end-organ injury, multi-organ failure and excess mortality when compared to those treated later when adequate resuscitation is more likely to have been achieved.

Each procedure in the dataset is recorded with a number of days, hours and minutes after admission when it was performed. Using this information, I calculated the time from admission to definitive fracture fixation, as defined above using ICD-9 procedure codes. By excluding patients transferred from other facilities or admitted one or more days after the injury, I assumed the calculated treatment time level was a reliable surrogate for time from injury, underestimating the true elapsed time only by the time required to reach hospital.

Main Outcome Measure In-hospital mortality was the sole outcome. While other studies of treatment time effect have focused on Acute Respiratory Distress Syndrome or multi-organ failure, the reporting of such outcomes is inherently subjective and prone to measurement error90. Moreover, one could not expect that such outcomes would be validly or reliably measured over such a large number or diverse range of treating centers. Other surrogate measures such as length of hospital stay are notoriously positively skewed91, 92, making analysis of the effect of timing difficult to analyze and interpret. In the absence of patient-centered outcomes (i.e. health-related quality of life) and systematic assessment of morbid events after treatment, mortality is the most objective, and therefore likely valid, endpoint. While potentially lacking sensitivity in detection of morbid consequences of treatment, mortality can be considered a very specific surrogate for clinically important morbidity.

Statistical Methods The primary analytic strategy was based on marginal structural models (MSM) using the

inverse probability of treatment-weighted (IPTW) estimator93. This method uses the reciprocal

26

of the conditional probability of a subject receiving an assigned treatment given other covariates, as a means of confounding control in order to determine the effect of treatment. Causal effects are defined within the counterfactual framework of outcomes that would have been observed had subjects been assigned to each of the several possible treatments of interest. In order to study the causal effect of the timing of surgery on the risk of mortality, we would like to compare the risk of mortality corresponding to the hypothetical situation that each patient is assigned to receive definitive fixation after more than 12 hours P[Y(1)=1], to the risk corresponding to the hypothetical situation that each patient has definitive fixation within 12 hours P[Y(0)=1]. In the setting of a perfect experiment where patients could be perfectly randomized to treatment before or after 12 hours from admission, a possible estimate of the causal effect of interest is given by the observed relative risk of mortality, P[Y=1|A=1]/P[Y=1|A=0]. Since the relationship between the timing of surgery and subsequent clinical outcomes is confounded by baseline covariates such as age, injury severity or co-morbidities, the patients in our sample who were operated on after more than 12 hours may not be representative of the entire population of patients. Therefore, the risk of mortality among these patients, P[Y=1|A=1], cannot be used as an estimate of the true risk of mortality P[Y(1)=1] had every patient been assigned to surgery after more than 12 hours.

Robins47 introduced a class of estimators that address this problem through a straightforward weighting approach. Like propensity score methods67, 72, these inverse-probability-of-treatment-weighted (IPTW) estimators make use of an estimate of the treatment mechanism. IPTW estimators use the probability that a given subject (i) would have received her observed treatment (Ai) given her baseline covariates (Wi). They then weight each observation by the inverse of this probability, P(Ai|Wi), so that each subject’s weight is:

(1) wi = 1 / P(Ai|Wi)

This creates a new pseudo-sample in which treatment assignment is independent of the baseline covariates, making it straightforward to estimate P[Y(1)=1] and P[Y(0)=1] by fitting of a saturated logit model. In the setting of a categorically defined treatment time, t0 through t4, the following model was fit with corresponding indicator variables (Ia) in order to estimate the risk of mortality at each later interval versus the baseline (definitive fixation within 12 hours):

(2) logit(P(Y(a)=1))= B0 + B1*I(a=t1) + B2*I(a=t2) + B3*I(a=t3) + B4*I(a=t4)

In order to study the possibility of a different causal effect (interaction) of treatment time separately for subpopulations defined by a variable such as presence versus absence of a serious chest injury (Abbreviated Injury Score <3 versus >3), P[Y(1)=1]/P[Y(0)=1] could be calculated within each level of chest injury severity and compared. Interaction by baseline factors (Vj) can be assessed using the following conditional saturated weighted logit model with treatment by covariate product terms:

(3) logit(P(Y(a)=1|Vj))= B0 + B1*I(a=t1) + B2*I(a=t2) + B3*I(a=t3) + B4*I(a=t4) + B5*Vj + B6*I(a=t1)*Vj + B7*I(a=t2)*Vj + B8*I(a=t3)*Vj + B9*I(a=t4)*Vj

However, because of the complexity of clinically interpreting interaction by multi-level assigned treatment assignment, loss of statistical efficiency, and in order to test the hypothesis that delaying treatment beyond 12 hours was the main threshold beyond which the benefits in delay

27

including adequate resuscitation are realized, a second set of weights were estimated using a model of binary treatment assignment dichotomized at 12 hours. Therefore, following conditional saturated weighted logit model with a treatment by covariate product term was used for assessment of interaction:

(4) logit(P(Y(a)=1))= B0 + B1*I(a=t1-t4) + B2*Vj + B3*I(a=t1-t4)*Vj

Unlike propensity score analysis67, 72, IPTW can easily be adapted for use with multi-level categorical treatment assignment. The likelihood of treatment within one of the five time categories is estimated using data-adaptive model selection that allows for complex functions of the variables to predict treratment (via splines and their multiplicative interactions) 94 and goodness-of-fit of the model was assessed using the Hosmer-Le Cessie test 95. IPTW estimators only give consistent estimates of the parameters of interest if the treatment mechanism itself is estimated consistently. Since a mis-specified parametric model for the treatment mechanism will lead to inconsistent estimation of the treatment mechanism and thus inconsistent estimation of the causal parameters of interest, assuming an a priori functional form was avoided and instead a data-adaptive model selection algorithm was employed that chooses the functional form based on the information that is available in the data at hand. For this purpose, a model selection approach that is based on flexible polynomial spline functions was used and includes testing of all one-way interaction terms between candidate covariates94. The consistency of the estimates based on the selected model is not affected by adding additional terms for variables to this model that might in truth not be related to the treatment variable. If these variables are, however, associated with the outcome of interest, including them in the treatment mechanism model will adjust for empirical confounding by any of these variables and thus increase the efficiency of the IPTW estimates96. Based on this observation, the selected model was complemented with main-effect terms for all baseline covariates in Wm that had not already been selected by the data-adaptive model selection algorithm. The data-adaptively selected treatment models, before addition of main-effect terms for other predictors of the outcome, are given in Table 3.1 and Table 3.2 for the entire population and interaction assessments, respectively.

In order to contrast IPTW estimates with a conventional method of covariate adjustment, a logistic regression model was used to estimate the mortality risk as a function of timing of definitive fixation (coded as five indicator variables with t0 as the referent) and baseline covariates. Because, marginal structural models give population level (marginal) estimates and conditional models (such as logistic regression) provide pooled (conditional when covariates are included) estimates, direct comparisons are not appropriate. Nevertheless, because investigators often ascribe marginal inferences to these conditional estimates, this comparison was made for heuristic purposes. A backwards deletion approach was used to arrive at a parsimonious description of these relationships starting with a full model that contained main-effect terms for the treatment variable as well as Wm. The baseline covariate with the largest p-value was then eliminated and the model refitted. Since the focus of this analysis was on estimating the effect of time until fixation on mortality, treatment time indicators were not considered for deletion from the model. This step was then repeated until all of the confounders had p-values smaller than 0.05. Model goodness-of-fit was assessed using the Hosmer-Le Cessie test 95.

Confidence intervals (CI) and p-values were non-parametrically obtained using a clustered bootstrap approach97, 98 that takes into account correlations between patients treated at the same hospital. Observations from the same hospital are likely to be correlated with each

28

other. In the presence of such correlated data, standard mean estimates are still consistent even though they may no longer be efficient. Therefore, we do not need to take into account the correlation among patients from the same hospital for the purpose of obtaining point estimates. Confidence intervals provided by these standard methods, however, are no longer reliable in the presence of correlated data; they would be based on the assumption of a sample of independent observation and would thus tend to overestimate the information available in the data, resulting in confidence intervals that are too small. Therefore, confidence intervals were estimated using the following “clustered” bootstrap approach97. During each bootstrap iteration, a sample of size N was drawn with replacement from the pool of N hospitals in the dataset to obtain a bootstrap dataset that consists of all patients from the selected hospitals. This is a slight modification to the standard bootstrap approach for independently sampled observations, which would prescribe drawing samples of size n with replacement from the pool of n patients in the dataset. The same modified bootstrap approach was followed to obtain p-values by applying the general resampling-based methodology developed by Pollard and van der Laan98. Results were considered to be statistically significant at p<0.05. Interaction was assessed by obtaining bootstrap-based p-values for the hypothesis that the relative risk of mortality associated with treatment time was identical for both levels (not serious and serious) defined by each of the four potential effect modifying associated injuries discussed above. Because interactions of this kind are difficult to identify, the level at which statistical significance is considered was relaxed to p<0.2. All analyses were carried out in R version 2.3.199, 100. Data-adaptive model selection based on polynomial splines was performed using the polyspline package; multinomial regression models were fit using the multinom() function in the nnet package; the Hosmer-Le Cessie test was carried out using the resid() function of the Design package.

The results and testing of implicit assumptions of these methods are presented in Chapters 4 and 5.

29

Covariate 12-24hrs 24-48hrs 48-120hrs >120hrs

New Injury Severity Score

1.00 1.01 1.03 1.04

Arrival between 6am and 12pm

0.23 0.77 0.70 0.94

Cardiac Co-morbidity 1.34 1.70 2.49 2.14

Number of serious associated extremity injuries

0.69 0.80 0.91 0.91

Maximum AIS Head Region

1.02 1.02 1.03 0.98

Bilateral Femoral Fractures

0.29 0.77 0.57 0.81

Age 1.01 1.01 1.01 1.01

Teaching Hospital 1.53 0.91 0.65 1.07

Glasgow Coma Score 1.00 0.96 0.95 0.92

Treated at Level 1 Trauma Center

0.95 1.36 1.42 1.40

Cerebro-vascular Co-morbidity

1.24 2.51 1.04 0.00

Hospitals from Northeast Region

1.23 1.15 1.76 0.93

Table 3.1: Summary of the estimates of the treatment mechanism that are used for the categorical mortality analysis. The entries in the first column give the factor by which the relative risk of being assigned to surgery between 12 and 24 hours rather than to surgery within 12 hours changes for each unit increase in the covariate under consideration. For covariates with factors greater than 1.00, the relative risk of being assigned to the former treatment category rather than the latter one (reference category) thus increases as the value of the covariate increases. Entries in the remaining columns are interpreted accordingly. The polychotomous logistic regression model selected for the treatment mechanism consisted entirely of main-effects terms with the Hosmer – Le Cessie test showing acceptable model fit for all time frames except for > 120 hours (p-values of 0.24, 0.56, 0.24, and 0.028 respectively).

30

Covariate OR 95% CI

Arrival between 6am and 12pm

0.52 (0.43, 0.63)

New Injury Severity Score

1.03 (1.02, 1.04)

Cardiac Co-morbidity 1.69 (1.37, 2.07)

Number of serious associated extremity injuries

0.77 (0.70, 0.84)

Age 1.01 (1.01, 1.02)

Glasgow Coma Score 0.97 (0.95, 0.99)

New Injury Severity Score (knot 50)a

0.95 (0.92, 0.98)

Maximum AIS Head Region

1.01 (0.96, 1.07)

Bilateral Femoral Fractures

0.59 (0.30, 1.14)

Teaching Hospital 1.07 (0.91, 1.25)

Treated at Level 1 Trauma Center

1.18 (1.01, 1.39)

Cerebro-vascular Co-morbidity

1.54 (0.33, 7.24)

Hospitals from Northeast Region

1.25 (0.95, 1.64)

Table 3.2: Summary of the estimated treatment mechanism that is used for the binary mortality analysis (investigation of interaction). The entries give the estimated odds ratio for being assigned to surgery after 12 hours versus surgery during the reference time frame. a Apart from main-effect terms for those covariates which are found to be predictive of mortality, this model contains a spline function for the New Injury Severity Score with a knot at 50, suggesting that the dependence of treatment assignment probabilities on this variable may differ depending on whether the New Injury Severity Score is above or below 50. Hosmer – Le Cessie test showed acceptable goodness-of-fit (p=0.54).

31

Chapter 4: Testing the Experimental Treatment Assignment Assumption

There are three basic assumptions required to estimate parameters for any form of marginal structural model (inverse probability of treatment-weighted, G-computation, or double robust)101. The first is the sequential randomization assumption requiring that treatment be randomized with respect to the set of all possible counterfactual outcomes, given all prior covariates including treatments (as for time-dependent or repeated treatment regimes). This necessitates no unmeasured confounding be present. This assumptions is similar to Rosenbaum and Rubin’s “strongly ignorable treatment assignment” assumption for use of the propensity score67. To the extent that unknown confounding goes unadjusted for, or is uncorrelated with measured confounders, estimates may be biased. This assumption is ubiquitous in observational studies and cannot be empirically tested. The second is the assumption of time ordering essentially requiring that treatment precede outcome, which is safely preserved in this “point-treatment” study of treatment time effect on mortality. The third is the consistency assumption that bases valid inferences on the existence of counterfactual variables representing the set of subject specific outcomes had the subject, contrary to fact, had a treatment contrary to the one observed. This implies that the observed data are a subset of the full data; the rest can be considered missing.

Valid estimation of causal effects using inverse probability of treatment-weighted (IPTW) estimation, or any of the estimators just mentioned, obliges an additional experimental treatment assignment (ETA) assumption (this has also been called the positivity assumption)102,

103. It requires that there are no values of the baseline covariates for which treatment is assigned in a deterministic fashion. If, for example, patients with head injury scores of 3 or more are always operated on after more than 12 hours, none of the patients in the re-weighted sample who were operated on within 12 hours will have head injury scores of 3 or more, leading to a biased estimate of the corresponding risk of mortality. In this case, the comparison of interest cannot realistically be made among the population at hand because one group of patients could never realistically have received the treatment that they did not receive based on one (severity of head injury) or more characteristics. The ETA assumption is met in theory by study design: there were no conditions under which the probability of a subjects treatment time assignment within any of the five groups was set to 0 or 1. However, a situation where a certain baseline characteristic (W) or combination of W’s empirically drive the probability of treatment to (or near) 0 or 1 would constitute a practical violation of the ETA assumptions. Such practical violations of the ETA were investigated as follows. First, I performed a qualitative assessment by plotting observed treatment assignment against predicted values to assess whether for any value of the covariates (Wm), the probability of treatment assignment is 0 or 1. Second, I present results from a quantitative method of checking ETA using Monte Carlo simulation. Finally, I present an empiric method for assessing the sensitivity of IPTW estimates to the most likely violation of ETA, that those treated after 12 hours from admission could not feasibly have been treated earlier. This approach is based on the standardized risk ratio (SRR) using modified IPTW weights that, in this case, gives an estimate of the effect of delaying surgery for those subjects who were treated early. Each of these methods assesses the validity of the marginal IPTW estimates by testing, either quantitatively or qualitatively, whether subjects could have been treated both early and late.

32

Graphical Evaluation of the Experimental Treatment Assignment Assumption Mortimer104 introduced a method of graphically evaluating whether the ETA assumption is met practically, after the treatment model-fitting step has been undertaken. Using the treatment model for binary treatment dichotomized at 12 hours from admission introduced in Chapter 3 (Table 3.2), coefficients were used to calculate predicted treatment assignment for each subject. The log odds of treatment (based on coefficients derived from the treatment model) were plotted against both the observed and predicted treatment probabilities. Violations of the ETA assumption would be suggested by a probability of either treatment assignment equaling 0 or 1 for any value of covariates in the treatment model. Figure 4.1 shows this plot. It can be seen that for nearly all values of the covariates in the treatment model, treatment at both less than and greater than 12 hours occurred, and in all but one subject, the expected probability of treatment never fell below 5% or exceeded 95%. This visual representation supports the ETA assumption.

Monte Carlo Simulation Approach Wang105 proposed the following simulation-based approach (parametric bootstrap) for examining the extent to which IPTW estimators might be biased due to a violation of the ETA assumption. First an estimate of the data-generating distribution is obtained that allows one to simulate realizations of the observed data structure. For this estimated data-generating distribution, the true parameter values can be computed through G-computation. A sampling distribution of IPTW estimates can be obtained by applying the IPTW estimator to a large number of simulated realizations of the observed data structure. Since the assumption of no unmeasured confounding is trivially satisfied in this case, any discrepancy between the mean of these estimates and the true parameter value reflects a violation of the ETA assumption.

In the case of a point-treatment study, the observed data structure O=(W,A,Y) consists of baseline covariates W, treatment A, and outcome Y. An estimate of the data-generating distribution thus consists of an estimate of the marginal distribution of W, the conditional distribution of A given W, and the conditional distribution of Y given A and W. The marginal distribution of W is estimated by its empirical distribution using the data-adaptive approach described in Chapter 3 to obtain an estimate of the conditional distribution of A given W. For the mortality analysis, the conditional distribution of Y given A and W is estimated by logistic regression. This model includes main-effect terms for A as well as all baseline covariates that were found to have significant bivariate associations with the outcome. These estimates now allow one to simulate realizations of the observed data structures by using the following sequential approach. First, n realizations of W are generated by sampling with replacement from the n observed values in the data set. Next, n realizations of A are drawn from the estimated conditional distribution of A given these simulated values of W. Lastly, n realizations of Y are obtained by drawing from the estimated distribution of Y given the simulated values of A and W. The true parameter values for this data-generating distribution can be computed based on the following G-computation approach. If we want to estimate the counterfactual mortality for the scenario that every patient undergoes surgery within 12 hours of admission, for example, we first draw a large number, say N=10,000, realizations of W as above; then we generate N realizations of Y by drawing from the estimated conditional distribution of Y given A=a* and these simulated values of W, where a* represents the treatment level corresponding to surgery within 12 hours. The desired counterfactual mortality can then be estimated by simply taking the mean of these simulated Y-values.

33

Table 4.1 summarizes the results of such a simulation study for determining the extent of bias that our IPTW estimators of the marginal counterfactual mortality risks might be subject to due to a violation of the ETA assumption. Figure 4.2 shows the corresponding distributions of IPTW estimates relative to the true parameter value obtained by G-computation. In all cases, the bias due to a possible violation of the ETA assumption appears to be minimal. Since all subjects seem to have adequately large probabilities of following any one of the five categories of treatment time we are examining here, this should also be true for the two treatment categories defined by the binary version of treatment so that the corresponding parameters in the binary analysis can also be estimated without appreciable bias due to a violation of the ETA assumption.

Standardized Risk Ratio Approach Standardization deals with the issue of confounding through estimating expected numbers of events and comparing them to those observed based on chosen weights and rates. Comparisons may be made between populations or treatment groups that have causal interpretations and inferences depend on the designated target population106. Modified IPTW weights were proposed by Sato107 in order to expand causal contrasts afforded by IPTW estimates to specific components of the larger study group. Regardless of extensive confounding control, unknown or unmeasured factors may still preclude some patients from later treatment groups from receiving early treatment (a practical ETA violation). A corollary of this is that only those patients treated early could have reasonably been treated in any of the five treatment groups (t0-t4) and may represent the only patients who could ethically be randomized. Standardized risk ratios give the closest approximation to such an experiment by focusing on the patients who were treated early. All components of the analysis described in Chapter 3 were identical for the SRR analysis except that the calculation of weights was different in order to provide inferences as to the estimated effect of treatment within the early treatment group. Weights were generated using the same treatment mechanism model for the IPTW analysis to estimate the conditional probability of early treatment as the numerator and the conditional probability of receiving the treatment received as the denominator.

(5) wi = P(A=t0|W=Wi) / P(A=Ai|W=Wi)

By using the early treatment group (t0) as the standard population and the mortality experience of the entire study sample, the SRR is interpreted as the estimated risk ratio if those who were treated early where to have been treated at one of the later treatment-time periods. If subjects treated after 12 hours vary enough from those treated earlier to bias estimates of effect, the SRR—with its target population being the t0 group—should differ from the marginal IPTW estimate. Similarities between SRR and IPTW would support exchangeability of those in the early treatment group with those treated later and therefore, support the ETA assumptions and the IPTW population estimate. I present results of this approach based on standardized risk ratios side-by-side with the IPTW results in Chapter 5 for direct comparison.

34

Chapter 4 Figure Legend

Figure 4.1: Plot of the expected probability (Pr(early12)) and observed (early12) treatment assignment versus the expected log odds of treatment assignment (logoddsA). The dashed line represents 0.05 and 0.95 reference lines.

Figure 4.2: The distributions of IPTW estimates relative to the true parameter value obtained by G-computation (vertical red line).

35

Figure 4.1

36

Figure 4.2

37

<12hrs 12-24hrs 24-48hrs 48-120hrs >120hrs

G-computation truth

4.15% 2.27% 3.55% 2.68% 3.03%

Mean IPTW estimate

4.16% 2.20% 3.53% 2.64% 2.92%

Estimated bias 0.01% -0.07% -0.01% -0.03% -0.11%

Table 4.1: Estimate of ETA bias based on Monte-Carlo Simulation

38

Chapter 5: Results

The NTDB 5.0 included 18,404 subjects with at least one femoral shaft fracture who underwent internal fixation. Figure 5.1 illustrates how the study sample of 3069 subjects was selected based on inclusion and exclusion criteria. The average age was 32.7 years (SD 14.7) and there were 2,207 men (71.9%). More than half (N=1759) of the cases were treated within 12 hours of admission with decreasing numbers treated in each subsequent time period (Table 5.1). There were a total of 108 (3.52%) deaths in the study sample and the unadjusted estimate of the mortality risk was lowest in the group of patients treated between 12 and 24 hours (1.9%, 95% CI 0.7 – 3.3%) and increased for those treated later. Glasgow Coma Scale (GCS), New Injury Severity Score (NISS), and maximum AIS in the head and neck region were strongly associated (p< 0.0005) with both mortality and treatment, with a consistent trend towards more severe injury among patients treated later. 1,146 subjects (37.3%) had associated serious head or neck injuries, 1,696 (55.3%) had associated chest injuries, 808 (26.3%) had associated serious abdominal injuries, and 1,222 (39.8%) had an additional serious extremity or pelvis injury other then the femoral shaft fracture that was treated (Table 5.2). The most common serious associated injuries (AIS > 3) were closed C2 fracture, pulmonary contusion, lumbar vertebral fracture and inter-trochanteric fracture, respectively.

Inverse Probability of Treatment-Weighted Analaysis The estimated mortality risk if all subjects had been treated in any one of the five time periods is shown in Figure 5.2, controlling for measured confounders. Mortality was estimated to be highest if all subjects were treated within 12 hours of admission. If treatment was performed between 12 and 24 hours, 48 and 120 hours, or after 120 hours, mortality was estimated to be reduced to 43% to 58% of this reference mortality level (Table 5.3). A mortality risk of 83% that of the reference (t0) level was estimated for treatment within 24 to 48 hour time frame, though this finding was not statistically significant. With the exception of the last time frame, the Hosmer – Le Cessie test showed acceptable fit of the selected treatment model (p-values for goodness-of-fit: 0.24, 0.56, 0.24, and 0.028 respectively).

In order to examine interaction by the presence of serious associated injuries and because the categorical analysis showed similar mortality risk among the latter four treatment groups, treatment was dichotomized at 12 hours. The marginal relative risk of mortality for later treatment was 0.65 (95%CI: 0.45 to 1.04; p-value: 0.06). For this dichotomous comparison, the data-adaptively selected logistic regression model for the treatment mechanism again showed acceptable goodness-of-fit (Hosmer – Le Cessie test p-value: 0.54). Figure 5.3 shows the estimated marginal mortality within groups defined by the presence of serious head, chest, abdominal or additional limb or pelvis injury, controlling for measured confounders. In general, more severely injured patients appear to benefit more strongly from delaying surgery for at least 12 hours. Only when stratifying on abdominal injury, did I see sufficient statistical evidence (Table 5.4) to suggest that the marginal relative risks for the two groups are different (p-value for effect modification: 0.09). The estimated marginal mortality with delayed treatment of those with no or low severity abdominal injuries is 82% (p-value: 0.39) of what it would be had they all been treated early. This is in contrast to a reduction to 36% (p-value: 0.03) of the risk of the mortality with early treatment, had all subject with serious abdominal injuries undergone delayed

39

treatment. While there was a marked decrease in estimated mortality for subjects with serious head injuries undergoing delayed treatment (RR: 0.62; 95%CI: 0.4 – 0.99; p-value: 0.04), there was insufficient statistical evidence to show a difference in effect as compared to subjects with no or low severity head injuries. Therefore, with the exception of severity of abdominal injury, it does not appear that sub-classification by any of the other tested associated injuries identifies subjects who would experience a greater or lesser degree of mortality reduction from delayed treatment than that estimated for the entire sample.

Logistic Regression Analysis Table 5.3 summarizes the results of the estimation of the association using arbitrarily chosen multivariate logistic regression analysis giving odds ratios associated with treatment at any given time point conditional on holding other variables in the final model constant. All of the point estimates for delayed treatment times trended towards mortality reduction in comparison to treatment in under 12 hours, with the risk being lowest between 12 and 24 hours (OR: 0.52, 95%CI 0.18 – 1.19), though none reached statistical significance. Age, NISS and GCS were significant predictors of mortality. Also, both the number of femurs fractured and the total number of severe extremity or pelvic fractures were found to have a significant association with mortality. Surprisingly, being treated in a hospital from the Northeast was associated with a significant eight-fold increase in mortality versus subjects from the South. The Hosmer – Le Cessie goodness-of-fit test for this final logistic regression model suggested poor fit to the data (p=0.003). As for the IPTW interaction analysis, I worked with a dichotomized version of treatment time and found a similar difference in odd ratios for mortality among patients with low and high severity abdominal injuries (p=0.16) and no significant difference among the other subgroups tested (Table 5.5).

Standardized Risk Ratios Analysis The estimates from the multivariate standardized risk ratio analysis (Table 5.3) were very similar to IPTW estimates with only slightly decreased precision. This analysis estimates the effect of treatment delay for those patients actually treated early. This provides the most conservative estimate of treatment effect by targeting those who, given their other prognostic factors, could have most likely been treated early. Again, I worked with a dichotomized version of treatment time and found a similar difference in standardized risk ratios for mortality among patients with low and high severity abdominal injuries (p=0.20) and no significant difference among the other subgroups tested (Table 5.6).

40

Chapter 5 Figure Legend

Figure 5.1: Study sample inclusion flow chart form the aNational Trauma Data Bank version 5.0 (2000-2005). bISS=Injury Severity Score. cWm= Covariates associated with mortality (p<0.2). (Reprinted with permission, Journal of Bone and Joint Surgery-American Edition)84

Figure 5.2: Inverse Probability of Treatment Weighted mortality estimates with 95% confidence intervals (lines). (*) represents statistically significant relative risk reduction (p<0.05) versus the referent (t0<12hrs) treatment group. (Reprinted with permission, Journal of Bone and Joint Surgery-American Edition)84

Figure 5.3: Inverse-Probability-of-Treatment-Weighted mortality estimates stratified by severity of associated injury (presence of serious associated extremity/pelvis, abdominal, chest and head injury), with 95% confidence intervals (lines). (*) represents statistically significant relative risk reduction (p<0.05) versus the referent (t0) treatment group. AIS = Abbreviated Injury Score. (Reprinted with permission, Journal of Bone and Joint Surgery-American Edition)84

41

Figure 5.1

952,242

24,682

18,404

4754

2162

7865

11

106

19

82

336

3069

NTDBa 5.0 Cohort

Subjects with femoral shaft fracture

Subjects undergoing definitive care

of femoral shaft fracture

ISSb < 15

Transfer or late admission

Age < 15

Patient with severe burns

Missing time of surgery

Missing or aberrant length of stay stayoofstastaty

stay Surgery > 2 weeks from admission

Missing data in Wmc

Final Study Sample

42

Figure 5.2

43

Figure 5.3

Cov

aria

tes

(Wm)

a A

ssoc

iatio

n w

ith

treat

men

t

Ass

ocia

tion

with

m

orta

lity

Trea

tmen

t Gro

ups

Mea

n (S

D) b

p-

valu

e p-

valu

e t 0

(N=1

759)

t 1 (N

=540

)

t 2 (N

=359

)

t 3 (N

=272

) t 4

(N=1

39)

G

lasg

ow C

oma

Sco

re

<0.0

01

<0.0

01

12.6

7 (4

.22)

12

.69

(4.1

4)

11.6

6 (4

.75)

10

.77

(5.1

3)

9.68

(5.2

0)

New

Inju

ry

Sev

erity

Sco

re

<0.0

01

<0.0

01

27.3

5 (8

.97)

27

.21

(8.3

3)

29.2

0 (9

.62)

32

.31

(11.

39)

34.6

8 (1

4.05

)

Max

imum

AIS

H

ead

Reg

ion

<0.0

01

<0.0

01

1.71

(1.6

5)

1.76

(1.6

9)

2.01

(1.6

8)

2.32

(1.8

3)

2.50

(1.9

4)

Car

diac

Co-

mor

bidi

ty

<0.0

01

0.00

2 0.

13 (0

.34)

0.

17 (0

.37)

0.

20 (0

.40)

0.

28 (0

.45)

0.

27 (0

.45)

Num

ber o

f ser

ious

as

soci

ated

ex

trem

ity in

jurie

s

<0.0

01

<0.0

01

1.65

(0.9

4)

1.42

(0.7

3)

1.54

(0.8

6)

1.67

(1.0

1)

1.68

(0.9

6)

Arr

ival

bet

wee

n 6a

m a

nd 1

2pm

0.

001

0.04

0.

24 (0

.43)

0.

07 (0

.26)

0.

20 (0

.40)

0.

18 (0

.39)

0.

22 (0

.42)

Age

0.

01

0.01

31

.86

(13.

73)

33.1

2 (1

5.73

) 34

.25

(15.

87)

34.4

4 (1

6.33

) 33

.38

(15.

05)

Bila

tera

l fra

ctur

es

0.03

0.

03

0.02

(0.1

4)

0.00

(0.0

6)

0.01

(0.1

2)

0.02

(0.1

2)

0.02

(0.1

5)

Teac

hing

Hos

pita

l 0.

38

0.15

0.

56 (0

.50)

0.

67 (0

.47)

0.

58 (0

.50)

0.

53 (0

.50)

0.

63 (0

.49)

H

ospi

tals

from

N

orth

east

Reg

ion

0.54

0.

00

0.08

(0.2

7)

0.10

(0.3

0)

0.09

(0.2

9)

0.11

(0.3

2)

0.07

(0.2

6)

Trea

ted

at L

evel

1

Trau

ma

Cen

ter

0.69

0.

05

0.39

(0.4

9)

0.39

(0.4

9)

0.45

(0.5

0)

0.45

(0.5

0)

0.48

(0.5

0)

Cer

ebro

-vas

cula

r C

o-m

orbi

dity

0.

74

0.01

0.

00 (0

.04)

0.

00 (0

.04)

0.

01 (0

.08)

0.

00 (0

.06)

0.

00 (0

.00)

Mor

talit

y

(95%

Con

fiden

ce

Inte

rval

)

3.7%

(2

.3%

, 5.4

%)

1.9%

(0

.7%

, 3.3

%)

4.2%

(2

.2%

, 6.4

%)

4.4%

(2

.3%

, 6.7

%)

4.3%

(0

.8%

, 8.2

%)

Tabl

e 5.

1: S

umm

ary

tabl

e of

biv

aria

te a

ssoc

iatio

ns o

f cov

aria

tes w

ith m

orta

lity

and

treat

men

t gro

up.

Trea

tmen

t gro

ups d

efin

ed a

s: t 0

< 1

2 ho

urs,

12 <

t 1 <

24

hour

s, 24

< t 2

< 4

8 ho

urs,

48 <

t 3 <

120

hou

rs, 1

20 h

ours

< t 4

. a W

m =

Subs

et o

f co

varia

tes w

ith b

ivar

iate

ass

ocia

tions

with

mor

talit

y of

p<0

.2.

b SD =

Sta

ndar

d D

evia

tion.

44

Bod

y R

egio

n of

Ass

ocia

ted

Inju

ry

(Num

ber o

f sub

ject

s with

dia

gnos

is A

IS>3

a : re

spec

tive

num

ber o

f uni

que

diag

nose

s)

Inju

ry

(IC

D-9

CM

) Fr

eque

ncy

(n

umbe

r of d

iagn

oses

/ to

tal n

umbe

r of

regi

on sp

ecifi

c A

IS>3

dia

gnos

es)

Hea

d an

d N

eck

(N=1

146:

202

) 1.

C2

Ver

tebr

al F

ract

ure

– C

lose

d (8

05.0

2)

5.2%

(84/

1627

)

2. S

kull

Bas

e Fr

actu

re: I

ntra

-cra

nial

invo

lvem

ent a

nd lo

ss o

f co

nsci

ousn

ess –

Not

oth

erw

ise

spec

ified

(801

.00)

3.

2% (5

2/16

27)

3. T

raum

atic

Sub

arac

hnoi

d H

emor

rhag

e: lo

ss o

f co

nsci

ousn

ess –

Not

oth

erw

ise

spec

ified

(852

.00)

3.

1% (5

1/16

27)

Che

st (N

=169

6: 5

1)

1. L

ung

Con

tusi

on –

Clo

sed

(861

.21)

40

.1%

(988

/244

7)

2.

Tra

umat

ic P

neum

otho

rax

– C

lose

d (8

60.0

) 23

.5%

(574

/244

7)

3.

Dor

sal V

erte

bral

frac

ture

– C

lose

d (8

05.2

) 8.

5% (2

07/2

447)

A

bdom

en (N

=808

: 64)

1.

Lum

bar V

erte

bral

Fra

ctur

e –

Clo

sed

(805

.4)

38.4

% (3

85/1

004)

2. S

plen

ic P

aren

chym

al L

acer

atio

n (8

65.0

3)

12.3

% (1

23/1

004)

3. L

iver

Lac

erat

ion

– M

oder

ate

(864

.03)

10

.5%

(105

/100

4)

Extre

mity

and

Pel

vis (

N=1

222:

126

) 1.

Inte

r-tro

chan

teric

frac

ture

– C

lose

d (8

20.2

1)

7.3%

(127

/173

1)

2.

Tib

ia w

ith fi

bula

frac

ture

– O

pen

(823

.32)

6.

1% (1

06/1

731)

3. P

elvi

c fr

actu

re w

ith d

isru

ptio

n of

pel

vic

ring

– C

lose

d (8

08.4

3)

4.9

% (8

4/17

31)

Tabl

e 5.

2: M

ost c

omm

on a

ssoc

iate

d In

tern

atio

nal C

lass

ifica

tion

of D

isea

ses,

9th C

linic

al M

odifi

catio

n (I

CD

-9C

M) c

oded

inju

ries o

f th

e he

ad a

nd n

eck,

che

st, a

bdom

en, a

nd e

xtre

miti

es a

nd p

elvi

s.

a AIS

= A

bbre

viat

ed In

jury

Sco

re

45

Estim

ate

12-2

4hrs

24

-48h

rs

48-1

20hr

s >

120h

rs

Cru

de R

elat

ive

Ris

k 0.

50 [0

.06]

1.

13 [0

.69]

1.

19 [0

.52]

1.

17 [0

.77]

(0.1

8, 1

.01)

(0

.58,

2.0

5)

(0.6

7, 2

.03)

(0

.24,

2.6

5)

IPTW

a Rel

ativ

e R

isk

0.45

[0.0

3]

0.83

[0.4

9]

0.58

[0.0

3]

0.43

[0.0

3]

(0

.15,

0.9

8)

(0.4

3, 1

.44)

(0

.28,

0.9

3)

(0.1

0, 0

.94)

Lo

gist

ic R

egre

ssio

nb 0.

52 [0

.15]

0.

86 [0

.67]

0.

62 [0

.14]

0.

66 [0

.43]

(0.1

8, 1

.19)

(0

.40,

1.6

8)

(0.3

1, 1

.15)

(0

.14,

1.5

5)

Sta

ndar

dize

d R

isk

Rat

ioc

0.47

[0.0

7]

0.94

[0.8

5]

0.58

[0.0

9]

0.43

[0.0

5]

(0

.14,

1.1

1)

(0.4

4, 1

.76)

(0

.21,

1.0

9)

(0.0

9, 0

.94)

Tabl

e 5.

3: C

ompa

rison

of c

rude

, reg

ress

ion,

and

mar

gina

l stru

ctur

al m

odel

est

imat

es o

f eff

ect o

f tre

atm

ent t

ime

on m

orta

lity.

Poin

t est

imat

es a

re g

iven

with

p-v

alue

s in

brac

kets

[] a

nd 9

5% c

onfid

ence

inte

rval

s in

pare

nthe

ses (

). A

ll an

alys

es u

se t 0

(<12

hou

rs)

as th

e re

fere

nt tr

eatm

ent g

roup

. a In

vers

e-pr

obab

ility

-of-

treat

men

t-wei

ghte

d (I

PTW

) pop

ulat

ion

rela

tive

risk

was

der

ived

usi

ng m

odel

fo

r tre

atm

ent a

ssig

nmen

t con

trolli

ng fo

r Wm

(NIS

S, G

CS,

Nor

thea

st R

egio

n, a

ge, a

rriv

al ti

me,

num

ber o

f ser

ious

ext

rem

ity/p

elvi

s or

head

inju

ries,

num

ber o

f fem

ur fr

actu

res,

pres

ence

of c

ardi

ac o

r cer

ebro

-vas

cula

r co-

mor

bidi

ties,

teac

hing

stat

us, a

nd A

CS

Leve

l 1

desi

gnat

ion)

. b O

dds R

atio

s fro

m fi

nal m

ultiv

aria

te lo

gist

ic re

gres

sion

mod

el c

ontro

lling

for N

ISS,

GC

S, N

orth

east

Reg

ion,

age

, ar

rival

tim

e, a

nd n

umbe

r of s

erio

us (A

IS >

3) e

xtre

mity

/pel

vis i

njur

ies.

c Stan

dard

ized

Ris

k R

atio

(SR

R) u

sing

sam

e tre

atm

ent m

odel

as

IPTW

ana

lysi

s but

mod

ified

wei

ghts

to g

ive

the

estim

ated

pro

porti

onat

e ris

k ha

d th

ose

subj

ects

trea

ted

early

(t0)

rece

ived

trea

tmen

t at

a la

ter t

ime.

46

Ass

ocia

ted

Inju

ry

Low

sev

erity

(A

IS <

3)a

H

igh

seve

rity

(A

IS >

3)

R

elat

ive

Ris

k (9

5% C

I) n

Rel

ativ

e R

isk

(95%

CI)

n p-

valu

e E

xtre

mity

/ Pel

vis

Frac

ture

s 0.

58 (0

.33,

1.2

0)

1846

0.

73 (0

.39,

1.3

6)

1223

0.

63

Abd

omen

0.

82 (0

.54,

1.3

5)

2261

0.

36 (0

.13,

0.8

7)

808

0.09

C

hest

0.

67 (0

.32,

1.3

7)

1373

0.

64 (0

.39,

1.1

1)

1696

0.

90

Hea

d

0.71

(0.3

7, 1

.77)

19

23

0.62

(0.4

0, 0

.99)

11

46

0.74

Tabl

e 5.

4: In

vers

e-Pr

obab

ility

-of-

Trea

tmen

t-Wei

ghte

d es

timat

es o

f the

rela

tive

risk

of m

orta

lity

for t

reat

men

t del

ay (g

reat

er th

an 1

2 ho

urs)

by

seve

rity

of a

ssoc

iate

d in

jury

.

Rep

orte

d p-

valu

es te

st th

e eq

ualit

y of

trea

tmen

t eff

ects

bet

wee

n lo

w a

nd h

igh

seve

rity

asso

ciat

ed in

jury

sub-

grou

ps. a A

IS =

A

bbre

viat

ed In

jury

Sco

re.

47

Ass

ocia

ted

Inju

ry

Low

sev

erity

(A

IS <

3)a

H

igh

seve

rity

(A

IS >

3)

O

dds

Rat

io (9

5% C

I) n

Odd

s R

atio

(95%

CI)

n p-

valu

e E

xtre

mity

/ Pel

vis

Frac

ture

s 0.

56 (0

.28,

1.2

4)

1846

0.

77 (0

.39,

1.5

1)

1223

0.

55

Abd

omen

0.

81 (0

.48,

1.3

9)

2261

0.

39 (0

.14,

1.0

3)

808

0.16

C

hest

0.

66 (0

.30,

1.4

6)

1373

0.

62 (0

.35,

1.1

5)

1696

0.

87

Hea

d

0.79

(0.4

0, 2

.00)

19

23

0.57

(0.3

4, 1

.00)

11

46

0.47

Tabl

e 5.

5: L

ogis

tic R

egre

ssio

n es

timat

es o

f the

rela

tive

odds

of m

orta

lity

for t

reat

men

t del

ay (g

reat

er th

an 1

2 ho

urs)

by

seve

rity

of

asso

ciat

ed in

jury

.

Rep

orte

d p-

valu

es te

st th

e eq

ualit

y of

trea

tmen

t eff

ects

bet

wee

n lo

w a

nd h

igh

seve

rity

asso

ciat

ed in

jury

sub-

grou

ps. a A

IS =

A

bbre

viat

ed In

jury

Sco

re.

48

Ass

ocia

ted

Inju

ry

Low

sev

erity

(A

IS <

3)a

H

igh

seve

rity

(A

IS >

3)

R

elat

ive

Ris

k (9

5% C

I) n

Rel

ativ

e R

isk

(95%

CI)

n p-

valu

e E

xtre

mity

/ Pel

vis

Frac

ture

s 0.

59 (0

.33,

1.1

7)

1846

0.

73 (0

.34,

1.4

2)

1223

0.

66

Abd

omen

0.

79 (0

.48,

1.3

6)

2261

0.

40 (0

.12,

1.0

5)

808

0.20

C

hest

0.

68 (0

.29,

1.5

6)

1373

0.

64 (0

.35,

1.1

4)

1696

0.

87

Hea

d

0.69

(0.3

3, 1

.75)

19

23

0.64

(0.3

8, 1

.07)

11

46

0.86

Tabl

e 5.

6: S

tand

ardi

zed

Ris

k R

atio

est

imat

es o

f the

rela

tive

risk

of m

orta

lity

for t

reat

men

t del

ay (g

reat

er th

an 1

2 ho

urs)

by

seve

rity

of

asso

ciat

ed in

jury

.

Rep

orte

d p-

valu

es te

st th

e eq

ualit

y of

trea

tmen

t eff

ects

bet

wee

n lo

w a

nd h

igh

seve

rity

asso

ciat

ed in

jury

sub-

grou

ps. a A

IS =

A

bbre

viat

ed In

jury

Sco

re.

49

50

Chapter 6: Discussion

Clinical Comments I have conducted an analysis of the effect of timing of definitive femoral shaft fracture fixation among multi-system trauma subjects in the National Trauma Data Bank. Using an inverse-probability-of-treatment-weighted (IPTW) analysis to estimate the marginal (i.e. the population level) risk of mortality for categorically defined treatment time, I found treatment in all but one (24 – 48 hours) of four delayed treatment categories to significantly lower risk of mortality to about 50% of that expected with early treatment (less than 12 hours). I also showed that patients with serious abdominal injuries have greater risk reductions from delayed fixation when compared to those with less serious or no abdominal injury. These findings were consistent with, but more precisely estimated than, the results of multivariate standardized risk ratios using the early treatment group as the standard population. These findings strongly support a cautious approach to early definitive femoral shaft fracture fixation among multi-system trauma patients, and especially those with serious associated abdominal injuries.

In two large studies27, 108 assessing the effect of timing of internal fixation of femoral shaft fractures, subgroups of adult multi-system trauma patients (ISS>15) were defined with comparable demographics, associated injuries, and crude mortality rates. Fakhry108 found a significant decrease in mortality among later treated groups but only adjusted for ISS. Brundage27 did not report significant differences in mortality by treatment time but did find significant increases in complications such as pneumonia, ARDS, and length of hospital stay among patients treated between two and five days from injury. I divided the first hospital day into two periods, and found protective effects from delaying fixation even if only by twelve hours which may afford some degree of necessary resuscitation. Given that my sample was five times larger than the treated multi-system trauma subgroup in either of these studies, it is likely that increased power allowed more precise estimation of the protective effect of delayed treatment.

Several studies have asserted that patients with head or chest injuries may be more susceptible to the effects of early fracture fixation than patients without such associated trauma. Pape25 reported on a cohort of 106 multi-system trauma patients with femoral shaft fractures (ISS>18) with and without thoracic trauma (AIS>3). In a stratified analysis, these authors reported a significant increase in the incidence of ARDS among chest-injured patients who were treated with intra-medullary nailing early (<24 hours from admission) versus those who were not chest injured (33% versus 7.7%), and a similar, but not statistically significant, difference in mortality (21% versus 4%), suggesting that patients with chest injuries might benefit from delayed treatment. The notion that intra-medullary nailing potentiates the development of ARDS and further morbidity among patients who are vulnerable due to thoracic injury has been supported by human and animal studies showing embolization of marrow products during intra-medullary nailing109-113, however the clinical importance of this phenomenon continues to be debated. Several observational studies12, 15 using patients with thoracic injuries but without femoral shaft fractures as controls, have found no difference in incidence of ARDS or mortality suggesting that it is the severity of the pulmonary injury, rather than timing or mode of fixation, that determines the likelihood of adverse outcome. Among patients with associated head injuries, studies have shown hypotension and decreased cerebral perfusion pressure during early

51

fracture care28, 114, 115. Still, increased mortality has not been shown in comparing early (<24 hours) to late treatment of femoral shaft fractures27, 29, 108, 116, 117. While the relative risk for delayed treatment (beyond 12 hours) among those with serious thoracic injuries (RR 0.64%; 95%CI 0.39-1.11; p-value=0.10) approached significance and serious head injuries (RR 0.62%; 95%CI 0.40-0.99; p-value=0.04) achieved significance in my study, there was inadequate statistical evidence to support these subgroups experiencing a greater protective effect of delayed treatment than that expected in the rest of the sample.

The only factor for which there was evidence of modification of the effect of treatment timing on mortality was the presence of serious abdominal injury. For patients with serious abdominal injuries, undergoing delayed treatment had an estimated mortality risk 36% that of early treatment and this relative risk was significantly lower than for those without such injuries. Those with serious abdominal and extremity trauma may be particularly difficult to resuscitate because of significant on-going hemorrhage and potential for numerous vital organs to be injured. Several studies have shown abdominal injuries to be an important risk factor for mortality and morbidity in the multi-system trauma patient118, 119. Schulman120 reported on a series of blunt trauma patients undergoing a standardized resuscitation protocol and found only ISS, abdominal injury (AIS>3) and extremity and bony pelvis injury (AIS >3) to be significantly associated with prolonged occult hypoperfusion. White119 reported a five-fold increase in the risk of ARDS when combined extremity and abdominal injuries were present. Patients treated early in my study with associated serious abdominal injuries were likely under-resuscitated and may have been particularly vulnerable to adverse outcomes from a definitive orthopedic procedure that often involves substantial additional blood loss.

Early treatment in my study is a likely surrogate for treatment in an under-resuscitated state were cellular hypoxia, oxidative stress, inflammatory response, and altered coagulation contribute to morbidity and mortality34, 35, 121, 122. In a study of multi-system trauma patients with a femoral shaft fracture, Crowl30 showed that early intra-medullary fixation (<24 hours from admission) of femoral shaft fractures in the setting of occult hypo perfusion (serum lactate>2.5mmol/L) leads to a significant increase in post-operative complications. Hypo-perfusion resulting from trauma may prime the immune system for an inflammatory response if such treatment is undertaken prior to adequate resuscitations, and can lead to significant end organ injury 123, 124. The realization of this phenomenon has led to the description of so-called “damage-control orthopaedics” whereby definitive treatment is delayed until resuscitation of the patient has been adequately achieved25, 33, 125, 126. My findings are consistent with this approach and signify the potential benefits that may be realized with delays in treatment of just 12 hours, particularly for those with associated abdominal injuries. The current end-points used to guide resuscitation, such as blood pressure, urine output, heart rate, base deficit and serum lactate are global markers that may underestimate occult tissue hypo-perfusion. In the future, more sensitive measures of tissue oxygenation measured by polarographic or near-infrared technologies127 and markers of inflammation and coagulation26, 128 that better reveal the physiologic condition of a patient are likely to replace simple temporal distinctions. They will provide more accurate criteria by which to determine when a patient is “ready” for definitive internal fixation of a femoral shaft fracture or other major orthopaedic injuries.

52

Methodological Comments Marginal structural models with IPTW estimation were used in order to provide population level marginal estimates of effect with inference closest to what could have been achieved in the ideal setting of a randomized trial. Marginal structural models are based on the concept of counterfactual outcomes47, 96 as introduced in Chapter 2. In the clinical scenario posed here, a subject’s counterfactual (or potential outcome) represents the set of outcomes had the subject, contrary to fact, had an exposure history other than the one actually observed. While this may appear to be an abstract concept, inferences provided are analogous to classical standardization of risk ratios106. Moreover, several of the assumptions of marginal structural models such as no unmeasured confounding (sequential randomization) and temporal ordering are no different from those posed by conventional regression-based techniques of analysis of observational studies.

In Chapter 4, I introduced the assumptions necessary for valid identification and estimation of marginal structural model parameters generally, and those required for the use of IPTW specifically. Two of these assumptions are particularly important to note. The first is that there are no unmeasured confounding variables. This means that the counterfactual outcome is conditionally independent of treatment, given measured covariates. To the extent that unknown confounding goes unadjusted for or is uncorrelated with measured confounders, estimates may be biased. As discussed earlier, this assumption is ubiquitous in observational studies and cannot be empirically tested. While great care was taken to assess all possible confounders in a rich set of covariate data provided in the NTDB, we cannot be certain that some residual confounding is not present. The second assumption, called the experimental treatment assignment (ETA) assumption, requires that there are no values of the baseline covariates for which treatment is assigned in a deterministic fashion. The ETA assumption gets at the very heart of treatment selection bias that plagues therapeutic observational studies and those of surgical interventions specifically. The disparity between crude and adjusted estimates (Table 5.3), in light of higher injury severity and mortality in later treatment groups (Table 5.1), demonstrates the degree to which confounding by disease severity may threaten valid inferences in studies similar to ours. Because there is such a strong clinical tendency to prescribe a treatment based on the subject’s likelihood of outcome and danger for this to bias estimates of causal effects, I dedicated Chapter 4 to testing this assumption.

After applying stringent inclusion and exclusion criteria to selecting the study sample from the NTDB, three methods were undertaken to assess the ETA assumption. The first was qualitative evaluation using a graphical summary104 that showed that for nearly all values of covariate in the binary treatment assignment model, both treatment before and after 12 hours occurred. The second was an empirical, subject-matter-based approach based on the standardized risk ratio modification of IPTW proposed by Sato107. Because subjects treated after 12 hours tended to have slightly higher injury severity (Table 1), one might postulate that regardless of the extensive confounding control performed here, unknown or unmeasured factors may have still preclude some patients from the later treatment groups from receiving early treatment and thus constituting a practical ETA violation. The most conservative response to this would be to assume that only those patients treated early could have reasonably been treated in any of the five treatment groups (t0-t4) and that a randomized trial including only those patients treated early might give different results compared to the marginal IPTW estimate. Using the early treatment group weights and population mortality rates to estimate the

53

proportionate change in mortality had those treated early been treated in each of the later treatment groups (Table 5.3), estimates of effect were comparable to IPTW estimates supporting the notion of sufficient exchangeability between the early treatment group and the larger study sample. Hence, the most conservative counterfactual scenario examined using multivariable standardized risk ratios supports the inference from the population-level IPTW estimate. These results are also in agreement with the Monte-Carlo simulations-based approach105 that examined to what extent the estimates might be biased due to such a violation of the ETA assumption. In all cases, the bias due to a possible violation of the ETA assumption appears to be minimal since all subjects seem to have adequately large probabilities of following any one of the five categories of time until definitive fixation.

In the absence of ETA violation, there are several additional advantages to IPTW over conventional regression analysis are evident. Similar to a standardized risk ratio106, 129, IPTW provide unconditional estimates that are valid even when there is effect-measure modification. Comparing crude (unadjusted) estimates to estimates from both standard regression and IPTW shows evidence of confounding (Table 5.3). Apart from an initial relative risk reduction in the 12-24 hour range of about 50% that is consistent among all estimates, the crude estimates show a slight (though not statistically significant) increase in risk for those treated later. Both regression and IPTW show similar relative risk reduction in all groups treated after 12 hours, with IPTW estimates of effect being greater in magnitude and precision. There are several potential reasons for this. One is that as mentioned earlier, the two analytical methods typically estimate parameters that have different interpretations, regression typically being used to estimate the association conditional within sub-groups, and IPTW used typically for so-called marginal associations. While regression can also be used for marginal estimation, the advantage of IPTW is that the treatment mechanism is estimated semi-parametrically. The other reason is that while the accuracy and precision of logistic regression depends on goodness-of-fit of the outcome model—which was poor—it is the goodness-of-fit of the treatment model—which was acceptable—that matters for valid IPTW estimates. However, the approach based on IPTW may be considered inefficient because model selection is done orthogonal to the parameter of interest that has nothing to do with the treatment mechanism. Recently developed regression techniques using machine learning and targeted maximum likelihood130 optimize the model selection A parameter that is more highly related to the parameter of interest. Such techniques may afford significant gains in efficiency over the IPTW approach presented here and warrant further investigation. Still, I believe that IPTW provides estimates with interpretation that closely approximate the causal inference that clinicians seek in understanding the impact of treatment time.

Another important feature of this analysis is the assessment of potential interaction by selected associated injury patterns. Like the IPTW estimates (Table 5.4), both the regression (Table 5.5) and SRR approaches (Table 5.6) yielded evidence of interaction between those with and without severe associated abdominal injuries. The IPTW analysis yielded a slightly larger difference in effect though these results are again, not directly comparable to those from standard regression and have a different target population from the SRR approach. Just as discussed earlier for main effects estimates, assessing interaction with standard regression gives inference that is conditional on the effect modifier of interest as well as any other confounders included in the model whereas IPTW estimates are marginal within strata of the effect modifier of interest. Whereas stratification or regression adjustment are used for confounder control and detection of

54

interaction in conventional analytical methods, marginal structural models make confounding control (IPTW) distinct from methods for detection of effect modification (adding treatment covariate interaction terms to an MSM)47. The marginal structural model including serious abdominal injury reveals a significant difference in the effect of early versus delayed definitive surgery depending on whether the abdominal AIS was less than 3 or greater than or equal to three. Moreover, within strata of abdominal injury severity, the estimate of effect is marginal in the same sense as the analysis with only the main effects term. The SRR interaction analysis supports the findings of IPTW as representing the most conservative causal contrast within strata of associate injury by using the early treatment group as the standard population. As with the main effects analysis, a marginal assessment of interaction gives inferences similar to that expected from sub-group analyses from a randomized controlled trial.

Final Comments This observational study assessing the marginal effect of timing of definitive fixation of femoral shaft fractures in multi-system trauma patients showed a significant estimated relative risk reduction of about 50% among subjects treated between 12-24 hours and greater than 48 hours. Patients with severe abdominal injuries (AIS >3 ) showed significantly greater relative risk reductions than those with low abdominal AIS scores signifying interaction by this set of associated injuries. The observational nature of this study precludes absolutely ascribing causal effects to timing of treatment because the assumption of no unmeasured confounding cannot be empirically tested. In addition, there was missing information on measured covariates for subjects who would have otherwise been eligible for inclusion in the study sample (Figure 5.1) imposing a potential selection bias. While the NTDB includes a very large sample of trauma centers from all over the United States, it is still a convenience sample. However, this is the largest cohort ever used for investigation of this study question, and the national representation of academic and non-academic centers of various levels provides far-reaching generalizability of these results. The NTDB data provided extensive covariate information that was used for comprehensive covariate adjustment and robust analytical methods were used to optimize model fit and properly estimate confidence intervals in the setting of potential correlations by treatment center. Statistically efficient analytic methods were employed to address this bias and these methods provided unconditional estimates of relative risk. In order to compare marginal estimates to those given by conventional regression, IPTW results have been given side-by-side with logistic regression estimates and differences in interpretation discussed. Assessments of the experimental treatment assignment assumption gave both qualitative and quantitative support to the marginal IPTW estimates. Thus, these results provide strong empirical evidence in support of a delayed or “damage-control” approach to definitive fixation of femoral-shaft fractures among multi-system trauma patients and particularly among those with serious abdominal injuries. Further prospective randomized studies are necessary to confirm these results.

55

REFERENCES:

1. Riska EB, von Bonsdorff H, Hakkinen S, Jaroma H, Kiviluoto O, Paavilainen T. Prevention of fat embolism by early internal fixation of fractures in patients with multiple injuries. Injury. Nov 1976;8(2):110-‐116.

2. Johnson KD, Johnston DW, Parker B. Comminuted femoral-‐shaft fractures: treatment by roller traction, cerclage wires and an intramedullary nail, or an interlocking intramedullary nail. J Bone Joint Surg Am. Oct 1984;66(8):1222-‐1235.

3. Webb LX, Gristina AG, Fowler HL. Unstable femoral shaft fractures: a comparison of interlocking nailing versus traction and casting methods. J Orthop Trauma. 1988;2(1):10-‐12.

4. Hansen ST, Winquist RA. Closed intramedullary nailing of the femur. Kuntscher technique with reaming. Clin Orthop Relat Res. Jan-‐Feb 1979(138):56-‐61.

5. Tornetta P, 3rd, Tiburzi D. Antegrade or retrograde reamed femoral nailing. A prospective, randomised trial. J Bone Joint Surg Br. Jul 2000;82(5):652-‐654.

6. Wolinsky PR, McCarty E, Shyr Y, Johnson K. Reamed intramedullary nailing of the femur: 551 cases. J Trauma. Mar 1999;46(3):392-‐399.

7. Kuntscher G. The Marrow Nailing Method: Springer; 1947. 8. Clawson DK, Smith RF, Hansen ST. Closed intramedullary nailing of the femur. J Bone

Joint Surg Am. Jun 1971;53(4):681-‐692. 9. Goris RJ, Gimbrere JS, van Niekerk JL, Schoots FJ, Booy LH. Early osteosynthesis and

prophylactic mechanical ventilation in the multitrauma patient. J Trauma. Nov 1982;22(11):895-‐903.

10. Johnson KD, Cadambi A, Seibert GB. Incidence of adult respiratory distress syndrome in patients with multiple musculoskeletal injuries: effect of early operative stabilization of fractures. J Trauma. May 1985;25(5):375-‐384.

11. Seibel R, LaDuca J, Hassett JM, et al. Blunt multiple trauma (ISS 36), femur traction, and the pulmonary failure-‐septic state. Ann Surg. Sep 1985;202(3):283-‐295.

12. Bosse MJ, MacKenzie EJ, Riemer BL, et al. Adult respiratory distress syndrome, pneumonia, and mortality following thoracic injury and a femoral fracture treated either with intramedullary nailing with reaming or with a plate. A comparative study. The Journal of bone and joint surgery. Jun 1997;79(6):799-‐809.

13. Behrman SW, Fabian TC, Kudsk KA, Taylor JC. Improved outcome with femur fractures: early vs. delayed fixation. J Trauma. Jul 1990;30(7):792-‐797; discussion 797-‐798.

14. Charash WE, Fabian TC, Croce MA. Delayed surgical fixation of femur fractures is a risk factor for pulmonary failure independent of thoracic trauma. J Trauma. Oct 1994;37(4):667-‐672.

15. Boulanger BR, Stephen D, Brenneman FD. Thoracic trauma and early intramedullary nailing of femur fractures: are we doing harm? J Trauma. Jul 1997;43(1):24-‐28.

16. Bone LB, Johnson KD, Weigelt J, Scheinberg R. Early versus delayed stabilization of femoral fractures. A prospective randomized study. J Bone Joint Surg Am. Mar 1989;71(3):336-‐340.

56

17. Bernard GR, Artigas A, Brigham KL, et al. The American-‐European Consensus Conference on ARDS. Definitions, mechanisms, relevant outcomes, and clinical trial coordination. Am J Respir Crit Care Med. Mar 1994;149(3 Pt 1):818-‐824.

18. Baker SP, O'Neill B, Haddon W, Jr., Long WB. The injury severity score: a method for describing patients with multiple injuries and evaluating emergency care. J Trauma. Mar 1974;14(3):187-‐196.

19. Osler T, Baker SP, Long W. A modification of the injury severity score that both improves accuracy and simplifies scoring. J Trauma. Dec 1997;43(6):922-‐925; discussion 925-‐926.

20. Abbreviated Injury Score , 1990 revision Des Plaines, IL: Association for the Advancement of Automotive Medicine

; 1990. 21. Brenneman FD, Boulanger BR, McLellan BA, Redelmeier DA. Measuring injury

severity: time for a change? J Trauma. Apr 1998;44(4):580-‐582. 22. Rutledge R, Osler T, Emery S, Kromhout-‐Schiro S. The end of the Injury Severity

Score (ISS) and the Trauma and Injury Severity Score (TRISS): ICISS, an International Classification of Diseases, ninth revision-‐based prediction tool, outperforms both ISS and TRISS as predictors of trauma patient survival, hospital charges, and hospital length of stay. J Trauma. Jan 1998;44(1):41-‐49.

23. Lavoie A, Moore L, LeSage N, Liberman M, Sampalis JS. The Injury Severity Score or the New Injury Severity Score for predicting intensive care unit admission and hospital length of stay? Injury. Apr 2005;36(4):477-‐483.

24. Pinney SJ, Keating JF, Meek RN. Fat embolism syndrome in isolated femoral fractures: does timing of nailing influence incidence? Injury. Mar 1998;29(2):131-‐133.

25. Pape HC, Auf'm'Kolk M, Paffrath T, Regel G, Sturm JA, Tscherne H. Primary intramedullary femur fixation in multiple trauma patients with associated lung contusion-‐-‐a cause of posttraumatic ARDS? J Trauma. Apr 1993;34(4):540-‐547; discussion 547-‐548.

26. Pape HC, van Griensven M, Rice J, et al. Major secondary surgery in blunt trauma patients and perioperative cytokine liberation: determination of the clinical relevance of biochemical markers. J Trauma. Jun 2001;50(6):989-‐1000.

27. Brundage SI, McGhan R, Jurkovich GJ, Mack CD, Maier RV. Timing of femur fracture fixation: effect on outcome in patients with thoracic and head injuries. J Trauma. Feb 2002;52(2):299-‐307.

28. Jaicks RR, Cohn SM, Moller BA. Early fracture fixation may be deleterious after head injury. J Trauma. Jan 1997;42(1):1-‐5; discussion 5-‐6.

29. Wang MC, Temkin NR, Deyo RA, Jurkovich GJ, Barber J, Dikmen S. Timing of surgery after multisystem injury with traumatic brain injury: effect on neuropsychological and functional outcome. J Trauma. May 2007;62(5):1250-‐1258.

30. Crowl AC, Young JS, Kahler DM, Claridge JA, Chrzanowski DS, Pomphrey M. Occult hypoperfusion is associated with increased morbidity in patients undergoing early femur fracture fixation. J Trauma. Feb 2000;48(2):260-‐267.

31. Pape HC, Schmidt RE, Rice J, et al. Biochemical changes after trauma and skeletal surgery of the lower extremity: quantification of the operative burden. Crit Care Med. Oct 2000;28(10):3441-‐3448.

57

32. Pape HC, Grimme K, Van Griensven M, et al. Impact of intramedullary instrumentation versus damage control for femoral fractures on immunoinflammatory parameters: prospective randomized analysis by the EPOFF Study Group. J Trauma. Jul 2003;55(1):7-‐13.

33. Giannoudis PV, Smith RM, Bellamy MC, Morrison JF, Dickson RA, Guillou PJ. Stimulation of the inflammatory system by reamed and unreamed nailing of femoral fractures. An analysis of the second hit. J Bone Joint Surg Br. Mar 1999;81(2):356-‐361.

34. Abramson D, Scalea TM, Hitchcock R, Trooskin SZ, Henry SM, Greenspan J. Lactate clearance and survival following injury. J Trauma. Oct 1993;35(4):584-‐588; discussion 588-‐589.

35. Moore FA. The role of the gastrointestinal tract in postinjury multiple organ failure. Am J Surg. Dec 1999;178(6):449-‐453.

36. Tisherman SA, Barie P, Bokhari F, et al. Clinical practice guideline: endpoints of resuscitation. J Trauma. Oct 2004;57(4):898-‐912.

37. Davis JW, Parks SN, Kaups KL, Gladen HE, O'Donnell-‐Nicol S. Admission base deficit predicts transfusion requirements and risk of complications. J Trauma. Nov 1996;41(5):769-‐774.

38. Rixen D, Siegel JH. Metabolic correlates of oxygen debt predict posttrauma early acute respiratory distress syndrome and the related cytokine response. J Trauma. Sep 2000;49(3):392-‐403.

39. Robinson CM. Current concepts of respiratory insufficiency syndromes after fracture. J Bone Joint Surg Br. Aug 2001;83(6):781-‐791.

40. Rixen D, Grass G, Sauerland S, et al. Evaluation of criteria for temporary external fixation in risk-‐adapted damage control orthopedic surgery of femur shaft fractures in multiple trauma patients: "evidence-‐based medicine" versus "reality" in the trauma registry of the German Trauma Society. J Trauma. Dec 2005;59(6):1375-‐1394; discussion 1394-‐1375.

41. Dunham CM, Bosse MJ, Clancy TV, et al. Practice management guidelines for the optimal timing of long-‐bone fracture stabilization in polytrauma patients: the EAST Practice Management Guidelines Work Group. J Trauma. May 2001;50(5):958-‐967.

42. Greenland S, Morgenstern H. Confounding in health research. Annu Rev Public Health. 2001;22:189-‐212.

43. Greenland S, Neutra R. Control of confounding in the assessment of medical technology. Int J Epidemiol. Dec 1980;9(4):361-‐367.

44. Breslow NE, Day NE. Statistical Methods in Cancer Research. 1: The Analysis of Case Control Studies. Lyon: I.A.R.C.; 1980.

45. Mantel N, Haenszel W. Statistical aspects of the analysis of data from retrospective studies of disease. Journal of the National Cancer Institute. Apr 1959;22(4):719-‐748.

46. Jewell NP. Statistics for Epidemiology. Boca Raton: Chapman & Hall / CRC; 2004. 47. Robins JM, Hernan MA, Brumback B. Marginal structural models and causal

inference in epidemiology. Epidemiology. Sep 2000;11(5):550-‐560. 48. Rothman KJ, Greenland S. Modern Epidemiology. 2nd ed. Philadelphia: Lippincott-‐

Raven; 1998. 49. Moore DS, McCabe GP. Introduction to the Practice of Statistics. 4th ed. New York:

W.H. Freeman and Company; 2003.

58

50. Altman DG. Practical Statistics for Medical Research. London: Chapman & Hall / CRC; 1991.

51. Lehmann EL. Nonparametrics: Statistics based on ranks. New York: McGraw-‐Hill International; 1975.

52. Altman DG, Gore SM, Gardner MJ, Pocock SJ. Statistical guidelines for contributors to medical journals. In: Garnder MJ, Altman DG, eds. Statistics with confidence. Confidence intervals and statistical guidelines. London: British Medical Journal; 1989:83-‐100.

53. Rothman KJ, Greenlan S. Modern Epidemiology. 2nd ed. Philadelphia: Lippincott, Williams &Wilkins; 1998.

54. Hurwitz SR, Tornetta P, 3rd, Wright JG. An AOA critical issue; how to read the literature to change your practice: an evidence-‐based medicine approach. The Journal of bone and joint surgery. Aug 2006;88(8):1873-‐1879.

55. Bender R, Lange S. Adjusting for multiple testing-‐-‐when and how? Journal of clinical epidemiology. Apr 2001;54(4):343-‐349.

56. Sackett DL. Bias in analytic research. Journal of chronic diseases. 1979;32(1-‐2):51-‐63.

57. Greenland S, Robins JM, Pearl J. Confounding and collapsibility in causal inference. Stat Sci. Feb 1999;14(1):29-‐46.

58. Byar DP, Simon RM, Friedewald WT, et al. Randomized clinical trials. Perspectives on some recent ideas. The New England journal of medicine. Jul 8 1976;295(2):74-‐80.

59. Salas M, Hofman A, Stricker BH. Confounding by indication: an example of variation in the use of epidemiologic terminology. Am J Epidemiol. Jun 1 1999;149(11):981-‐983.

60. Greenland S, Pearl J, Robins JM. Causal diagrams for epidemiologic research. Epidemiology. Jan 1999;10(1):37-‐48.

61. Ciminiello M, Parvizi J, Sharkey PF, Eslampour A, Rothman RH. Total Hip Arthroplasty: Is Small Incision Better? The Journal of Arthroplasty. 2006;21(4):484-‐488.

62. Day NE, Byar DP, Green SB. Overadjustment in case-‐control studies. American journal of epidemiology. Nov 1980;112(5):696-‐706.

63. Saleh K, Olson M, Resig S, et al. Predictors of wound infection in hip and knee joint replacement: results from a 20 year surveillance program. J Orthop Res. May 2002;20(3):506-‐515.

64. Bhandari M, Tornetta P, 3rd, Sprague S, et al. Predictors of reoperation following operative management of fractures of the tibial shaft. Journal of orthopaedic trauma. May 2003;17(5):353-‐361.

65. Hosmer DW, Lemeshow S. Applied Logistic Regression. New York: John Wiley & Sons; 1989.

66. Kleinbaum DG, Kupper LL, Muller KE, Nizam A. Applied Regression Analysis and Other Multivariable Methods. 3rd ed. Pacific Grove: Doxbury Press; 1998.

67. Rosenbaum PR, Rubin DB. The Central Role of the Propensity Score in Observational Studies for Causal Effects. Biometrika. April, 1983 1983;70(1):41-‐55.

59

68. Sturmer T, Schneeweiss S, Rothman KJ, Avorn J, Glynn RJ. Performance of propensity score calibration-‐-‐a simulation study. American journal of epidemiology. May 15 2007;165(10):1110-‐1118.

69. McHenry TP, Mirza SK, Wang J, et al. Risk factors for respiratory failure following operative stabilization of thoracic and lumbar spine fractures. The Journal of bone and joint surgery. May 2006;88(5):997-‐1005.

70. Cook EF, Goldman L. Performance of tests of significance based on stratification by a multivariate confounder score or by a propensity score. Journal of clinical epidemiology. 1989;42(4):317-‐324.

71. Cepeda MS, Boston R, Farrar JT, Strom BL. Comparison of logistic regression versus propensity score when the number of events is low and there are multiple confounders. American journal of epidemiology. Aug 1 2003;158(3):280-‐287.

72. Rosenbaum PR, Rubin DB. Reducing Bias in Observational Studies Using Subclassification on the Propensity Score. Journal of the American Statistical Association. September 1984 1984;79(387):516-‐524.

73. Sturmer T, Joshi M, Glynn RJ, Avorn J, Rothman KJ, Schneeweiss S. A review of the application of propensity score methods yielded increasing use, advantages in specific settings, but not substantially different estimates compared with conventional multivariable methods. Journal of clinical epidemiology. May 2006;59(5):437-‐447.

74. Shah BR, Laupacis A, Hux JE, Austin PC. Propensity score methods gave similar results to traditional regression modeling in observational studies: a systematic review. Journal of clinical epidemiology. Jun 2005;58(6):550-‐559.

75. Weitzen S, Lapane KL, Toledano AY, Hume AL, Mor V. Principles for modeling propensity scores in medical research: a systematic literature review. Pharmacoepidemiology and drug safety. Dec 2004;13(12):841-‐853.

76. Rubin DB. Bias Reduction Using Mahalanobis-‐Metric Matching. Biometrics. 1980;36(2):293-‐298.

77. Diamond A, Sekhon J. Genetic Matching for Estimating Causal Effects: A General Multivariate Matching Method for Achieving Balance in Observational Studies. eScholarship. 2006. http://www.escholarship.org/uc/item/8gx4v5qt.

78. Newhouse JP, McClellan M. Econometrics in outcomes research: the use of instrumental variables. Annual review of public health. 1998;19:17-‐34.

79. McClellan M, McNeil BJ, Newhouse JP. Does more intensive treatment of acute myocardial infarction in the elderly reduce mortality? Analysis using instrumental variables. Jama. Sep 21 1994;272(11):859-‐866.

80. Greenland S. An introduction to instrumental variables for epidemiologists. International journal of epidemiology. Aug 2000;29(4):722-‐729.

81. McGuire KJ, Bernstein J, Polsky D, Silber JH. The 2004 Marshall Urist award: delays until surgery after hip fracture increases mortality. Clinical orthopaedics and related research. Nov 2004(428):294-‐301.

82. Harris KM, Remler DK. Who is the marginal patient? Understanding instrumental variables estimates of treatment effects. Health services research. Dec 1998;33(5 Pt 1):1337-‐1360.

83. van der Laan M, Hubbard AE, Jewell NP. Estimation of Treatment Effects in Randomized Trials with Noncompliance and a Dichotomous Outcome. U.C. Berkeley

60

Division of Biostatistics Working Paper Series. 2004(157). http://www.bepress.com/cgi/viewcontent.cgi?article=1157&context=ucbbiostat.

84. Morshed S, Tornetta P, 3rd, Bhandari M. Analysis of observational studies: a guide to understanding statistical methods. J Bone Joint Surg Am. May 2009;91 Suppl 3:50-‐60.

85. National Trauma Data Bank Reference Manual: Background, caveats, and resources. Chicago September 2005 (based on NTDB Version 5.0) 2005.

86. MacKenzie EJ, Steinwachs DM, Shankar B. Classifying trauma severity based on hospital discharge diagnoses. Validation of an ICD-‐9CM to AIS-‐85 conversion table. Med Care. Apr 1989;27(4):412-‐422.

87. Deyo RA, Cherkin DC, Ciol MA. Adapting a clinical comorbidity index for use with ICD-‐9-‐CM administrative databases. Journal of clinical epidemiology. Jun 1992;45(6):613-‐619.

88. Benjamini Y, Hochberg T. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society, Series B. 1995;85:289-‐300.

89. Blow O, Magliore L, Claridge JA, Butler K, Young JS. The golden hour and the silver day: detection and correction of occult hypoperfusion within 24 hours improves outcome from major trauma. J Trauma. Nov 1999;47(5):964-‐969.

90. Ferguson ND, Frutos-‐Vivar F, Esteban A, et al. Acute respiratory distress syndrome: underrecognition by clinicians and diagnostic accuracy of three clinical definitions. Crit Care Med. Oct 2005;33(10):2228-‐2234.

91. Daly LE, Bourke GJ. Interpretation and Uses of Medical Statistics. 5th ed: Blackwell Science; 2000.

92. Lee AH, Fung WK, Fu B. Analyzing hospital length of stay: mean or median regression? Med Care. May 2003;41(5):681-‐686.

93. Robins JM. Marginal structural models versus structural nested models as tools for causal inference. In: Halloran MEaB, D., ed. Statistical Models in Epidemiology: The Environment and Clinical Trials

. Vol 116: Springer Verlag; 1999:95-‐134. 94. Kooperberg C, Bose S, Stone CJ. Polychotomous Regression. Journal of the American

Statistical Association. 1997;92:117-‐127. 95. Hosmer DW, Hosmer T, Le Cessie S, Lemeshow S. A comparison of goodness-‐of-‐fit

tests for the logistic regression model. Stat Med. May 15 1997;16(9):965-‐980. 96. van der Laan MJ, Robins JM. Unified Methods for Censored Longitudinal Data and

Causality: Springer Verlag; 2003. 97. Efron B, Tibshirani RJ. An Introduction to the Bootstrap. New York: Chapman Hall /

CRC; 1993. 98. Pollard CS, van der Laan MJ. Choice of null distribution in resampling-‐based multiple

testing. Journal of Statistical Planning and Inference. 2004;125:85-‐101. 99. Ihaka R, Gentleman R. R: A language for data analysis and graphics. J Comput Graph

Stat. 1996;5:299-‐315. 100. R: A language and environment for statistical computing [computer program].

Vienna, Austria: R Foundation for Statistical Computing; 2003.

61

101. van der Laan MJ, Robins JM. Unified Methods for censored longitudinal data and causality. New York: Springer-‐Verlag; 2003.

102. Cole SR, Hernan MA. Constructing inverse probability weights for marginal structural models. American Journal of Epidemiology. Sep 15 2008;168(6):656-‐664.

103. Messer LC, Oakes JM, Mason S. Effects of Socioeconomic and Racial Residential Segregation on Preterm Birth: A Cautionary Tale of Structural Confounding. American Journal of Epidemiology. Mar 15 2010;171(6):664-‐673.

104. Mortimer KM, Neugebauer R, van der Laan M, Tager IB. An application of model-‐fitting procedures for marginal structural models. Am J Epidemiol. Aug 15 2005;162(4):382-‐388.

105. Wang Y, Peterson ML, Bangsberg D, van der Laan MJ. Diagnosing bias in the Inverse-‐Probability-‐of-‐Treatment-‐Weighted Estimator from Violations of Experimental Treatment Assignment. U.C. Berkeley Division of Biostatistics Working Paper Series. 2006(211).

106. Miettinen OS. Standardization of risk ratios. Am J Epidemiol. Dec 1972;96(6):383-‐388.

107. Sato T, Matsuyama Y. Marginal structural models as a tool for standardization. Epidemiology. Nov 2003;14(6):680-‐686.

108. Fakhry SM, Rutledge R, Dahners LE, Kessler D. Incidence, management, and outcome of femoral shaft fracture: a statewide population-‐based analysis of 2805 adult patients in a rural state. J Trauma. Aug 1994;37(2):255-‐260; discussion 260-‐251.

109. Pape HC, Regel G, Dwenger A, Sturm JA, Tscherne H. Influence of thoracic trauma and primary femoral intramedullary nailing on the incidence of ARDS in multiple trauma patients. Injury. 1993;24 Suppl 3:S82-‐103.

110. Heim D, Regazzoni P, Tsakiris DA, et al. Intramedullary nailing and pulmonary embolism: does unreamed nailing prevent embolization? An in vivo study in rabbits. J Trauma. Jun 1995;38(6):899-‐906.

111. Pell AC, Christie J, Keating JF, Sutherland GR. The detection of fat embolism by transoesophageal echocardiography during reamed intramedullary nailing. A study of 24 patients with femoral and tibial fractures. J Bone Joint Surg Br. Nov 1993;75(6):921-‐925.

112. Wolinsky PR, Sciadini MF, Parker RE, et al. Effects on pulmonary physiology of reamed femoral intramedullary nailing in an open-‐chest sheep model. J Orthop Trauma. 1996;10(2):75-‐80.

113. Schemitsch EH, Jain R, Turchin DC, et al. Pulmonary effects of fixation of a fracture with a plate compared with intramedullary nailing. A canine model of fat embolism and fracture fixation. J Bone Joint Surg Am. Jul 1997;79(7):984-‐996.

114. Pietropaoli JA, Rogers FB, Shackford SR, Wald SL, Schmoker JD, Zhuang J. The deleterious effects of intraoperative hypotension on outcome in patients with severe head injuries. J Trauma. Sep 1992;33(3):403-‐407.

115. Townsend RN, Lheureau T, Protech J, Riemer B, Simon D. Timing fracture repair in patients with severe brain injury (Glasgow Coma Scale score <9). J Trauma. Jun 1998;44(6):977-‐982; discussion 982-‐973.

116. McKee MD, Schemitsch EH, Vincent LO, Sullivan I, Yoo D. The effect of a femoral fracture on concomitant closed head injury in patients with multiple injuries. J Trauma. Jun 1997;42(6):1041-‐1045.

62

117. Starr AJ, Hunt JL, Chason DP, Reinert CM, Walker J. Treatment of femur fracture with associated head injury. J Orthop Trauma. Jan 1998;12(1):38-‐45.

118. Ziran BH, Le T, Zhou H, Fallon W, Wilber JH. The impact of the quantity of skeletal injury on mortality and pulmonary morbidity. J Trauma. Dec 1997;43(6):916-‐921.

119. White TO, Jenkins PJ, Smith RD, Cartlidge CW, Robinson CM. The epidemiology of posttraumatic adult respiratory distress syndrome. J Bone Joint Surg Am. Nov 2004;86-‐A(11):2366-‐2376.

120. Schulman AM, Claridge JA, Carr G, Diesen DL, Young JS. Predictors of patients who will develop prolonged occult hypoperfusion following blunt trauma. J Trauma. Oct 2004;57(4):795-‐800.

121. Brohi K, Singh J, Heron M, Coats T. Acute traumatic coagulopathy. J Trauma. Jun 2003;54(6):1127-‐1130.

122. Hofmann S, Huemer G, Kratochwill C, et al. [Pathophysiology of fat embolisms in orthopedics and traumatology]. Orthopade. Apr 1995;24(2):84-‐93.

123. Nast-‐Kolb D, Waydhas C, Gippner-‐Steppert C, et al. Indicators of the posttraumatic inflammatory response correlate with organ failure in patients with multiple injuries. J Trauma. Mar 1997;42(3):446-‐454; discussion 454-‐445.

124. Roumen RM, Redl H, Schlag G, et al. Inflammatory mediators in relation to the development of multiple organ failure in patients after severe blunt trauma. Crit Care Med. Mar 1995;23(3):474-‐480.

125. Pape HC, Hildebrand F, Pertschy S, et al. Changes in the management of femoral shaft fractures in polytrauma patients: from early total care to damage control orthopedic surgery. J Trauma. Sep 2002;53(3):452-‐461; discussion 461-‐452.

126. Harwood PJ, Giannoudis PV, van Griensven M, Krettek C, Pape HC. Alterations in the systemic inflammatory response after early total care and damage control procedures for femoral shaft fracture in severely injured patients. J Trauma. Mar 2005;58(3):446-‐452; discussion 452-‐444.

127. Ikossi DG, Knudson MM, Morabito DJ, et al. Continuous muscle tissue oxygenation in critically injured patients: a prospective observational study. J Trauma. Oct 2006;61(4):780-‐788; discussion 788-‐790.

128. Giannoudis PV, Hildebrand F, Pape HC. Inflammatory serum markers in patients with multiple trauma. Can they predict outcome? J Bone Joint Surg Br. Apr 2004;86(3):313-‐323.

129. Greenland S. Interpretation and estimation of summary ratios under heterogeneity. Stat Med. Jul-‐Sep 1982;1(3):217-‐227.

130. Moore KL, van der Laan MJ. "Application of Time-‐to-‐Event Methods in Assessment of Safety in Clinical Trials". U.C. Berkeley Division of Biostatistics Working Paper Series. 2009;Working Paper 248. http://www.bepress.com/ucbiostat/paper248.

Marginal structural models demonstrate treatment …...Marginal structural models demonstrate...

Documents

Transcript of Marginal structural models demonstrate treatment …...Marginal structural models demonstrate...