Download - Design and Analytic Methods for Time Varying Exposures in ...

Design and Analytic Methods for Time‐Varying Exposures in Perinatal Epidemiology

By Anthony Philip Nunes

MS, University of Massachusetts at Amherst, 2006

A Dissertation Submitted in Partial Fulfillment of the

Requirements for Degree of Doctor of Philosophy

in the Division of Biology and Medicine

at Brown University

Providence, Rhode Island

May 2011

III

This dissertation by Anthony Philip Nunes is accepted in its present form

by the Division of Biology and Medicine as satisfying the

dissertation requirement for the Degree of Doctor of Philosophy.

Date________________ _________________________________

Elizabeth W. Triche, PhD, Advisor

Recommended to the Graduate Council

Date________________ _________________________________

E. Andres Houseman, ScD (Reader)

Date________________ _________________________________

Maureen G. Phipps, MD, MPH (Reader)

Date________________ _________________________________

Gregory A. Wellenius, ScD (Reader)

Approved by the Graduate Council

Date________________ _________________________________

Peter Weber, Dean of the Graduate School

IV

Curriculum Vitae

Anthony Nunes was born in Fall River, MA in September of 1980. He attended the

University of Massachusetts at Amherst where he received a Bachelors of Science

degree in Environmental Science with a concentration in Toxicology and Chemistry.

Anthony then received his Master of Science degree in Epidemiology from the School of

Public Health and Health Sciences at the University of Massachusetts at Amherst.

Anthony’s Master’s thesis assessed the association between stress and motor vehicle

injuries among active duty US Army personnel. Anthony then worked as a researcher

for the US Army Research Institute for Environmental Medicine, assisting in data

analysis within the substantive area of injury prevention epidemiology, and as an

associate epidemiologist for Environ International, assisting with grant writing, data

collection, data analysis, and drafting of manuscripts and expert reports.

He entered the Doctoral Program in Epidemiology in the Department of Community

Health at Brown University in September of 2007 to pursue research within the

substantive area of perinatal and reproductive epidemiology. He received funding

through a National Institute on Aging Training Grant and through a research

assistantship in the Division of Research at Women & Infants Hospital in Providence, RI.

In addition, Anthony worked as a statistical and methodological consultant for the

Community Health Clerkship rotation in the Warren Alpert Medical School. During his

graduate training, Anthony was honored by receiving an invitation to attend the NICHD

V

Summer Institute in Reproductive and Perinatal Epidemiology. He has contributed to

publications within the substantive areas of adolescent pregnancy, pharmaco‐

epidemiology, and injury/environmental epidemiology. Anthony’s research has been

presented at conferences for the International Society for Pharmacoepidemiology, the

American College of Obstetricians and Gynecologists, and the Society for Epidemiologic

Research.

VI

Acknowledgments

Throughout my graduate training at Brown, I have been blessed to have mentors who

have been as caring and kind as they have been knowledgeable. I would like to thank

Dr. Beth Triche, my advisor and mentor, for her commitment, guidance, and support

throughout this investigation. To Dr. Maureen Phipps, Dr. E. Andres Houseman, and Dr.

Gregory Wellenius, I would like to express my appreciation for each of their unique

perspectives and insights they provided to help shape the methods and clinical

relevance of my research. I am grateful to my academic advisors and research mentors

Dr. Stephen Buka, Dr. Melissa Clark, Dr. Kate Lapane, Dr. Martin Weinstock, Dr. Joseph

Hogan, and Dr. Vincent Mor; each of whom have helped to develop the questions and

analytic approaches addressed in this dissertation. I would like to acknowledge Dr.

Enrique Schisterman, who served as an external reader; and Dr. Michael Bracken, Dr.

Theodore Holford, Dr. Kathleen Belanger, and Dr. Brian Leaderer from the Yale Center

for Perinatal, Pediatric, and Environmental Epidemiology for granting access to the data

utilized in this research. Lastly, none of this would have been possible without the

support of my family, friends, and colleagues. I would like to specifically thank my

parents, Anthony and Gail Nunes; my wife, Heather Nunes; and my children, Anthony

and Daniel Nunes; for the support and encouragement they have provided and the

sacrifices they have made to allow me to pursue higher education.

VII

Table of Contents

Signature Page ................................................................................................................... III Curriculum Vitae ................................................................................................................ IV Acknowledgments ............................................................................................................. VI List or Tables .................................................................................................................... VIII List of Figures ..................................................................................................................... IX Introduction ........................................................................................................................ 1 Overview ................................................................................................................. 1

Analytic Solution ................................................................................................................. 3

Design Solution ................................................................................................................... 4

Specific Aims ........................................................................................................... 5

Chapter 1: TIME‐DEPENDENT BIAS OF AVERAGE AND JOINTLY MODELED EXPOSURES: EXAMPLES IN PERINATAL EPIDEMIOLOGY .................................................................................... 6

Abstract ................................................................................................................... 7

Introduction ........................................................................................................................ 8

Simulation Methods .......................................................................................................... 14

Simulation Results ............................................................................................................. 17

Bias Correction Methods .................................................................................................. 18

Bias Correction Illustration ............................................................................................... 19

Discussion.............................................................................................................. 22

Chapter 2: EVALUATING MISSING DATA DESIGNS IN THE PRESENCE OF NON‐DESIGNED MISSING DATA: APPLICATIONS IN PERINATAL EPIDEMIOLOGY ................................... 34 Abstract ................................................................................................................. 35

Introduction ...................................................................................................................... 37

Methods ............................................................................................................................ 40

Results ............................................................................................................................... 48

Discussion.............................................................................................................. 50

Chapter 3: TIME DEPENDENT ASSOCIATIONS BETWEEN MATERNAL CAFFEINE CONSUMPTION AND FETAL GROWTH ............................................................................ 63 Abstract ................................................................................................................. 64

Introduction ...................................................................................................................... 65

Methods ............................................................................................................................ 67

Results ............................................................................................................................... 74

Discussion.............................................................................................................. 77

General Discussion ........................................................................................................... 89 References ........................................................................................................................ 95

VIII

List of Tables

Table 1.1, Average Effect Estimates for 1000 Simulations of 10,000 Pregnancies Using Time‐Invariant and Time‐Varying Methods ..................................................................... 28

Table 1.2, Average Effect Estimates for 1000 Simulations of 10,000 Pregnancies Using Time‐Invariant and Time‐Varying Methods .................................................. 29

Table 1.3, Simulation of Average Exposure and Preterm Birth: Average Effect Estimates for 1000 Simulations of 10,000 Pregnancies Using Time‐Invariant and Time‐Varying Methods ........................................................................................... 30

Table 1.4, Association Between Prenatal Care Initiation and Preterm Birth and Low Birth Weight, 2006 US Natality Data ...................................................................... 31

Table 2.1, Sample Size and Cost Parameters Used for Data Simulations ......................... 56

Table 2.2, Characteristics of Study Participants Within Protocols of the Nutrition in Pregnancy Study Prior to and After Weighting ............................................. 57

Table 2.3, Bias and Relative Efficiency of Missing Data Designs (MDD) in the Presence of Non‐Designed Missing Data Relative to Complete Ascertainment Designs (CAD), 1000 Data Simulations ....................................................................... 58

Table 2.4, Cost‐Fixed Sample Size and Compliance Within Study Protocols Among Participants in the Nutrition in Pregnancy Study .......................................... 59

Table 2.5, Association Between Measures of Smoking and Small for Gestational Age by Week of Pregnancy and Study Design Among Participants in the Nutrition in Pregnancy Study ............................................................................................ 60

Table 3.1, Distribution of Baseline Characteristics by Levels of First Trimester Caffeine Consumption, Health and Nutrition in Pregnancy Study, 1996‐2001 .......... 83

Table 3.2, Association Between Caffeine Consumption and Intrauterine Growth Retardation Among Full Term Live Births, Health and Nutrition in Pregnancy Study, 1996‐2001 .......................................................................................... 85

Table 3.3, Associations Between Joint Effects of Self Reported Caffeine Intake and Potential Effect Modifiers and Intrauterine Growth Retardation Among Full Term Live Births, Health and Nutrition in Pregnancy Study, 1996‐2001 ...... 86

Table 3.4, Association Between Caffeine Consumption and Birth Weight Among Full Term Live Births, Health and Nutrition in Pregnancy Study, 1996‐2001 ...... 87

IX

List of Figures

Figure 1.1, Calculation of the Incidence Rate Ratio from a 2X2 Table ............................ 32

Figure 1.2, Calculation of the Incidence Rate Ratio of Jointly Modeled Exposures ......... 32

Figure 1.3, Dose‐Response Patterns for (a) Constant Probability of Exposure Initiation, (b) Declining Probability of Exposure Initiation, and (c) Increasing Probability of Exposure Initiation .................................................................................... 33

Figure 2.1, Candidate Missing Data Designs ..................................................................... 61

Figure 2.2, Distribution of the Predicted Probability of Being Assigned to the Intensive Protocol Prior to Weighting (a) and After Weighting (b) ............................. 62

Figure 3.1, Directed Acyclic Graph for Confounders of the Association Between Caffeine and Fetal Growth ........................................................................................... 88

1

INTRODUCTION

In epidemiological investigations of time varying exposures, repeat assessments of

exposures are necessary to accurately characterize exposed person‐time and to quantify

time‐dependent effects. Though time‐dependent effects are not unique to perinatal

epidemiology, the sensitivity of exposure effects are magnified more so than any other

time during human development due to the physiologic changes experienced by the

mother and fetus. The profound changes in maternal metabolic, hematologic,

cardiovascular, and respiratory physiology are rivaled only by that experience in the

embryonic and fetal periods of development.[1] As a consequence of the maternal and

fetal physiological changes, associations between perinatal exposures and adverse

pregnancy outcomes are timing sensitive.[2, 3] That is, the same exposure may result in

different outcomes depending on the gestational age at which the exposure occurred.

This has been well documented for fetal/neonatal outcomes such as spontaneous

abortion, birth defects, low birth weight, growth restriction, and fetal programming [2,

3]. In mothers, outcomes such as preeclampsia, eclampsia, maternal hemorrhage, and

maternal mortality are sensitive to exposure timing[4]. Without assessment and

evaluation of timing specific exposures, epidemiological investigations can not

sufficiently describe the exposure/disease association nor will is validly estimate the

underlying causal effect.

2

When exposure is time‐varying while individuals are at risk of experiencing the outcome

of interest, time‐fixed analytic methods will produce biased measures of association.

Despite sufficient literature documenting bias resulting from ignoring exposure timing,

investigators may rely on time‐fixed analytic methods due to analytic simplicity or data

limitations. Recent examples of time‐fixed analyses of time‐varying exposures in

perinatal epidemiology include maternal weight change [5‐8], maternal infections [9,

10], illicit drug use [11, 12], medication use [13‐15], environmental exposures[16],

smoking cessation[17], and prenatal care utilization [18].

The feasibility of designs incorporating repeated measures in pregnancy is limited due to

cost and excessive subject burden within a short duration of time. Minimizing subject

burden in perinatal investigations is particularly important due to mothers’ increased

resistance to participate in invasive and non‐invasive study methodologies [19, 20]. As a

consequence, there is a need for design approaches to increase the feasibility of

collecting repeat exposure measures and analytic approaches to validly estimate

measures of association where exposure timing cannot be feasibly collected.

3

Analytic Solutions Time‐fixed analyses of time‐varying exposures can lead to biased estimates between the

exposure and outcome termed time‐dependent bias, immortal time bias, and survivor

treatment selection bias [21‐23]. Several studies have examined the impact of this bias

in the context of an exposure treated as a binary indicator of “ever exposed” during

follow‐up [22‐25]. Where exposure timing is available, time‐dependent bias is

eliminated through the use of time‐varying analytic methods. Time‐dependent bias is

fairly common in published cohort studies; however, it is often preventable and rarely

discussed as a cause for concern [21, 22].

The presence and direction of time‐dependent bias for “ever exposed” metrics have

been addressed in the existing literature. We add to this base of methodological

literature by expanding the mathematical proofs, further describing the pattern and

magnitude of time‐dependent bias, and presenting a solution to preventing the bias.

We address the problem where exposure timing cannot be feasibly ascertained through

a missing data perspective. We consider scenarios in which it is known that some

exposure has occurred but where we are limited to some final measure of cumulative

exposure. From a missing data perspective, we then assume a functional form of the

exposure over time using observed data within our sample and prior information from

the existing literature. Once the functional form of exposure/timing is specified, we

multiply impute exposure timing and obtain corrected measures of association using

time‐varying analyses.

4

Design Solution Missing data designs, including partial questionnaire designs [26‐29], multi‐cohort

longitudinal designs [30, 31], and multi‐measurement methods of construct assessment

[28, 30], deliberately omit collection of some data elements with the study sample. In

doing so, they require less intensive follow‐up protocols than a comparable complete

ascertainment study design.

In idealized simulation scenarios, missing data designs have been shown to improve

statistical efficiency without sacrificing validity. Though theoretically appropriate,

missing data designs are rarely used in perinatal investigations, in part due to a general

mistrust in missing data methodologies [32]. Prior methodological publications have not

assessed performance of missing data designs implemented in the context of time

varying exposures nor have missing data designs been evaluated in scenarios with non‐

designed missing data due to non‐compliance and loss to follow‐up. We expand upon

the existing methodological literature by introducing concepts of missing data designs

for time varying exposures and by assessing the performance of missing data designs in

non‐idealized scenarios encountered in observational epidemiology by exploring the

impact of non‐designed missingness on the statistical efficiency and validity.

5

Specific Aims

This series of papers aims to demonstrate the need to obtain and analyze repeat

exposure measures of time‐varying exposures, introduce a bias correction method

where exposure timing cannot be ascertained, introduce design solutions to increase

feasibility repeat exposure measures, and to implement identified methods to quantify

the association between timing specific caffeine consumption and fetal growth. To

demonstrate the need to obtain an analyzing time varying exposures, we quantify the

magnitude, direction, and patter of bias introduced in scenarios where time‐varying

exposures are treated as time‐fixed. We propose and evaluate missing data designs as a

valid method for assessing time‐varying exposures while minimizing cost and subject

burden. Lastly, we implement designed missing data methods to quantify the

association between maternal caffeine consumption and fetal growth.

6

Chapter 1: TIME‐DEPENDENT BIAS OF AVERAGE AND JOINTLY MODELED EXPOSURES:

EXAMPLES IN PERINATAL EPIDEMIOLOGY

7

Abstract:

Epidemiologic studies frequently treat time‐varying exposures as if they were time‐

fixed, either for analytic simplicity or because detailed data on exposure timing was not

collected. For binary exposures this approach can lead to health effect estimates that

are on average biased in the negative direction. We performed simulation studies to

evaluate the magnitude of this potential bias in the setting of non‐binary exposures

using examples from perinatal epidemiology. Specifically, we simulated effects of

trimester‐specific and average exposures on time to event, and compared the results

from time‐fixed logistic and survival analyses with those from a time‐varying survival

analysis. Time‐fixed analyses were biased downward for all exposure metrics

considered. Moreover, when using average exposure metrics, we observed an artificial

non‐linear dose response function. We propose and illustrate a method based on

multiple‐imputation of timing‐specific exposure that can be used to avoid this bias when

data on exposure timing are unavailable. In conclusion, treating time‐varying exposures

as time‐invariant can bias health effect estimates and yield incorrect dose‐response

functions. Where timing‐specific data are not available, multiple imputation of

exposure timing may be a useful tool in obtaining unbiased effect estimates or

performing sensitivity analyses. This method may be applied to other epidemiologic

substantive areas.

8

Though recent publications have highlighted the appropriateness of time varying

methods [33‐35], observational studies in perinatal and reproductive epidemiology

often treat time‐varying exposures as if they were time‐invariant or time‐fixed.

Examples of exposures that are time‐varying but have been treated as time‐fixed

include maternal weight change [5‐8], maternal infections [9, 10], illicit drug use [11,

12], medication use [13‐15], environmental exposures[16], smoking cessation[17], and

prenatal care utilization [18]. Investigators may choose to treat time‐varying exposures

as time‐fixed due to lack of adequate data on exposure timing or as a means to

minimize analytic complexity. However, this approach can lead to biased estimates

between the exposure and outcome termed time‐dependent bias, immortal time bias,

and survivor treatment selection bias [21‐23]. Distinct from misclassification or collider

stratification, time‐dependent bias occurs when individuals are eligible to become

exposed while at risk of experiencing the outcome. Several studies have examined the

impact of this bias in the context of an exposure treated as a binary indicator of “ever

exposed” during follow‐up [22‐25]. In perinatal epidemiology, this would be akin to

creating a metric of ever exposed within pregnancy or within a more specific relevant

etiologic period of interest (e.g. trimester specific exposures). O’Neal et al. describe a

scenario in which time‐fixed analyses quantifying the association between any urinary

tract infection and preterm birth have produced misleading results biased in the

negative direction[36]. Others have confirmed that time‐dependent bias of a binary

exposure is expected to bias measures of association in a negative direction [23, 24]

such that protective associations appear more protective, null associations appear

9

protective, and causal associations appear weaker or possibly protective relative to the

unbiased association. Where exposure timing is available, time‐dependent bias is

eliminated through the use of time‐varying analytic methods. Time‐dependent bias is

fairly common in published cohort studies; however, it is often preventable and rarely

discussed as a cause for concern [21, 22].

The presence and direction of time‐dependent bias for “ever exposed” metrics have

been addressed in the existing literature; however, the magnitude of such bias in

scenarios relevant to perinatal research has not been addressed. Additionally, prior

publications have not fully evaluated the impact of the bias when using alternative time‐

fixed metrics such as average exposure or joint modeling of multiple binary indicators

(e.g. trimester specific binary indicators). Though prior studies have demonstrated that

time varying methods prevent time‐dependent bias, analytic solutions where exposure

timing was not collected have not been proposed. In this study, we quantify the

magnitude, direction, and pattern of time‐dependent bias associated with several time‐

fixed exposure metrics commonly implemented in perinatal epidemiology. In addition,

we demonstrate the validity of using time‐varying analytic approaches and propose a

method for bias correction based on time varying analysis of multiply imputed exposure

timing where exposure timing is unknown.

10

Time‐Dependent Bias:

Suissa (2008) presents a simple mathematical proof of the biased incidence rate ratio

for binary, single transition exposures (i.e. individuals can only transition from

unexposed to exposed) and binary non‐recurring outcomes. In our handling of time‐

dependent bias, we extend this mathematical proof to include transient exposures (i.e.

individuals may transition from unexposed to exposed and from exposed to unexposed).

We address time dependent bias of odds ratios in the appendix.

In a 2x2 time‐fixed analysis of epidemiological data (Figure 1.1), “a” represents the

number of exposed who experience the outcome, “c” represents the number of

unexposed who experience the outcome, “T+” is the total person‐time among those

with any exposure, and “T‐” is the total person‐time among those with no exposure.

Assuming a constant hazard over time, the incidence rate ratio is estimated by:

(1)

Where: k is the ratio of follow‐up time between the unexposed and exposed (T‐/T+)

For a time‐varying exposure, this time‐fixed analysis incorrectly assumes that exposed

persons are exposed for their entire time at risk for experiencing the event. When

discussing time‐dependent bias, Suissa (2008) emphasized the mischaracterization of

11

exposure prior to exposure initiation. Here we define “p” as the average proportion of

time unexposed among those classified as “ever exposed”. Specifically, this is the

average ratio of time preceding exposure initiation to time in follow‐up among the

exposed. Knowing the value of “p”, Suissa demonstrated the corrected rate ratio to be

estimated as:

(2)

The rate ratio may be further biased if the exposure is transient (i.e. individuals may

transition from an exposed to unexposed state). Exposures such as smoking cessation

and initiation of prenatal care are examples of single transition exposures and would not

be susceptible to this aspect of the bias. Other exposures, such as caffeine,

acetaminophen, and maternal illness, may be expected to have a limited duration of

effect, and may occur multiple times during the time period of interest. Where subjects

may transition from exposed to unexposed during follow‐up, time‐fixed analytic

approaches would inappropriately attribute some events and person‐time to an

exposed state. We define “q” as the ratio of unexposed follow‐up time following

exposure initiation to follow‐up time after exposure initiation among the exposed. The

unbiased rate ratio can be estimated by:

(3)

12

Given the equations for the biased and unbiased rate ratios, we can quantify the nature

of the bias to determine factors affecting the magnitude and direction of the bias.

(4)

Therefore, the magnitude and direction of the bias are dependent on the proportion of

time unexposed among the exposed (p and q), the ratio of follow‐up times between the

exposed and unexposed (k), and the incidence of the outcome in the exposed and

unexposed (c and a). From the bias equation, it can be shown that no bias is present

when p and q equal zero. If either p or q are non‐zero, a bias will be present. Any non‐

zero value of p will lead to a bias in the negative direction.

Time‐fixed metrics are not limited to single binary indicators. One common metric in

perinatal epidemiology is to use trimester specific exposures by creating binary

indicators of exposure within each trimester. If indicators for each trimester are

simultaneously included in a regression model, then time‐dependent bias may impact

associations observed in each trimester. In a time‐fixed analysis, we may produce a 4x2

table from which to calculate the trimester specific measures of association (Figure 1.2).

For example, the formula for a time‐fixed rate ratio for first trimester exposure is given

as follows:

(5)

13

For preterm birth, individuals are at risk of experiencing the exposure and outcome

during the 2nd and 3rd trimesters. Consequently, some proportion of follow‐up time

among those with 2nd or 3rd trimester exposures may be unexposed. Thus, the follow‐

up time among the unexposed is underestimated in the time‐fixed analysis. The

unbiased rate ratio for first trimester exposures can be expressed as:

(6)

where p2 and p3 represent the proportion of time unexposed among those exposed in

the 2nd and 3rd trimesters respectively. Comparing the unbiased to the biased effect

estimate, we see that this scenario would result in negative bias even though first

trimester exposures were not time‐varying during the period of time at risk of preterm

birth.

Average exposure and cumulative exposure metrics also are also susceptible to time‐

dependent bias. Suissa (2008) addresses the problem of immortal time prior to

initiation of exposure resulting in a bias in the negative direction [21]. Additionally,

time‐fixed average exposures may produce artificially non‐linear associations due to

variable amounts of information used to calculate the averages. An average exposure

for an individual is calculated by summing the observed exposure at all time points and

dividing by the time in follow‐up. The sample size of the number of exposure

assessments obtained for an individual is dependent on the number of days in follow‐

14

up. Shorter follow‐up times will be more susceptible to extreme values and will produce

an exposure distribution with fatter tails than the distribution of exposure for those with

longer follow‐up times. Consequently, those with shorter durations of follow‐up will

have a greater probability of being classified in the lower or upper tails of the exposure

distribution. This bias will result in an artificial “U” or “J” shaped dose‐response

function.

We have demonstrated how the incidence rate ratio will be biased in the presence of

time‐dependent bias. In observational studies, investigators often estimate hazard

ratios from Cox Proportional Hazards models rather than incidence rate ratios[37].

Though there are some limitations in relying on hazard ratios, they can generally be

interpreted as incidence rate ratios[38] and are susceptible to time‐dependent bias[24].

Simulation Methods

We assessed the bias under several scenarios relevant to perinatal epidemiology using

simulated data representing 10000 pregnancies. As identified in the previous section,

the magnitude of the bias depends on the proportion of time unexposed among the

exposed (p and q), the probability of the outcome conditional on the exposure and the

average follow‐up times for those with and without the outcome (k), and the and the

incidence of the outcome in the exposed and unexposed (a and c). To capture each of

these factors, we provided the following parameters: weekly probability of exposure

15

initiation, continuity of exposure, hazard function for birth, and the magnitude of

association between exposure and outcome.

The proportion of time unexposed among the exposed is a function of timing of

initiation, continuity of exposure, and duration of follow‐up. We considered weekly

probabilities of initiation [P(Ei+1=1|Ei=0)] ranging from 0.01 to 0.1 under three scenarios;

constant, increasing, and decreasing probability of exposure over pregnancy. Continuity

of exposure was specified by providing the probability of being exposed at time ti+1

given exposure was present at ti. We considered exposures with a high degree of

continuity [P(Ei+1=1|Ei=1) =1], moderate continuity [P(Ei+1=1|Ei=1) =0.9], and low

continuity [P(Ei+1=1|Ei=1) =0.5]. The exposure scenarios utilized in our data simulations

are summarized in Table 1.1.

For the purpose of this simulation, we defined the outcome as preterm birth. To

simulate preterm birth, an estimation of the hazard function[39] of birth at each week

of gestation was identified using the 2006 US Natality data[40]. The 2006 US Natality

data includes all registered births in the 50 states, District of Columbia, and New York

City. The hazard of birth was estimated at each week of gestation up to the 37th week

(i=1 to 37). The hazard at the midpoint of each week was calculated as

2⁄⁄ where di is the number of births during week i and ni is the number at

risk at the beginning of week i.

16

We specified the probability of being born at a specific GA to be dependent on the

identified hazard function, the timing specific exposure state (E|T), the timing specific

prevalence of exposure, and the specified magnitude of association (RR) such that the

baseline hazard function of our simulated data was representative of the hazard

function from observational data. The magnitude of the association was not dependant

on the timing of exposure. Time‐fixed and time‐varying exposure metrics were

created. Time‐fixed exposure metrics include “ever exposed” during pregnancy, “ever

exposed” within trimesters and average exposure during pregnancy. Average exposure

was calculated as the number of weeks exposed divided by the weeks in follow‐up.

Time‐varying exposure metrics included timing specific binary indicators of exposure

(during pregnancy and within trimesters) and average exposure. Average exposure was

calculated as the number of weeks exposed prior to time T divided by the duration of

follow‐up at time T.

Simulated data were analyzed using logistic regression, time‐fixed Cox Proportional

Hazards Models and time‐varying Cox Proportional Hazards Models[41]. For each

scenario and analytic method, we report the average effect estimate from 1000 data

simulations. For analyses of average exposures, we assessed whether there was a

departure from linearity by including higher order terms (up to the 10th power). We

obtained the average AIC among the 1000 simulations for each of the higher order

models (2nd order to 10th order). For each scenario, the model with the lowest average

AIC was considered as our final model. Where the final model included higher order

17

terms, we concluded that the dose response relationship was artificially non‐linear. We

plotted the resulting dose response functions to qualitatively describe the pattern of the

observed bias.

Simulation Results

Simulated analyses of a single transition binary exposure [P(Ei=1|Ei‐1=1)=1] were

consistent with the expected direction of time‐dependent bias (Table 1.2). That is, the

bias tended to be in the negative direction when using logistic regression or the time‐

fixed hazards model. We did not observe a notable bias for the single transition

exposure where the probability of being exposed declined over the course of pregnancy.

The lack of bias can be explained by the relatively low probability of initiating the

exposure while at risk for experiencing the outcome. For transient exposures [P(Ei=1|Ei‐

1=1)=0.5 or 0.9], the direction and magnitude of the observed bias differed between

scenarios. When including binary indicators of exposure for each trimester, the

resulting bias was largest for the third trimester; however, bias was observed in the first

and second trimesters as well. For each of the assessed scenarios, the bias was in the

negative direction.

When modeling the association between average exposure and preterm birth assuming

a linear dose response relationship, time‐fixed analyses generally produced biased

effect estimates (Table 1.3). For exposures that are more common in later pregnancy,

the bias was in the negative direction for each of the assessed scenarios. For constant

18

and decreasing exposures, the direction of the bias was dependent on the magnitude of

the association and the consistency of the exposure. The dose‐response relationship

identified with logistic regression and the time‐fixed Hazards models were artificially

non‐linear in all of the assessed scenarios. The pattern of the bias was dependent on

the probability of exposure initiation, distribution of exposure timing, association

between the exposure and the outcome (Figure 1.3). In general, the effect estimates

were underestimated for exposures approaching 0 and overestimated for exposures

approaching 1.

Bias Correction

The above simulations demonstrate that time‐dependent bias is of concern in perinatal

epidemiology. The easiest solution is to utilize time‐varying methods when analyzing

time‐varying exposures. However, timing‐specific data is often unavailable in existing

data sources or may not be feasible to collect in ongoing studies. If data on exposure

timing is not available, we propose a bias correction method based on multiple

imputation of exposure timing. If there is sufficient prior knowledge, we can specify the

functional form representing the timing specific probability of exposure and duration of

exposure (i.e. initiation, continuity, pattern). Where there is limited prior knowledge,

we may perform sensitivity analyses by specifying plausible functional forms of timing‐

specific exposure probabilities and duration of exposure. Once specified, we propose

multiply imputing exposure timing then analyzing the data using standard time‐varying

19

approaches to produce corrected effect estimates or a range of plausible effect

estimates.

Illustration

As an illustration of the bias and the bias correction method, we quantify the association

between initiation of prenatal care and preterm birth and low birth weight using 2006

US Natality data[42]. Prenatal care has been presumed to help prevent preterm birth

and LBW; however, attempts to quantify this association have produced equivocal

results [43]. Previous studies looking at the association between timing of prenatal care

initiation and LBW have not confirmed the hypothesis that early prenatal care is more

beneficial than delayed prenatal care [18, 44]. Contrary to expectation, results from

prior studies indicate that delayed prenatal care is more protective than early prenatal

care [18]. The explanation for this unexpected finding has been residual confounding;

however, time‐dependent bias may have contributed to the reported effect estimates.

An article published in 1962 attributes the findings to mothers who delay initiation of

care until the third trimester having lower risks because they are closer to reaching full

term [45]. Though this essentially describes time‐dependent bias, it was interpreted

and addressed as confounding.

In this illustration, we quantify the association between prenatal care initiation and

preterm birth. In the time‐fixed models, prenatal care was defined as early (initiation

within the first trimester), delayed (initiation after the first trimester but before the 37th

week of gestation), or no prenatal care. The time‐fixed metrics were analyzed using

20

logistic regression and Cox Proportional Hazards Models. In addition, we also report

effect estimates obtained from a logistic regression adjusted for gestational age at birth

for low birth weight. This is consistent with the conclusion that gestational age at birth

is a confounder. It was not possible to adjust for gestational age in the model for

preterm delivery since preterm delivery is defined by gestational age. We created time‐

varying indicators of prenatal care initiation and analyzed using the extended Cox

Proportional Hazards Model. We report effect estimates using multiple imputation of

timing of prenatal care initiation and observed timing of prenatal care initiation. For the

imputed timing, we assumed we only knew whether prenatal care was early or delayed.

Where it was early, we sampled from a uniform distribution of initiation times ranging

from week 4 to week 12. Where it was delayed, we sampled from a uniform

distribution of initiation times ranging from week 13 to week 37 or gestational age at

delivery. The imputation step was repeated five times. Point estimates and confidence

intervals were quantified using SAS Proc MIAnalyze to reflect the uncertainty associated

with the imputation process [46].

The findings from our time‐fixed analysis are consistent with reported effect estimates

from previous studies[47]. That is, those who received prenatal care had approximately

1/3rd the risk of preterm birth and low birth weight (Table 1.4). After adjusting for

gestational age at delivery, the magnitude of negative association observed for low birth

weight was reduced by approximately 60%. The results from the time‐varying models

21

were dramatically different. Early prenatal care remained significantly protective while

delayed prenatal care approached null. Effect estimates from the imputed time and

observed time models were comparable suggesting that the imputed time‐varying

method effectively addressed the time‐dependent bias in this example. It is important

to note that adjusting for gestational age at delivery did not sufficiently address the

time‐dependent bias and is fundamentally different than treating gestational age as a

time axis in a time‐varying analysis.

22

Discussion:

In perinatal epidemiology, time‐dependent bias has the potential to substantially impact

the validity of analyses when exposure timing is ignored. We have demonstrated that

this bias extends beyond single binary exposure metrics. Of particular importance,

analyses of trimester‐specific exposures and average exposure are susceptible to time‐

dependent bias. Recognizing that timing specific data may not be available in some

situations, we have demonstrated the utility of simulated exposure event times to

obtain unbiased effect estimates. If we are confident in our knowledge of the functional

form of exposure event times, these corrected effect estimates can be viewed as

unbiased estimates. Where little is known about the functional form of exposure event

times, multiple assumptions can be tested to perform a sensitivity analysis.

In our simulations of average exposure, we demonstrated the bias where the

distribution of point‐in‐time exposure was binary (e.g. yes/no indicators of medication

use, maternal illness). The bias as described and simulated is also relevant to point‐in‐

time exposures that are continuous (e.g. concentrations of pollutants, blood pressure)

where exposure is repeatedly assessed throughout follow‐up. For continuous

exposures, individuals with shorter follow‐up times will have fewer measurements

contributing to their estimate of average exposure. As a result of the limited

information used to quantify average exposure among those with shorter follow‐up

times, the distribution of average exposure will be flattened with wider tails.

Consequently, those with the outcome will be more likely to have exposures in the

23

extremes. Similar to our findings, this would lead to an underestimation of effect

estimates at low exposures and an overestimation at higher exposures. Where the

time‐fixed average exposure is known but data for exposures at specific time points is

not available (e.g. passive monitoring of pollutants, maternal weight gain), our

imputation method would appropriately address time‐dependent bias.

For many perinatal outcomes, event times are known or can be assessed (e.g. preterm

birth, spontaneous abortion, clinical preeclampsia); however, other outcomes have

unknown event times and are only diagnosed at birth (e.g. malformations, growth

restriction). Though time‐varying methods are more appropriate than time‐fixed

methods, the validity of these models is dependent on our knowledge of when the

event actually occurred. Where the gestational age of outcome occurrence is unknown,

the bias is impacted by the ratio between the average gestational age at which the

outcome occurred and gestational age at birth. Because the timing of the event can

occur no later than the timing of birth, the ratio is always between 0 and 1. As the ratio

approaches 0, the resulting bias is in the negative direction. Though potentially still

biased, use of time‐varying methods with gestational age at birth as the event time will

more closely approximate the true association as compared to a time‐fixed approach.

Though this paper focused on immediately detectable adverse outcomes such as

preterm birth and low birth weight, the bias discussed in this paper is relevant to studies

of perinatal etiology of later life outcomes. Studies in the area of fetal programming are

also susceptible if exposure is classified as time‐fixed during pregnancy. Outcomes

24

should be viewed as occurring during pregnancy but not diagnosed until later in life due

to long latency. For example, when assessing the association between prenatal

exposures and asthma in the offspring, it is useful to identify the actual direct effect of

the exposure (e.g. impaired lung development). When viewed from this perspective,

individuals are at risk for impaired lung development due to a prenatal exposure from

week 18 through delivery. The period of time from birth to diagnosis of asthma can be

viewed as a latency period. Thus the time axis from week 18 to birth is critical for the

exposure effect on the outcome while the time from birth to asthma diagnosis

contributes to the sensitivity and specificity of the outcome assessment. For this

reason, the appropriate time axes in an epidemiological analysis should be from week

18 to delivery while age at diagnosis should be considered as a potential confounder.

We have demonstrated that time‐dependent bias is a concern in perinatal

epidemiology. The easiest solution is to utilize time‐varying methods where timing

specific data are available. Where timing specific data is not available, we

demonstrated that imputation of exposure timing may be a useful tool in obtaining

unbiased effect estimates or performing sensitivity analyses. Ignoring the time‐varying

nature of an exposure is not a viable option unless the exposure is effectively invariant

while subjects are at risk for experiencing the event. Adjusting for gestational age at

delivery is not a suitable alternative to a time‐varying analysis. For time‐varying

exposures, an increased emphasis should be given to obtaining exposure assessments at

multiple time points. In addition to preventing time‐dependent bias, assessing exposure

25

at multiple time points enables our ability to detect timing‐specific effects. Whether in

a prospective or retrospective setting, the prospect of obtaining multiple exposure

assessments in a pregnant population may be difficult from the perspective of study

cost and subject burden; however, the potential impact of time‐dependent bias

warrants careful consideration. Perinatal epidemiologists should consider novel

methods such as imputation of exposure timing or more efficient study designs to

feasibly collect time‐varying exposure data. Future research should aim to further test

and develop methods for exposure timing imputation and methods for increasing the

efficiency of exposure assessment.

26

Appendix Time Dependent Bias of Odds Ratio:

In the context of a case control analysis, the odds ratio can be shown to be biased in the

negative direction by a simple mathematical proof. Suppose a time varying exposure (E)

and case status (C) are independent random variables (i.e. E does not affect the risk of

C). The distribution of time varying exposure events occurring over the course or

pregnancy can be thought of as a Poisson process. That is, the number of exposure

events experienced during pregnancy follows a Poisson distribution with a daily rate of

λ.

Daily Exposure Events ~Pois(λ|C)= Pois(λ| )= Pois(λ)

When calculating an OR from a 2X2 table or logistic regression, our operational

definition of exposure is “ever exposed” (E ≥ 1) within some time period of a specified

duration (T). For example, we may be interested in assessing exposures from

conception to 28 weeks when looking at stillbirth or from 20 weeks to 37 weeks when

looking at preterm birth. Exposure events over a time interval are distributed as a

Poisson distribution with rate λT. If the outcome of interest is associated with earlier

gestational age at birth, then the exposure event distributions over the specified period

will not be equivalent (Pois(λTC) ≠ Pois(λ )). Therefore, whereas daily exposure and the

outcome are independent by definition, the probability of observing an exposure event

within a specified period is dependent on outcome status such that:

1| 1

1| 1

27

:

1| 1 1|

1|1 1|

1

28

CHAPTER 1 TABLES

Table 1.1: Exposure scenarios utilized in data simulations

Exposure Scenario

Proportion Exposed

Duration of Exposure Event

(weeks)*

Proportion of time

Unexposed *

P(Ei=1|Ei‐1=0) P(Ei=1|Ei‐

1=1) p q

Constant at 0.01 0.50 0.33 1.96 0.46 0.82 Constant at 0.01 0.90 0.33 8.03 0.45 0.50 Constant at 0.01 1.00 0.33 21.82 0.45 0.00

↓ from 0.1 to 0.001 0.50 0.61 2.03 0.15 0.91 ↓ from 0.1 to 0.001 0.90 0.61 9.95 0.15 0.66 ↓ from 0.1 to 0.001 1.00 0.61 34.07 0.15 0.00

↑ from 0.001 to 0.1 0.50 0.56 1.83 0.74 0.61 ↑ from 0.001 to 0.1 0.90 0.56 5.48 0.74 0.27 ↑ from 0.001 to 0.1 1.00 0.56 10.41 0.74 0.00

p: Time to first exposure divided by total time in follow‐up q: Time unexposed after first exposure divided by time after first exposure ↓: Decreasing over time ↑: Increasing over time

* Calculated assuming 40 week follow‐up. Will vary with differing definitions of follow‐up and magnitude of association between exposure and outcome of interest

29

P(Ei=1|Ei-1=0) P(Ei=1|Ei-1=1) OR HRINV HRTV OR HRINV HRTV OR HRINV HRTV

Any Pregnancy Constant at 0.01 0.50 0.78 0.79 0.50 1.00 1.00 1.00 1.35 1.31 2.00Constant at 0.01 0.90 0.44 0.46 0.50 0.88 0.89 1.00 1.81 1.72 2.00Constant at 0.01 1.00 0.42 0.44 0.50 0.85 0.86 1.00 1.81 1.67 2.00

↓ from 0.1 to 0.001 0.50 1.55 1.51 0.50 1.89 1.81 1.00 2.80 2.56 2.00↓ from 0.1 to 0.001 0.90 0.59 0.58 0.50 1.08 1.07 1.00 2.33 2.15 2.00↓ from 0.1 to 0.001 1.00 0.50 0.50 0.50 0.99 0.99 1.00 2.00 2.00 2.00

↑ from 0.001 to 0.1 0.50 0.48 0.49 0.50 0.56 0.59 1.00 0.81 0.82 2.00↑ from 0.001 to 0.1 0.90 0.27 0.27 0.50 0.48 0.50 1.00 0.99 0.92 2.00↑ from 0.001 to 0.1 1.00 0.26 0.26 0.50 0.47 0.49 1.00 0.98 0.97 2.00

Trimester Specific*Any 1st Trimester Constant at 0.01 1.00 0.45 0.48 0.50 0.95 0.96 1.00 1.90 1.89 2.00

Any 2nd Trimester 0.45 0.48 0.50 0.94 0.95 1.00 1.93 1.89 2.00Any 3rd Trimester 0.31 0.33 0.50 0.62 0.64 1.00 1.24 1.20 2.00

Any 1st Trimester ↓ from 0.1 to 0.001 1.00 0.48 0.50 0.50 0.99 0.99 1.00 2.00 2.00 2.00Any 2nd Trimester 0.48 0.50 0.50 0.99 0.99 1.00 2.00 2.00 2.00Any 3rd Trimester 0.37 0.41 0.50 0.78 0.79 1.00 1.68 1.58 2.00

Any 1st Trimester ↑ from 0.001 to 0.1 1.00 0.33 0.37 0.50 0.70 0.72 1.00 1.54 1.46 2.00Any 2nd Trimester 0.33 0.37 0.50 0.69 0.71 1.00 1.53 1.45 2.00Any 3rd Trimester 0.19 0.22 0.50 0.39 0.41 1.00 0.81 0.81 2.00

*Joint modeling of binary exposure indicators within each trimester

Table 1.2: Average effect estimates for 1000 simulations of 10,000 preganacies using time invariant and time varying methodsTrue RR=0.5 True RR=1 True RR=2.0Exposure

30

P(Ei=1|Ei-1=0) P(Ei=1|Ei-1=1) OR HRINV HRTV OR HRINV HRTV OR HRINV HRTV

Constant at 0.01 0.5 0.44 0.47 0.50 1.00 1.00 1.00 2.28 2.15 2.00Constant at 0.01 0.9 0.47 0.48 0.50 0.99 1.00 1.00 2.26 2.20 2.00Constant at 0.01 1 0.55 0.55 0.50 0.98 0.99 1.00 2.20 1.95 2.00

↓ from 0.1 to 0.001 0.5 0.53 0.54 0.50 1.18 1.17 1.00 2.76 2.44 2.00↓ from 0.1 to 0.001 0.9 0.49 0.51 0.50 1.03 1.03 1.00 2.59 2.15 2.00↓ from 0.1 to 0.001 1 0.69 0.69 0.50 0.99 1.00 1.00 2.55 2.30 2.00

↑ from 0.001 to 0.1 0.5 0.23 0.24 0.50 0.59 0.60 1.00 1.47 1.92 2.00↑ from 0.001 to 0.1 0.9 0.33 0.34 0.50 0.84 0.85 1.00 1.87 1.79 2.00↑ from 0.001 to 0.1 1 0.39 0.39 0.50 0.90 0.91 1.00 1.90 1.80 2.00

* Effect estimates for simple regression assuming linear dose response relationship

Table 1.3: Simulation of Average Exposure and Preterm Birth: Average effect estimates* for 1000 simulations of 10,000 pregnancies using time invariant and time varying methods

Exposure True RR=0.5 True RR=1 True RR=2.0

31

Table 1.4: Association between prenatal care initiation and preterm birth and low birth weight, 2006 US Natality Data

OR HRTI ORADJ HRTV HRImpTV

Preterm BirthNo Prenatal Care 1.00 (--) 1.00 (--) NA 1.00 (--) 1.00 (--)

Early 0.36 (0.35-0.37) 0.38 (0.38-0.39) NA 0.88 (0.88-0.89) 0.88 (0.88-0.89)Delayed 0.40 (0.39-0.41) 0.43 (0.42-0.44) NA 0.99 (0.98-1.00) 0.99 (0.97-1.01)

Low Birth WeightNo Prenatal Care 1.00 (--) 1.00 (--) 1.00 (--) 1.00 (--) 1.00 (--)

Early 0.32 (0.32-0.33) 0.33 (0.32-0.34) 0.54 (0.52-0.57) 0.90 (0.89-0.90) 0.90 (0.89-0.90)Delayed 0.34 (0.33-0.25) 0.35 (0.34-0.36) 0.55 (0.53-0.57) 0.96 (0.95-0.97) 0.97 (0.95-0.98)

HRTI= Time invariant hazard ratio

HRADJ= Time invariant hazard ratio adjusted for gestational age at birth

HRImpTV=Time varying hazard ratio using imputed exposure timing

HRTV=Time varying hazard ratio using observed exposure timing

Time Invariant Time Varying

32

CHAPTER 1 FIGURES

D

+ ‐ P‐Time

E + a b T+

‐ c d T‐

Figure 1.1: Calculation of the incidence rate ratio from a 2X2 table

Preterm Birth

Yes No Time

Exposure None A b T0

1st Trimester C d T1

2nd Trimester E f T2

3rd Trimester G h T3

Figure 1.2: Calculation of incidence rate ratio of jointly modeled exposures

33

a)

b)

c)

Figure 1.3: Dose response patterns for (a) constant probability of exposure initiation, (b) declining probability of exposure initiation, (c) increasing probability of exposure initiation.

0.000.000.010.020.050.140.391.052.827.63

20.59

0 0.2 0.4 0.6 0.8 1

Odds Ratio

.0

.0.01.02.05.14.391.052.827.63

20.59

0 0.2 0.4 0.6 0.8 1

Odds Ratio

.0

.0.01.02.05.14.391.052.827.63

20.59

0 0.2 0.4 0.6 0.8 1

Odds Ratio

Dose

p(E|E)=0.9, RR=0.5 p(E|E)=0.9, RR=1

p(E|E)=0.9, RR=2.0

34

Chapter 2: EVALUATING MISSING DATA DESIGNS IN THE PRESENCE OF NON‐DESIGNED

MISSING DATA: APPLICATIONS IN PERINATAL EPIDEMIOLOGY

35

Abstract

The feasibility of designs incorporating repeated exposure measures in pregnancy is

limited due to cost and excessive subject burden within a short duration of time. Prior

methodological work has identified designed missingness as an efficient and valid tool

for longitudinal assessment of outcomes. The goal of this paper is to introduce concepts

of missing data designs for prospective assessment of exposures and to assess

performance of missing data designs in scenarios encountered in observational

epidemiology. Study designs with designed missing data were compared to the

traditional cohort design with intended complete exposure ascertainment. We use

simulated data to quantify bias and relative efficiency under several scenarios

representing a range of non‐designed missing data due to non‐compliance or loss to

follow‐up. We further evaluate the performance of missing data designs using an

observational dataset implementing multiple unique patterns of designed missing data.

We observed that study designs with designed missing data were unbiased relative to

the comparative traditional cohort study. Efficiency of the missing data designs was

dependent on the between time correlation of the true exposure, the within time

correlation between the proxy exposure and the true exposure, and the proportion of

observations with non‐designed missing data. Missing data designs were more

susceptible to a loss of precision in the presence of non‐designed missing data.

36

Within the observational dataset, we observed that participant compliance was

strongest among the missing data designs. In conclusion, missing data designs are a

viable option for prospective assessment of exposures. Intensive studies should

consider missing data designs as a means to improve efficiency, reduce subject burden,

and reduce selection bias.

37

Introduction:

In perinatal epidemiology, exposures of interest are often time‐varying and may have

narrow yet unknown relevant etiologic periods, thus requiring repeated assessments to

be validly measured [2, 3, 48‐50]. The feasibility of designs incorporating repeated

measures in pregnancy is limited due to cost and excessive subject burden within a

short duration of time. Minimizing subject burden in perinatal investigations is

particularly important due to mothers’ increased resistance to invasive and non‐invasive

study methodologies [19, 20]. Recognizing constraints on financial cost and subject

burden, designing studies of repeatedly measured exposures can be presented as a

balancing act between small and thick or large and thin [51]. That is, designs may

sacrifice sample size to maximize exposure data or sacrifice exposure data to maximize

sample size. Without constraints on cost, time, or subject burden, ideal studies would

be large and thick. As a consequence, design methodologies have been developed to

maximize the amount of information collected at a fixed cost (i.e. attempt to approach

the validity of a thick study at the cost of a thin study).

Missing data designs, including partial questionnaire designs [26‐29], multi‐cohort

longitudinal designs [30, 31], and multi‐measurement methods of construct assessment

[28, 30], deliberately introduce data that is either missing at random (MAR) or missing

completely at random (MCAR) so as to validly utilize missing data methods when making

inference. In doing so, they require less intensive follow‐up protocols than a

38

comparable complete ascertainment study design. Unbiased estimates can be

obtained by ignoring (MCAR), multiply imputing (MCAR and MAR), or by maximum

likelihood based approaches (MCAR and MAR)[28]. In idealized simulation scenarios,

missing data designs have been shown to improve statistical efficiency without

sacrificing validity.

Recognizing that epidemiologist may be reluctant to rely on missing data methods for

the primary exposure of interest, it is important to note that traditional epidemiological

designs can be conceived as missing data designs in which all designed missing data is

clustered among the unsampled population [52]. Typically, missing data among the

unsampled population is considered MCAR, thus justifying analyses ignoring the missing

data. While the designed missing data in traditional studies is primarily concerned with

cost (e.g. random sampling of source population in a cohort study) and statistical

efficiency (e.g. outcome dependent sampling in a case‐control study), designed missing

data in novel missing data designs have an additional benefit of reducing individual level

subject burden while also controlling cost and maximizing statistical efficiency. Through

reducing subject burden, missing data designs may have an added benefit of reducing

selection bias due self selection and loss to follow‐up.

Assessment of associations between time‐varying exposures and adverse pregnancy

outcomes may be well suited for missing data designs because pregnant women are

more reluctant to participate in burdensome studies [20], adequate assessment of many

39

perinatal exposure requires measurements at multiple time‐points [49, 50], and non‐

invasive measures may not adequately characterize the conceptual exposures

experienced by the fetus [53]. Though theoretically appropriate, missing data designs

are rarely used in perinatal investigations, in part due to a general mistrust in missing

data methodologies [32]. Additionally, prior methodological publications have not

assessed performance of missing data designs implemented in the context of time

varying exposures nor have missing data designed been evaluated in scenarios with

non‐designed missing data due to non‐compliance and loss to follow‐up. The goals of

this paper are to introduce concepts of missing data designs and to assess the

performance of missing data designs in non‐idealized scenarios encountered in

observational epidemiology by exploring the impact of non‐designed missingness on the

statistical efficiency and validity.

‐

40

Simulation Methods

To evaluate missing data designs in the presence of non‐designed missingness, we

analyzed simulated data and data from the Health and Nutrition in Pregnancy Study

(NIP). Simulated data of generalized exposure scenarios were created within a source

population representing 100,000 pregnant women. Variable inputs for our simulations

included the total number of assessment time points, distributions of exposures

measured by gold standard or proxy, correlation between exposures between time

points [COR(Gi, Gi+1), where Gi is the gold standard measured at time i], correlation

between exposure methods within time points [COR(Gi, Pi), where P is the proxy

exposure assessment], and the magnitude of association between a binary outcome and

exposure at each time‐point. For analyses presented in this paper, we assumed a study

with 5 designed assessment time points with between‐time and within‐time exposure

correlations ranging from 0.2 to 0.9. Outcomes were simulated such that there was a 2‐

fold increased risk per unit change in exposure at a specific time point.

For a study with exposure assessed at 5 time points and where there is an accepted gold

standard and a reasonable proxy, there are essentially an unlimited number of potential

missing data designs. We have selected four candidate designs for comparison (Figure

2.1): a complete ascertainment study in which the gold standard is measured at all time

points for all participants, a two‐stage design in which the proxy is measured at all time

points but the gold standard is only measured at time 1 (t1) and one additional time

point (t2‐t5), a multi‐cohort design in which the gold standard is measured at three time

41

points, and a complex design utilizing elements of two‐stage and multi‐cohort designs.

Each of the candidate designs includes a common component in which the gold

standard is measured for all subjects at t1. The variable components at t2‐t5 collect

exposure data with less intensity.

We compared the validity and precision of these study designs at a fixed cost. Though

cost may conceptually include factors such as time, subject burden, and effort, we chose

to compare designed at a fixed financial cost. As described by Helms (1992), the cost of

a study could be thought of consisting of a core administrative cost (Admin) that would

be relatively constant between designs, a subject recruitment cost (SRC), and the

subject assessment cost (SAC)[31]. Cost is also impacted by the sample size (N) and the

number of assessments per subject (K). Thus the total cost of a study may be

summarized in a simplified model.

Cost=Admin+N x (SRC + K*SAC)

The cost model assumes that subject recruitment costs and subject assessment costs do

not vary as the sample size or number of assessments per subjects changes. If we

further assume that the administrative costs, including initiation of the study,

maintenance of the data, analysis of the data, and production of a manuscript, are

relatively constant between designs, then this cost contributor may be dropped from

the model for comparative purposes.

42

Using recruitment and assessment costs (gold standard and proxy), we obtained sample

sizes at a fixed total cost for each of the candidate designs by solving for N in our cost

model. For subject recruitment, we assumed a scenario requiring 1 hour of effort at

$20/hr per subject. The cost of a proxy assessment was intended to be representative

of a phone interview with 1 hour of effort at $20/hr. The gold standard was intended to

be representative of routine biospecimen collection and evaluation costing $100 plus an

additional 2 hours of effort at $20/hr. The cost and sample size parameters are

summarized in Table 2.1.

To simulate data collected from each design, we first sampled the specified number of

subjects from the simulated source population. Next we introduced missing data

consistent with the design by deleting exposure values. No values were deleted from

the complete ascertainment design during this stage of the simulation. For the two‐

stage design, sampled participants were randomly assigned 1 time point at which the

gold standard would be measured. For the multi‐cohort design, all proxy measures

were deleted and 2 gold standard assessments were randomly deleted within each

subject. For the complex design; 10% of the population retained complete data on both

the gold standard and proxy exposure; 10% retained all proxy exposures, the baseline

gold standard, and one additional randomly selected gold standard; and 80% retained

the baseline gold standard, the baseline and final proxy, and 1 additional randomly

43

identified gold standard. We then introduced non‐designed missing data by randomly

deleting 5%, 10%, 20%, 30%, and 40% of designed measures at each time point.

We use multiple imputation methods to impute designed and non‐designed missing

data simultaneously [46, 54‐57]. Multiple imputation methods for missing data designs

have been presented as a preferred method due to ease of use in readily available

software and ability to perform sensitivity analyses where non‐designed missing data is

expected to be MNAR[28]. A Markov Chain Monte Carlo (MCMC) multiple imputation

(MI) was chosen to simultaneously impute missing values for the gold standard at all

time points [54, 57]. The first step of MCMC MI requires the calculation of prior

parameters for multivariate normal means and covariances of variables included in the

imputation model. To accomplish this, missing values were initially filled in using a

vector of mean values and the covariance matrix estimated from the EM algorithm for

the observed data. Using the parameters from the prior model, missing data were

updated (imputed) by sampling a value from the predictive distribution. Posterior

parameters for multivariate normal predictive means and covariance matrix were then

estimated using the observed and imputed values for all variables in the imputation

model. This process was repeated until the vector of means and the covariance matrix

were unchanged between consecutive iterations. This process was repeated to produce

10 complete datasets. All analyses and imputations were performed using SAS version

9.2 (SAS Institute, Cary, NC).

44

Timing specific measures of association were quantified using logistic regression. We

report bias and relative efficiency separately for the common component and the

variable components. Bias was quantified as the percent difference between the

specified and observed effect estimates. We evaluated the efficiency of the study

designs by calculating the ratio of the standard errors from the missing data designs to

the standard errors from the complete ascertainment design subjected to the same

degree on non‐designed missingness. Relative efficiencies greater than 1 represent

scenarios in which the missing data design was more efficient than the complete

ascertainment design.

Observational Data Methods

The Health and Nutrition in Pregnancy Study enrolled 2478 women for the purpose of

assessing the association between dietary factors and adverse pregnancy outcomes

[58]. Women were recruited from 1996 through 2000 from 56 obstetrical practices and

15 clinics associated with six hospitals in Connecticut and Massachusetts. Eligible

women had a gestational age less than 24 weeks at enrollment and had no prior history

of diabetes. Women were assigned to one of three screening protocols sharing a

common component including a baseline survey and urine sample and a postpartum

survey: 1) telephone group consisting of one telephone follow‐up interview at 20, 28, or

36 weeks; 2) biomonitoring group consisting of three telephone follow‐up interviews at

20, 28, and 36 weeks in conjunction with an additional urinary sample at one of the

follow‐up times; or 3) intensive monitoring group consisting of three in‐person

45

interviews and three urinary samples at 20, 28, and 36 weeks. For our purposes, each

protocol (intensive, telephone, and biomonitor) and the overall NIP design was

considered representative of a unique missing data design with the intensive group

serving as the traditional complete ascertainment cohort design. We evaluated the

performance of these designs by quantifying the magnitude and precision of the

association between maternal smoking and birth weight using data from the Health and

Nutrition in Pregnancy Study. We had insufficient numbers to evaluate other clinically

relevant outcomes such as small for gestational age.

As with our simulated data, we compared the efficiency of the missing data designs

(telephone, biomonitor, NIP) relative to the complete ascertainment design (intensive

group) at a fixed cost. Based on observable covariates, assignment to study protocol

was not completely at random. This non‐random assignment would produce a selection

bias such that intra‐study comparisons would not be valid. To remove the selection bias

we employed a cluster sampling approach within each design. Sampling weights were

obtained using predicted probability (Ppredict) of being assigned to the intensive protocol

as quantified using a logistic regression with baseline covariates. Sampling weights were

assigned as 1‐ Ppredict for those assigned to the intensive group and Ppredict for those in the

Telephone and Biomonitoring group. We then sampled from each protocol according to

the sampling weights such that the total cost and the probability of being assigned to

the intensive group within each missing data design was the same. The distributions for

the predicted probabilities for being assigned to the intensive protocol are presented in

46

Figure 2.2. We evaluated the distribution of baseline characteristics to confirm

comparability between designs after weighting (Table 2.2). Prior to propensity

weighting, those assigned to the intensive protocol were more likely to be married,

white, higher educated, and recruited in the first trimester. After weighting, no notable

differenced were observed.

To ensure that individuals had an opportunity to complete assessments at all designed

time points, this analysis was restricted to women recruited prior to week 20 with

deliveries occurring after 36 weeks gestation. Small for gestational age was defined as a

birth weight below the 10th percentile for gestational age, gender, and ethnicity using an

external standard developed from 1999 US Natality Data. The primary exposure of

interest was urinary cotinine. Participants were provided with urine containers and

requested to collect urine between dinner and bedtime on the night prior to the

interview. Samples were collected the following day and were analyzed for urinary

cotinine by Labstat, Inc. (Kitchener, Ontario, Canada). As a proxy measure women were

asked to report the number of cigarettes smoked per day the previous week at weeks

20, 28, and 36. Respondents were asked to recall their first trimester exposures during

their baseline interview.

Within each study design, we assess compliance with both proxy and gold standard

measures by calculating the total percentage of designed measures that were missing.

The association between urinary cotinine and birth weight is quantified using

47

multivariable linear adjusting for maternal age, race, and gestational age. Cotinine

measures were simultaneously assessed in the regression models and variance inflation

factors were generated to assess presence of multicolinearity (VIF > 2). Standard errors

for observed associations are compared across studies.

48

Simulation Results

We did not observe any bias due to designed missing data in any of the assessed

scenarios. In both the common component and the variable component, the relative

efficiency of the missing data designs declined as the prevalence of non‐designed

missing data increased (Table 2.3). For the common component, missing data designs

were more efficient than complete ascertainment designs subjected to the same

probability of non‐designed missing data. For the variable components, each of the

missing data designs was more efficient where correlation was high and there was no

non‐designed missing data. The multi‐measurement design remained more efficient

even when the within time correlation was low and the probability of non‐designed

missing data was high. For the multi‐cohort design, there was no efficiency advantage

where the prevalence of non‐designed missing data was 10% or above or where the

between time correlation was below 0.9. The complex missing data design only realized

increased efficiency where the between time and within time correlations were high

(Cor=0.9) and non‐designed missingness was low (Pmiss≤0.1).

49

Observational Data Results

Using the observational data obtained from the Health and Nutrition in pregnancy

study, we observed that compliance with the study assessments was inversely related

with the demands of the protocol (Table 2.5). While non‐designed missingness was high

for both proxy (12%‐21%) and gold standard (16%‐38%) measures, the intensive group

had the highest percentage of non‐designed missing data of the gold standard urinary

measures. Similarly, the biomonitoring and the intensive groups had the highest

percentage of non‐designed missing data of the self reported smoking measures.

When quantifying the association between urinary cotinine and birth weight, missing

data designs tended to be more efficient. However, the associations observed in the

missing data designs were somewhat attenuated relative to the complete ascertainment

design. Due to the small sample size within each study protocol used in this analysis,

associations between cotinine and birth weight did not achieve statistical significance

with the exception of first trimester exposures assessed using the complex NIP design.

Similar finding were observed with respect to self reported smoking.

50

Discussion

This paper is unique in that we evaluated missing data designs in the presence of non‐

designed missing data using both simulated and observational data. We have

demonstrated that missing data designs have the potential to increase study efficiency

and reduce subject burden while still obtaining valid results; however, investigators

must carefully consider the expected prevalence of non‐designed missing data resulting

from non‐compliance and loss to follow‐up. The missing data designs evaluated in this

paper were more susceptible to loss of precision in the presence of non‐designed

missing data. The relative efficiency was dependent on the pattern of designed

missingness, joint exposure distributions (gold standard and proxy), and prevalence of

non‐designed missingness. In addition to increased efficiency, missing data designs may

increase validity by increasing sample size and reducing selection bias at enrollment and

due to subject fatigue (non‐compliance and loss to follow‐up) by minimizing the burden

on study participants. In our example using data from the Nutrition in Pregnancy Study,

the probability of non‐compliance with the study protocol for the gold standard

assessment was much greater in the complete ascertainment cohort (PMiss=0.38) as

compared to the missing data designs (PMiss=0.24 to 0.16).

Missing data designs are often viewed as methods of introducing missingness into a

study design so as to decrease cost; however, this incorrectly ignores designed

missingness present in traditional study designs (i.e clustering missingness in the

unsampled population). A more accurate description would be that missing data

51

designs attempt to optimize the distribution of missing data in a design so as to

maximize statistical and/or cost efficiency. Morara et al. address missing data

optimization in longitudinal designs by defining a probability function for the probability

of being sampled at a particular stage at a particular time given cost, between time

correlations, and exposure ascertainment correlations. Using this equation, it can be

shown that complete ascertainment cohort and case‐control studies are specific types

of missing data designs. Thus the primary difference between complete ascertainment

and missing data designs is the distribution, rather than presence, of missing data.

In this paper, we have focused on missing data designs with direct applications to

repeated measures in perinatal epidemiology; however, the partial questionnaire design

warrants some discussion as it was the predecessor to the multi‐cohort methods.

Several partial questionnaire designs have been developed to address survey fatigue

[28, 29, 59‐61] The three‐form design, a specific type of partial questionnaire, divides

the parent survey into 4 components, a common component (X) and three variant

components (A, B, C) [28, 29]. Three different surveys are then developed by combining

the common component with two of the variant components (XAB, XAC, XBC). Study

respondents are then randomly assigned one of the survey instruments, but no single

respondent answers all potential survey questions (XABC). Thus, missingness of survey

responses is expected to be MCAR through randomization. Using this method,

correlations can be obtained between all variables included in the parent survey even

though each individual only answers a subset of the questions. If the missing data is

52

correlated with observed data, imputation or maximum likelihood methods can be used

to obtain valid estimates of associations while increasing precision of the estimates as

compared to complete case or available case analyses.

The multi‐cohort design is similar to the partial questionnaire design, the difference

being that the multi‐cohort randomizes subjects in a longitudinal study to have missing

values at some of the follow‐up times. The multi‐cohort design was initially described

by Helms (1992) in the context of evaluating the efficacy of a weight loss treatment [31].

In the example provided by Helms, subjects could be assigned to one of three follow‐up

protocols. Assuming high correlations between consecutive assessment intervals

(COR>0.7), little information was lost by decreasing proportion of subjects in protocol 1

and increasing proportion in protocol 2 or protocol 3. Similar methods can be applied in

perinatal investigations of time‐varying exposures provided some degree of temporal

correlation is expected. Exposures such as diet, physical activity, air pollution, and

drug/alcohol use are potential exposures for which this design may work well.

Multi‐measurement methods of construct assessment quantify exposure using one or

more measures at each time‐point. The most well known multi‐measurement method

is the two‐stage design[62, 63], though this design is typically not implemented in a

longitudinal setting. The two‐stage method of construct measurement is appropriate

where the gold standard of exposure ascertainment is relatively expensive and there is

an affordable and reasonably valid proxy measure [28]. In a longitudinal two‐stage

53

design, proxy exposure measures could be ascertained at all time points for all

participants. Exposure ascertainment by gold standard could then be randomized to

individuals within the study sample and time points within an individual. The intent of

the two‐stage design is to obtain more power than is attainable with the gold standard

alone and more validity than is attainable with the proxy alone [28]. While the multi‐

measurement missing data design is comparable if for to two‐stage validation studies,

their similarities largely deviate in how the data are analyzed. Typically designs

incorporating validation studies either proceed with analysis using an uncorrected proxy

(if the proxy is deemed to be sufficiently valid) or analyses are corrected using a

regression calibration. Analyzing as a missing data design offers some advantages over

regression calibration. Most notably, the missing data designs only imputes values for

individuals and time‐points with missing data while regression calibration adjusts all

observations with respect to their measured proxy even when the gold standard is

observed. Examples of perinatal exposures suitable for this design include dietary

factors, medication use, drug use, and environmental or occupational exposures.

Methods for power and sample size calculations for missing data designs have been

developed; however current methods do not allow for time varying exposures or non‐

designed missing data[30, 64]. It was not within the scope of this study to develop

methods for power and sample size calculations; however, through data simulation,

investigators can identify relative efficiency and susceptibility to bias of candidate study

designs. Though potentially burdensome, this would be a valuable exercise given the

54

potential for increased efficiency, reduced bias, reduced non‐compliance and loss to

follow‐up, and reduced subject burden. Future work should be targeted towards

developing software to identify the optimum distribution of designed missing data in

studies with time varying exposures and non‐designed missing data,

Complete ascertainment was obtained on a subset of the data for each of the missing

data designs assessed in this study. Each study design attempted to assess baseline

exposure on all subjects. For this reason, baseline associations between urinary cotinine

and SGA were consistently more precisely estimated when using missing data designs.

By incorporating a common component between study protocols within a design,

missing data designs allow for the estimation of some measures of association in the

absence of designed missing data. While designed missingness will not introduce bias

on average and may increase efficiency, it is preferable to minimize missing data in

variables required to test the primary hypothesis by distributing the missing data among

ancillary variables and time‐points. Where prior evidence indicates a specific relevant

etiologic period of interest, the common component of the missing data designs should

include detailed and complete assessment within the period of interest. For example,

when assessing the association between smoking and growth restriction, we may design

a study with a common component assessing detailed exposure in the third trimester

and variable components assessing exposure in the first and early second trimesters

based on prior evidence. The intensive protocol of the NIP study was included to have

complete data on a 10% sample of the study population. Missing data designs

55

incorporating a complete ascertainment protocol are better equipped to estimate

partial correlations or interactions involving three or more coviariates/time points. In

addition to directly assessing these interactions, this feature is enables more precise

imputations and increases the statistical efficiency of the study design.

With the advent of readily available software for multiple imputations and maximum

likelihood analyses, missing data designs offer a viable option for increasing precision

and decreasing subject burden without sacrificing validity. The relative efficiency of

missing data designs is highly sensitive to the distribution of exposure variables, costs

associated with recruitment and assessments, and the presence of non‐designed

missing data. Methods for power and sample size calculations and missing data

optimization need to be developed; however, data simulations can adequately compare

candidate study designs under variable conditions of non‐designed missingness.

Missing data designs may be particularly useful to minimize subject burden in studies of

vulnerable populations such as pregnant women and newborns.

56

CHAPTER 2 TABLES

Design Sample Size

Recruitment($20/Subject)

Gold Standard ($140/Measure)

Proxy Measure ($20/Measure)

Total Cost

Complete Ascertainment 543 10,870 391,304 500,000 Multi-Measurement 1042 20,833 375,000 104,167 500,000 Multi-Cohort 893 17,857 482,143 500,000 Complex 1397 27,933 377,095 94,972 500,000

Table 2.1: Sample size and cost parameters used for data simulationsDesign Cost

57

Intensiven=100

Telephonen=315

Biomonitorn=178

Intensiven=100

Telephonen=315

Biomonitorn=178

Maternal Age<25 18% 21% 25% 23% 22% 26%

25-29 27% 25% 25% 29% 27% 22%30-34 35% 35% 32% 28% 32% 33%

>35 20% 19% 19% 20% 19% 19%Marital Status

Married 75% 71% 63% 73% 69% 66%Never Married 21% 24% 32% 21% 25% 30%

Divorced/Separated/Widowed 4% 5% 4% 6% 6% 4%Ethnicity

White 72% 70% 60% 68% 68% 64%Black 5% 9% 8% 10% 9% 7%

Hispanic 22% 19% 28% 19% 21% 26%Other 2% 2% 4% 3% 3% 3%

Education<HS 14% 13% 19% 17% 17% 17%

HS 18% 18% 16% 17% 17% 19%Some College 19% 22% 27% 18% 18% 18%

College 21% 25% 24% 32% 31% 28%College+ 28% 21% 13% 16% 16% 18%

GA at 1st Interview1st trimester 86% 26% 23% 58% 57% 60%

2nd trimester 14% 74% 77% 42% 43% 40%

Weighted*Pre-WeightingTable 2.2: Characteristics of study participants within protocols of the Nutrition in Pregnancy Study prior to and after weighting

*Inverse probability of being asigned to the intensive protocol used as sampling weight to remove selection bias between protocols

58

Design 0 0.1 0.3 0 0.1 0.3

Multi-MeasurementCor(Ei, Pi)=0.9 0 (1.75) 0 (1.75) 0 (1.75) 0 (1.69) 0 (1.64) 0 (1.56)Cor(Ei, Pi)=0.6 0 (1.75) 0 (1.75) 0 (1.75) 0 (1.37) 0 (1.35) 0 (1.28)Cor(Ei, Pi)=0.2 0 (1.75) 0 (1.74) 0 (1.74) 0 (1.25) 0 (1.21) 0 (1.10)

Multi-Cohort Cor(Ei, Ei+1)=0.9 0 (1.28) 0 (1.25) 0 (1.20) 0 (1.01) 0 (0.93) 0 (0.87)Cor(Ei, Ei+1)=0.6 0 (1.28) 0 (1.25) 0 (1.20) 0 (0.99) 0 (0.90) 0 (0.87)Cor(Ei, Ei+1)=0.2 0 (1.28) 0 (1.28) 0 (1.24) 0 (0.98) 0 (0.89) 0 (0.92)

ComplexCor(Ei, Ei+1)=0.9 & Cor(Ei, Pi)=0.9 0 (1.60) 0 (1.58) 0 (1.56) 0 (1.15) 0 (1.13) 0 (1.06)Cor(Ei, Ei+1)=0.6 & Cor(Ei, Pi)=0.6 0 (1.60) 0 (1.56) 0 (1.52) 0 (1.06) 0 (0.84) 0 (0.81)Cor(Ei, Ei+1)=0.2 & Cor(Ei, Pi)=0.2 0 (1.60) 0 (1.59) 0 (1.53) 0 (0.83) 0 (0.76) 0 (0.80)

Bias: Percent difference between CAD and MDD odds ratiosRelative Efficiency: standard error from CAD divided by standard error from MDD� Bias and relative efficiency for t1*Average bias and relative efficiency for t2-t5

Table 2.3: Bias and relative efficiency of missing data designs (MDD) in presence of non-designed missing data relative to complete ascertainment designs (CAD) , 1000 data simulaitons

Common Component�

Bias (Relative Efficiency)

Variable Components*

Bias (Relative Efficiency)

Probability of non-designed missingness

59

Complete Ascertainment

Intensive Biomonitoring Telephone NIPSample Size 100 178 315 244Proxy (Interview Data)

Prenatal Interviews/Subject 4 4 2 2.4# Designed Prenatal Interviews 400 712 630 585.6

# Observed Observed Interviews 318 549 557 498%Non-Designed Missing 21% 23% 12% 15%

Gold Standard (Urinary Cotinine)Prenatal Samples/Subject 4 2 1 1.4

# Designed Prenatal Samples 400 356 315 342# Observed Prenatal Samples 250 271 266 266

%Non-Designed Missing 38% 24% 16% 22%

Table 2.4: Cost-fixed sample size and compliance within study protocols among participants in the Nutrition in Pregnancy Study

Missing Data Designs

60

Complete Ascertainment

IntensiveBirth Weight Δ(95%CI) Δ(95%CI) Rel Eff Δ(95%CI) Rel Eff Δ(95%CI) Rel Eff

Urinary Cotinine1st Trimester -12.52 (-43.27-18.23) ‐16.71 (‐50.63‐17.21) 1.13 ‐4.15 (‐23.45‐15.16) 1.59 -17.20 (-31.93--2.47) 2.09

Week 20 -21.31 (-55.16-12.55) ‐10.01 (‐45.37‐25.34) 0.96 ‐7.50 (‐74.59‐59.59) 0.50 -3.35 (-10.90-4.20) 4.49Week 28 -29.49 (-118.38-59.39) ‐7.42 (‐38.79‐23.96) 2.33 ‐5.19 (‐38.36‐27.99) 2.68 -3.14 (-23.30-17.02) 4.41Week 36 -57.11 (-152.63-38.40) ‐11.32 (‐51.28‐28.65) 2.54 0.70 (‐24.02‐25.41) 3.86 -11.06 (-29.78-7.65) 5.10

Reported Smoking1st Trimester -18.98 (-57.36-19.39) ‐14.57 (‐29.79‐0.65) 2.52 -18.98 (-57.36-19.39) 1.00 -25.61 (-39.71--11.52) 2.72

Week 20 -9.22 (-43.06-24.62) ‐7.52 (‐27.92‐12.88) 1.66 -18.14 (-57.10-20.82) 0.87 -6.76 (-36.59-23.06) 1.13Week 28 -28.69 (-101.94-44.56) ‐11.90 (‐32.89‐9.09) 3.49 -30.93 (-110.06-48.20) 0.93 -6.74 (-28.59-15.10) 3.35Week 36 -25.29 (-126.98-76.39) ‐7.43 (‐22.76‐7.90) 6.63 -1.39 (-99.00-96.23) 1.04 -14.18 (-37.50-9.15) 4.36

Tabe 2.5: Association between measures of smoking and small for gestational age by week of pregnancy and study design among participants in the Nutrition in Pregnancy Study.

Δ Difference in birth weight in grams per unit increase in exposure (per 50 ng of urinary cotinine or per cigarette of reported smoking). Adjusted for maternal age, race, and gestation at deliveryRel Eff: Relative Efficiency defined as standard error from CAD divided by standard error from MDD

Missing Data DesignsTelephone Biomonitor NIP

61

CHAPTER 2 FIGURES

Figure 2.1: Candidate missing data designs

t1 t2 t3 t4 t5 t1 t2 t3 t4 t5 t1 t2 t3 t4 t5 t1 t2 t3 t4 t5

Indicate Gold StandardIndicates unsampled population

ComplexMulti-CohortMulti-MeasurementComplete Ascertainment

4

3 3 3 3

4 4 4

2

1

2 2 2

1 1 1

8

7 7 7 7

8 8 8

6

5 5 5 5

6 6 6

10

9 9 9 9

10 10 10

12

11 11 11 11

12 12 12

14

13 13 13 13

14 14 14

16

15 15 15 15

16 16 16

18

17 17 17 17

18 18 18

Indicates Proxy Measure20

19 19 19 19

20 20 20

62

a

b

Figure 2.2: Distribution of the predicted probability of being assigned to the intensive

protocol prior to weighting (a) and after weighting (b).

63

Chapter 3: TIME DEPENDENT ASSOCIATIONS BETWEEN MATERNAL CAFFEINE

CONSUMPTION AND INTRAUTERINE GROWTH RESTRICTION

64

Abstract

Understanding the association between maternal caffeine consumption and IUGR is of

high public health importance given the prevalence of caffeinated beverage

consumption during pregnancy and the serious risks associated with intrauterine growth

retardation (IUGR). We quantify timing dependent associations between maternal

caffeine consumption and measures of fetal growth and evaluate for effect measure

modification by acetaminophen within a cohort of 2277 full term singleton pregnancies

recruited from 1996‐2000 in Connecticut and Massachusetts. Caffeine measures were

assessed in the first trimester, second trimester (week 20), early third trimester (week

28), and mid third trimester (week 36). Assessed fetal growth measures included IUGR

and birth weight. Associations between caffeine and measures of fetal growth were

quantified using inverse probability of treatment weighted logistic regression. Effect

measure modification on the additive scale was quantified using the relative excess risk

due to interaction. We observed significant increased odds of IUGR for caffeine

exposures occurring in the first trimester (OR per 100 mg=1.15, 95%CI: 1.04‐1.28) and

mid‐third trimester (OR per 100mg= 1.16, 95%CI: 1.00‐1.35). We did not observe a

statistically significant departure from additivity due to acetaminophen use. In

conclusion, we observed increased risks associated with even moderate levels of

caffeine consumption (60mg‐170mg) for exposures occurring in the first trimester and

mid third trimester. Subsequent analyses should assess whether decreasing caffeine

intake is associated improved reproductive outcomes.

65

Introduction:

Intrauterine growth restriction (IUGR) is an etiologically diverse adverse outcome

attributed to fetal, placental, and maternal characteristics[65]. Neonates identified as

growth restricted have an increased risk of perinatal mortality and morbidity. More

recently, IUGR has been identified as a risk factor for later life conditions such as

hypertension, diabetes, and cognitive functioning. Caffeine, a fairly ubiquitous exposure

during pregnancy, may contribute to IUGR through inhibition of trophoblast mRNA

during placental development [66] or through decreased uteroplacental and

fetalplacental blood flow during rapid fetal growth[67]. Understanding the association

between maternal caffeine consumption and IUGR is of high public health importance

given the prevalence of caffeinated beverage consumption during pregnancy and the

serious risks associated with IUGR.

Despite several prior investigations, the relationship between caffeine and IUGR

remains undetermined [40, 58, 68]. Assessing the association between caffeine and

pregnancy outcomes is complicated by changes in caffeine metabolism associated with

pregnancy progression, co‐dependency between pregnancy symptoms and caffeinated

beverage intake, and potential interactions with other chemical exposures. Prior

investigations have assessed for potential effect measure modification by smoking

status [58] and gender [69]. Acetaminophen, a common exposure in pregnancy [70],

has not been assessed as a potential effect modifier in the association between caffeine

and IUGR. Acetaminophen and caffeine are metabolized by CYP1A2 and the

66

combination of caffeine and acetaminophen has been demonstrated to retard fetal

growth in experimental rodent studies [71, 72].

In the present paper, we quantify time dependent associations between caffeine

exposure and measures of fetal growth. Based on the proposed biological mechanisms,

we hypothesize associations between caffeine and IUGR will be most pronounced for

exposures during the first trimester (placentation) and mid‐third trimester (rapid fetal

growth). In addition, we evaluate time‐dependent interactions between caffeine and

acetaminophen, smoking and gender.

67

Methods

The Health and Nutrition in Pregnancy Study enrolled 2478 women for the purpose of

assessing the association between dietary factors and adverse pregnancy outcomes

[58]. Women were recruited from 1996 through 2000 from 56 obstetrical practices and

15 clinics associated with six hospitals in Connecticut and Massachusetts. Eligibility

criteria included English speaking, gestational length less than 24 weeks at enrollment,

no prior history of diabetes, and no intent to terminate the pregnancy. Among

identified eligible women, all women consuming greater than 150mg of caffeine per day

(n=718) and a random sample of women consuming less than 150 mg of caffeine per

day (n=2915) were invited to participate. Of those invited to participate, 17.6% refused,

0.6% were no longer eligible, 2% miscarried prior to the first interview, and 11.7% could

not be contacted. The final cohort included 2478 pregnancies. This analysis was limited

to 2277 pregnancies after we further excluded those for whom fetal growth measures

could not be ascertained (70 miscarried, 6 stillbirths, 5 withdrew from study, 44 lost to

follow‐up), non‐singleton pregnancies (n=53), and births prior to the 36th week of

gestation (n=14). Non‐singleton pregnancy were excluded due to known associations

between multiples and IUGR.

Women were assigned to one of three follow‐up protocols based on gestational age at

first interview, level of caffeine consumption, and randomization. Each of the follow‐up

protocols included a baseline survey and urine sample prior to 24 weeks gestation

68

(average of 14.4 weeks) and a postpartum survey. The majority of participants (80%)

were assigned to a follow‐up protocol involving one telephone follow‐up interview at

20, 28, or 36 weeks in addition to their baseline and postpartum assessments. Of the

remainder, 10% were assigned to a follow‐up protocol involving three telephone

interviews at 20, 28, and 36 weeks and an additional urinary sample at either 20, 28, or

36 weeks of gestation, and 10% were assigned to a follow‐up protocol involving three

in‐person interviews and three urinary samples at 20, 28, and 36 weeks.

Caffeine exposure was estimated based on self reported measures and urinary

metabolites. At their baseline interview, women were asked to recall their caffeine

intake prior to pregnancy and during their first three months of pregnancy. Women

were asked to report the frequency and quantity of consumed coffee, tea, and soda.

Model cup sizes were presented to participants to aid in their recall of serving size.

Detailed questions elicited coffee type (regular or instant), coffee preparation method,

coffee brand, tea type (hot or iced), tea preparation (steeping duration), and tea brands.

Among a 10% subsample of women, beverage samples were obtained and analyzed for

their caffeine content at two randomly selected visits. Average daily caffeine intake

(mg/day) was computed based on self reported caffeinated beverage consumption and

beverage caffeine content obtained from the beverage analysis. Additional interviews

were conducted in the second trimester (week 20), early third trimester (week 28), mid

third trimester (week 36), and postpartum to assess type, frequency, and serving size of

caffeinated beverage consumption over the previous week.

69

Urinary concentrations of caffeine metabolites were assessed at baseline for all

participants and at weeks 20, 28, and 36 for subsamples of participant. Participants

were provided with urine containers and requested to collect urine between dinner and

bedtime on the night prior to the interview. Samples were collected the following day

and were analyzed for urinary caffeine and paraxanthene by Labstat, Inc. (Kitchener,

Ontario, Canada). Caffeine metabolites have a relatively short half life (less than 6

hours) and less than 2% of caffeine is excreted through urine, consequently caffeine

metabolites do not accurately reflect average weekly caffeine exposure [73]. Caffeine

metabolites were not directly evaluated as a risk factor in this study; however, they

were utilized as informative covariates for multiple imputation of missing self‐reported

caffeine.

Our operational measure for intrauterine growth restriction was defined as a birth

weight below the 10th percentile of for gestational age, gender, and ethnicity. Birth

weights were recorded with 24 hours of birth. Gestational age was calculated from the

reported number of days since the first day of the last menstrual period. Where timing

of the last menstrual period could not be recalled, gestational age was calculated from

the physicians estimated date of delivery. Sonography estimates confirmed the

gestational ages of 61.2% of the women. We compared observed birth weights to the

distributions of birth weights within the entire US population using publically available

US Natality data[74]. Gestational age, gender, and ethnicity specific birth weight

70

distributions were assessed using 1999 US Natality data representing all singleton US

births [74]. In addition to IUGR we assessed birth weight in grams as a continuous

outcome.

Demographic, health history, prenatal health, and health behavior covariates were

assessed at the prenatal and postpartum interviews. Time‐fixed covariates included

maternal age at first interview, ethnicity, parity, education, height, pre‐pregnancy

weight, pre‐pregnancy BMI, and prior chronic disease. Time‐varying covariates included

smoking status, alcohol use, and pregnancy complications (gestational hypertension),

and medication use (Acetaminophen and NSAIDs). Smoking, medication use, and

alcohol use were assessed at each of the interviews. Subjects were asked to provide the

number of cigarettes smoked per day over the previous week. For medication use,

individuals provided brand names for medications taken over the previous week. We

created indicators for medications with the class of non‐steroidal anti‐inflammatory

drugs and for acetaminophen. Alcohol use was assessed by asking respondents to

report the number of servings of wine, beer, or mixed drinks consumed over the

previous week. Candidate confounders were identified via review of existing literature

and directed acyclic graphs. The directed acyclic graph representing the association

between caffeine and IUGR highlights confounders that may be appropriately adjusted

for in a multivariable analysis and those where adjustment would lead to bias.

71

Analyses were performed using SAS version 9.2 (The SAS Institute, Cary, NC). As

described above, this study design deliberately introduces missing data through the

assigned follow‐up protocols. Previous work has demonstrated that missing data

designs, in general and the specific design implemented in the Health and Nutrition in

Pregnancy Study, do not contribute bias to the observed associations (Chapter 2 Ref).

We use multiple imputation methods to impute designed and non‐designed missing

data simultaneously [46, 54‐57]. Multiple imputation methods for missing data designs

have been presented as a preferred method due to ease of use in readily available

software and ability to perform sensitivity analyses where non‐designed missing data is

expected to be MNAR[28]. Caffeine variables were transformed into two components, a

logit transformation of the binary indicator of caffeine consumption and a log

transformation of the continuous measure of caffeine mg/day. This transformation was

identified via sensitivity analyses of our imputation method. The final imputed value of

caffeine was obtained by multiplying the imputed binary indicator by the exponent of

the imputed value for log caffeine. A Markov Chain Monte Carlo (MCMC) multiple

imputation was chosen to simultaneously impute missing values [54, 57]. The first step

of MCMC MI requires the calculation of prior parameters for multivariate normal means

and covariances of variables included in the imputation model. To accomplish this,

missing values were initially filled in using a vector of mean values and the covariance

matrix estimated from the EM algorithm for the observed data. Using the parameters

from the prior model, missing data were updated (imputed) by sampling a value from

the predictive distribution. Posterior parameters for multivariate normal predictive

72

means and covariance matrix were then estimated using the observed and imputed

values for all variables in the imputation model. This process was repeated until the

vector of means and the covariance matrix was unchanged between consecutive

iterations. This process was repeated to produce 10 complete datasets.

Caffeine was analyzed as a continuous and as a categorical variable. The categorical

indicators of caffeine exposure were created by identifying relevant cutpoints in

proximity to the 25th, 50th, 75th and 90th percentiles of reported caffeine consumption

(mg/day) at the baseline interview. The distribution of demographic, behavioral, and

reproductive characteristics were examined by quantiles of caffeine consumption.

Associations between caffeine and IUGR were quantified using inverse probability

weighted logistic regression [35, 75, 76]. For our analysis of categorical caffeine

exposure, we obtained predicted probabilities for an individual’s observed quantile of

caffeine exposure using multinomial logistic regression. Stabilized weights were

calculated by dividing the predicted probabilities obtained from an uninformed model

by the predicted probabilities obtained from a model with time‐fixed and time‐

dependent covariates. Weight models were evaluated using the Hosmer‐Lemeshow

Goodness‐of‐Fit Test [77]. For continuous exposures, weights were based on the

inverse of the normal density function identified using linear regression with time‐fixed

and time‐dependent covariates. Confidence intervals were calculated using

bootstrapped standard errors. We present the fraction of missing information (λ) as a

measure of relative precision of our missing data design versus a complete

73

ascertainment design of the same sample size. As the fraction of missing information

approaches 0, the precision obtained from our design with planned missing data

approaches the precision of a complete ascertainment design of the same sample size.

We evaluate time‐dependent effect measure modification of acetaminophen, smoking,

and gender on the association between caffeine and IUGR using a joint effects

approach. The joint effects approach enables assessment of additive effect measure

modification from odds ratios. Joint effect analysis weights were calculated using

multinomial logistic regression as described for the caffeine main effect models. The

magnitude and direction of departures from additivity were quantified using the

Relative Excess Risk Index (RERI)[78, 79]. Confidence intervals were calculated for the

RERI using a Taylor Series expansion to estimate the standard error[79]. For effect

modification on the association between caffeine and birth weight, we perform a linear

regression analysis with an interaction term. We report the average change in birth

weight per 100mg of caffeine by stratum of timing specific acetaminophen use, smoking

status, and gender.

74

RESULTS Within our study population, caffeine consumption was associated with demographic

characteristics, health behaviors, and health history (Table 1). As compared to women

not drinking caffeinated beverages, high caffeine consumers were tended to be

younger, not married, Hispanic, and of lower education. Caffeine consumers were more

likely to exhibit risk behaviors such as first trimester smoking and drinking.

To assess the time dependent effects of caffeine we, timing specific caffeine measures

were simultaneously included in our analytic models. We assessed for and ruled out

multi‐ colinearity based on the variance inflation factor using a threshold of 2 for

detection of colinearity. In our age adjusted models, we observed significantly

increased odds of IUGR in the first and third trimester among those consuming between

60 and 17 mg of caffeine per day and those consuming greater than 170 mg of caffeine

per day (Table 2). Measures of association from our fully adjusted models were

attenuated. In our analysis of quantiles of caffeine consumption, the association

between first trimester caffeine and IUGR remained significant after adjusting for age,

parity, education, BMI, prior chronic disease, first smoking (trimester1, week 20, 28 and

36), hypertension (week 20, 28 and 36), and alcohol (trimester 1, weeks 20, 28, and 36).

It is important to note that the fraction of missing information was substantially smaller

for first trimester exposures (λ=0.00‐0.002) as compared to second and third trimester

exposures (λ=0.05‐0.56). Consumption of greater than 170mg of caffeine per day was

associated with a 2‐fold increased odds of IUGR (95%CI:1.13‐4.01). Assessing first

75

trimester exposures as a continuous measure indicated that for each 100mg increase in

caffeine consumption, the odds of IUGR increased by 17% (OR=1.15, 95%CI: 1.04‐1.28).

We did not observe any significant associations or trends for caffeine exposure in the

second trimester. The magnitude and pattern of associations observed for exposures in

weeks 28 and 36 were suggestive of a potentially clinically relevant though not

statistically significant association. The quantile analysis for week 36 exposures was

suggestive of increased odds associated with even low levels of caffeine consumption

relative to those not consuming any caffeine. For each 100mg increase in third

trimester caffeine consumption, odds of IUGR increased by 16% (OR= 1.16 , 95%CI: 1.00‐

1.35)

Effect measure modification on the additive scale was assessed using joint effects

marginal structural models (Table 3). We failed to identify any statistically significant

departures from additively; however, measures of association between caffeine and

IUGR tended to be larger among acetaminophen users, smokers, and males. For the

assessment of additive effect modification by acetaminophen for third trimester

caffeine exposures, the observed RERIs were indicative of a potential synergistic

relationship (RERI=1.32, 95%CI ‐1.02‐3.65).

76

Associations between caffeine and birth weight were quantified using multivariable

adjusted linear regression. In both the first and third trimesters, we observed a non‐

significant birth weight reduction of approximately 25g per 100mg increase in caffeine.

The association between caffeine and birth weight was not significantly modified by

acetaminophen, smoking, or gender. Though not statistically significant, the magnitude

of the difference in the association between caffeine and birth weight among

acetaminophen users (Δ=‐102.71, 95%CI: ‐205.54‐0.12) and non‐users (Δ=‐‐21.29,

95%CI: ‐80.73‐38.16) is notable.

77

Discussion

Consisted with the hypothesized biological mechanism, we observed the strongest

associations between caffeine exposure and fetal growth for exposures occurring in the

first and third trimesters. Our analyses identified an association between first trimester

caffeine consumption independent of caffeine consumption later in pregnancy

suggesting the potential for placental etiology. We also observed an association

between third trimester caffeine consumption and fetal growth independent of earlier

caffeine exposures suggesting that caffeine may also act through reduced blood flow

during rapid fetal growth (week 36). We did not observe significant associations for

caffeine exposures occurring after placentation but prior to peak fetal growth (weeks 20

and 28). Our analysis of effect measure modification did not identify significant

modification by acetaminophen use and failed to confirm previous reports of effect

measure modification by smoking and gender.

A recent committee opinion from the American College of Obstetricians and

Gynecologists concluded that there was insufficient evidence that caffeine increases the

risk of IUGR[80]. Studies assessing caffeine intake greater than 300mg per day have

generally observed increased risks of IUGR and reduced birth weight [81‐84]. A large

prospective cohort study in the Netherlands observed significant increased risks of small

for gestational age among those consuming greater than 2 servings of caffeinated

beverages per day. Deviations in fetal weight and crown‐rump length as measured by

ultrasound were observed as early as the first trimester for those consuming greater

78

than 6 servings of coffee per day [84]. Studies of more moderate levels of caffeine

consumption vary in their conclusions [58, 85, 86]. Our finding of a significant increased

risk of IUGR is consistent with recent findings from comparably sized prospective

study[85]. The CARE study group assessed caffeine exposures in each trimester in a

cohort of 2635 pregnant women. While they observed significant increased risks of

IUGR for caffeine exposures in each trimester, the strongest associations were observed

for third trimester exposures. In the only randomized control trial of caffeine intake

during pregnancy, no association was observed between caffeine and birth weight[87].

Women with pre‐pregnancy caffeine consumption of greater than 3 cups per day were

randomized to consume caffeinated or decaffeinated coffee during their second and

third trimesters. Pregnancies randomized to the decaffeinated group were on average

16 grams (95%CI: ‐40‐70) larger than those consuming caffeinated beverages. This trial

did not assess the impact of caffeine during the first trimester nor did it assess

compliance with the study protocol after randomization. Observational studies

assessing caffeinated and decaffeinated coffee consumption have concluded that only

caffeinated coffee is associated with measures of fetal growth.

In a previous publication from the Nutrition in Pregnancy study, caffeine was

significantly associated with reduced birth weight; however, neither first trimester nor

third trimester caffeine exposure was significantly associated with IUGR [58]. The

disparity in the magnitude and significance of the odds ratios observed in our analysis

relative to the prior publication may be explained by differences in the source of

79

exposure data and in the analytic approach. The earlier publication relied on

postpartum recalled third trimester caffeine exposure while we relied on the

prospective measures of caffeine exposure. In relying on the prospective measures, our

analysis may be less susceptible to recall bias but requires stronger missing data

assumptions due to the designed missing data for interview data at week 20, 28, and 36.

Analytically, we estimate the marginal effect rather than the conditional effect. Recent

methodological work has demonstrated biases associated with estimation of the

conditional associations when relying on multivariable adjusted logistic regression due

to non‐collapsibility of the odds ratio [68, 88]. Furthermore, multivariable adjustment

for time‐varying variables (such as gestational hypertension and smoking) can lead to a

biased estimation of the measure of effect where the variable acts as both a confounder

and mediator [35, 75]. For example, caffeine consumption may be associated with IUGR

through its association with hypertension[89] or gestational diabetes[90] thus serving as

a mediators; however, being diagnosed with hypertension or gestational diabetes may

alter subsequent caffeine consumption thus serving as confounders. While still prone to

bias due to unmeasured confounding, inverse probability weighting is not susceptible to

bias due to non‐collapsibility of the odds ratio.

We assess for effect measure modification using the RERI in models relying on logistic

regression as opposed to including a cross‐product term because prior work has

concluded that additive effect measure modification is a better assessment of biological

interaction. The cross‐product terms in our linear regression models directly estimate

80

additive effect measure modification. The RERI can be interpreted as the excess risk

due to interaction relative to the risk without exposure. We did not observe any

significant interactions between acetaminophen, smoking, or gender and caffeine

exposures; however, our findings were suggestive of a potential synergistic relationship

between acetaminophen and caffeine. Given the magnitude of the difference between

regression parameters for birth weight by acetaminophen status, further exploration of

this potential relationship should be assessed in future research. Reliance on odds

ratios for calculation of the RERI may produce biased estimates of departures from

additivity[91]; however this is unlikely to be a problem in this investigation given the

prevalence of the outcome (8%) and the magnitude of the observed RERIs.

The NIP study was specifically designed to assess reproductive outcomes associated

with maternal caffeine consumption. For this reasons, our caffeine assessments were

designed to reduce recall bias and to prospectively measure caffeine at four potentially

relevant time points. Though urinary concentrations of caffeine metabolites were

available, we elected to rely primarily on the self reported measures because urinary

caffeine metabolites only reflect immediate caffeine exposures and would not provide

an accurate assessment of typical daily or weekly caffeine consumption. With a half life

of under 6 hours and only 0.5%‐2% being excreted through urine, urinary caffeine

metabolites may be a good indicator of recent caffeine consumption but would not

reflect typical weekly consumption. Though not assessed as a primary exposure,

81

paraxanthine, the primary metabolite of caffeine, was used to aid in the imputation of

missing self‐reported caffeine values.

The design of this study intentionally omitted collection of caffeine exposure at some

time points for some individuals. The missing data present in this study was a tool to

reduce subject burden and improve statistical efficiency. Prior publications have

demonstrated the validity and efficiency of studies with designed missing data. To be

validly implemented, the multiple imputation methods employed in this study assume

that the missing data is missing at random and that the imputation model is correctly

specified. Though we can ensure that the designed missing data is missing at random,

we cannot rule out the possibility that non‐designed missingess due to noncompliance

or loss to follow‐up introduced missing data that was not at random. To validate our

imputation model and to assess the susceptibility of the imputation model to

missingness not at random, we artificially introduced missing data into the subset of our

population with complete exposure data. We then applied our imputation model to this

subset and compared parameters (mean caffeine and OR between caffeine and IUGR)

between the complete data subset without artificial missing data and the complete data

subset with imputed values for artificial missing data. Artificial missing data was

introduced in two scenarios: 1) where missing data was completely at random and 2)

where missing data was dependent on the caffeine consumption and IUGR. These

analyses confirmed that our imputation model was correctly specified and robust to the

assessed scenario of missingness not at random. The missing data design performed

82

well in our assessment of main effects; however, may have impeded our ability to

identify significant departures from additivity in our joint effects models.

Our findings, in conjunction with biological plausibility and consistency with prior

epidemiological investigations, provide additional evidence that even moderate caffeine

exposure during pregnancy can result in a clinically significant reduction in fetal growth.

Though it may seem prudent to recommend a reduction in caffeine consumption during

pregnancy, it is important to note that there may be adverse consequences associated

with discontinuation of caffeine. To further resolve this question, future observational

studies should assess temporal patterns of caffeine consumption in addition to timing

specific quantities of caffeine consumption.

83

0 mg 0.1-8 mg 8.1-60 mg 60-170 mg > 170 mgn=545 n=592 n=573 n=340 n=227.2

Age<24 14.7 22.1 22.9 23.7 39.025-29 26.5 29.7 25.1 20.5 21.930-34 39.4 32.0 33.8 33.7 22.1>35 19.4 16.2 18.3 22.1 17.0

Marital statusMarried 81.9 73.4 70.5 62.6 40.2

EthnicityWhite 75.1 74.2 70.9 66.5 47.8Black 9.2 7.4 8.4 6.8 8.1Hispanic 12.5 15.3 18.7 24.2 43.2Other 3.1 3.1 2.0 2.6 0.9

Education<11 6.7 10.7 13.5 16.8 35.812 14.5 15.9 17.1 24.4 25.013-15 19.7 22.0 26.7 23.6 24.016 31.6 29.1 23.7 20.5 9.8>17 27.5 22.3 19.0 14.8 5.4

BMI< 20 17.0 16.3 14.5 12.6 14.620-25 53.0 52.8 48.9 49.4 46.725-30 20.5 21.2 23.6 20.9 21.7>30 9.5 9.8 13.0 17.1 17.0

First Trimester Caffeine

Table 3.1: Distribution of baseline characteristics by levels of first trimester caffeine consumption, Health and Nutrition in Pregnancy Study, 1996-2001

84

0 mg 0.1-8 mg 8.1-60 mg 60-170 mg > 170 mgn=545 n=592 n=573 n=340 n=227.2

1st trimester smoking (Cigs/day)0 95.0 89.3 86.0 76.4 50.61-5 3.8 7.4 8.1 10.8 18.86-10 0.7 2.3 4.1 7.3 12.0>10 0.6 1.1 1.8 5.6 18.6

1st trimester alcoholYes 23.9 33.8 37.0 39.2 35.0No 76.2 66.2 63.0 60.8 65.1

Pre-pregnancy healthChronic disease 8.3 8.4 9.5 10.8 17.2Emotional problems 5.6 6.8 4.7 9.6 13.6

Parity0 49.3 47.9 44.7 35.0 27.01 35.0 38.1 34.9 36.0 35.9>1 15.7 14.0 20.5 29.0 37.1

Prior pregnancy morbidityPregnancy hypertensio 3.5 3.5 4.6 7.0 4.6Preterm labor 1.6 4.6 3.8 4.3 6.6Gestational diabetes 2.5 0.4 2.8 2.3 2.6

Table 3.1 Continued: Distribution of baseline characteristics by levels of first trimester caffeine consumption, Health and Nutrition in Pregnancy Study, 1996-2001

First Trimester Caffeine

85

IPTW-Agea

IPTW-Fullb

Births IUGR λ OR (95%CI) OR (95%CI)Reported Caffeine

Trimester 1: 0 mg 545 30 1.00 (--) 1.00 (--)0-8 mg 592 48 0.01 0.61 (0.36-1.04) 0.54 (0.30-1.00)8-60 mg 573 57 0.01 1.37 (0.90-2.09) 1.68 (0.87-3.21)60-170 mg 340 29 0.00 2.78 (1.40-5.55) 1.44 (0.50-4.17)>170 mg 227 27 0.02 1.54 (1.04-2.30) 2.12 (1.13-4.01)

Week 20 0 mg 592 44 1.00 (--) 1.00 (--)0-8 mg 669 72 0.27 0.81 (0.46-1.42) 0.66 (0.35-1.23)8-60 mg 573 33 0.56 1.82 (0.78-4.23) 0.96 (0.24-3.78)60-170 mg 274 13 0.31 0.81 (0.47-1.39) 0.66 (0.21-2.07)>170 mg 169 29 0.27 1.13 (0.58-2.17) 0.81 (0.21-3.15)

Week 28 0 mg 537 42 1.00 (--) 1.00 (--)0-8 mg 632 55 0.06 1.36 (0.42-4.36) 0.34 (0.05-2.58)8-60 mg 645 43 0.11 1.55 (0.93-2.60) 1.28 (0.70-2.34)60-170 mg 299 31 0.12 1.80 (1.08-2.99) 1.34 (0.79-2.26)>170 mg 164 20 0.17 3.54 (1.31-9.57) 1.86 (0.66-5.29)

Week 36 0 mg 548 22 1.00 (--) 1.00 (--)0-8 mg 636 53 0.18 1.54 (0.78-3.03) 1.45 (0.46-4.59)8-60 mg 680 62 0.19 1.82 (0.83-4.00) 1.65 (0.52-5.21)60-170 mg 272 27 0.14 1.96 (0.90-4.25) 1.67 (0.52-5.29)>170 mg 141 27 0.05 3.47 (1.41-8.59) 2.24 (0.60-8.41)

Continuous (per 100mg) 2277 191Trimester 1: 0.19 1.17 (1.06-1.30) 1.15 (1.04-1.28)

Week 20 0.51 1.09 (0.95-1.24) 1.10 (0.95-1.27)Week 28 0.79 1.08 (0.91-1.28) 1.11 (0.91-1.34)Week 36 0.06 1.17 (1.01-1.34) 1.16 (1.00-1.35)

Table 3.2: Association between caffeine consumption and intrauterine growth retardation among full term live births, Health and Nutrition in Pregnancy Study, 1996-2001

λ: Fraction of missing information: Ratio of between imputation variance to total variancea: Adjusted for age and caffeine measuresb: Adjusted for age, caffeine measures, parity, education, bmi, prior chronic disease, first trimester smoking, smoking (week 20, 28 and 36), hypertension (week 20, 28 and 36), alcohol (trimester 1, weeks 20, 28, and 36)

86

No AcetaminophenOR (95%CI)

AcetaminophenOR (95%CI) RERI (95%CI)

First Trimestera

0 1.00 (--) 0.76 (0.26‐2.17)

0-8 1.44 (0.43‐4.86) 0.96 (0.31‐2.97) ‐0.25 (‐1.64‐1.15)

8-60 1.73 (0.75‐4.03) 1.42 (0.55‐3.72) ‐0.07 (‐1.05‐0.92)

> 60 1.10 (0.46‐2.62) 1.19 (0.44‐3.19) 0.33 (‐0.40‐1.06)

Third Trimestera

0 1.00 (--) 1.19 (0.43‐3.33)

0-8 1.34 (0.45‐3.96) 1.93 (0.73‐5.06) 0.39 (‐1.06‐1.85)

8-60 1.62 (0.59‐4.44) 2.43 (1.04‐5.69) 0.62 (‐0.79‐2.03)

> 60 1.56 (0.62‐3.98) 3.07 (1.09‐8.68) 1.32 (‐1.02‐3.65)

Nonsmoker Smoker

First Trimesterb

0 1.00 (--) 1.79 (0.57‐5.57)

0-8 1.29 (0.67‐2.47) ‐‐ ‐‐

8-60 1.53 (1.01‐2.32) 3.66 (1.80‐7.47) 1.35 (‐1.81‐4.50)

> 60 1.03 (0.62‐1.69) 3.22 (1.99‐5.21) 1.41 (‐0.98‐3.80)

Third Trimesterb

0 1.00 (--) 1.87 (0.82‐4.25)

0-8 0.89 (0.31‐2.56) 2.27 (0.11‐47.07) 0.51 (‐5.75‐6.77)

8-60 1.54 (0.76‐3.13) 1.97 (0.44‐8.80) ‐0.44 (‐3.50‐2.61)

> 60 1.58 (0.85‐2.97) 3.48 (1.99‐6.08) 1.03 (‐1.33‐3.39)

Male Female

First Trimesterc

0 1.00 (--) 1.01 (0.59‐1.71)

0-8 1.52 (0.66‐3.47) 0.83 (0.29‐2.40) ‐0.69 (‐2.26‐0.88)

8-60 1.51 (0.86‐2.67) 1.84 (1.06‐3.20) 0.32 (‐0.75‐1.39)

> 60 2.07 (1.22‐3.51) 1.11 (0.61‐2.04) ‐0.97 (‐2.21‐0.28)

Third Trimesterc

0 1.00 (--) 0.85 (0.41-1.76)0-8 0.92 (0.14-5.83) 0.85 (0.23-3.20) 0.08 (‐1.84‐2.01)

8-60 1.43 (0.62-3.30) 1.29 (0.60-2.75) 0.01 (‐1.40‐1.42)

> 60 2.02 (1.08-3.78) 1.50 (0.78-2.87) ‐0.37 (‐1.88‐1.13)

Table 3.3: Associations between joint effects of self reported caffeine intake and potential effect measure modifiers and intrauterine growth retardation among full term live births, Health and Nutrition in Pregnancy Study, 1996-2001

MSM Joint Effects

Weights from multinomial logistic regression with predictors age, parity, education, a: smoking (trimester 1, week 20, 28 and 36)b: alcohol (trimester 1, weeks 20, 28, and 36)c: smoking (trimester 1, week 20, 28 and 36)

87

Δ (95%CI) PInteraction Δ (95%CI) PInteraction

Self Reported CaffeineAll -24.41 (-51.75-2.92) -25.27 (-79.29-28.76)Acetaminophen

No -28.12 (-76.09-19.85) 0.87 -21.29 (-80.73-38.16) 0.15Yes -23.88 (-54.82-7.06) -102.71 (-205.54-0.12)

SmokingNo -20.15 (-61.88-21.57) 0.75 -27.45 (-107.3-52.39) 0.94

Yes -27.47 (-57.92-2.98) -24.97 (-73.73-23.79)Gender

Male -23.31 (-52.64-6.03) 0.90 -28.52 (-107.00-49.96) 0.74Female -25.77 (-63.34-11.8) -18.90 (-67.47-29.67)

Trimester 1 Trimester 3

Δ: Change in birth weight (grams) per 100mg increase in caffeine modeled using multivariable linear regression adjusted for age, parity, education, bmi, prior chronic disease, first trimester smoking, smoking (week 20, 28 and 36), and hypertension (week 20, 28 and 36)

Table 3.4: Association between caffeine consumption and birth weight among full term live births, Health and Nutrition in Pregnancy Study, 1996-2001

88

Figure 1: Directed Acyclic Graph for confounders of the association between caffeine and fetal growth

Time‐Fixed Confounders: Maternal age, ethnicity, parity, education, height, weight, pre‐pregnancy BMI, prior chronic disease.

Time‐Dependent Confounders: Smoking status, alcohol use, gestational hypertension, and medication use (Acetaminophen and

NSAIDs).

CaffT1 CaffT2 IUGR

Time Fixed

Time Dependent

Time Dependent

Time Dependent

89

General Discussion Through this series of papers, we have demonstrated the need to obtain and

analyze repeat exposure measures of time‐varying exposures, introduced a bias

correction method where exposure timing cannot be ascertained, introduced

design solutions to enable repeat exposure measures, and quantified the

association between timing specific caffeine consumption and fetal growth.

We extend the concepts of time‐dependent bias substantially from what has

been previously addressed in the methodological literature through our

assessment of transient exposures, average exposures, and jointly modeled

binary exposures. We have demonstrated that time‐dependent bias has the

potential to substantially impact the validity of analyses when exposure timing is

ignored in perinatal epidemiology. We demonstrate the utility of imputed

exposure event times to obtain unbiased effect estimates where true exposure

timing cannot be feasibly ascertained. This novel solution needs further

development to assess performance in preventing bias for continuous

exposures.

The use of time varying methods in perinatal epidemiology has been limited

because event times are often unknown and outcomes only become recognized

at birth (e.g. malformations, growth restriction). Though time‐varying methods

90

are more appropriate than time‐fixed methods, the validity of these models is

dependent on our knowledge of when the event actually occurred. Additional

methodological work should address the biases associated with various analytic

approaches to assessing outcomes with unknown event times in perinatal

epidemiology.

In our application of the bias correction method, we reveal how previous

analyses of adverse reproductive outcomes associated with delayed prenatal

care would be susceptible to time‐dependent bias. In fact, previous studies have

reported that delayed prenatal care is associated with a lower the risk of adverse

outcomes such as low birth weight and preterm birth[18] or that receiving any

prenatal care is unrealistically protective against a range of outcomes[47]. Given

this example and the potential consequences drawn from these findings, we feel

that careful review of existing literature relying on time‐fixed methods for time‐

varying exposures in pregnancy may be warranted.

Our design solution to rely on intentional missing data for prospective

assessment of exposure offers a novel alternative to improve efficiency and

reduce subject burden. In recognizing that traditional complete ascertainment

designs utilize designed missing data through the sampling process, we have

attempted to frame alternative missing data designs as a redistribution of

missing data to improve efficiency and reduce subject burden. Through our

91

assessment of the performance of missing data designs, we observed that the

efficiency of the designs is subject to the distribution of designed and non‐

designed missing data and the joint distributions of the exposure measures. Cost

parameters also have a large impact on the performance of missing data designs

relative to complete ascertainment designs. In addition to potential efficiency

advantages, missing data designs may increase validity by increasing sample size

and reducing selection bias at enrollment and due to loss to follow‐up.

Areas of future research within the methodological area of designed missing

data for prospective exposure assessment include development of methods for

power and sample size calculations and development of methods for optimizing

the distribution of designed missing data. Previous work addressing power and

missing data optimization for missing data designs do not allow for time varying

exposures or non‐designed missing data[30, 64]. It was not within the scope of

this study to develop methods for power and sample size calculations; however,

through data simulation, investigators can identify the relative efficiency for

candidate study designs by specifying cost parameter and exposure/outcome

distributions. Though potentially burdensome, this would be a valuable exercise

given the potential for increased efficiency, reduced bias, reduced non‐

compliance and loss to follow‐up.

92

In our third chapter quantifying time‐dependent associations between caffeine

and fetal growth, we utilized multiple imputation for designed and non‐designed

missing data. To be validly implemented, the multiple imputation methods

employed in this study assume that the missing data is missing at random and

that the imputation model is correctly specified. Given the extent of the

designed missingness within the NIP study design, we took care to validate our

imputation model. To validate our imputation model and to assess the

susceptibility of the imputation model to missingness not at random, we

artificially introduced missing data into the subset of our population with

complete exposure data. We then applied our imputation model to this subset

and compared parameters (mean caffeine and OR between caffeine and IUGR)

between the complete data subset without artificial missing data and the

complete data subset with imputed values for artificial missing data. Artificial

missing data was introduced in two scenarios: 1) where missing data was

completely at random and 2) where missing data was dependent on the caffeine

consumption and IUGR. The process of validating the imputation model is

important in any context, but is critical when imputing the primary exposure of

interest.

In perinatal epidemiology, accurate characterization of exposed person‐time and

ability to detect time‐dependent effects are both dependent on the validity of

estimated length of gestation. Outside the context of in‐vitro fertilization, the

93

precise date of conception is unknown; therefore estimated date of ovulation

that resulted in fertilization is used as a proxy for the start of pregnancy.

Previous studies have documented factors associated with misclassification of

gestational age; however, the potential for differential misclassification of

gestational age specific exposures has not been addressed. Misclassification of

gestational age based on LMP is well documented [92]. LMP based gestational

age estimates rely on accurate recall of the first day of the last menstrual period.

Studies have found that approximately 20% of pregnant of women indicate that

their reported LMP date is uncertain or unknown [93, 94]. Even among women

who report the date of their last menstrual period with certainty, there is

evidence of error due to digit preference [95] and mistaken reporting due to

skipped menstrual bleeding, mid‐cycle bleeding, or third trimester bleeding.

Evidence of LMP based GA misclassification, including implausible gestational

age specific birth weights and non‐normally distributed birth weights for preterm

and post term births, has been documented in several studies [96‐98]. In

contrast to LMP, ultrasound based estimates of gestational age rely on fetal

growth trajectories rather than a direct estimate of time in pregnancy. Factors

affecting accuracy of ultrasound measurement include timing of ultrasound,

facility characteristics, and maternal characteristics. Hadlock et al. present

precision of gestational age prediction models at various time‐points in

pregnancy[99]. They demonstrated that early ultrasound (12‐18 weeks based on

known LMP) may be accurate to within 1 to 2 weeks while later ultrasound (24‐

94

36 weeks based on known LMP) may be accurate to within 2 to 3 weeks.

Maternal characteristics such as central adiposity, preference in mode of

ultrasound, and factors associated with delayed prenatal care (e.g. access to

care, pregnancy intention) may impact the validity of ultrasound dating. Even if

measured accurately, estimated GA may still be misclassified due to variability in

growth trajectories[100]. Future research should explore missing data methods

as a potential tool for correcting biases associated with misclassification of

gestational age.

Traditionally, missing data in observational epidemiology has been viewed as a

nuisance. In this dissertation, we have attempted to utilize missing data as a

tool. In our first chapter on time‐dependent bias, we attempt to remove the

analytic bias by multiply imputing exposure time. In our second chapter, we

assess the validity and efficiency of study designs in which we deliberately have

missing data within our study sample. In our third chapter, missing data

methods are utilized in the design of the study and in our adjustment for

confounding. While the missing data methods implemented in this dissertation

are not novel, they are applied in a novel context to address common problems

in observational epidemiology.

95

1. Cunningham, F.G., et al., Chapter 5. Maternal Physiology: Williams

Obstetrics, 23e: http://www.accessmedicine.com/content.aspx?aID=6043606.

2. Kochenour, N.K., Adverse pregnancy outcome: sensitive periods, types of adverse outcomes, and relationships with critical exposure periods. Prog Clin Biol Res, 1984. 160: p. 229-35.

3. Czeizel, A.E., Specified critical period of different congenital abnormalities: a new approach for human teratological studies. Congenit Anom (Kyoto), 2008. 48(3): p. 103-9.

4. Karumanchi, S.A. and R.J. Levine, How does smoking reduce the risk of preeclampsia? Hypertension, 2010. 55(5): p. 1100-1.

5. Rooney, B.L., M.A. Mathiason, and C.W. Schauberger, Predictors of Obesity in Childhood, Adolescence, and Adulthood in a Birth Cohort. Matern Child Health J.

6. Oken, E., et al., Maternal gestational weight gain and offspring weight in adolescence. Obstet Gynecol, 2008. 112(5): p. 999-1006.

7. Beyerlein, A., et al., Associations of gestational weight loss with birth-related outcome: a retrospective cohort study. BJOG, 2010.

8. Bodnar, L.M., et al., Severe obesity, gestational weight gain, and adverse birth outcomes. Am J Clin Nutr, 2010. 91(6): p. 1642-8.

9. Yates, L., et al., Influenza A/H1N1v in pregnancy: an investigation of the characteristics and management of affected women and the relationship to pregnancy outcomes for mother and infant. Health Technol Assess. 14(34): p. 109-82.

10. Chen, Y.K., et al., No increased risk of adverse pregnancy outcomes in women with urinary tract infections: a nationwide population-based study. Acta Obstet Gynecol Scand. 89(7): p. 882-8.

11. van Gelder, M.M., et al., Characteristics of pregnant illicit drug users and associations between cannabis use and perinatal outcome in a population-based study. Drug Alcohol Depend. 109(1-3): p. 243-7.

12. Schempf, A.H. and D.M. Strobino, Illicit drug use and adverse birth outcomes: Is it drugs or context. Journal of Urban Health, 2008. 85(6): p. 858-873.

13. Lund, N., L.H. Pedersen, and T.B. Henriksen, Selective serotonin reuptake inhibitor exposure in utero and pregnancy outcomes. Arch Pediatr Adolesc Med, 2009. 163(10): p. 949-54.

14. Calderon-Margalit, R., et al., Risk of preterm delivery and other adverse perinatal outcomes in relation to maternal use of psychotropic medications during pregnancy. Am J Obstet Gynecol, 2009. 201(6): p. 579 e1-8.

15. Ververs, T.F., et al., Association between antidepressant drug use during pregnancy and child healthcare utilisation. BJOG, 2009. 116(12): p. 1568-77.

96

16. Ritz, B., et al., Ambient air pollution and preterm birth in the environment and pregnancy outcomes study at the University of California, Los Angeles. Am J Epidemiol, 2007. 166(9): p. 1045-52.

17. Vardavas, C.I., et al., Smoking and smoking cessation during early pregnancy and its effect on adverse pregnancy outcomes and fetal growth. Eur J Pediatr. 169(6): p. 741-8.

18. Hueston, W.J., et al., Delayed prenatal care and the risk of low birth weight delivery. J Community Health, 2003. 28(3): p. 199-208.

19. Daniels, J.L., et al., Attitudes toward participation in a pregnancy and child cohort study. Paediatr Perinat Epidemiol, 2006. 20(3): p. 260-6.

20. Nechuta, S., et al., Attitudes of pregnant women towards participation in perinatal epidemiological research. Paediatr Perinat Epidemiol, 2009. 23(5): p. 424-30.

21. Suissa, S., Immortal time bias in pharmacoepidemiology. Am J Epidemiol, 2009. 167(4): p. 492-499.

22. van Walraven, C., et al., Time-dependent bias was common in survival analyses published in leading clinical journals. J Clin Epidemiol, 2004. 57(7): p. 672-82.

23. Beyersmann, J., et al., An easy mathematical proof showed that time-dependent bias inevitably leads to biased effect estimation. J Clin Epidemiol, 2008. 61(12): p. 1216-21.

24. Beyersmann, J., M. Wolkewitz, and M. Schumacher, The impact of time-dependent bias in proportional hazards modelling. Stat Med, 2008. 27(30): p. 6439-54.

25. Tleyjeh, I.M., et al., Propensity score analysis with a time-dependent intervention is an acceptable although not an optimal analytical approach when treatment selection bias and survivor bias coexist. J Clin Epidemiol. 63(2): p. 139-40.

26. Andres Houseman, E. and D.K. Milton, Partial questionnaire designs, questionnaire non-response, and attributable fraction: applications to adult onset asthma. Stat Med, 2006. 25(9): p. 1499-519.

27. Wacholder, S., et al., The partial questionnaire design for case-control studies. Stat Med, 1994. 13(5-7): p. 623-34.

28. Graham, J.W., et al., Planned missing data designs in psychological research. Psychol Methods, 2006. 11(4): p. 323-43.

29. Graham, J.W., S.M. Hofer, and D.P. MacKinnon, Maximizing the usefulness of data obtained with planned missing value patterns: An application of maximum likelihood procedured. Multivariate Behavioral Research, 1996. 31(2): p. 197-218.

30. Morara, M., et al., Optimal design for epidemiological studies subject to designed missingness. Lifetime Data Anal, 2007. 13(4): p. 583-605.

31. Helms, R.W., Intentionally incomplete longitudinal designs: I. Methodology and comparison of some full span designs. Stat Med, 1992. 11(14-15): p. 1889-913.

32. Graham, J.W., Missing data analysis: making it work in the real world. Annu Rev Psychol, 2009. 60: p. 549-76.

97

33. Howards, P.P., E.F. Schisterman, and P.J. Heagerty, Potential confounding by exposure history and prior outcomes: an example from perinatal epidemiology. Epidemiology, 2007. 18(5): p. 544-51.

34. Howards, P.P., et al., Misclassification of gestational age in the study of spontaneous abortion. Am J Epidemiol, 2006. 164(11): p. 1126-36.

35. Bodnar, L.M., et al., Marginal structural models for analyzing causal effects of time-dependent treatments: an application in perinatal epidemiology. Am J Epidemiol, 2004. 159(10): p. 926-34.

36. M, S.O.N., et al., Have studies of urinary tract infection and preterm delivery used the most appropriate methods? Paediatr Perinat Epidemiol, 2003. 17(3): p. 226-33.

37. Symons, M.J. and D.T. Moore, Hazard rate ratio and prospective epidemiological studies. J Clin Epidemiol, 2002. 55(9): p. 893-9.

38. Hernan, M.A., The hazards of hazard ratios. Epidemiology. 21(1): p. 13-5.

39. Lee, E.T., Statistical methods for survival data analysis, 2nd edition. Probability and Mathematical Statistics, ed. V. Barnett, et al. 1992, New York, New York: Wiley Interscience.

40. Clausson, B., et al., Effect of caffeine exposure during pregnancy on birth weight and gestational age. Am J Epidemiol, 2002. 155(5): p. 429-36.

41. Therneau, T.M. and P.M. Grambsch, Modeling survival data: extending the Cox model, ed. K. Dietz, et al. 2000, New York: Springer.

42. Roth, J. NCHS's Vital Statistics Natality Birth Data -- 1968-2006. 2009 [cited 2010; Available from: http://www.nber.org/data/vital-statistics-natality-data.html.

43. Alexander, G.R. and C.C. Korenbrot, The role of prenatal care in preventing low birth weight. Future Child, 1995. 5(1): p. 103-20.

44. Guillory, V.J., et al., Prenatal care and infant birth outcomes among Medicaid recipients. J Health Care Poor Underserved, 2003. 14(2): p. 272-89.

45. Shwartz, S., Prenatal care, prematurity, and neonatal mortality. A critical analysis of prenatal care statistics and associations. Am J Obstet Gynecol, 1962. 83: p. 591-8.

46. Molenberghs, G. and M. Kenward, Missing data in clinical studies, ed. S. Senn and V. Barnett. 2007: John Wiley & Sons.

47. Vintzileos, A.M., et al., The impact of prenatal care in the United States on preterm births in the presence and absence of antenatal high-risk conditions. Am J Obstet Gynecol, 2002. 187(5): p. 1254-7.

48. Hertz-Picciotto, I., L.M. Pastore, and J.J. Beaumont, Timing and patterns of exposures during pregnancy and their implications for study methods. Am J Epidemiol, 1996. 143(6): p. 597-607.

49. Nunes, A.P., et al., Time dependent bias of non-binary exposures:examples in perinatal epidemiology. 2010.

50. Savitz, D.A., et al., Epidemiologic measures of the course and outcome of pregnancy. Epidemiol Rev, 2002. 24(2): p. 91-101.

98

51. Golding, J. and C. Steer, How many subjects are needed in a longitudinal birth cohort study? Paediatr Perinat Epidemiol, 2009. 23 Suppl 1: p. 31-8.

52. Wacholder, S., The case-control study as data missing by design: estimating risk differences. Epidemiology, 1996. 7(2): p. 144-50.

53. Hogue, C.J. and M.A. Brewster, The potential of exposure biomarkers in epidemiologic studies of reproductive health. Environ Health Perspect, 1991. 90: p. 261-9.

54. Yuan, Y. Multiple imputation for missing values: Concepts and new development. in SUGI. 2000. Rockville, MD.

55. Rubin, D.B., Multiple Imputation for Nonresponse in Surveys, ed. J.W. Sons. 1987, New York.

56. Schafer, J.L., Multiple imputation in multivariate problems when the imputation and analysis models differ. Statistica Neerlandica, 2003. 57(1): p. 19-35.

57. Schafer, J.L. and M.K. Olsen, Multiple imputation for multivariate missing-data problems: A data analyst's perspective. Multivariate Behavioral Research, 1998. 33(4): p. 545-571.

58. Bracken, M.B., et al., Association of maternal caffeine consumption with decrements in fetal growth. Am J Epidemiol, 2003. 157(5): p. 456-66.

59. Adiguzel, F. and M. Wedel, Split Questionnaire Design for Massive Surveys. Journal of Marketing Research, 2008. 45(5): p. 608-617.

60. Chipperfield, J.O. and D.G. Steel, Design and Estimation for Split Questionnaire Surveys. Journal of Official Statistics, 2009. 25(2): p. 227-244.

61. Raghunathan, T.E. and J.E. Grizzle, A Split Questionnaire Survey Design. Journal of the American Statistical Association, 1995. 90(429): p. 54-63.

62. Newman, S.C., P.E. Shrout, and R.C. Bland, The efficiency of two-phase designs in prevalence surveys of mental disorders. Psychol Med, 1990. 20(1): p. 183-93.

63. Shrout, P.E. and S.C. Newman, Design of two-phase prevalence surveys of rare disorders. Biometrics, 1989. 45(2): p. 549-55.

64. Brown, C.H., A. Indurkhya, and G.K. Sheppard, Power calculations for data missing by design: Applications to a follow-up study of lead exposure and attention. Journal of the American Statistical Association, 2000. 95(450): p. 383-395.

65. Resnik, R., Intrauterine growth restriction. Obstet Gynecol, 2002. 99(3): p. 490-6.

66. Nomura, K., et al., Caffeine suppresses the expression of the Bcl-2 mRNA in BeWo cell culture and rat placenta. J Nutr Biochem, 2004. 15(6): p. 342-9.

67. Kirkinen, P., et al., The effect of caffeine on placental and fetal blood flow in human pregnancy. Am J Obstet Gynecol, 1983. 147(8): p. 939-42.

68. Austin, P.C., The performance of different propensity score methods for estimating marginal odds ratios. Stat Med, 2007. 26(16): p. 3078-94.

99

69. Vik, T., et al., High caffeine consumption in the third trimester of pregnancy: gender-specific effects on fetal growth. Paediatr Perinat Epidemiol, 2003. 17(4): p. 324-31.

70. Scialli, A.R., et al., A review of the literature on the effects of acetaminophen on pregnancy outcome. Reprod Toxicol, 2010. 30(4): p. 495-507.

71. Burdan, F., Effects of prenatal exposure to combination of acetaminophen, isopropylantipyrine and caffeine on intrauterine development in rats. Hum Exp Toxicol, 2002. 21(1): p. 25-31.

72. Burdan, F., Intrauterine growth retardation and lack of teratogenic effects of prenatal exposure to the combination of paracetamol and caffeine in Wistar rats. Reprod Toxicol, 2003. 17(1): p. 51-8.

73. Grosso, L.M., et al., Prenatal caffeine assessment: fetal and maternal biomarkers or self-reported intake? Ann Epidemiol, 2008. 18(3): p. 172-8.

74. . Centers for Disease Control and Prevention, National center for Health Statistics. 1999 natality detail file, issued June 2001. (NCHS CD-ROM series 21, no. 12H, ASCII version.

75. Robins, J.M., M.A. Hernan, and B. Brumback, Marginal structural models and causal inference in epidemiology. Epidemiology, 2000. 11(5): p. 550-60.

76. Cole, S.R. and M.A. Hernan, Constructing inverse probability weights for marginal structural models. Am J Epidemiol, 2008. 168(6): p. 656-64.

77. Hosmer, D.W., et al., A comparison of goodness-of-fit tests for the logistic regression model. Stat Med, 1997. 16(9): p. 965-80.

78. Rothman, K.J., S. Greenland, and T.L. Lash, Modern Epidemiology. 3rd ed, ed. S. Seigafuse and L. Bierig. 2008, Philadelphia, PA: Lippincott Wolliams & Wilkins.

79. Hosmer, D.W. and S. Lemeshow, Confidence interval estimation of interaction. Epidemiology, 1992. 3(5): p. 452-6.

80. ACOG CommitteeOpinion No. 462: Moderate caffeine consumption during pregnancy. Obstet Gynecol, 2010. 116(2 Pt 1): p. 467-8.

81. Martin, T.R. and M.B. Bracken, The association between low birth weight and caffeine consumption during pregnancy. Am J Epidemiol, 1987. 126(5): p. 813-21.

82. Fenster, L., et al., Caffeine consumption during pregnancy and fetal growth. Am J Public Health, 1991. 81(4): p. 458-61.

83. Peacock, J.L., J.M. Bland, and H.R. Anderson, Effects on birthweight of alcohol and caffeine consumption in smoking women. J Epidemiol Community Health, 1991. 45(2): p. 159-63.

84. Bakker, R., et al., Maternal caffeine intake from coffee and tea, fetal growth, and the risks of adverse birth outcomes: the Generation R Study. Am J Clin Nutr, 2010. 91(6): p. 1691-8.

85. Maternal caffeine intake during pregnancy and risk of fetal growth restriction: a large prospective observational study. BMJ, 2008. 337: p. a2332.

100

86. Bracken, M.B., et al., Heterogeneity in assessing self-reports of caffeine exposure: implications for studies of health effects. Epidemiology, 2002. 13(2): p. 165-71.

87. Bech, B.H., et al., Effect of reducing caffeine intake on birth weight and length of gestation: randomised controlled trial. BMJ, 2007. 334(7590): p. 409.

88. Williamson, E., et al., Propensity scores: From nave enthusiasm to intuitive understanding. Stat Methods Med Res, 2011.

89. Bakker, R., et al., Maternal Caffeine Intake, Blood Pressure, and the Risk of Hypertensive Complications During Pregnancy. The Generation R Study. Am J Hypertens, 2010.

90. Adeney, K.L., et al., Coffee consumption and the risk of gestational diabetes mellitus. Acta Obstet Gynecol Scand, 2007. 86(2): p. 161-6.

91. Kalilani, L. and J. Atashili, Measuring additive interaction using odds ratios. Epidemiol Perspect Innov, 2006. 3: p. 5.

92. Lynch, C.D. and J. Zhang, The research implications of the selection of a gestational age estimation method. Paediatr Perinat Epidemiol, 2007. 21 Suppl 2: p. 86-96.

93. Buekens, P., et al., Epidemiology of pregnancies with unknown last menstrual period. J Epidemiol Community Health, 1984. 38(1): p. 79-80.

94. Hall, M.H., et al., The extent and antecedents of uncertain gestation. Br J Obstet Gynaecol, 1985. 92(5): p. 445-51.

95. Waller, D.K., et al., Assessing number-specific error in the recall of onset of last menstrual period. Paediatr Perinat Epidemiol, 2000. 14(3): p. 263-7.

96. Dietz, P.M., et al., A comparison of LMP-based and ultrasound-based estimates of gestational age using linked California livebirth and prenatal screening records. Paediatr Perinat Epidemiol, 2007. 21 Suppl 2: p. 62-71.

97. Haglund, B., Birthweight distributions by gestational age: comparison of LMP-based and ultrasound-based estimates of gestational age using data from the Swedish Birth Registry. Paediatr Perinat Epidemiol, 2007. 21 Suppl 2: p. 72-8.

98. Ananth, C.V., Menstrual versus clinical estimate of gestational age dating in the United States: temporal trends and variability in indices of perinatal outcomes. Paediatr Perinat Epidemiol, 2007. 21 Suppl 2: p. 22-30.

99. Hadlock, F.P., et al., Estimating fetal age: computer-assisted analysis of multiple fetal growth parameters. Radiology, 1984. 152(2): p. 497-501.

100. Henriksen, T.B., et al., Bias in studies of preterm and postterm delivery due to ultrasound assessment of gestational age. Epidemiology, 1995. 6(5): p. 533-7.