Effect of the Number of Dating Partners on Marriage and ...
Transcript of Effect of the Number of Dating Partners on Marriage and ...
Effect of the Number of Dating Partners on Marriage and Cohabitation Formation: A
Bayesian Analysis with Nonparametric Endogeneity
Abstract: In this paper we investigate the impact of the number of dating partners on first
marriage and cohabitation formation. Since the number of dating partners might be correlated
with unobserved preferences, the analysis is done in a framework that permits its potential
endogeneity. We also allow the number of dating partners to affect marriage and cohabitation
formation nonparametrically. We use data from the National Longitudinal Survey 1997 cohort
and Bayesian estimation techniques. We find that total number of dating partners is indeed
endogenous in this context. Our results show that after controlling for endogeneity, an increase in
the number of dating partners increases the probability of both marriage and cohabitation for
both men and women, although the effect is stronger in men for marriage and women for
cohabitation.
Keywords: Bayesian estimation; semiparametric probit; endogeneity; marriage; cohabitation;
dating
1. Introduction
Dating is normally associated with pairing-off of romantic partners. Dating partners, in general,
may or may not be actual long term partners in more formal unions, i.e., cohabitation or
marriage. Dating has established itself as a social institution but its effects on family formation
are not well understood. Although the last few decades have seen a substantial amount of
research on marriage and cohabitation, dating and its impacts on family formation have remained
relatively unexplored. In this paper we address the question as to how the cumulative number of
dating partners affects future unions (cohabitation or marriage). In particular, we focus first
marriage or first cohabiting relationship. The recent advent of online matching websites makes
this issue even more interesting because these websites reduce the cost of finding a dating
partner. As a result, we may expect an increase in the number of partners an individual dates
over his or her lifetime. What this will mean for family formation is an important question.
Presumably dating provides individuals with information that affects the value of future
unions. Information obtained from dating are presumably different from information obtained
from cohabitation in dating one learns about the general characteristics of the opposite sex,
while cohabitation gives more detailed information about a particular individual. Cohabitation
seemingly allows two potential partners to learn about each other before marriage, and thereby
provides valuable information regarding the future value and stability of marriage, although the
empirical evidence on this is mixed. A number of studies have associated premarital cohabitation
with higher divorce rates (Lillard et al., 1995), while Svarer (2004) concludes that premarital
cohabitation reduces the risk of divorce.
We are not aware of any study that explores either the association or the causal
relationship between the number of dating partners and union formation, although a few recent
studies have investigated mate selection at the dating stage. Cawley et al. (2006) report that
heavier boys and girls are less likely to date, but for sexual activity the evidence is less
consistent. They hypothesize that dating an obese person may adversely affect the reputation and
hence the long-term opportunities of the dating individual. Using experimental data from a
speed-dating experiment, Fisman et al. (2006) find that compared to women, men care more
about the physical attractiveness of their prospective partners.
What kind of impact should we expect the number of dating partners to have on union
formation? Two competing hypotheses can be posited on this issue. On the one hand, the more
one learns about the other gender, the more valuable (both in terms of quality and stability) a
future union becomes. This will be the case when individuals of a particular gender share
common characteristics and have similar tastes and preferences. In this case dating would
provide an individual with the knowledge about those characteristics and preferences and thereby
raise the utility generated by marriage. Therefore, in this scenario, an increase in dating partners
should positively affect union formation. On the other hand, if individuals of a particular gender
are very different, then the knowledge about one individual does not help in any other
relationship, and we would expect that dating should not change the value of future marriage.
Testing these competing hypotheses, however, might not be straightforward due to the presence
of unobserved individual-level characteristics. For instance, suppose those individuals who put
more effort to seek dating partners also spend more resources in finding marriage partners. If this
characteristic is not controlled for, we would observe a positive association between the number
of dating partners and union formation, even though there might not be any causal link between
the two variables. On the other hand, we may observe a negative relationship between the
number of dating partners and long term union formation, if the unobserved characteristic is
aversion toward long-term monogamy. In this case, individuals with a large number of dating
partners are less likely to form long term relationships. Therefore, in order to investigate the
causal impact of the number of dating partners on union formation, it is important to treat the
number of dating partners variable as endogenous. In fact, there is statistical evidence suggesting
the presence of unobserved endogeneity. Hence, in our analysis we explicitly control for the
unobserved confounding using a version of the standard treatment-response model.
Another innovation in our modeling is that we allow the number of dating partners to
affect marriage and cohabitation formation nonparametrically. The default specification in the
applied treatment-response literature is based on the linear regression model that assumes a
linear relationship between the treatment and the predictor of the outcome. This linear
relationship, however, might not be credible in the current application. For instance, the marginal
impact of the first date is presumably different from the marginal impact of, say, the 50th date.
Therefore, we follow the nonparametric approach of Kline and Tobias (2008), which allows us to
explore the presumably non-linear impact of the number of dating partners.
The cost of this flexible nonparametric approach is that we need to restrict our
specification to a binary choice model due to computational concerns. Specifically, when we
analyze cohabitation formation, marriage is not treated as a competing risk, and vice versa. This
is because even though in theory we could fit, say, a Multinomial Probit model (MNP) when the
endogenous variable consists of more than two categories, it poses several computational
problems in practice, especially when we include the nonparametric component that requires
intense computation. Therefore, although restricting to a binary choice model could be a
potential limitation, we feel that it is a sacrifice one has to make for exploring the potential non-
linear impact of the endogenous variable.
We use lagged obesity status and the total hours worked as instruments for the total
number of dates for females. Obese women are less likely to attract many potential partners
(Cawley et al., 2006). Also, obesity and work are unlikely to affect marriage directly other than
through affecting the number of dating partners. For men, however, hours worked might not be a
suitable instrument as it is related to income. Several authors have identified income as one of
the factors that is associated with marriage for men (Clarkberg, 1999; Macdonald and Rindfuss,
1981; Mare and Winship, 1991; Oppenheimer et al., 1997; Sweeny, 2002); Oppenheimer et al.
(1997) find that employment is positively related with marriage for men. Xie et al. (2003) use
several measures of earning potential, and find that earning potential positively influences the
likelihood of marriage for men but not for women. They also found that earning potential does
not affect entry into cohabiting unions. Given the evidence in the literature, we feel that hours
worked can be a valid instrument for women but not for men. Height, on the other hand, is
unlikely to directly affect the probability of marriage (Cawley et al., 2006) other than through the
number of dating partners. Therefore, we use height as the second instrument for men.
For instrumental variable results to be credible the instruments are required to be strong
and valid (i.e., instruments are conditionally uncorrelated with union formation). If obesity
affects marriage directly, other than through affecting the number of dating partners (for
example, if obesity is a measure of health), then this instrument will not be valid. Even though
there might be some concerns about the validity of the instruments (as in most of the studies
involving instruments), we also have good reasons to believe that they are valid in this context.
In fact, all we need for our instruments to be valid is that they are conditionally uncorrelated with
the errors. Given that we use an extensively rich set of variables, including standard measures of
health and income, this condition is likely to be satisfied.
One potential problem with our instruments for men is that both the instruments share a
common rationale (physical attractiveness). Murray (2006) criticized Levitt (1996, 1997) on the
ground that his instruments share a common rationale. But in the same paper Murray suggests
(page 116) that "Intuitively, if Levitt knew that he had enough surely valid instruments to exactly
identify his crime equation, he could use those instruments alone to carry out a consistent two-
stage least squares estimation in which the remaining potential instruments were included
among the explanators (that is, in X), rather than being used as instruments (that is, in Z).
Failing to reject the null hypothesis that these remaining potential instruments all have zero
coefficients in the second stage of two-stage least squares when included in X as explanators
would support the validity of those extra variables as instruments. The key to this strategy's
success is knowing for sure that an exactly identified subset is indeed valid so that two-stage
least squares estimation is both possible and valid." This is very close to what we do to check
the validity of our instruments. Since we have two instruments, we conduct a version of the
standard over-identification test: we first assume one instrument is valid, then include the other
in the response equation and calculate the Bayes factor in favor of the hypothesis that its
associated coefficient is zero. We then repeat the exercise by assuming the other instrument is
valid. We present evidence regarding the relevance and validity of our instruments in results
section. In addition to formal tests, we also check whether our instruments are correlated with
unobserved variables relevant to cohabitation/marriage decision. We follow a strategy along the
lines of Angrist and Krueger (1991) as suggested by Murray (2005). We report the results in
results section.
We use data from the National Longitudinal Survey 1997 cohort (hereafter NLSY97) for
the analysis. In our data the oldest respondents are 24 years old at the time of last interview. We
observe very few divorces and almost no second marriages. We therefore focus on first entry into
either a cohabiting relationship or marriage. Using NLSY97 for this paper is particularly useful,
if not absolutely necessary. This is because NLSY97 is the only nationally representative
longitudinal dataset we are aware of that has detailed information on the respondents' history of
dating behavior, along with a variety of other information on education, labor market behavior
and health that are necessary for our purpose. The main disadvantage of using this dataset is that
the respondents are relatively young. If young adults utilize the information obtained from dating
in a different way than older adults, then our results might not generalize to older adults.
Nonetheless, this is a first step in exploring the relationship between the number of dating
partners and union formation.
Using the NLSY97 data, we find that an increase in the number of dating partners
increases marriage and cohabitation formation, after controlling for other relevant variables. We
also find that the impacts of the number of dating partners are different for men and women; the
impact on cohabitation formation is stronger in women while the impact on marriage is stronger
in men. The rest of this article is organized as follows. In estimation section we first introduce a
treatment-response model in which the endogenous variable, the number of dating partners,
enters the response equation non-parametrically. The construction of a Gibbs sampler is then
discussed. The data used in the analysis is described in data section and the empirical results are
presented in results section
2. The Empirical Model and Bayesian Estimation
The central focus of our analysis is to investigate the impact of the number of dating
partners, as a proxy for the knowledge about the opposite sex, on first entry into marriage and
cohabitation. To determine the exact nature of the relationship, we seek to allow for extra
flexibility by modeling such an impact non-parametrically. In addition, since there are reasons to
suspect that some unobserved characteristics might affect both the number of dating partners and
marriage (cohabitation) decision, the analysis is done in a framework that permits the potential
endogeneity of the number of dating partners variable. Specifically, we consider the following
two-equation system:
)2(
)1()0)((
ititit
itititit
vs
usfy
w
x1
where yit is the marital status of the i th individual at time t , which takes the value 1 if the
individual is married and 0 otherwise, sit is the cumulative number of dating partners from the
time the i th individual starts dating till he or she enters into an union, x it and wit are vectors
containing a variety of individual characteristics that influence respectively the decision of
getting married and the number of dating partners, )(f is a smooth but otherwise unrestricted
function, and )(1 is the indicator function, which takes the value 1 if the event in the
parenthesis is true and 0 otherwise. We assume the error terms ),(ititit
vu follow a bivariate
normal distribution:
where the variance of u it is assumed to be unity for identification purposes (see, for example,
Train, 2003, p.27). We implicitly treat the number of dating partners as if it was a continuous
variable due to computational convenience. Given that a substantial portion of the subjects have
more than a dozen dating partners (males have on average 28 partners and females 17), we feel
that this approximation is appropriate.
The potential problem of endogeneity is handled by explicitly allowing for a possible
correlation ( vuvuv
/ ) between yit and sit . Put differently, all unobserved
characteristics simultaneously affecting both the marriage decision and the number of dating
partners are captured by the correlation parameter uv
. In addition, it also gives a quantitative
measure of the extent of the endogeneity problem.
The above model is estimated via the Bayesian approach. In particular, Markov chain Monte
Carlo (MCMC) techniques are used to generate draws from the posterior distribution. References
on the Bayesian methodology in general and econometrics applications in particular can be
found in Gelman et al. (2003), Koop (2003), Koop et al. (2007) and Lancaster (2005), among
many others. In what follows, we discuss the construction of a Gibbs sampler. Following the data
augmentation approach of Albert and Chib (1993), we rewrite (1) in terms of the latent variables
}{it
y as
)3(,)(itititit
usfy x
where ).0(itit
yy 1 Several approaches have been proposed to handle the nonparametric
specification of the endogenous variable sit , and we adopt the one described in Koop and
Poirier (2004). In particular, we follow the implementation of Kline and Tobias (2008), who
constructed a continuous treatment-response model with skewed errors for the treatment
equation.
Firstly, we sort the data by values of s such that s1 denotes the smallest value in the sample,
1kkss for 1,,2 kk , and sk is the largest value. Define ))(,),((
1 ksfsf ,
then equation (3) can be rewritten as
)4(,itititit
uy dx
where dit is constructed to select off the appropriate element of . Stacking (4) and (2) over
t and then i , we have the system
)6(
)5('*
vWs
uDxyit
where )I,(,|),(2 TT
0WXvu N , i
N
i TT 1 , and T i denotes the number of
observations for the i th individual.
It is worth noting that in the above model there might be as many parameters as observations
(and possibly more). To circumvent the problem of insufficient observations, we place an
informative prior on the first-differences of point wise slopes of the function )(f . Intuitively,
since the function )(f is assumed to be smooth, the first-differences of pointwise slopes are
expected to be small. In particular, if all the first-differences are all zero, it reduces to the linear
specification, i.e., bxaxf )( . We introduce the quantity H , where the matrix H is
defined as
and kjssjjj
,,2,1
. As discussed in Kline and Tobias (2008), the first two
elements of H are pair of ``initial conditions": 11 and
22 , while the other
elements are first-differences of point wise slopes
).()(
,,3,
21
21
21
1
1
jj
jj
jj
jj
jj
j
sfsf
kjssss
The prior on is assumed to be
))(,(| V0N
where )(V is a kk block diagonal matrix with 10I2 on the upper block and
2I
k
on the lower block. In this way, we specify a reasonably flat prior on the initial conditions 1
and 2 , while the prior on the differences of point wise slopes are centered around 0 with the
``smoothness" controlled by the smoothing parameter . Finally, this prior on induces a
prior on the function values
))(,(| V0N
where HVV11
)(H)( . The following priors complete the specification of the model:
),2/,2/(
)),(,(
),1(),(
),,(
),,(
11
S
s
IG
N
IW
N
N
V0
1S
V0
V0
where ),(IW and ),(IG denote respectively the inverse-Wishart and inverse-gamma
distributions, and s11 is the first diagonal element of .
Define
.,,0
,
it
it
it
it
it
it
it
it
s
yz
dD
w0
0xX
Stacking the observations first over t and then i , the posterior simulator consists the
following 5 steps:
1) Sample sy ,,,,| by drawing
),,(,,,,| DdDsy N
Where
.))(I(,)I(0
11111VDzXdVXXD
TT
2) Sample sy ,,,,| by drawing
),,(,,,,| DdDsy N
where
).)(I(,)()I(1111
XzDdVDDDTT
3) Sample sy ,,,,| by drawing
),1(),(,,,,|2
11
11
1Ssyitit
T
t
N
i
i
TIW
where 2
11 is the first diagonal element of . This can be done by the algorithm described in
Nobile (2000).
4) Sample sy ,,,,,|itit
y by drawing
,0 if),,(
,1 if),,(,,,,,,|
)0,(
),0(
ititit
ititit
itityv
yvy
TN
TNsyy
where ),(2
),( baTN denotes a normal distribution with mean and variance 2 truncated
to the interval ),( ba , and it
y is the vector y except it
y . Generating random variates from
the truncated normal distribution can be done by the inverse-transform method or various
efficient rejection methods (Geweke, 1991; Robert, 1995).
5) Sample sy ,,,,| by drawing
.2
,2
)2(,,,,|
2
3 k
k
kSkIGsy
3. Data
The data used in this article is from the National Longitudinal Survey 1997 cohort
(NLSY97), which is a nationally representative sample of 9022 youths, aged 12 to 16 as of
December 31, 1996. The first wave of the survey was conducted in 1997, and since then data has
been collected annually. We use the first eight waves of the data that are currently available.
In each interview respondents were asked about their height, weight, and relationship
status, among other things. During the sample period, 440 males (out of 3360 males) and 675
females (out of 3291 females) entered into their first cohabiting relationships. During the same
period, 255 males and 421 females entered into their first marriages. Thus even with our
relatively young sample, we do observe a considerable number of union formation. Given the
nature of our sample, marriage is almost an absorbing state (we have 69 divorces in the sample
and almost no second marriages).
Estimated model has a discrete time duration structure and the relevant summary statistics (based
on person-year observations) are presented in Table (1). For example, for cohabitation
relationships, each interview where the respondent was single is coded as zero. The first time a
respondent entered into a cohabiting relationship, it is coded as one, and then we ignore
information on that individual for the rest of the sample period. If an individual never entered
into any type of union during our sample period then he/she is coded as zero for all interviews.
The marriage variable is constructed similarly. We focus our analysis on individuals who are 18
years old and over. The average age of respondents in the sample is 19.8 years old with 24 being
the highest age. Although the relatively young age of the respondents is a limitation, this is best
available dataset for studying this topic.
In our data the number of dating partners shows a considerable variation. Males in our
sample have more dating partners (about 28 on average) than females (about 17 on average).
45% (25%) of males (females) have more than 20 dating partners, and 16% (6%) have more than
50 dating partners. Two other points about the number of dating partners are worth mentioning.
First, even though we focus our analysis on 18-24 year-olds, the total number of dating partners
includes all dating partners, i.e., including those partners whom the subjects dated before they
were 18 years old. Second, there might be some double (or more) counting of dating partners.
For example, if someone dated p individuals between time t and 1t and q individuals
between t+1 and 2t , we assume that at time 2t the subject has dated qp individuals.
However, there might be some overlap between the two periods. In the extreme case where the
subject dated the same individual for seven consecutive years, we would treat that as if he or she
had dated a total of seven persons. Since the subject's dating partners are not identified in the
NLSY97 data, it is not possible to avoid double counting.
We calculate the Body Mass Index (BMI) for each individual by using the following formula:
.703inches)in (height
poundsin weight BMI
2
We follow NIH guidelines to classify individuals with a BMI over 30 as obese. We drop
observations that are obviously erroneous, e.g. if BMI is greater than 60 (17 observations) or if
BMI is less than 10 (4 observations).
In addition, we control for race, age, education, and indicator variables for smokers, race, urban
residency, self-reported health status, pregnancy status and indulgence in risky behavior like
alcohol or marijuana use. For alcohol and marijuana, we use the number of days they consumed
those products in the 30 days preceding the interview as our control variable. If they report any
smoking during the past year they are coded as a smoker. For self reported health status 1 stands
for excellent health and 5 for poor health. Summary statistics for all these variables are presented
in Table (1).
[Table (1) here]
4. Empirical Results
In this section we use the model discussed above to analyze the NLSY97 data. Male and
female subsamples are analyzed separately as the impact of the number of dating partners on
union formation is presumably different among the genders. The set of priors used in the analysis
is as follows:
,10,4,,10,10,105
2s
kkISIVIV
where k and k are respectively the dimensions of and . Also observe that the prior
for the smoothing parameter is chosen such that its expected value is 5106 with infinite
variance. Thus the set of priors utilized in the analysis is proper but rather non-informative.
4.1 Results for the Male Subsample
Marriage and cohabitation decisions are analyzed separately using the empirical model
presented above. To investigate the impact of the number of dating partners on marital and
cohabiting decisions, we ran a Gibbs sampler with 35,000 iterations for each model and
discarded the first 5,000 iterations as burn-in. We report the coefficient posterior means,
posterior standard deviations and the probabilities of being positive in Tables (2) and (3), while
the point estimates of the function )(sf are plotted in Figures (1) and (2).
[Figures (1) and (2) here]
We first note that there is evidence of the presence of endogeneity in the marriage-dating
equations system. Specifically, the posterior mean of the correlation parameter uv
(which
relates the error term of the marriage equation to that of the dating equation) is -0.27 and its
probability of being negative is 0.929. Put differently, with a rather high probability, unobserved
characteristics affecting the number of dating partners and marriage decision are negatively
correlated, and this illustrates the importance of controlling for endogeneity for the male
subsample. This finding suggests individuals who have a large number of dating partners might
have an aversion towards long term monogamy involved in marriage. Given the above
discussion it is not surprising that ordinary probit results (i.e. without controlling for
endogeneity) show that number of dating partners is negatively related to probability of
marriage. We are not presenting the details of this estimation results here but they are available
from the authors.
Top panel of Table 2 presents the point estimates for all exogenous variables in the
marriage equation. All the variables have expected signs. Age and income increases the
probability of marriage while smoking, drinking and substance use reduces the probability of
marriage. The omitted race category is Whites, which implies that Blacks and Hispanics are less
likely to marry than Whites.
Figure (1) shows the estimated nonparametric curve )(sf relating the impact of the cumulative
number of dating partners on marriage formation. It is upward sloping, suggesting that an
increase in the number of dating partners increases probability of marriage. It is also of interest
to note that the nonparametric curve is rather linear; indicating the impact on the marriage
decision is virtually uniform in the number of dating partners.
The results from the cohabitation-dating equations system also show that the problem of
endogeneity is substantial. In particular, the correlation parameter is estimated to be -0.31, with
probability of being negative equal to 0.91. Again this demonstrates the importance of
controlling for unobserved confounding in the male sample. Unlike the nonparametric curve in
the marriage-dating system, )(sf is concave in s (Figure 2) . This pattern is consistent with a
learning explanation, where once individuals have dated a number of partners, they may not need
the extra (specific) information from cohabitation as much, leading to a leveling off impact in
cohabitation formation while the impact on marriage is linear.
As in a simple probit model, the impacts of the number of dating partners on marriage
and cohabitation are confounded by the nonlinear link function, in which the impacts depend not
only on the model parameters, but also on all other covariates. To provide a more intuitive
interpretation of the nonparametric curves that relate the number of dating partners on marriage
and cohabitation, we perform an exercise similar to the computation of the average covariate
effect. More specifically, suppose we wish to compute the probability that a new individual i is
married ( )0i
z , given that his number of dating partners is ssi
, while marginalizing over
the covariates x i and parameters ),,( . As discussed in Chib and Jeliazkov (2006) in
the context of estimating the average covariate effect, a practical procedure is to marginalize out
the covariates using their empirical distribution, and the model parameters by the posterior
distribution. The goal is to estimate the following quantity by Monte Carlo simulation
.)|()(),,,|0(),|0( ddsszssziiiiiiii
xyxxyy PP
In practice, the above quantity can be estimated as follows. First, obtain a draw from the
posterior distribution and randomly select a vector of covariates. Given 2
v , obtain an error
term for the treatment equation: ).,0(2
viv N Then a Monte Carlo average of the equation
above can be easily obtained by realizing
,
1
),,,,|0(
2
2
2
v
uv
v
uv
iii
iiii
vvssz
dx
xyP
where )( is the distribution function of a standard normal random variable and di is
constructed to select off the appropriate element of . It is worth noting that in the above
computation, the conditional mean of zi given the covariates and other model parameters is set
to be 2/
viuviivdx instead of ).(/
2
iivuviis zdx This is because in
computing the impact of the number of dating partners on marriage, inevitably si has to be
fixed at a constant. Consequently, the latter formula would artificially blow up the conditional
mean, as the term ii
s z can be made arbitrarily large. Instead, we simulate the error term of
the treatment equation v i , and compute the conditional mean of zi given v i (in addition to
the covariates and other parameters). In this way, we account for the unobserved endogeneity
while avoiding the artifact of fixing si to be a constant.
[Figure (3) about here]
Figure (3) reports the probabilities that a male subject would get married (left panel) or
start cohabiting (right panel) given his cumulative number of dating partners, while
marginalizing out all other covariates. The curve for cohabitation is rather linear while the curve
for marriage is convex. Also, the impact of the number of dating partners seems somewhat
stronger for marriage. For example, an increase in number of dating partner from zero to 20
increases the probability of cohabitation from 3% to 5%, but it increases the probability of
marriage from 2% to 5%.
Next we check the validity of our instruments. We perform a version of the standard
over-identification test similar to Kline and Tobias (2008). First recall that the instruments used
for the male subsample are height (Height) and lagged obesity status (ObeseLag). In the over-
identification test, we first assume ObeseLag is a valid instrument and then test if the other
potential instrument, Height, is also valid. Specifically, we include Height in the marriage
equation and compute the posterior odds ratio in favor of restricting the coefficient associated
with Height in the marriage equation to be zero. Under equal prior odds, (i.e., prior odds ratio is
equal to one), it amounts to computing the Bayes factor in favor of the restriction (see, for
example, Koop, 2003, pp. 3--5). We calculate the relevant Bayes factor via the Savage-Dickey
density ratio (Verdinelli and Wasserman, 1995) and obtain a value of 199, indicating strong
support that Height can be excluded from the marriage equation. Now assuming Height is a valid
instrument, we repeat the procedure and obtain a Bayes factor of 22 in favor of the hypothesis
that the coefficient associated with ObeseLag is equal to zero. Taken together, these results
provide evidence for the validity of the instruments. We also perform the same exercise for the
cohabitation-dates equation system and the corresponding Bayes factors are 512 and 18,
suggesting Height and ObeseLag are also valid instruments for this system.
We further checked whether BMI is correlated with unobserved variables relevant to
cohabitation/marriage decision following a strategy similar to Angrist and Krueger (1991). We
run reduced form regressions and find that numbers of dating partners do not vary with BMI for
individuals who are normal range (i.e. a BMI between 18 and 25; since these individuals are in
normal range we should not expect an impact of BMI on number of dating partners for this
group). This implies that BMI is not correlated with unobserved variables that affect the number
of dating partners (like fidelity for example). Then we further check by adding BMI in the
marriage equation as an explanator. If the BMI is correlated with anything relevant to marriage
besides the number of dating partners then the coefficient of BMI should be significant in such
regressions. It is not (with p value always above 0.50), which lends strong support to the validity
of BMI as an instrument. Same holds for female sample.
This kind of strategy cannot directly be applied for our other instruments. However, as
Murray (2006) suggests results from overidentification tests are more beleivable when we know
we already have at least one valid instrument. We also estimated our model with lagged obesity
as the only instrument.The results are very similar to the results presented here. This gives us
additional confidence in our results.
4.2 Female Subsample
We perform the same analysis for the female subsample. Tables (4) and (5) present the
coefficient posterior means and standard deviations, as well as their probabilities of being
positive; the point estimates of the function )(sf are plotted in Figures (4) and (5). The
problem of endogeneity seems to be less severe in the female subsample. In particular, the
probability that the correlation parameter uv
, which relates the unobservables in the marriage
(cohabitation) equation to those in the dates equation, is negative is estimated to be only 0.648
(0.848). Although it still suggests the presence of endogeneity, the evidence is less conclusive
than in the male subsample. The simple probit results show a negative association between
number of dating partners and probability of marriage for the female subsample too.
The top panel of Table 4 presents the point estimates for the marriage equation. Like the
male subsample, all covariates have the expected sign. For the females subsample, we also
control for their pregnancy status. Not surprisingly, it is associated with higher probability of
marriage. The nonparametric curve )(sf is flatter than its counterpart in the male subsample
for the marriage-dates equation system, whereas it is steeper than its male counterpart in the
cohabitation-dates equation system. Also, for cohabitation, the impact seems rather linear in
females while it is concave for males.
Taken together these results suggest that information gained from dating may be used differently
by men and women. For men the information from dating seems to improve the value of
marriage, but for women the impact is relatively small. This could be due to the fact that women
start off with a high level of knowledge about men and hence information is not as valuable to
them, or they are willing to marry with relatively less information. The latter may happen if
women value marriage more than men. For example, if women value marriage-specific capital
like children or commitment of long term relationship more than men, then they may be willing
to marry men with relatively less information.
[Figures (4) and (5) here]
Figure (6) reports the probabilities that a female would get married (left panel) or start
cohabiting (right panel) given her number of dating partners, while marginalizing out all other
covariates. Like the male subsample, the curve for cohabitation is rather linear while the curve
for marriage is convex. However there are important differences. The impact of an increase in
the number of dating partners on marriage is much stronger in males. The probability of getting
married for males is below the probability of getting married for females for small number of
dating partners. But as the number of dating partners increases (say above 25) the probability of
getting married for males goes above the probability for female. It should be noted that for the
full sample the unconditional probability of marriage is higher for females than for males, and
the above result is consistent with that. Note that the unconditional probability of marriage is
given by
),()|marriage()marriage(
1
jsjs
j
PPP
where s is the number of dates. The probability of observing females with high number of
dating partners is comparatively low. In fact, 65% of the females in our sample had less than 10
dating partners and 82% less than 20 dating partners. Males however have more dating partners,
45% of males have less than 10 and 65% have less than 20 dating partners. So the unconditional
probability of marriage for females is higher than males. Also, it should be noted that the patterns
we observe in average covariate effect is similar to what we observe in our estimate of
nonparametric function.
[Figures (6) here]
We also perform the same overidentification test as in the previous section to check the
validity of the instruments. Recall that for the female subsample, we use variables hours worked
in last year (HoursWorked) and lagged obesity status (ObeseLag) as instruments. For the
marriage-dates system, assuming HoursWorked is a valid instrument; the Bayes factor in favor of
imposing the restriction that the coefficient associated with ObeseLag equals zero is computed to
be 526. Repeating the procedure but now assuming ObeseLag is valid, we obtain a Bayes factor
of 22 in favor of the hypothesis that HoursWorked is also valid. Therefore, there is strong
evidence that both HoursWorked and ObeseLag are valid instruments. For the cohabitation-dates
equation system, we perform the same exercise and the corresponding Bayes factors are 16 and
14, suggesting HoursWorked and ObeseLag are also valid instruments for this system.
5. Concluding Remarks
This paper investigates the causal relationship between the number of dating partners and
marriage or cohabitation formation. Since the number of dating partners could be correlated with
unobserved preferences, we explicitly treat it as an endogenous variable. We also allow number
of dating partners to affect marriage and cohabitation formation non-parametrically. Using the
NLSY97 dataset, we find that after controlling for endogeneity, an increase in the number of
dating partners increases the probability of both marriage and cohabitation for both men and
women, although the effect is stronger in men for marriage and women for cohabitation. We
check for and provide evidence of validity of our instruments. We also estimated a slightly
different specification with lagged obesity being the only instrument.The results are very similar
to the results presented above. We however prefer the above specification as we cannot test the
validity of the instruments in an exactly identified system. If we do not control for endogeneity,
then the number of dating partners shows a negative impact on marriage. It is not surprising
given the evidence on endogeneity above. Our results suggest that information gained from
dating is presumably used differently by men and women. Returns from dating (in terms of
increased probability (value)) seem to be higher for men than women. This could be due to the
fact that women start off with a higher level of knowledge about men, and hence the marginal
benefit is lower. On the other hand, it might be the case that women are willing to marry with
relatively less information. We also find that the probability of marriage is lower in men
compared to women for relatively low number of dating partners, and it is higher in men for very
high number of dating partners.
References
Albert, J., and Chib, S. (1993). Bayesian Analysis of Binary and Polychotomous Response
Data. Journal of the American Statistical Association, 88:669-679.
Angrsit, J., and A. Krueger (1991), "Does Compulsory School Attendance Affect Schooling
and Earnings?," Quarterly Journal of Economics, November
Cawley, J., Joyner K., and Sobal, J. (2006). Size Matters: The Influence of Adolescents'
Weight and Height On Dating and Sex. Rationality and Society, 18(1):67-94.
Chib, S. and Jeliazkov, I. (2006). Inference in Semiparametric Dynamic Models for Binary
Longitudinal Data. Journal of the American Statistical Association, 101:685-700.
Clarkberg, M. (1999). The Price of Partnering: The Role of Economic Well-being in Young
Adults' First Union Experiences. Social Forces, 77:945-68.
Fisman, R., Iyengar, S. S., Kamenica, E. and Simonson, I. (2006). Gender Differences in
Mate Selection: Evidence from a Speed Dating Experiment. Quarterly Journal of Economics,
121(2):673-97.
Gelman, A., Carlin, J. B., Stern, H. S., and Rubin, D. B. (2003). Bayesian Data Analysis,
Second Edition. Chapman & Hall/CRC.
Geweke, J. (1991). Efficient Simulation from the Multivariate Normal and Student- t
Distributions Subject to Linear Constraints and the Evaluation of Constraint Probabilities. In
Computing Science and Statistics: Proceedings of the 23rd Symposium on the Interface (E.
Keramidas and S. Kaufman ed.), 571-578.
Kline, B., and Tobias, J. L. (2008). The Wages of BMI: Bayesian Analysis of a Skewed
Treatment-Response Model with Nonparametric Endogeneity. Journal of Applied
Econometrics, 23(6):767-793.
Koop, G. (2003). Bayesian Econometrics. Wiley-Interscience.
Koop, G., and Poirier, D. J. (2004). Bayesian Variants of Some Classical Semiparametric
Regression Techniques. Journal of Econometrics, 123(2):259-282.
Koop, G., Poirier, D. J., and Tobias, J. L. (2007). Bayesian Econometric Methods.
Cambridge University Press.
Lancaster, T. (2004). Introduction to Modern Bayesian Econometrics. Wiley-Blackwell.
Levitt, S. D. (1996), The effect of Prison Population Size on Crime rates: Evidence from
Prison Overcrowding Litigation. Quarterly Journal of Economics, 111:2, May, 319-351.
Levitt, S. D. (1997), Using Electoral Cycles in Police Hiring to Estimate the Effect of Police
on Crime, American Economic Review, 87:4, June, 270-290.
Lillard, L. A., Brien, M. J., and Waite, L. J. (1995). Premarital Cohabitation and Subsequent
Marital Dissolution: A Matter of Self-Selection? Demography, 32(3):437-57
MacDonald, M. M. and Rindfuss, R. R. (1981). Earnings, Relative Income, and Family
Formation. Demography, 18:123-36.
Mare, R.D. and Winship, C. (1991). Socioeconomic Change and the Decline of Marriage for
Blacks and Whites. In The Urban Underclass (C. Jencks and P.E. Peterson ed.), 175-202.
Murray, Michael P. 2005. The Bad, the Weak, and the Ugly: Avoiding the Pitfalls of
Instrumental Variables Estimation. Social Science Research Network Working Paper No.
843185.
Murray, P. M., 2006, Avoiding Invalid Instruments and Coping with Weak Instruments,
Journal of Economic Perspectives, Volume 20, Number 4, Pages 111-132.
Nobile, A. (2000). Comment: Bayesian Multinomial Probit Models with a Normalization
Constraint. Journal of Econometrics, 99:334-345.
Oppenheimer, V. K. (1997). Women's Employment and the Gains to Marriage: The
Specialization and Trading Model of Marriage. Annual Review of Sociology, 23:431-53.
Robert, C. P. (1995). Simulation of Truncated Normal Variables. Statistics and Computing,
5:121-125.
Svarer, M (2004). Is Your Love in Vain? Another Look at the Premarital Cohabitation and
Divorce. Journal of Human Resources, 39(2):523-535.
Sweeney, M. M. (2002). Two Decades of Family Change: The Shifting Economic
Foundations of Marriage. American Sociological Review, 67:132-47.
Train, K. E. (2003). Discrete Choice Methods with Simulation. Cambridge University Press.
Verdinelli, I. and Wasserman, L. (1995). Computing Bayes Factors Using a Generalization of
the Savage-Dickey Density Ratio. Journal of the American Statistical Association, 90:614-
618.
Xie, Y., Raymo, M. J., Goyette, K., and Thornton, A. (2003). Economic Potential and Entry
into Marriage and Cohabitation. Demography, 40(2):351-67.
Table 1: Data Summary Statistics
Women Men
Marriage Cohabitation Marriage Cohabitation
mean s.d. mean s.d. mean s.d. mean s.d.
Union 0.04 0.19 0.07 0.25 0.024 0.15 0.044 0.20
Age 19.8 1.5 19.7 1.5 19.8 1.5 19.7 1.5
Education 12.6 1.8 12.6 1.7 12.2 1.7 12.3 1.7
Black 0.29 0.45 0.29 0.45 0.25 0.43 0.25 0.43
Hispanic 0.20 0.40 0.20 0.40 0.21 0.41 0.21 0.41
Urban 0.80 0.40 0.80 0.40 0.78 0.41 0.78 0.41
Pregnancy status 0.10 0.30 0.10 0.30 - - - -
Smoker 0.40 0.49 0.38 0.48 0.45 0.49 0.44 0.50
Regular alcohol user 1.1 2.79 1.1 2.72 2.4 4.7 2.3 4.5
Regular marijuana user 1.6 5.7 1.4 5.33 3.0 7.9 2.8 7.7
Height 64.6 2.9 64.6 2.9 70.3 3.4 70.3 3.4
Health 2.17 0.94 2.15 0.93 1.96 0.92 1.94 0.91
Number of dating partners 16.80 18.86 16.03 18.17 28.6 32.7 28.1 32.1
Obese 0.15 0.35 0.14 0.35 0.14 0.35 0.14 0.35
Income category 1 0.25 0.43 0.26 0.43 0.25 0.43 0.25 0.43
Income category 2 0.37 0.48 0.38 0.48 0.30 0.46 0.30 0.46
Income category 3 0.17 0.38 0.16 0.37 0.15 0.36 0.15 0.35
Income category 4 0.18 0.38 0.17 0.38 0.25 0.43 0.24 0.42
Income category 5 0.03 0.17 0.02 0.15 0.06 0.24 0.06 0.24
Number of observations 10678 9625 10565 9985
Table2: Model parameters posterior means, standard deviations, and probabilities of being
positive. Marital choice of the male subsample.
Marriage equation
Variable E(⋅|y) √(Var(⋅|y)) P(⋅>0|y)
Age 0.066 0.017 1.000
Education -0.014 0.019 0.238
UrbanDummy -0.040 0.074 0.285
Alcohol -0.040 0.010 0.000
Marijuana -0.040 0.009 0.000
SmokerDummy -0.114 0.061 0.034
IncomeCat 0.152 0.030 1.000
Health 0.037 0.031 0.884
Black -0.402 0.082 0.000
Hispanic -0.094 0.072 0.094
Other 0.131 0.236 0.721
Dates equation
Variable E(⋅|y) √(Var(⋅|y)) P(⋅>0|y)
Constant -3.088 2.918 0.144
Age 2.086 0.201 1.000
Education -1.087 0.193 0.000
UrbanDummy 5.071 0.743 1.000
Alcohol 0.779 0.070 1.000
Marijuana 0.247 0.041 1.000
SmokerDummy 4.100 0.664 1.000
Height -0.089 0.059 0.067
ObeseLag -3.621 0.859 0.000
IncomeCat 1.548 0.262 1.000
Health -1.286 0.338 0.000
Black 4.561 0.768 1.000
Hispanic 5.018 0.793 1.000
Other -1.766 2.187 0.209
Other parameters
Variable E(⋅|y) √(Var(⋅|y)) P(⋅>0|y)
1003 13.89 1.000
ρ_{uv} -0.269 0.213 0.071
Η 2.34×10 1.72×10 1.000
Table3: Model parameters posterior means, standard deviations, and probabilities of being
positive. Cohabitation choice of the male subsample.
Cohabitation equation
Variable E(⋅|y) √(Var(⋅|y)) P(⋅>0|y)
Age -0.001 0.018 0.411
Education -0.053 0.018 0.000
UrbanDummy -0.010 0.065 0.427
Alcohol -0.030 0.007 0.000
Marijuana 0.001 0.003 0.669
SmokerDummy 0.138 0.061 0.992
IncomeCat 0.098 0.026 1.000
Health 0.012 0.026 0.686
Black 0.014 0.061 0.583
Hispanic -0.010 0.067 0.425
Other -0.136 0.242 0.292
Dates equation
Variable E(⋅|y) √(Var(⋅|y)) P(⋅>0|y)
Constant -3.440 2.931 0.122
Age 2.014 0.203 1.000
Education -1.125 0.198 0.000
UrbanDummy 5.302 0.755 1.000
Alcohol 0.783 0.074 1.000
Marijuana 0.252 0.043 1.000
SmokerDummy 4.131 0.673 1.000
Height -0.062 0.058 0.143
ObeseLag -3.423 0.858 0.000
IncomeCat 1.687 0.264 1.000
Health -1.333 0.350 0.000
Black 3.909 0.767 1.000
Hispanic 4.985 0.797 1.000
Other -4.019 2.154 0.029
Other parameters
Variable E(⋅|y) √(Var(⋅|y)) P(⋅>0|y)
σ_{v}² 967 13.65 1.000
ρ_{uv} -0.310 0.209 0.090
Η 3.97×10 3.55×10 1.000
Table4: Model parameters posterior means, standard deviations, and probabilities of being
positive. Marital choice of the female subsample.
Marriage equation
Variable E(⋅|y) √(Var(⋅|y)) P(⋅>0|y)
Age 0.066 0.019 0.999
Education -0.058 0.013 0.000
UrbanDummy -0.165 0.062 0.004
Alcohol -0.064 0.015 0.000
Marijuana -0.010 0.005 0.039
SmokerDummy -0.123 0.062 0.025
IncomeCat 0.097 0.027 0.999
Health -0.014 0.025 0.281
PregnantDummy 0.487 0.067 1.000
Black -0.524 0.079 0.000
Hispanic -0.015 0.062 0.407
Other -0.080 0.223 0.372
Dates equation
Variable E(⋅|y) √(Var(⋅|y)) P(⋅>0|y)
Constant -12.824 1.946 0.000
Age 1.032 0.111 1.000
Education 0.188 0.109 0.955
HoursWorked 0.076 0.026 0.997
UrbanDummy 2.691 0.446 1.000
Alcohol 0.763 0.066 1.000
Marijuana 0.253 0.032 1.000
SmokerDummy 3.595 0.389 1.000
ObeseLag -0.976 0.505 0.025
IncomeCat 0.874 0.195 1.000
Health 0.597 0.192 0.999
PregnantDummy 1.265 0.581 0.985
Black -2.804 0.441 0.000
Hispanic -1.932 0.483 0.000
Other -2.930 1.597 0.034
Other parameters
Variable E(⋅|y) √(Var(⋅|y)) P(⋅>0|y)
σ_{v}² 325 4.41 1.000
ρ_{uv} -0.082 0.200 0.352
Η 3.32×10 2.73×10 1.000
Table5: Model parameters posterior means, standard deviations, and probabilities of being
positive. Cohabitation choice of the male subsample.
Cohabitation equation
Variable E(⋅|y) √(Var(⋅|y)) P(⋅>0|y)
Age -0.063 0.012 0.000
Education -0.031 0.011 0.005
UrbanDummy 0.023 0.056 0.647
Alcohol -0.050 0.008 0.000
Marijuana 0.000 0.004 0.535
SmokerDummy 0.112 0.059 0.973
IncomeCat 0.078 0.024 0.999
Health 0.045 0.020 0.987
PregnantDummy 0.320 0.066 1.000
Black -0.062 0.056 0.127
Hispanic 0.012 0.057 0.596
Other -0.137 0.212 0.267
Dates equation
Variable E(⋅|y) √(Var(⋅|y)) P(⋅>0|y)
Constant -10.434 1.964 0.000
Age 0.789 0.114 1.000
Education 0.360 0.114 0.999
HoursWorked 0.087 0.026 0.999
UrbanDummy 2.805 0.449 1.000
Alcohol 0.738 0.069 1.000
Marijuana 0.271 0.034 1.000
SmokerDummy 3.641 0.397 1.000
ObeseLag -1.499 0.499 0.001
IncomeCat 0.815 0.198 1.000
Health 0.468 0.194 0.992
PregnantDummy 1.434 0.604 0.991
RaceCat1 -2.505 0.445 0.000
RaceCat2 -2.583 0.489 0.000
RaceCat3 -2.880 1.655 0.041
Other parameters
Variable E(⋅|y) √(Var(⋅|y)) P(⋅>0|y)
σ_{v}² 302 4.366 1.000
ρ_{uv} -0.433 0.257 0.152
Η 3.53×10 3.39×10 1.000
Figure 1: Posterior means of the function f (s) relating the number of dating partners to
marriage choice for the male sample.
Figure 2: Posterior means of the function f (s) relating the number of dating partners
to co- habiting choice for the male sample.
Figure 3: The probabilities that a male subject would get married (left panel) and start
cohab- iting (right panel) given his number of dating partners.
Figure 4: Posterior means of the function f (s) relating the number of dating partners to
marriage choice for the female sample.
Figure 5: Posterior means of the function f (s) relating the number of dating
partners to co- habiting choice for the female sample.
Figure 6: The probabilities that a female subject would get married (left
panel) and start cohabiting (right panel) given her number of dating partners.