Effect of the Number of Dating Partners on Marriage and ...

Effect of the Number of Dating Partners on Marriage and Cohabitation Formation: A

Bayesian Analysis with Nonparametric Endogeneity

Abstract: In this paper we investigate the impact of the number of dating partners on first

marriage and cohabitation formation. Since the number of dating partners might be correlated

with unobserved preferences, the analysis is done in a framework that permits its potential

endogeneity. We also allow the number of dating partners to affect marriage and cohabitation

formation nonparametrically. We use data from the National Longitudinal Survey 1997 cohort

and Bayesian estimation techniques. We find that total number of dating partners is indeed

endogenous in this context. Our results show that after controlling for endogeneity, an increase in

the number of dating partners increases the probability of both marriage and cohabitation for

both men and women, although the effect is stronger in men for marriage and women for

cohabitation.

Keywords: Bayesian estimation; semiparametric probit; endogeneity; marriage; cohabitation;

dating

1. Introduction

Dating is normally associated with pairing-off of romantic partners. Dating partners, in general,

may or may not be actual long term partners in more formal unions, i.e., cohabitation or

marriage. Dating has established itself as a social institution but its effects on family formation

are not well understood. Although the last few decades have seen a substantial amount of

research on marriage and cohabitation, dating and its impacts on family formation have remained

relatively unexplored. In this paper we address the question as to how the cumulative number of

dating partners affects future unions (cohabitation or marriage). In particular, we focus first

marriage or first cohabiting relationship. The recent advent of online matching websites makes

this issue even more interesting because these websites reduce the cost of finding a dating

partner. As a result, we may expect an increase in the number of partners an individual dates

over his or her lifetime. What this will mean for family formation is an important question.

Presumably dating provides individuals with information that affects the value of future

unions. Information obtained from dating are presumably different from information obtained

from cohabitation in dating one learns about the general characteristics of the opposite sex,

while cohabitation gives more detailed information about a particular individual. Cohabitation

seemingly allows two potential partners to learn about each other before marriage, and thereby

provides valuable information regarding the future value and stability of marriage, although the

empirical evidence on this is mixed. A number of studies have associated premarital cohabitation

with higher divorce rates (Lillard et al., 1995), while Svarer (2004) concludes that premarital

cohabitation reduces the risk of divorce.

We are not aware of any study that explores either the association or the causal

relationship between the number of dating partners and union formation, although a few recent

studies have investigated mate selection at the dating stage. Cawley et al. (2006) report that

heavier boys and girls are less likely to date, but for sexual activity the evidence is less

consistent. They hypothesize that dating an obese person may adversely affect the reputation and

hence the long-term opportunities of the dating individual. Using experimental data from a

speed-dating experiment, Fisman et al. (2006) find that compared to women, men care more

about the physical attractiveness of their prospective partners.

What kind of impact should we expect the number of dating partners to have on union

formation? Two competing hypotheses can be posited on this issue. On the one hand, the more

one learns about the other gender, the more valuable (both in terms of quality and stability) a

future union becomes. This will be the case when individuals of a particular gender share

common characteristics and have similar tastes and preferences. In this case dating would

provide an individual with the knowledge about those characteristics and preferences and thereby

raise the utility generated by marriage. Therefore, in this scenario, an increase in dating partners

should positively affect union formation. On the other hand, if individuals of a particular gender

are very different, then the knowledge about one individual does not help in any other

relationship, and we would expect that dating should not change the value of future marriage.

Testing these competing hypotheses, however, might not be straightforward due to the presence

of unobserved individual-level characteristics. For instance, suppose those individuals who put

more effort to seek dating partners also spend more resources in finding marriage partners. If this

characteristic is not controlled for, we would observe a positive association between the number

of dating partners and union formation, even though there might not be any causal link between

the two variables. On the other hand, we may observe a negative relationship between the

number of dating partners and long term union formation, if the unobserved characteristic is

aversion toward long-term monogamy. In this case, individuals with a large number of dating

partners are less likely to form long term relationships. Therefore, in order to investigate the

causal impact of the number of dating partners on union formation, it is important to treat the

number of dating partners variable as endogenous. In fact, there is statistical evidence suggesting

the presence of unobserved endogeneity. Hence, in our analysis we explicitly control for the

unobserved confounding using a version of the standard treatment-response model.

Another innovation in our modeling is that we allow the number of dating partners to

affect marriage and cohabitation formation nonparametrically. The default specification in the

applied treatment-response literature is based on the linear regression model that assumes a

linear relationship between the treatment and the predictor of the outcome. This linear

relationship, however, might not be credible in the current application. For instance, the marginal

impact of the first date is presumably different from the marginal impact of, say, the 50th date.

Therefore, we follow the nonparametric approach of Kline and Tobias (2008), which allows us to

explore the presumably non-linear impact of the number of dating partners.

The cost of this flexible nonparametric approach is that we need to restrict our

specification to a binary choice model due to computational concerns. Specifically, when we

analyze cohabitation formation, marriage is not treated as a competing risk, and vice versa. This

is because even though in theory we could fit, say, a Multinomial Probit model (MNP) when the

endogenous variable consists of more than two categories, it poses several computational

problems in practice, especially when we include the nonparametric component that requires

intense computation. Therefore, although restricting to a binary choice model could be a

potential limitation, we feel that it is a sacrifice one has to make for exploring the potential non-

linear impact of the endogenous variable.

We use lagged obesity status and the total hours worked as instruments for the total

number of dates for females. Obese women are less likely to attract many potential partners

(Cawley et al., 2006). Also, obesity and work are unlikely to affect marriage directly other than

through affecting the number of dating partners. For men, however, hours worked might not be a

suitable instrument as it is related to income. Several authors have identified income as one of

the factors that is associated with marriage for men (Clarkberg, 1999; Macdonald and Rindfuss,

1981; Mare and Winship, 1991; Oppenheimer et al., 1997; Sweeny, 2002); Oppenheimer et al.

(1997) find that employment is positively related with marriage for men. Xie et al. (2003) use

several measures of earning potential, and find that earning potential positively influences the

likelihood of marriage for men but not for women. They also found that earning potential does

not affect entry into cohabiting unions. Given the evidence in the literature, we feel that hours

worked can be a valid instrument for women but not for men. Height, on the other hand, is

unlikely to directly affect the probability of marriage (Cawley et al., 2006) other than through the

number of dating partners. Therefore, we use height as the second instrument for men.

For instrumental variable results to be credible the instruments are required to be strong

and valid (i.e., instruments are conditionally uncorrelated with union formation). If obesity

affects marriage directly, other than through affecting the number of dating partners (for

example, if obesity is a measure of health), then this instrument will not be valid. Even though

there might be some concerns about the validity of the instruments (as in most of the studies

involving instruments), we also have good reasons to believe that they are valid in this context.

In fact, all we need for our instruments to be valid is that they are conditionally uncorrelated with

the errors. Given that we use an extensively rich set of variables, including standard measures of

health and income, this condition is likely to be satisfied.

One potential problem with our instruments for men is that both the instruments share a

common rationale (physical attractiveness). Murray (2006) criticized Levitt (1996, 1997) on the

ground that his instruments share a common rationale. But in the same paper Murray suggests

(page 116) that "Intuitively, if Levitt knew that he had enough surely valid instruments to exactly

identify his crime equation, he could use those instruments alone to carry out a consistent two-

stage least squares estimation in which the remaining potential instruments were included

among the explanators (that is, in X), rather than being used as instruments (that is, in Z).

Failing to reject the null hypothesis that these remaining potential instruments all have zero

coefficients in the second stage of two-stage least squares when included in X as explanators

would support the validity of those extra variables as instruments. The key to this strategy's

success is knowing for sure that an exactly identified subset is indeed valid so that two-stage

least squares estimation is both possible and valid." This is very close to what we do to check

the validity of our instruments. Since we have two instruments, we conduct a version of the

standard over-identification test: we first assume one instrument is valid, then include the other

in the response equation and calculate the Bayes factor in favor of the hypothesis that its

associated coefficient is zero. We then repeat the exercise by assuming the other instrument is

valid. We present evidence regarding the relevance and validity of our instruments in results

section. In addition to formal tests, we also check whether our instruments are correlated with

unobserved variables relevant to cohabitation/marriage decision. We follow a strategy along the

lines of Angrist and Krueger (1991) as suggested by Murray (2005). We report the results in

results section.

We use data from the National Longitudinal Survey 1997 cohort (hereafter NLSY97) for

the analysis. In our data the oldest respondents are 24 years old at the time of last interview. We

observe very few divorces and almost no second marriages. We therefore focus on first entry into

either a cohabiting relationship or marriage. Using NLSY97 for this paper is particularly useful,

if not absolutely necessary. This is because NLSY97 is the only nationally representative

longitudinal dataset we are aware of that has detailed information on the respondents' history of

dating behavior, along with a variety of other information on education, labor market behavior

and health that are necessary for our purpose. The main disadvantage of using this dataset is that

the respondents are relatively young. If young adults utilize the information obtained from dating

in a different way than older adults, then our results might not generalize to older adults.

Nonetheless, this is a first step in exploring the relationship between the number of dating

partners and union formation.

Using the NLSY97 data, we find that an increase in the number of dating partners

increases marriage and cohabitation formation, after controlling for other relevant variables. We

also find that the impacts of the number of dating partners are different for men and women; the

impact on cohabitation formation is stronger in women while the impact on marriage is stronger

in men. The rest of this article is organized as follows. In estimation section we first introduce a

treatment-response model in which the endogenous variable, the number of dating partners,

enters the response equation non-parametrically. The construction of a Gibbs sampler is then

discussed. The data used in the analysis is described in data section and the empirical results are

presented in results section

2. The Empirical Model and Bayesian Estimation

The central focus of our analysis is to investigate the impact of the number of dating

partners, as a proxy for the knowledge about the opposite sex, on first entry into marriage and

cohabitation. To determine the exact nature of the relationship, we seek to allow for extra

flexibility by modeling such an impact non-parametrically. In addition, since there are reasons to

suspect that some unobserved characteristics might affect both the number of dating partners and

marriage (cohabitation) decision, the analysis is done in a framework that permits the potential

endogeneity of the number of dating partners variable. Specifically, we consider the following

two-equation system:

)2(

)1()0)((

ititit

itititit

vs

usfy

w

x1

where yit is the marital status of the i th individual at time t , which takes the value 1 if the

individual is married and 0 otherwise, sit is the cumulative number of dating partners from the

time the i th individual starts dating till he or she enters into an union, x it and wit are vectors

containing a variety of individual characteristics that influence respectively the decision of

getting married and the number of dating partners, )(f is a smooth but otherwise unrestricted

function, and )(1 is the indicator function, which takes the value 1 if the event in the

parenthesis is true and 0 otherwise. We assume the error terms ),(ititit

vu follow a bivariate

normal distribution:

where the variance of u it is assumed to be unity for identification purposes (see, for example,

Train, 2003, p.27). We implicitly treat the number of dating partners as if it was a continuous

variable due to computational convenience. Given that a substantial portion of the subjects have

more than a dozen dating partners (males have on average 28 partners and females 17), we feel

that this approximation is appropriate.

The potential problem of endogeneity is handled by explicitly allowing for a possible

correlation ( vuvuv

/ ) between yit and sit . Put differently, all unobserved

characteristics simultaneously affecting both the marriage decision and the number of dating

partners are captured by the correlation parameter uv

. In addition, it also gives a quantitative

measure of the extent of the endogeneity problem.

The above model is estimated via the Bayesian approach. In particular, Markov chain Monte

Carlo (MCMC) techniques are used to generate draws from the posterior distribution. References

on the Bayesian methodology in general and econometrics applications in particular can be

found in Gelman et al. (2003), Koop (2003), Koop et al. (2007) and Lancaster (2005), among

many others. In what follows, we discuss the construction of a Gibbs sampler. Following the data

augmentation approach of Albert and Chib (1993), we rewrite (1) in terms of the latent variables

}{it

y as

)3(,)(itititit

usfy x

where ).0(itit

yy 1 Several approaches have been proposed to handle the nonparametric

specification of the endogenous variable sit , and we adopt the one described in Koop and

Poirier (2004). In particular, we follow the implementation of Kline and Tobias (2008), who

constructed a continuous treatment-response model with skewed errors for the treatment

equation.

Firstly, we sort the data by values of s such that s1 denotes the smallest value in the sample,

1kkss for 1,,2 kk , and sk is the largest value. Define ))(,),((

1 ksfsf ,

then equation (3) can be rewritten as

)4(,itititit

uy dx

where dit is constructed to select off the appropriate element of . Stacking (4) and (2) over

t and then i , we have the system

)6(

)5('*

vWs

uDxyit

where )I,(,|),(2 TT

0WXvu N , i

N

i TT 1 , and T i denotes the number of

observations for the i th individual.

It is worth noting that in the above model there might be as many parameters as observations

(and possibly more). To circumvent the problem of insufficient observations, we place an

informative prior on the first-differences of point wise slopes of the function )(f . Intuitively,

since the function )(f is assumed to be smooth, the first-differences of pointwise slopes are

expected to be small. In particular, if all the first-differences are all zero, it reduces to the linear

specification, i.e., bxaxf )( . We introduce the quantity H , where the matrix H is

defined as

and kjssjjj

,,2,1

. As discussed in Kline and Tobias (2008), the first two

elements of H are pair of ``initial conditions": 11 and

22 , while the other

elements are first-differences of point wise slopes

).()(

,,3,

21

21

21

1

1

jj

jj

jj

jj

jj

j

sfsf

kjssss

The prior on is assumed to be

))(,(| V0N

where )(V is a kk block diagonal matrix with 10I2 on the upper block and

2I

k

on the lower block. In this way, we specify a reasonably flat prior on the initial conditions 1

and 2 , while the prior on the differences of point wise slopes are centered around 0 with the

``smoothness" controlled by the smoothing parameter . Finally, this prior on induces a

prior on the function values

))(,(| V0N

where HVV11

)(H)( . The following priors complete the specification of the model:

),2/,2/(

)),(,(

),1(),(

),,(

),,(

11

S

s

IG

N

IW

N

N

V0

1S

V0

V0

where ),(IW and ),(IG denote respectively the inverse-Wishart and inverse-gamma

distributions, and s11 is the first diagonal element of .

Define

.,,0

,

it

it

it

it

it

it

it

it

s

yz

dD

w0

0xX

Stacking the observations first over t and then i , the posterior simulator consists the

following 5 steps:

1) Sample sy ,,,,| by drawing

),,(,,,,| DdDsy N

Where

.))(I(,)I(0

11111VDzXdVXXD

TT


),,(,,,,| DdDsy N

where

).)(I(,)()I(1111

XzDdVDDDTT


),1(),(,,,,|2

11

11

1Ssyitit

T

t

N

i

i

TIW

where 2

11 is the first diagonal element of . This can be done by the algorithm described in

Nobile (2000).

4) Sample sy ,,,,,|itit

y by drawing

,0 if),,(

,1 if),,(,,,,,,|

)0,(

),0(

ititit

ititit

itityv

yvy

TN

TNsyy

where ),(2

),( baTN denotes a normal distribution with mean and variance 2 truncated

to the interval ),( ba , and it

y is the vector y except it

y . Generating random variates from

the truncated normal distribution can be done by the inverse-transform method or various

efficient rejection methods (Geweke, 1991; Robert, 1995).


.2

,2

)2(,,,,|

2

3 k

k

kSkIGsy

3. Data

The data used in this article is from the National Longitudinal Survey 1997 cohort

(NLSY97), which is a nationally representative sample of 9022 youths, aged 12 to 16 as of

December 31, 1996. The first wave of the survey was conducted in 1997, and since then data has

been collected annually. We use the first eight waves of the data that are currently available.

In each interview respondents were asked about their height, weight, and relationship

status, among other things. During the sample period, 440 males (out of 3360 males) and 675

females (out of 3291 females) entered into their first cohabiting relationships. During the same

period, 255 males and 421 females entered into their first marriages. Thus even with our

relatively young sample, we do observe a considerable number of union formation. Given the

nature of our sample, marriage is almost an absorbing state (we have 69 divorces in the sample

and almost no second marriages).

Estimated model has a discrete time duration structure and the relevant summary statistics (based

on person-year observations) are presented in Table (1). For example, for cohabitation

relationships, each interview where the respondent was single is coded as zero. The first time a

respondent entered into a cohabiting relationship, it is coded as one, and then we ignore

information on that individual for the rest of the sample period. If an individual never entered

into any type of union during our sample period then he/she is coded as zero for all interviews.

The marriage variable is constructed similarly. We focus our analysis on individuals who are 18

years old and over. The average age of respondents in the sample is 19.8 years old with 24 being

the highest age. Although the relatively young age of the respondents is a limitation, this is best

available dataset for studying this topic.

In our data the number of dating partners shows a considerable variation. Males in our

sample have more dating partners (about 28 on average) than females (about 17 on average).

45% (25%) of males (females) have more than 20 dating partners, and 16% (6%) have more than

50 dating partners. Two other points about the number of dating partners are worth mentioning.

First, even though we focus our analysis on 18-24 year-olds, the total number of dating partners

includes all dating partners, i.e., including those partners whom the subjects dated before they

were 18 years old. Second, there might be some double (or more) counting of dating partners.

For example, if someone dated p individuals between time t and 1t and q individuals

between t+1 and 2t , we assume that at time 2t the subject has dated qp individuals.

However, there might be some overlap between the two periods. In the extreme case where the

subject dated the same individual for seven consecutive years, we would treat that as if he or she

had dated a total of seven persons. Since the subject's dating partners are not identified in the

NLSY97 data, it is not possible to avoid double counting.

We calculate the Body Mass Index (BMI) for each individual by using the following formula:

.703inches)in (height

poundsin weight BMI

2

We follow NIH guidelines to classify individuals with a BMI over 30 as obese. We drop

observations that are obviously erroneous, e.g. if BMI is greater than 60 (17 observations) or if

BMI is less than 10 (4 observations).

In addition, we control for race, age, education, and indicator variables for smokers, race, urban

residency, self-reported health status, pregnancy status and indulgence in risky behavior like

alcohol or marijuana use. For alcohol and marijuana, we use the number of days they consumed

those products in the 30 days preceding the interview as our control variable. If they report any

smoking during the past year they are coded as a smoker. For self reported health status 1 stands

for excellent health and 5 for poor health. Summary statistics for all these variables are presented

in Table (1).

[Table (1) here]

4. Empirical Results

In this section we use the model discussed above to analyze the NLSY97 data. Male and

female subsamples are analyzed separately as the impact of the number of dating partners on

union formation is presumably different among the genders. The set of priors used in the analysis

is as follows:

,10,4,,10,10,105

2s

kkISIVIV

where k and k are respectively the dimensions of and . Also observe that the prior

for the smoothing parameter is chosen such that its expected value is 5106 with infinite

variance. Thus the set of priors utilized in the analysis is proper but rather non-informative.

4.1 Results for the Male Subsample

Marriage and cohabitation decisions are analyzed separately using the empirical model

presented above. To investigate the impact of the number of dating partners on marital and

cohabiting decisions, we ran a Gibbs sampler with 35,000 iterations for each model and

discarded the first 5,000 iterations as burn-in. We report the coefficient posterior means,

posterior standard deviations and the probabilities of being positive in Tables (2) and (3), while

the point estimates of the function )(sf are plotted in Figures (1) and (2).

[Figures (1) and (2) here]

We first note that there is evidence of the presence of endogeneity in the marriage-dating

equations system. Specifically, the posterior mean of the correlation parameter uv

(which

relates the error term of the marriage equation to that of the dating equation) is -0.27 and its

probability of being negative is 0.929. Put differently, with a rather high probability, unobserved

characteristics affecting the number of dating partners and marriage decision are negatively

correlated, and this illustrates the importance of controlling for endogeneity for the male

subsample. This finding suggests individuals who have a large number of dating partners might

have an aversion towards long term monogamy involved in marriage. Given the above

discussion it is not surprising that ordinary probit results (i.e. without controlling for

endogeneity) show that number of dating partners is negatively related to probability of

marriage. We are not presenting the details of this estimation results here but they are available

from the authors.

Top panel of Table 2 presents the point estimates for all exogenous variables in the

marriage equation. All the variables have expected signs. Age and income increases the

probability of marriage while smoking, drinking and substance use reduces the probability of

marriage. The omitted race category is Whites, which implies that Blacks and Hispanics are less

likely to marry than Whites.

Figure (1) shows the estimated nonparametric curve )(sf relating the impact of the cumulative

number of dating partners on marriage formation. It is upward sloping, suggesting that an

increase in the number of dating partners increases probability of marriage. It is also of interest

to note that the nonparametric curve is rather linear; indicating the impact on the marriage

decision is virtually uniform in the number of dating partners.

The results from the cohabitation-dating equations system also show that the problem of

endogeneity is substantial. In particular, the correlation parameter is estimated to be -0.31, with

probability of being negative equal to 0.91. Again this demonstrates the importance of

controlling for unobserved confounding in the male sample. Unlike the nonparametric curve in

the marriage-dating system, )(sf is concave in s (Figure 2) . This pattern is consistent with a

learning explanation, where once individuals have dated a number of partners, they may not need

the extra (specific) information from cohabitation as much, leading to a leveling off impact in

cohabitation formation while the impact on marriage is linear.

As in a simple probit model, the impacts of the number of dating partners on marriage

and cohabitation are confounded by the nonlinear link function, in which the impacts depend not

only on the model parameters, but also on all other covariates. To provide a more intuitive

interpretation of the nonparametric curves that relate the number of dating partners on marriage

and cohabitation, we perform an exercise similar to the computation of the average covariate

effect. More specifically, suppose we wish to compute the probability that a new individual i is

married ( )0i

z , given that his number of dating partners is ssi

, while marginalizing over

the covariates x i and parameters ),,( . As discussed in Chib and Jeliazkov (2006) in

the context of estimating the average covariate effect, a practical procedure is to marginalize out

the covariates using their empirical distribution, and the model parameters by the posterior

distribution. The goal is to estimate the following quantity by Monte Carlo simulation

.)|()(),,,|0(),|0( ddsszssziiiiiiii

xyxxyy PP

In practice, the above quantity can be estimated as follows. First, obtain a draw from the

posterior distribution and randomly select a vector of covariates. Given 2

v , obtain an error

term for the treatment equation: ).,0(2

viv N Then a Monte Carlo average of the equation

above can be easily obtained by realizing

,

1

),,,,|0(

2

2

2

v

uv

v

uv

iii

iiii

vvssz

dx

xyP

where )( is the distribution function of a standard normal random variable and di is

constructed to select off the appropriate element of . It is worth noting that in the above

computation, the conditional mean of zi given the covariates and other model parameters is set

to be 2/

viuviivdx instead of ).(/

2

iivuviis zdx This is because in

computing the impact of the number of dating partners on marriage, inevitably si has to be

fixed at a constant. Consequently, the latter formula would artificially blow up the conditional

mean, as the term ii

s z can be made arbitrarily large. Instead, we simulate the error term of

the treatment equation v i , and compute the conditional mean of zi given v i (in addition to

the covariates and other parameters). In this way, we account for the unobserved endogeneity

while avoiding the artifact of fixing si to be a constant.

[Figure (3) about here]

Figure (3) reports the probabilities that a male subject would get married (left panel) or

start cohabiting (right panel) given his cumulative number of dating partners, while

marginalizing out all other covariates. The curve for cohabitation is rather linear while the curve

for marriage is convex. Also, the impact of the number of dating partners seems somewhat

stronger for marriage. For example, an increase in number of dating partner from zero to 20

increases the probability of cohabitation from 3% to 5%, but it increases the probability of

marriage from 2% to 5%.

Next we check the validity of our instruments. We perform a version of the standard

over-identification test similar to Kline and Tobias (2008). First recall that the instruments used

for the male subsample are height (Height) and lagged obesity status (ObeseLag). In the over-

identification test, we first assume ObeseLag is a valid instrument and then test if the other

potential instrument, Height, is also valid. Specifically, we include Height in the marriage

equation and compute the posterior odds ratio in favor of restricting the coefficient associated

with Height in the marriage equation to be zero. Under equal prior odds, (i.e., prior odds ratio is

equal to one), it amounts to computing the Bayes factor in favor of the restriction (see, for

example, Koop, 2003, pp. 3--5). We calculate the relevant Bayes factor via the Savage-Dickey

density ratio (Verdinelli and Wasserman, 1995) and obtain a value of 199, indicating strong

support that Height can be excluded from the marriage equation. Now assuming Height is a valid

instrument, we repeat the procedure and obtain a Bayes factor of 22 in favor of the hypothesis

that the coefficient associated with ObeseLag is equal to zero. Taken together, these results

provide evidence for the validity of the instruments. We also perform the same exercise for the

cohabitation-dates equation system and the corresponding Bayes factors are 512 and 18,

suggesting Height and ObeseLag are also valid instruments for this system.

We further checked whether BMI is correlated with unobserved variables relevant to

cohabitation/marriage decision following a strategy similar to Angrist and Krueger (1991). We

run reduced form regressions and find that numbers of dating partners do not vary with BMI for

individuals who are normal range (i.e. a BMI between 18 and 25; since these individuals are in

normal range we should not expect an impact of BMI on number of dating partners for this

group). This implies that BMI is not correlated with unobserved variables that affect the number

of dating partners (like fidelity for example). Then we further check by adding BMI in the

marriage equation as an explanator. If the BMI is correlated with anything relevant to marriage

besides the number of dating partners then the coefficient of BMI should be significant in such

regressions. It is not (with p value always above 0.50), which lends strong support to the validity

of BMI as an instrument. Same holds for female sample.

This kind of strategy cannot directly be applied for our other instruments. However, as

Murray (2006) suggests results from overidentification tests are more beleivable when we know

we already have at least one valid instrument. We also estimated our model with lagged obesity

as the only instrument.The results are very similar to the results presented here. This gives us

additional confidence in our results.

4.2 Female Subsample

We perform the same analysis for the female subsample. Tables (4) and (5) present the

coefficient posterior means and standard deviations, as well as their probabilities of being

positive; the point estimates of the function )(sf are plotted in Figures (4) and (5). The

problem of endogeneity seems to be less severe in the female subsample. In particular, the

probability that the correlation parameter uv

, which relates the unobservables in the marriage

(cohabitation) equation to those in the dates equation, is negative is estimated to be only 0.648

(0.848). Although it still suggests the presence of endogeneity, the evidence is less conclusive

than in the male subsample. The simple probit results show a negative association between

number of dating partners and probability of marriage for the female subsample too.

The top panel of Table 4 presents the point estimates for the marriage equation. Like the

male subsample, all covariates have the expected sign. For the females subsample, we also

control for their pregnancy status. Not surprisingly, it is associated with higher probability of

marriage. The nonparametric curve )(sf is flatter than its counterpart in the male subsample

for the marriage-dates equation system, whereas it is steeper than its male counterpart in the

cohabitation-dates equation system. Also, for cohabitation, the impact seems rather linear in

females while it is concave for males.

Taken together these results suggest that information gained from dating may be used differently

by men and women. For men the information from dating seems to improve the value of

marriage, but for women the impact is relatively small. This could be due to the fact that women

start off with a high level of knowledge about men and hence information is not as valuable to

them, or they are willing to marry with relatively less information. The latter may happen if

women value marriage more than men. For example, if women value marriage-specific capital

like children or commitment of long term relationship more than men, then they may be willing

to marry men with relatively less information.

[Figures (4) and (5) here]

Figure (6) reports the probabilities that a female would get married (left panel) or start

cohabiting (right panel) given her number of dating partners, while marginalizing out all other

covariates. Like the male subsample, the curve for cohabitation is rather linear while the curve

for marriage is convex. However there are important differences. The impact of an increase in

the number of dating partners on marriage is much stronger in males. The probability of getting

married for males is below the probability of getting married for females for small number of

dating partners. But as the number of dating partners increases (say above 25) the probability of

getting married for males goes above the probability for female. It should be noted that for the

full sample the unconditional probability of marriage is higher for females than for males, and

the above result is consistent with that. Note that the unconditional probability of marriage is

given by

),()|marriage()marriage(

1

jsjs

j

PPP

where s is the number of dates. The probability of observing females with high number of

dating partners is comparatively low. In fact, 65% of the females in our sample had less than 10

dating partners and 82% less than 20 dating partners. Males however have more dating partners,

45% of males have less than 10 and 65% have less than 20 dating partners. So the unconditional

probability of marriage for females is higher than males. Also, it should be noted that the patterns

we observe in average covariate effect is similar to what we observe in our estimate of

nonparametric function.

[Figures (6) here]

We also perform the same overidentification test as in the previous section to check the

validity of the instruments. Recall that for the female subsample, we use variables hours worked

in last year (HoursWorked) and lagged obesity status (ObeseLag) as instruments. For the

marriage-dates system, assuming HoursWorked is a valid instrument; the Bayes factor in favor of

imposing the restriction that the coefficient associated with ObeseLag equals zero is computed to

be 526. Repeating the procedure but now assuming ObeseLag is valid, we obtain a Bayes factor

of 22 in favor of the hypothesis that HoursWorked is also valid. Therefore, there is strong

evidence that both HoursWorked and ObeseLag are valid instruments. For the cohabitation-dates

equation system, we perform the same exercise and the corresponding Bayes factors are 16 and

14, suggesting HoursWorked and ObeseLag are also valid instruments for this system.

5. Concluding Remarks

This paper investigates the causal relationship between the number of dating partners and

marriage or cohabitation formation. Since the number of dating partners could be correlated with

unobserved preferences, we explicitly treat it as an endogenous variable. We also allow number

of dating partners to affect marriage and cohabitation formation non-parametrically. Using the

NLSY97 dataset, we find that after controlling for endogeneity, an increase in the number of

dating partners increases the probability of both marriage and cohabitation for both men and

women, although the effect is stronger in men for marriage and women for cohabitation. We

check for and provide evidence of validity of our instruments. We also estimated a slightly

different specification with lagged obesity being the only instrument.The results are very similar

to the results presented above. We however prefer the above specification as we cannot test the

validity of the instruments in an exactly identified system. If we do not control for endogeneity,

then the number of dating partners shows a negative impact on marriage. It is not surprising

given the evidence on endogeneity above. Our results suggest that information gained from

dating is presumably used differently by men and women. Returns from dating (in terms of

increased probability (value)) seem to be higher for men than women. This could be due to the

fact that women start off with a higher level of knowledge about men, and hence the marginal

benefit is lower. On the other hand, it might be the case that women are willing to marry with

relatively less information. We also find that the probability of marriage is lower in men

compared to women for relatively low number of dating partners, and it is higher in men for very

high number of dating partners.

References

Albert, J., and Chib, S. (1993). Bayesian Analysis of Binary and Polychotomous Response

Data. Journal of the American Statistical Association, 88:669-679.

Angrsit, J., and A. Krueger (1991), "Does Compulsory School Attendance Affect Schooling

and Earnings?," Quarterly Journal of Economics, November

Cawley, J., Joyner K., and Sobal, J. (2006). Size Matters: The Influence of Adolescents'

Weight and Height On Dating and Sex. Rationality and Society, 18(1):67-94.

Chib, S. and Jeliazkov, I. (2006). Inference in Semiparametric Dynamic Models for Binary

Longitudinal Data. Journal of the American Statistical Association, 101:685-700.

Clarkberg, M. (1999). The Price of Partnering: The Role of Economic Well-being in Young

Adults' First Union Experiences. Social Forces, 77:945-68.

Fisman, R., Iyengar, S. S., Kamenica, E. and Simonson, I. (2006). Gender Differences in

Mate Selection: Evidence from a Speed Dating Experiment. Quarterly Journal of Economics,

121(2):673-97.

Gelman, A., Carlin, J. B., Stern, H. S., and Rubin, D. B. (2003). Bayesian Data Analysis,

Second Edition. Chapman & Hall/CRC.

Geweke, J. (1991). Efficient Simulation from the Multivariate Normal and Student- t

Distributions Subject to Linear Constraints and the Evaluation of Constraint Probabilities. In

Computing Science and Statistics: Proceedings of the 23rd Symposium on the Interface (E.

Keramidas and S. Kaufman ed.), 571-578.

Kline, B., and Tobias, J. L. (2008). The Wages of BMI: Bayesian Analysis of a Skewed

Treatment-Response Model with Nonparametric Endogeneity. Journal of Applied

Econometrics, 23(6):767-793.

Koop, G. (2003). Bayesian Econometrics. Wiley-Interscience.

Koop, G., and Poirier, D. J. (2004). Bayesian Variants of Some Classical Semiparametric

Regression Techniques. Journal of Econometrics, 123(2):259-282.

Koop, G., Poirier, D. J., and Tobias, J. L. (2007). Bayesian Econometric Methods.

Cambridge University Press.

Lancaster, T. (2004). Introduction to Modern Bayesian Econometrics. Wiley-Blackwell.

Levitt, S. D. (1996), The effect of Prison Population Size on Crime rates: Evidence from

Prison Overcrowding Litigation. Quarterly Journal of Economics, 111:2, May, 319-351.

Levitt, S. D. (1997), Using Electoral Cycles in Police Hiring to Estimate the Effect of Police

on Crime, American Economic Review, 87:4, June, 270-290.

Lillard, L. A., Brien, M. J., and Waite, L. J. (1995). Premarital Cohabitation and Subsequent

Marital Dissolution: A Matter of Self-Selection? Demography, 32(3):437-57

MacDonald, M. M. and Rindfuss, R. R. (1981). Earnings, Relative Income, and Family

Formation. Demography, 18:123-36.

Mare, R.D. and Winship, C. (1991). Socioeconomic Change and the Decline of Marriage for

Blacks and Whites. In The Urban Underclass (C. Jencks and P.E. Peterson ed.), 175-202.

Murray, Michael P. 2005. The Bad, the Weak, and the Ugly: Avoiding the Pitfalls of

Instrumental Variables Estimation. Social Science Research Network Working Paper No.

843185.

Murray, P. M., 2006, Avoiding Invalid Instruments and Coping with Weak Instruments,

Journal of Economic Perspectives, Volume 20, Number 4, Pages 111-132.

Nobile, A. (2000). Comment: Bayesian Multinomial Probit Models with a Normalization

Constraint. Journal of Econometrics, 99:334-345.

Oppenheimer, V. K. (1997). Women's Employment and the Gains to Marriage: The

Specialization and Trading Model of Marriage. Annual Review of Sociology, 23:431-53.

Robert, C. P. (1995). Simulation of Truncated Normal Variables. Statistics and Computing,

5:121-125.

Svarer, M (2004). Is Your Love in Vain? Another Look at the Premarital Cohabitation and

Divorce. Journal of Human Resources, 39(2):523-535.

Sweeney, M. M. (2002). Two Decades of Family Change: The Shifting Economic

Foundations of Marriage. American Sociological Review, 67:132-47.

Train, K. E. (2003). Discrete Choice Methods with Simulation. Cambridge University Press.

Verdinelli, I. and Wasserman, L. (1995). Computing Bayes Factors Using a Generalization of

the Savage-Dickey Density Ratio. Journal of the American Statistical Association, 90:614-

618.

Xie, Y., Raymo, M. J., Goyette, K., and Thornton, A. (2003). Economic Potential and Entry

into Marriage and Cohabitation. Demography, 40(2):351-67.

Table 1: Data Summary Statistics

Women Men

Marriage Cohabitation Marriage Cohabitation

mean s.d. mean s.d. mean s.d. mean s.d.

Union 0.04 0.19 0.07 0.25 0.024 0.15 0.044 0.20

Age 19.8 1.5 19.7 1.5 19.8 1.5 19.7 1.5

Education 12.6 1.8 12.6 1.7 12.2 1.7 12.3 1.7

Black 0.29 0.45 0.29 0.45 0.25 0.43 0.25 0.43

Hispanic 0.20 0.40 0.20 0.40 0.21 0.41 0.21 0.41

Urban 0.80 0.40 0.80 0.40 0.78 0.41 0.78 0.41

Pregnancy status 0.10 0.30 0.10 0.30 - - - -

Smoker 0.40 0.49 0.38 0.48 0.45 0.49 0.44 0.50

Regular alcohol user 1.1 2.79 1.1 2.72 2.4 4.7 2.3 4.5

Regular marijuana user 1.6 5.7 1.4 5.33 3.0 7.9 2.8 7.7

Height 64.6 2.9 64.6 2.9 70.3 3.4 70.3 3.4

Health 2.17 0.94 2.15 0.93 1.96 0.92 1.94 0.91

Number of dating partners 16.80 18.86 16.03 18.17 28.6 32.7 28.1 32.1

Obese 0.15 0.35 0.14 0.35 0.14 0.35 0.14 0.35

Income category 1 0.25 0.43 0.26 0.43 0.25 0.43 0.25 0.43

Income category 2 0.37 0.48 0.38 0.48 0.30 0.46 0.30 0.46

Income category 3 0.17 0.38 0.16 0.37 0.15 0.36 0.15 0.35

Income category 4 0.18 0.38 0.17 0.38 0.25 0.43 0.24 0.42

Income category 5 0.03 0.17 0.02 0.15 0.06 0.24 0.06 0.24

Number of observations 10678 9625 10565 9985

Table2: Model parameters posterior means, standard deviations, and probabilities of being

positive. Marital choice of the male subsample.

Marriage equation

Variable E(⋅|y) √(Var(⋅|y)) P(⋅>0|y)

Age 0.066 0.017 1.000

Education -0.014 0.019 0.238

UrbanDummy -0.040 0.074 0.285

Alcohol -0.040 0.010 0.000

Marijuana -0.040 0.009 0.000

SmokerDummy -0.114 0.061 0.034

IncomeCat 0.152 0.030 1.000

Health 0.037 0.031 0.884

Black -0.402 0.082 0.000

Hispanic -0.094 0.072 0.094

Other 0.131 0.236 0.721

Dates equation


Constant -3.088 2.918 0.144

Age 2.086 0.201 1.000

Education -1.087 0.193 0.000

UrbanDummy 5.071 0.743 1.000

Alcohol 0.779 0.070 1.000

Marijuana 0.247 0.041 1.000

SmokerDummy 4.100 0.664 1.000

Height -0.089 0.059 0.067

ObeseLag -3.621 0.859 0.000

IncomeCat 1.548 0.262 1.000

Health -1.286 0.338 0.000

Black 4.561 0.768 1.000

Hispanic 5.018 0.793 1.000

Other -1.766 2.187 0.209

Other parameters


1003 13.89 1.000

ρ_{uv} -0.269 0.213 0.071

Η 2.34×10 1.72×10 1.000


positive. Cohabitation choice of the male subsample.

Cohabitation equation


Age -0.001 0.018 0.411

Education -0.053 0.018 0.000

UrbanDummy -0.010 0.065 0.427

Alcohol -0.030 0.007 0.000

Marijuana 0.001 0.003 0.669

SmokerDummy 0.138 0.061 0.992

IncomeCat 0.098 0.026 1.000

Health 0.012 0.026 0.686

Black 0.014 0.061 0.583

Hispanic -0.010 0.067 0.425

Other -0.136 0.242 0.292

Dates equation


Constant -3.440 2.931 0.122

Age 2.014 0.203 1.000

Education -1.125 0.198 0.000

UrbanDummy 5.302 0.755 1.000

Alcohol 0.783 0.074 1.000

Marijuana 0.252 0.043 1.000

SmokerDummy 4.131 0.673 1.000

Height -0.062 0.058 0.143

ObeseLag -3.423 0.858 0.000

IncomeCat 1.687 0.264 1.000

Health -1.333 0.350 0.000

Black 3.909 0.767 1.000

Hispanic 4.985 0.797 1.000

Other -4.019 2.154 0.029

Other parameters


σ_{v}² 967 13.65 1.000

ρ_{uv} -0.310 0.209 0.090

Η 3.97×10 3.55×10 1.000


positive. Marital choice of the female subsample.

Marriage equation


Age 0.066 0.019 0.999

Education -0.058 0.013 0.000

UrbanDummy -0.165 0.062 0.004

Alcohol -0.064 0.015 0.000

Marijuana -0.010 0.005 0.039

SmokerDummy -0.123 0.062 0.025

IncomeCat 0.097 0.027 0.999

Health -0.014 0.025 0.281

PregnantDummy 0.487 0.067 1.000

Black -0.524 0.079 0.000

Hispanic -0.015 0.062 0.407

Other -0.080 0.223 0.372

Dates equation


Constant -12.824 1.946 0.000

Age 1.032 0.111 1.000

Education 0.188 0.109 0.955

HoursWorked 0.076 0.026 0.997

UrbanDummy 2.691 0.446 1.000

Alcohol 0.763 0.066 1.000

Marijuana 0.253 0.032 1.000

SmokerDummy 3.595 0.389 1.000

ObeseLag -0.976 0.505 0.025

IncomeCat 0.874 0.195 1.000

Health 0.597 0.192 0.999


Black -2.804 0.441 0.000

Hispanic -1.932 0.483 0.000

Other -2.930 1.597 0.034

Other parameters


σ_{v}² 325 4.41 1.000

ρ_{uv} -0.082 0.200 0.352

Η 3.32×10 2.73×10 1.000


positive. Cohabitation choice of the male subsample.

Cohabitation equation


Age -0.063 0.012 0.000

Education -0.031 0.011 0.005

UrbanDummy 0.023 0.056 0.647

Alcohol -0.050 0.008 0.000

Marijuana 0.000 0.004 0.535

SmokerDummy 0.112 0.059 0.973

IncomeCat 0.078 0.024 0.999

Health 0.045 0.020 0.987


Black -0.062 0.056 0.127

Hispanic 0.012 0.057 0.596

Other -0.137 0.212 0.267

Dates equation


Constant -10.434 1.964 0.000

Age 0.789 0.114 1.000

Education 0.360 0.114 0.999

HoursWorked 0.087 0.026 0.999

UrbanDummy 2.805 0.449 1.000

Alcohol 0.738 0.069 1.000

Marijuana 0.271 0.034 1.000

SmokerDummy 3.641 0.397 1.000

ObeseLag -1.499 0.499 0.001

IncomeCat 0.815 0.198 1.000

Health 0.468 0.194 0.992


RaceCat1 -2.505 0.445 0.000

RaceCat2 -2.583 0.489 0.000

RaceCat3 -2.880 1.655 0.041

Other parameters


σ_{v}² 302 4.366 1.000

ρ_{uv} -0.433 0.257 0.152

Η 3.53×10 3.39×10 1.000

Figure 1: Posterior means of the function f (s) relating the number of dating partners to

marriage choice for the male sample.

Figure 2: Posterior means of the function f (s) relating the number of dating partners

to cohabiting choice for the male sample.

Figure 3: The probabilities that a male subject would get married (left panel) and start

cohabiting (right panel) given his number of dating partners.

Figure 4: Posterior means of the function f (s) relating the number of dating partners to

marriage choice for the female sample.

Figure 5: Posterior means of the function f (s) relating the number of dating

partners to cohabiting choice for the female sample.

Figure 6: The probabilities that a female subject would get married (left

panel) and start cohabiting (right panel) given her number of dating partners.

Effect of the Number of Dating Partners on Marriage and ...

Documents

Transcript of Effect of the Number of Dating Partners on Marriage and ...