The Causal Effect of Parents’ Education on …Of course, because the child’s schooling choice is...

61
The Causal Effect of Parents’ Education on Children’s Earnings * Sang Yoon (Tim) Lee Department of Economics University of Mannheim Nicolas Roys Department of Economics University of Wisconsin-Madison Ananth Seshadri Department of Economics University of Wisconsin-Madison March 2015 Preliminary; comments welcome. Abstract We present a model of endogenous schooling and earnings to isolate the causal effect of par- ents’ education on children’s education and earnings outcomes. The model delivers a positive relationship between parents’ education and children’s earnings and an ambiguous relation- ship with children’s schooling. Identification is achieved by comparing the earnings levels of children with the same schooling level but of parents with different schooling levels. A gen- eralized version of the model with heterogeneous tastes for schooling is estimated using the HRS data with parents’ schooling. The empirically observed, positive OLS coefficient from regressing children’s schooling on parents’ schooling is mainly accounted for by the correla- tion between parents’ schooling and children’s unobserved tastes for schooling. But this is countered by the negative structural relationship between parents’ and children’s schooling choices, resulting in a negative or close to zero IV coefficient when exogenously increasing par- ents’ schooling. Nonetheless, an exogenous one-year increase in parents’ schooling increases a child’s lifetime earnings by 1.2 percent on average. * We thank Steven Durlauf and Chris Taber for very helpful conversations which motivated this paper. Junjie Guo and Wei Song provided outstanding research assistance. We also thank Mariacristina De Nardi, Eric French and seminar participants at the IFS. 1

Transcript of The Causal Effect of Parents’ Education on …Of course, because the child’s schooling choice is...

Page 1: The Causal Effect of Parents’ Education on …Of course, because the child’s schooling choice is also endogenous, the reduced form evidence does not tell us what the exact model

The Causal Effect of Parents’ Educationon Children’s Earnings ∗

Sang Yoon (Tim) LeeDepartment of EconomicsUniversity of Mannheim

Nicolas RoysDepartment of Economics

University of Wisconsin-Madison

Ananth SeshadriDepartment of Economics

University of Wisconsin-Madison

March 2015

Preliminary; comments welcome.

Abstract

We present a model of endogenous schooling and earnings to isolate the causal effect of par-ents’ education on children’s education and earnings outcomes. The model delivers a positiverelationship between parents’ education and children’s earnings and an ambiguous relation-ship with children’s schooling. Identification is achieved by comparing the earnings levels ofchildren with the same schooling level but of parents with different schooling levels. A gen-eralized version of the model with heterogeneous tastes for schooling is estimated using theHRS data with parents’ schooling. The empirically observed, positive OLS coefficient fromregressing children’s schooling on parents’ schooling is mainly accounted for by the correla-tion between parents’ schooling and children’s unobserved tastes for schooling. But this iscountered by the negative structural relationship between parents’ and children’s schoolingchoices, resulting in a negative or close to zero IV coefficient when exogenously increasing par-ents’ schooling. Nonetheless, an exogenous one-year increase in parents’ schooling increases achild’s lifetime earnings by 1.2 percent on average.

∗We thank Steven Durlauf and Chris Taber for very helpful conversations which motivated this paper. Junjie Guoand Wei Song provided outstanding research assistance. We also thank Mariacristina De Nardi, Eric French and seminarparticipants at the IFS.

1

Page 2: The Causal Effect of Parents’ Education on …Of course, because the child’s schooling choice is also endogenous, the reduced form evidence does not tell us what the exact model

1 Introduction

Parents have a large influence on their children’s outcomes. The correlation between parents’and children’s schooling is as high as 0.5, and even after controlling for children’s schooling, par-ents’ schooling or earning levels have a positively significant effect on children’s earnings. Doesthis merely represent correlation in unobserved heterogeneity across generations? Or does it alsopartly reflect human capital spillovers from parent to child? These are important questions fromthe perspective of education policy. After all, if intergenerational correlations merely reflect se-lection, government subsidies aimed at improving education would only impact one generation.On the other hand, in the presence of intergenerational externalities, the returns from such publicinvestments are reaped by all succeeding members of a dynasty, resulting in long-lasting effects.Redistributive education policies may also go beyond reducing inequality within a single genera-tion and have positive impacts on intergenerational mobility.

Furthermore, intergenerational spillovers form an integral part of models of human capital ac-cumulation such as Becker and Tomes (1986). Being able to identify and estimate the magnitude ofthese spillovers is consequently important not just for better understanding the effects of socioe-conomic policy, but also to gain a better understanding of the mechanics through which inequalityis formed and persists over generations. How important is nature (individual abilities that cannotbe affected by exogenous intervention) as opposed to nurture (investments into human capital) indetermining the next generation’s economic status? How do forced increases in the schooling ofparents affect the schooling and earnings of children? These are the questions we seek to addressin this paper.

We begin with a very simple life-cycle model of human capital accumulation owing to Ben-Porath (1967), which encompasses both schooling and learning on-the-job, and in which humancapital represents an individual’s economic earnings ability. In this model, individual earning pro-files are determined by their initial level of human capital, and the speed, or innate learning ability,at which the individual accumulates human capital.1 We augment this model in two aspects. First,we posit that an individual’s initial level of human capital when he begins schooling at age 6 isalso a function of his parent’s human capital, which is what represents the parental spillover inour model.2 Second, we assume that the initial level of human capital is also affected by learningability, i.e., the speed at which human capital is accumulated after age 6. Thus, the initial leveland speed of human capital accumulation are affected by his own ability, which is unobservedbut correlated across generations, while the parent’s human capital, which can be proxied by theschooling or earnings levels of the parent and thus observed, only affects the child early on in life.

The simple model delivers analytical expressions for the schooling and earnings of a child as afunction of the parent’s human capital, which closely resemble those that have been estimated inthe empirical literature. In particular, we demonstrate that the identification problem of separating

1The Mincer regression is a special case of this set-up when post-schooling time allocation declines linearly untilretirement.

2For the purposes of this paper, we refer to the causal effect of parents’ education on children’s earnings as the“parental spillover."

2

Page 3: The Causal Effect of Parents’ Education on …Of course, because the child’s schooling choice is also endogenous, the reduced form evidence does not tell us what the exact model

parental spillovers from selection on abilities can be rectified by using information on children’searnings and schooling jointly. The key for identification is that the parents’ human capital affectschildren’s education and earnings outcomes differently, which is true in our model because thespillover occurs early on in life while other factors persist through life. In our model, all elseequal, children with high human capital parents spend less time in school because there is lessneed to accumulate additional human capital before entering the labor market.3 It is still the casethat these children ultimately attain higher levels of human capital and hence higher earnings, butit takes them less time to reach that level, resulting in shorter schooling periods. But in the data,these children in fact spend more time in school. So it must be that high human capital parents,who also tend to have high innate learning abilities, also pass on these abilities which overcomesthe negative level effect, by inducing high learning children to stay longer in school.

What allows identification in our model is this differentiation of the quantity and quality ofyears of schooling: children who attain the same years of schooling can still have different levelsof human capital, which is manifested in the data as differential earnings. Suppose that we observetwo individuals with the same schooling level but different levels of earnings, and parents withdifferent levels of schooling. Through the lens of our model, this reveals the relative magnitudes oftheir learning abilities as a function of the relative magnitudes of their parents’ schooling. Then,the earnings differences between these two individuals can be completely accounted for by theschooling differences of their parents, from which we can recover the size of parental spillovers.Once this is done, the contribution of nature is identified by observing how children’s schoolinglevels vary across parents with different schooling levels.

Our identification scheme relies on parental spillovers having a constant effect over the chil-dren’s life-cycle earnings conditional on the children’s own schooling levels. Is such an assump-tion empirically reasonable? Reduced form evidence suggests that children who attain the sameyears of schooling, but whose mothers have different years of schooling, have parallel earningprofiles with a constant gap. The parallel gap points toward the existence of a parental spilloverthat only affects how much human capital the child accumulates before entering the labor market(schooling), and then remaining constant once controlling for the child’s schooling level. More-over, this gap is indeed similar across different child schooling levels. We show analytically thatthe spillover is precisely picking up this gap. If our model were true, reduced form estimates fromthe Health and Retirement Survey data indicate that educating a mom for an extra year is equiv-alent to having a mom with an extra year of education—i.e., the treatment and selection effectsare similar. Furthermore, five extra years of mom’s education has the same reduced form effect asone additional year of own schooling, suggesting that as much as 20% of parents’ education spillsover to child’s earnings.

Of course, because the child’s schooling choice is also endogenous, the reduced form evidencedoes not tell us what the exact model implied magnitude of parental spillovers on children’s earn-ings is. To bring our model closer to the data, we extend the basic model and estimate it to HRS

3This is in fact what many empirical studies find, as we soon discuss. The idea that less schooling may indicatehigher earnings prospects goes as far back as Willis and Rosen (1979).

3

Page 4: The Causal Effect of Parents’ Education on …Of course, because the child’s schooling choice is also endogenous, the reduced form evidence does not tell us what the exact model

data on individual schooling and earnings, and parents’ schooling. In addition to parents’ school-ing levels, which is observed, and learning abilities, which is unobserved, we account for a thirdsource of heterogeneity—an unobserved taste for schooling.

We let both the child’s ability and taste for schooling be correlated with his parent’s schoolinglevel, in line with evidence that other factors not related to economic or financial returns (psy-chological factors, non-cognitive skills, misinformation) may induce a child to attain more or lesseducation (Betts, 1996; Oreopoulos et al., 2008; Rege et al., 2011). Moreover, several studies that es-timate the returns to education have shown that pecuniary motives alone fall short of explainingeducation choices (Heckman et al., 2006, 2008). In our context, less educated parents may dis-courage children’s academic achievement due to lack of information, or more educated parentsmay motivate the children to emulate their parents regardless of future economic outcomes. Sincewe do not explicitly model expectations formulations or non-cognitive skills, our heterogeneoustastes for schooling effectively lumps all such factors together.

By letting the child’s taste for schooling be correlated with his parent’s schooling, the rela-tionship between schooling and parents freely varies from the relationship between earnings andparents. Our estimates suggest that most of schooling differences can be explained by this het-erogeneity in tastes across parents with different levels of education, evidence that ceteris paribus,pecuniary motivation has little causal effect on children’s schooling outcomes. Nonetheless, aforceful increase in mom’s schooling is found to have a 1.2% causal boost on lifetime earnings.However, the effect is heterogeneous both across individuals and over the life-cycle. On average,the earnings increase mainly comes from reducing children’s need for schooling so that they enterthe labor market early (thereby decreasing foregone earnings), but there is almost no differencein their earnings after age 25. For children of less educated parents, however, the earnings boostsustains over the child’s entire lifetime.

Related literature By no means are we the first to estimate the causal effect of parents on chil-dren’s outcomes. We contribute to this literature by incorporating insights from a human capitalmodel of earnings and education, and the recent literature that emphasizes the potentially largeimpact of early childhood interventions. In particular, Cunha and Heckman (2007) note that earlychildhood investments are difficult to separate from genetic transmissions. Ours is a first attemptto fill this gap.

Since Solon (1999), a broad literature has studied the causal effect of parents on children’searnings and/or education.4 The common challenge for all these studies is to separately iden-tify the unobserved correlation between parents’ and the children’s abilities from the unobservedcausal impact of parental spillovers. The typical approach has been to posit a linear relationshipbetween parents’ and children’s education, and employ special data on twins, adoptees or com-pulsory schooling reforms as natural experiments for identification (Behrman and Rosenzweig,2002; Plug, 2004; Black et al., 2005). All studies find that the causal intergenerational schooling re-

4Black and Devereux (2011); Sacerdote (2011) are recent surveys of this literature.

4

Page 5: The Causal Effect of Parents’ Education on …Of course, because the child’s schooling choice is also endogenous, the reduced form evidence does not tell us what the exact model

lationship is close to zero or even negative, especially for mothers.5 Because conventional wisdomtells us that the causal intergenerational effect of schooling should be positive, they contemplatewhether the inability of being able to control gender differences in parenting, unobserved effectsfrom mating, etc., in the data, leads to such surprising results.

Conversely, there is also evidence that points to strong inherited genetic effects on children’sschooling outcomes. Early work by Behrman and Taubman (1989), using data on twin parentsand extended family relationships to decompose the variance of observed years of schooling intothe variance of genetic and environmental variables, conclude that as high as 80% is accountedfor by genetics alone. More recently, Plug and Vijverberg (2003), using data on biological andadopted children to separately identify how much of inherited IQ can explain children’s schoolingoutcomes, find more evidence for environment effects, but conclude that the inherited geneticeffect is still at least larger than 50%.

Studies that look at children’s earnings and/or outcomes other than schooling tend to find alarger role for the environment. Bowles and Gintis (2002) estimates how much of the intergener-ational persistence in earnings can be explained by the correlation between the child’s educationand IQ, and his father’s earnings, but has little to say about causal effects as the intermediatevariables themselves are both subject to ability selection and spillovers. In a rare study whereboth information on adopted and biological parents were used, Björklund et al. (2006) find thatthe biological mother’s years of schooling has a larger effect on the child’s years of schooling thanthe adopted mother’s, but adoptive father’s earnings or income has a larger effect on the child’searnings or income than the biological father’s. Sacerdote (2007), using a large sample of Koreanadoptees, finds that while adopted families tend to matter less for both education and earnings,the seem to have a large effect on other social behavior, such as college selectivity and drinking.

Taken altogether, the fact that genetic effects are larger in some cases while environmentaleffects are larger in others is viewed as an inconsistency, especially in terms of the child’s educationand earnings outcomes (Black and Devereux, 2011). This stems at least partially from an implicitunderstanding that parents should have a qualitatively similar effect on the child’s educationand earnings outcomes, which is most likely motivated by the very strong relationship betweenan individual’s education and earnings (Card, 1999). However, just as it may not be natural toexpect the parental environment to have the same effect on the child’s cognitive traits as non-cognitive traits (Cunha et al., 2010; Heckman et al., 2013), we argue that a parent’s educationcan have qualitatively different effects on the child’s education and earnings as well. Anotherconfounding factor when interpreting the parental effect on children’s schooling is to what extentthe education choice for or by the child was economically motivated. While this is well recognizedin the literature on returns to schooling (Heckman et al., 2006, 2008), as of yet no attempts havebeen made to link non-pecuniary motives with a potential discrepancy between the parental effecton children’s education as opposed to children’s earnings.

Our first departure from the literature is to posit a non-linear model in which the individual

5With the exception of Black et al. (2005), who find weak evidence of a positive mother-son relationship.

5

Page 6: The Causal Effect of Parents’ Education on …Of course, because the child’s schooling choice is also endogenous, the reduced form evidence does not tell us what the exact model

takes his parent’s variable(s) as a state to solve a lifetime optimization problem, as opposed toa linear explanatory variable. The negative causal effect of parents on children’s schooling thatthe solution admits may sound counterintuitive at first. But such a level effect (that high initialconditions substitute later investments) have been found to be important in previous life-cyclemodels of human capital accumulation (Heckman et al., 1998; Huggett et al., 2011). Moreover, thesolution explains the child’s schooling and earnings outcomes jointly, rather than single outcomesone by one. In a mechanical sense, this is how we are able to identify the two unobservables,namely, the cross-sectional correlation between children’s abilities and their parents’ schooling,and the size of the intergenerational spillovers. Since the resulting causal effect on earnings isstill positive, our model is at once consistent with the empirical findings that parents may havea negative causal effect on children’s schooling, and the limited evidence that fathers’ or familyincomes have a positive causal effect on children’s earnings.

The second aspect that differentiates our approach is that in our estimation, we explicitly sep-arate pecuniary and non-pecuniary motives for schooling. Non-pecuniary motives are estimatedto be quite large in life-cycle models with schooling and/or occupation choice.6 There is also sub-stantial evidence that children from less advantaged families are more likely to be misinformedabout education returns (Betts, 1996; Avery and Turner, 2012; Hoxby and Avery, 2013), whichwould also be captured as part of the taste for schooling in our setting. The existence of non-pecuniary motives presents difficulties when estimating causal effects from the data, but also pro-vides some discipline on how to interpret the results from special data sets. For example, theschooling difference between twin parents or compulsory increases in years of schooling are lesslikely to to be related to other unobserved family characteristics that affect children’s schoolingdecisions, while children adopted to different families very likely do develop the non-cognitiveskills or acquire information that conform to such unobservables. If these unobservables are posi-tively correlated with parents’ education, to some extent it should be expected that the effect of aparents’ education on children’s education should be smaller in twins or IV studies (although theformer is also subject to sampling bias) than adoptee studies.

Recent research differentiates how cognitive and non-cognitive skills formed early in life canexplain various measures of well-being in adulthood (Cunha et al., 2010; Heckman et al., 2013),and the childhood environment has long been suspected as what may explain the large estimatesfor non-pecuniary motives found in structural models of earnings (Bowles et al., 2001; Heckmanet al., 2006). The spillover in our model can be understood as the parental effect on cognitive skillsthat increases the child’s earnings ability, while the correlation of a parent’s education with herchild’s taste for school can be understood as the parental effect on non-cognitive abilities that donot directly relate to earnings and only schooling.

The rest of the paper is organized as follows. Section 2 posits a simple model of human capitalaccumulation and adds to it a parental spillover. The solution to this model is derived analyticallyfrom which we can make empirical predictions. In section 3 we describe the HRS data that we use

6Keane and Wolpin (1997); Heckman et al. (1998) and almost the entire literature that followed.

6

Page 7: The Causal Effect of Parents’ Education on …Of course, because the child’s schooling choice is also endogenous, the reduced form evidence does not tell us what the exact model

and interpret the reduced form evidence through the lenses of our model. Section 4 presents themore comprehensive model which is estimated to the HRS. We also show, quantitatively, that theestimated model inherits properties of the simpler model. Section 5 examines counterfactual pre-dictions of the model including a hypothetical compulsory schooling reform. Section 7 concludes.

2 A Simple Model with Parental Spillovers

In this section, we present a simple variant of a Ben-Porath (1967) life-cycle model of human capitalaccumulation.

2.1 Preliminaries

The Ben-Porath Model Consider first the basic Ben-Porath model without parental spilloversor any tastes for schooling. An individual begins life at age 6, retires at age R > 6 and dies atage T ≥ R, all exogenous to the individual. At age 6, an individual is described by the statevector (h0, z), which denotes his initial stock of human capital and (learning) ability, respectively.An individual chooses time and good investments into human capital accumulation, [n(a), m(a)],a ∈ [6, R), to maximize the present discounted value of net income. His problem is

maxn(a),m(a)

{∫ R

6e−r(a−6) [wh(a)(1− n(a))−m(a)] da

}subject to

h(a) = z[n(a)h(a)]α1 m(a)α2 , a ∈ [6, R),

n(a) ∈ [0, 1], m(a) ≥ 0,

h(6) = h0

where α1 and α2 are the returns to time and good investments, respectively. The market wagew and discount rate r are constant and taken as given by the individual. Since the market wagemultiplies human capital to generate earnings, human capital is understood as an individual’s“earning ability" as opposed to z, “ability to learn" (Heckman et al., 1998). Starting from h0, humancapital h(a) is accumulated and evolves over the life-cycle, while z is constant through life.

The above decision problem is a finite horizon problem. When the individual retires, his stockof human capital depreciates completely. Thus, the time path of n(a) weakly decreases with age.The individual state vector captures slope and level effects. Assuming decreasing returns to scale,i.e. α1 + α2 < 1, if the initial stock of human capital h0 and ability z are low and high enough,respectively, the time allocation decision is constrained at the upper bound of 1 for some time andthen strictly declines. Since all time is spend either working or accumulating human capital, thelength of time that n(a) = 1 can be understood as the schooling period. All else equal, individualswith higher ability levels (high z) make more human capital investments and stay longer in school,

7

Page 8: The Causal Effect of Parents’ Education on …Of course, because the child’s schooling choice is also endogenous, the reduced form evidence does not tell us what the exact model

while individuals with a higher initial stock of human capital (h0) stay less time in school.Although the model is posed as if a child, at age 6, chooses a lifetime plan for human capital

investment decisions, it is clear that parents would have a dominant role in making those decisionsat least during earlier stages of life. This could be factored in easily by assuming a parent who hasaltruistic preferences toward the lifetime utility or human capital of her children. But with nouncertainty and short-run credit constraints, an optimizing altruistic parent would make the samedecisions for her child as the child would have chosen on his own.

Assumptions for Identification Now suppose a mass of individuals face the same problem overmultiple generations. We think of intergenerational transmission as parents influencing the initialstate vector (h0, z). To set ideas, denote the initial level of human capital and learning abilityof individual i as (h0i, zi), and his parent’s human capital and learning ability as (hPi, zPi). Ourapproach is to assume a statistical relationship for the parents to focus on the children’s initialconditions. Learning abilities, which remain constant through life, are transmitted exogenouslythrough generations. We assume that log zi is log-linear in log zPi, specifically

log zi = µz + ρzσz

σzP

(log zPi − µzP) + εzi, (1)

where ρz represents the intergenerational correlation of abilities, (µz, µzP) are the population meansof the child and parent generations’ abilities, (σz, σzP) the corresponding standard deviations, andεzi a mean zero, i.i.d. shock with standard deviation σε.

The parents’ human capital hP is likely correlated with their abilities zP, which transmits toz. However, we are not interested in the intergenerational persistence of learning abilities. Byassuming a statistical relationship between hP and zP, z becomes multiplicatively independent ofzP conditional on hP, i.e.

zPi = f (hPi)εPi (2)

⇒ zi = g(hPi)εi, (3)

where (εPi, εi) are i.i.d. across the population. We can now remain completely silent about zP andits correlation with the child’s (h0, z), as long as we know the correlation between log z and log hP

induced by (3), which we denote by ρzhP . This correlation is positive if children of high humancapital parents, who likely have high abilities themselves, also have higher ability children.

If each subsequent cohort began life with the same initial stock of human capital h0i ≡ h0,parents’ education would not have any causal influence on the earnings of children. For there tobe any causal (treatment) effect there must exist a channel through which h0i can be affected by hPi.To the extent that hi(6) ≡ h0i represents the amount of learning that happens before school entry,clearly its formation is also affected by both one’s own learning ability zi and parental investmentsprior to age 6. Such parental investments will depend on the parent’s economic status, which is

8

Page 9: The Causal Effect of Parents’ Education on …Of course, because the child’s schooling choice is also endogenous, the reduced form evidence does not tell us what the exact model

summarized by hPi. So we can write h0i as a function of (zi, hPi); specifically we assume

hi(6) ≡ h0i = f (zi, hPi) = zλi hν

Pi. (4)

We want to identify the elasticity of h0 with respect to hP, evaluated at individual i’s state:

ν ≡ ∂ log f (z, hP)

∂ log hP

∣∣∣∣∣(z,hP)=(zi ,hPi)

which is assumed to be constant for all i. The parameter ν captures the degree to which a higherhuman capital parent transmits more human capital to her child during the first 6 years of life. Thespillover effect is defined as the increase in earnings induced by this transmission. Such a channelis rather standard in the literature on intergenerational transmissions (Becker and Tomes, 1986),and our particular formulation can be considered a reduced form representation of the importanceof early childhood. The parameter λ can be understood analogously: it captures the populationcorrelation between z and h0 conditional on hP, which comes from two channels: i) how much thechild’s ability matters for early human capital formation, and ii) how much of childrens’ abilitiesare not explained by hP.7

In the data, hP can be proxied by a parent’s education and/or earnings, and is thus observed.But since abilities are unobserved, we may face the problem of separately identifying (ν, λ) fromρzhP , i.e., whether a child’s schooling or earnings are correlated with the parent because of hP or z.The goal of this section is to show how the solution to the model allows this. To do so, we abstractfrom tastes for schooling for now, which are only included later in section 4. We do this primarilyto gain insight into the precise mechanisms at work, and understand how the model parameterscan be recovered from a panel individuals assuming (1)-(4). Taste for schooling are included toaccount for any remaining unobserved heterogeneity.

Consequently, we are simply solving a life-cycle decision problem in which a child takes hisability and his parent’s human capital as an exogenous state variable. This means the parentalspillover is essentially an intergenerational externality that is not internalized by previous gener-ations.8

2.2 The Individual’s Problem

For what follows we drop the individual subscript i unless necessary. Let V(a, h) denote the valuefunction for an individual of age a and human capital level h. The problem faced by an individual

7We are assuming away that parents’ human capital can directly affect their children’s learning abilities, but if thiswere the case, λ would pick up a third channel, namely, how much of z is directly affected by hP before age 6. In doingso, we would still be losing any direct effect that hP can have on z after age 6. However, not only do parents have thelargest effect on children in their early years (Del Boca et al., 2013), such an effect is not separately identified from ρzhP

in our simple model. We discuss this more in Section 5.8In other words, individuals do not invest more in their own human capital in anticipation of that investment

spilling over to subsequent generations. If they do, it would only have a larger effect on adult earnings.

9

Page 10: The Causal Effect of Parents’ Education on …Of course, because the child’s schooling choice is also endogenous, the reduced form evidence does not tell us what the exact model

at age 6 given h(6) = h0 can be written

V(6, h0) = max{n(a),m(a)}

{∫ R

6e−r(a−6)g (h(a); n(a), m(a)) da

}h(a) = f (h(a); n(a), m(a)) , n(a) ∈ [0, 1], m(a) > 0.

where the objective function and law of motion are given as

g(h; n, m) = −whn−m

f (h; n, m) = z(nh)α1 mα2 .

This is a continuous time deterministic control problem with state h and controls (n, m). Theterminal time is fixed at R but the terminal state h(R) must be chosen. Since the objective functionis linear, the constraint set strictly convex, and the law of motion strictly positive and concave (aslong as α1 + α2 < 1), the optimization problem is well-defined and the solution is unique (Léonardand Van Long, 1992). The Hamilton-Jacobi-Bellman equation is

rV(a, h)− ∂V(a, h)∂a

= maxn,m

{g(h; n, m) +

∂V(a, h)∂h

f (h; n, m)

}.

As usual, the HJB equation can be interpreted as a no-arbitrage condition. The left-hand side isthe instantaneous cost of holding a human capital level of h at age a, while the the right-hand sideis the instantaneous return. The first order conditions for the controls are

whn ≤ α1z(nh)α1 mα2 ·Vh, with equality if n < 1 (5)

m = α2z(nh)α1 mα2 ·Vh, (6)

where Vh is the partial of V(a, h) with respect to h. These conditions simply state that the marginalcost of investment, on the left-hand side, is equal to the marginal return. The envelope conditiongives (at the optimum)

r ·Vh −Vah = w(1− n) +α1z(nh)α1 mα2

h·Vh + z(nh)α1 mα2 ·Vhh, (7)

where Vxh is the partial of Vh with respect to x ∈ {a, h}. This “Euler equation” states that at theoptimum, the marginal cost of increasing human capital must be equal to the marginal return.Equations (5), (6) and (7) along with the law of motion

h = z(nh)α1 mα2 (8)

characterize the complete solution, given the initial state h(6) = h0 and terminal condition Vh = 0,the appropriate transversality condition for a fixed terminal time problem. We solve this problemin Appendix A and here only present the important results.

10

Page 11: The Causal Effect of Parents’ Education on …Of course, because the child’s schooling choice is also endogenous, the reduced form evidence does not tell us what the exact model

To save on notation, it is useful to define α ≡ α1 + α2 and

q(a) ≡[1− e−r(R−a)

], κ ≡

αα11 αα2

2 w1−α1

r.

PROPOSITION 1: OPTIMAL SCHOOLING CHOICE Define α ≡ α1 + α2 and the function

F(s)−1 ≡ κ(α1

w

)1−α·[

1− (1− α1)(1− α2)

α1α2· 1− e−

α2rs1−α2

q(6 + s)

] 1−α1−α1

· q(6 + s).

The optimal choice of schooling S is uniquely determined by

F′(S) > 0, F(S) ≥ z1−λ(1−α)h−ν(1−α)P (9)

with equality if S > 0.

Proof. See Appendix A.

The higher the learning ability z of an individual, the higher the optimal choice of his schooling(as long as λ(1− α) < 1). Intuitively, conditional on an initial level of human capital, higher zindividual benefit more from schooling. Importantly, the causal, spillover effect from the parent’shuman capital hP on schooling is negative (as long as ν(1− α) > 0). While this may sound coun-terintuitive, it is in line with the empirical evidence in Behrman and Rosenzweig (2002) and Blacket al. (2005) who find that increasing mothers’ years of schooling by 1 year has a negative effect ontheir children’s schooling outcomes. In our model, this relationship holds true because childrenwith high human capital parents have less of a need to stay in school because they start off with ahigher level of human capital (a quantity-quality substitution effect).

In the estimation, we proxy for hP by parents’ schooling.9 Then according to our model, theobserved dependence of children’s schooling on parents’ schooling depends on both ρzhP , thecorrelation between (z, hP), the spillover effect ν, and early childhood learning λ. To see this,suppose (3) takes the form

log zi = µz + ρzhP

σz

σhP

(log hPi − µhP) + εi

= µz + ρzhP log hPi + εi, (10)

where (µhP , σhP) denote the mean and standard deviation of the parents’ human capital.10 NowρzhP denotes the elasticity of z with respect to hP, rather than the correlation. At equality, (9) can

9The model is estimated to the original HRS cohort. In the HRS data, we observe parents’ years of schooling, butnot earnings. To proxy the human capital distribution of the parents by earnings instead of schooling, we assume thatthe parents’ schooling-earnings relationship is identical to a separate Mincer regression run on the HRS AHEAD cohort,who are approximately one generation older than the original HRS cohort.

10This is in fact what we assume for the estimation later in Section 4, equation (19).

11

Page 12: The Causal Effect of Parents’ Education on …Of course, because the child’s schooling choice is also endogenous, the reduced form evidence does not tell us what the exact model

be written as

log F(Si) = [1− λ(1− α)] log zi − ν(1− α) log hPi

= [ρzhP − (ν + λρzhP)(1− α)] log hPi + [1− λ(1− α)] (µz + εi). (11)

Hence depending on other parameter values, we may observe a positive OLS effect between chil-dren and parent’s schooling even if ν is large. Moreover, even if we were to observe that increasinga parent’s human capital exogenously had a small effect on a child’s schooling, it may not implythat there are no spillovers but a large α. But (ν, λ, ρzhP) are not separately identified in such asimple schooling regression, even if (α1, α2) are known.

PROPOSITION 2: POST-SCHOOLING HUMAN CAPITAL For any S, human capital at the end ofschooling, hS, satisfies

whS = C1(S) · z1

1−α , where C1(s) = α1 · [κq(6 + s)]1

1−α

Proof. See Appendix A.

The above proposition tells us that, once the length of schooling is known, the human capital levelof a child is affected only by his own learning ability z. His initial stock of human capital, h0, has noeffect on the amount of human capital accumulated (quality) except through the length of schooling(quantity), S. So both the parental effects of ν and early childhood learning λ are subsumed in thelength of schooling. Although ν has no effect on a given individual’s earnings once his schooling isdetermined, we will see in Corollary 3 that earnings differences across individuals with the samelevel of schooling identifies ν from the data.

Using Proposition 2, we can also describe the dependence of children’s earnings profiles onthe human capital of their parents. Assume that a fraction πn and πm of time and goods invest-ments (n, m), respectively, are subtracted from the value of the human capital to obtain measuredearnings. This simply amounts to assuming that the individual pays for the job training costs inthe form of lower contemporaneous wages (i.e., that employers deduct this fraction before payingemployee wages) .

COROLLARY 1: EVOLUTION OF EARNINGS PROFILES For an individual who attains S years ofschooling, for all a ∈ [6 + S, R),

e(a) = wh(a) [1− πnn(a)]− πmm(a) = [C1(S) + C2(a; S)] · z 11−α

for any (πn, πm) ∈ [0, 1]2, where

C2(a; s) = κ1

1−α ·{

r ·∫ a

6+sq(x)

α1−α dx− (α1πn + α2πm)q(a)

}for a ≥ s− 6.

12

Page 13: The Causal Effect of Parents’ Education on …Of course, because the child’s schooling choice is also endogenous, the reduced form evidence does not tell us what the exact model

Proof. See Appendix A.

By virtue of Corollary 1, what fraction of job training costs are paid for by the firm only dependson age, as long as it is assumed to be constant. Moreover, the exponents for the human capitalproduction function, (α1, α2), are identified by the slopes of the log age-earnings profiles of indi-viduals with the same level of schooling, since age only affects earnings through the function C2.This is standard in models that use this approach.

This expression for earnings in Corollary 1 can also be interpreted as a Mincer-type equationthat relates earnings to schooling. The functions C1 and C2 are common to all individuals anddictates the returns to schooling, while C2 would further tell us the shape of the average age-earnings profile. Of course, we would need to know the exact parameters for these functions andalso be able to control for the unobserved z for testing.

2.3 Identification of Key Parameters

A robust finding in empirical studies is that even after controlling for observables, mothers’ ed-ucation has a statistically significant relationship with children’s schooling and earnings. Theparameters (ν, λ, ρzhP) are a structural representation of this. In this section, we demonstrate thatProposition 1 and Corollary 1 imply that all three parameters are separately identified once wehave data on children’s schooling and earnings outcomes, and the human capital levels of theirparents.

COROLLARY 2: IDENTIFYING λ AND ρzhP Suppose we observe a large, representative sample of individ-uals for whom we know the human capital level of their parents hPi, schooling outcomes Si, and age-earningsprofiles ei(a) (but not zi).

1. If we select only those individuals whose parents have the same hPi = hP, then for any a ∈ [6+ S, R),earnings depend only on S through λ and nothing else:

ei(a|hPi = hP) ∝ [C1(Si) + C2(a; Si)] · F(Si)1

(1−α)[[1−λ(1−α)] . (12)

So if we regress

log ei(a|hPi = hP) = a0 + a1 log [C1(Si) + C2(a; Si)] + a2 log F(Si) + εi,

we recover

a2 = 1/ {(1− α) [1− λ(1− α)]} .

2. Suppose that (z, hP) follows (10). If we regress

log ei(a) = b0 + b1 log [C1(Si) + C2(a; Si)] + b2 log hPi + ηi, (13)

13

Page 14: The Causal Effect of Parents’ Education on …Of course, because the child’s schooling choice is also endogenous, the reduced form evidence does not tell us what the exact model

we recover

b2 = ˆρzhP /(1− α).

So if (α1, α2) is known, as well as the functions (C1, C2, F), λ is recovered from a Mincer regression thatincludes a complete set of dummies for all human capital levels of parents. Likewise, (ρzhP , σz) are jointlyidentified from a Mincer regression linearly controlling for log hP.

Proof. Suppose we observe two age a individuals with different levels of schooling and age aearnings but whose parents have the same level of human capital, denoted by (S1, S2), (e1, e2), and(hP1, hP2), respectively. Let (z1, z2) denote their unobserved learning abilities. Then by Corollary 1,

e1

e2=

[C1(S1) + C2(a; S1)]

[C1(S2) + C2(a; S2)]·(

z1

z2

) 11−α

,

but since hP1 = hP2, by Proposition 1

F(S1)

F(S2)=

(z1

z2

)1−λ(1−α)

⇒ e1

e2=

[C1(S1) + C2(a; S1)]

[C1(S2) + C2(a; S2)]·[

F(S1)

F(S2)

] 1(1−α)[1−λ(1−α)]

which is (12). Part 2 follows trivially from Corollary 1 after plugging in the assumed relationshipbetween z and hP from (10).

Put simply, the magnitude of λ is identified by the Mincer coefficient of children’s schooling onearnings, conditional on parental human capital. Clearly for children with identical levels of hP, thespillover has no role in explaining earnings differences. Also, earnings are not affected by a child’sinitial level of human capital directly, once controlling for schooling (Proposition 2 and Corollary1). Since the human capital technology parameters (α1, α2) are common to all individuals, theonly way that schooling can have heterogeneous effects on earnings is through z’s influence onthe determination of S, which reveals λ.

Even when λ is known, however, the schooling regression in (11) still does not identify ν sep-arately from ρzhP . On the other hand, if we know exactly the underlying functions (C1, C2, F), aMincer regression that controls for an individual’s age and schooling would reveal that the coef-ficient on the human capital of the individual’s parent captures only ρzhP and completely missesν.11 Since earnings are entirely explained by abilities after controlling for schooling, in which allparental effects are subsumed, a linear regression only reveals the relationship between parentsand abilities.

So although they are not directly related to parents, knowledge of the functions (C1, C2, F) areessential for identifying λ and ρzhP . If we could control for age and its interaction with schoolingproperly (the functions (C1, C2)), Corollary 2 would imply that we could simply regress log earn-

11Technically, we can also recover an estimate for the variance of abilities σ2z from (13), so that we can infer not just

the elasticity ρzhP but also the correlation ρzhP .

14

Page 15: The Causal Effect of Parents’ Education on …Of course, because the child’s schooling choice is also endogenous, the reduced form evidence does not tell us what the exact model

ings on parents’ human capital hP, and a function of age and schooling, and the coefficient on thefirst variable would completely reveal ability selection. This is somewhat surprising, since it doesnot pose any identification problems in terms of separating selection from the causal effect ν. Wewill see in Corollary 3, however, that if we instead control for all levels of children’s schooling witha full set of dummies, instead of relying on linearity or specific functional forms, the coefficient onparents’ human capital would identify ν instead.

COROLLARY 3: IDENTIFICATION OF SPILLOVERS Suppose we have the same sample as in Corollary2. If we select only those children with the same level of schooling, Si = S 6= 0, then for any a ∈ [6+ S, R),earnings depend only on hP through (ν, λ) and nothing else:

ei(a|Si = S) ∝ hν

1−λ(1−α)

Pi . (14)

So if we regress

log ei(a|Si = S) = b0 + b1C2(a; Si) + b2 log hPi + εi,

we recover

b2 = ν/ [1− λ(1− α)] .

So if α is known, and since λ is identified from Corollary 2, ν is recovered from a Mincer regression thatincludes a complete set of dummies for all levels of schooling.

Proof. Suppose we observe two age a individuals with the same level of schooling but differentlevels of age a earnings and parents with different levels of human capital, denoted by (S1, S2),(e1, e2), and (hP1, hP2), respectively. Let (z1, z2) denote their unobserved learning abilities. Thenby Corollary 1, since S1 = S2,

e1

e2=

(z1

z2

) 11−α

,

and by Proposition 1,

(z1

z2

)1−λ(1−α)

=

(hP1

hP2

)ν(1−α)

⇒ e1

e2=

(hP1

hP2

) ν1−λ(1−α)

which is (14).

Corollary 3 shows that our model has clear implications for the level and steepness of the age-earnings profiles of two individuals with the same years of schooling but parents with differentlevels of human capital: the steepness of the profiles should be identical while differences in par-ents’ human capital manifest as level differences. In other words, the profiles should be parallel

15

Page 16: The Causal Effect of Parents’ Education on …Of course, because the child’s schooling choice is also endogenous, the reduced form evidence does not tell us what the exact model

with constant gaps, and given λ, this gap identifies ν.The next natural question is whether the posited structure of the model is empirically reason-

able, and why we would not just stop here and run the proposed regressions in Corollaries 2-3.We address these questions in the next section and also present some raw evidence on the relativemagnitudes of ρzhP and ν in the next section. In section 4, we present a generalized model withtastes for schooling so that we do not overestimate the structural effects from the stylized modelthat ignores all sources of unobserved heterogeneity other than abilities.

2.4 Discussion

This section only describes a simple, stylized model. Although we estimate a more complicatedmodel to the HRS data in section 4, we use Corollaries 2 and 3 to discipline our choice of GMMmoments, and show through simulations that the basic intuition carries over. However, the maintakeaway from these corollaries is more than whether our model is a true representation of thedata: it shows that in order to identify the causal effect of parents on children’s earnings, we firstneed to correctly specify the relationship between the child’s own schooling and earnings, andalso between early childhood and later human capital accumulation. Otherwise, even if we hadan ideal instrument for hP, we would not be able to identify the causal effect on earnings. Sincean exogenous increase in hP would affect earnings not only directly but also indirectly through h0

and S, we would not be able to identify the causal effect without knowledge of the latter indirectchannels.

In the simple model, learning ability and parents’ human capital alone accounts for the ob-served correlation of children’s schooling and earnings with parents’ schooling. Later we addtastes for schooling to account for additional sources of unobserved heterogeneity. Notably miss-ing from our model is a consideration for life-cycle uncertainty and short-run borrowing con-straints, which we abstract from for a variety of reasons.

Since we focus on how intergenerational linkages affect childhood, schooling outcomes andlifetime earnings, life-cycle uncertainty during the working stage is not of foremost interest. Also,individual age-earnings profiles are noisy in the very beginning and end of the life-cycle, but quitestable in between ages 23 and 42. Most individuals (in the HRS cohort) display stable earningsprofiles from age 23 onward, peaking around age 42 after which it flattens out (until retirementbehavior begins causing irregularities). Given this smoothness of individual profiles during theseages, which we use as our GMM moments, it is unlikely our estimates are affected by working-ageborrowing constraints.

There is also substantial evidence that short-run constraints were not a major factor for highereducation outcomes in the U.S., at least prior to the 1990s (Carneiro and Heckman, 2002; Belleyand Lochner, 2007). While we do not have direct evidence for the HRS cohort that we estimate ourmodel to, these individuals would have gone to college in the 1950s, a period when foregone earn-ings would have been more relevant for their education decision than direct costs. Moreover, themajor dividing factor in this period was whether an individual finished high school, not college-

16

Page 17: The Causal Effect of Parents’ Education on …Of course, because the child’s schooling choice is also endogenous, the reduced form evidence does not tell us what the exact model

entry (Taber, 2001), and it is unlikely that secondary schooling costs were binding for schoolingchoices in this period. There is also evidence that schooling outcomes not rationalized by ex-pected earnings are better explained by non-pecuniary costs for schooling rather than borrowingconstraints (Cameron and Taber, 2004; Heckman et al., 2006), which we do incorporate into ourestimated model in section 4. In line with such studies, we also find that such taste heterogeneityis indeed important to account for schooling outcomes.

Borrowing constraints for the parent generation may affect, however, how the parent’s humancapital affects early human capital formation of the child, so some caution is required when inter-preting the parameter ν. In addition to a purely exogenous spillover effect, ν would also capturehow investment opportunities differ by parents with different levels of education. Hence ν shouldbe interpreted as a reduced form representation capturing both an exogenous spillover and pos-sibly borrowing constraints during the parent’s young adulthood, not a structural representationof an early childhood human capital production function. Nonetheless, since we add taste hetero-geneity into the estimated model and allow it to be correlated with parents’ education, all otherunobserved heterogeneity pertaining to parental background would be absorbed there, and theeffect captured by ν would be restricted only to pecuniary ones.

3 Data Analysis

While the Health and Retirement Study (HRS) is usually used to study elderly Americans closeto or in retirement, there are several features that make it suitable for studying intergenerationaleffects in the context of our model. First, these older individuals and their parents were lessaffected by compulsory schooling regulations and other government interventions, and more thanhalf never advanced to college. This makes the sample suitable for a model such as ours in whichschooling is a continuous choice.12 Second, education premia was quite stable prior to the 1980s, soit is unlikely for these cohorts to have been surprised by an unexpected rise in education returns. Itis also less likely that the effect of education on earnings outcomes or the reverse effect of expectedearnings on education outcomes changes much from birth year to birth year. Third, the HRScontains information on the schooling level of both parents, and augmented with restricted SocialSecurity earnings data, we can observe individuals’ entire life-cycle earnings histories.13

12Indeed, the HRS displays much more schooling variation than found in other datasets. Many papers studyingintergenerational schooling relationships, such as those cited in the introduction, also focus on earlier periods, but lacklife-cycle earnings information (except for Black et al. (2005), which uses Norwegian administrative data, although theydo not use that information in their analysis).

13One limitation of the HRS is that information on parents is limited to their education. But in most recent datasetswith richer information on parents and family background, we can only observe the beginning of children’s age-earnings profiles, which is also very noisy because the average age for labor market entry is increasing. More impor-tantly though, even datasets that have both richer parental background information and life-cycle earnings histories,such as NLSY79, are usually based on individuals facing compulsory schooling regulations.

17

Page 18: The Causal Effect of Parents’ Education on …Of course, because the child’s schooling choice is also endogenous, the reduced form evidence does not tell us what the exact model

Table 1: Summary Statistics by Education

HSD HSG SMC CLG Total

Schooling7.97 12.00 13.88 16.54 12.33

(2.67) (0.68) (0.50) (3.41)

Mom’s Schooling7.00 9.26 10.11 11.07 9.24

(3.70) (2.98) (3.11) (3.27) (3.60)

Dad’s Schooling6.41 8.72 9.88 11.02 8.91

(3.70) (3.36) (3.59) (3.77) (3.96)

% White 73.31 84.70 85.00 89.30 82.81% Black 21.94 13.24 12.23 6.62 13.82% Hispanic 20.36 4.98 5.86 2.72 8.67

Earnings 23-2716.69 23.39 22.20 18.34 20.24(12.23) (13.23) (12.77) (12.37) (13.00)

Earnings 28-3224.96 34.78 34.13 33.42 31.76(16.36) (17.92) (18.60) (19.56) (18.50)

Earnings 33-3730.90 41.46 41.68 44.15 39.33(18.78) (21.04) (21.72) (23.64) (21.85)

Earnings 37-4234.38 44.27 45.92 52.17 43.79(21.21) (24.16) (28.16) (31.88) (26.96)

# Obs 1349 1647 940 1178 5114*HSD<12, HSG=12, 12<SMC<16, CLG=16+ years of schooling**Years of schooling top-coded at 17.** Standard deviations in parentheses.***Earnings inflated to 2008, measured in $1000.

3.1 Descriptive Statistics

The HRS is sponsored by the National Institute of Aging and conducted by the University ofMichigan with supplemental support from the Social Security Administration. It is a nationalpanel study with an initial sample (in 1992) of 12,652 persons in 7,702 households that over-samples blacks, Hispanics, and residents of Florida. The sample is nationally representative ofthe American population 50 years old and above. The baseline 1992 sample that we use for ourstudy consisted of in-home, face-to-face interviews of the 1931-41 birth cohort and their spouses,if they were married. Follow up interviews have continued every two years after 1992. As theHRS has matured, new cohorts have been added.

For the purposes of our study, we keep 5,760 male respondents born between 1924 and 1941from the 1992 sample.14 We further drop 646 individuals with missing information on their owneducation or mother’s years of schooling. This leaves us with 5,114 individuals. Table 1 describesthis sample by level of education. Children and mom’s schooling are about 12.3 and 9.2 years,respectively, both with a standard deviation of approximately 3.5 years.

A large fraction of HRS respondents gave permission for researchers to gain access, undertightly restricted conditions, to their Social Security earnings records. Combined with self reported

14Most women from this sample have only very short earnings histories. Although the initial HRS sample selectedindividuals from the 1931-1941 cohort, of course many of their spouses were born in different years.

18

Page 19: The Causal Effect of Parents’ Education on …Of course, because the child’s schooling choice is also endogenous, the reduced form evidence does not tell us what the exact model

earnings in the HRS, these earnings records, although top-coded in some cases, provide almostthe entire history of earnings for most of the HRS respondents. We imputed top-coded earningsrecords assuming the following individual log-earnings process15

log e∗i,0 = X′i,tβ0 + ε i,0

log e∗i,t = ρ log e∗i,t−1 + X′i,tβx + ε i,t, t ∈ {1, 2, ..., T}ε i,t = αi + ui,t

where e∗i,t is the latent earnings of individual i at time t in 2008 dollars, Xi,t is the vector of char-acteristics at time t, and the error term ε i,t includes an individual specific component αi, whichis constant over time, and an unanticipated white noise component ui,t. We employed random-effect assumptions with homoskedastic errors to estimate above model separately for men withand without a college degree. Scholz et al. (2006) gives details of the above earnings model, theprocedure used to impute top-coded earnings, and the resulting coefficient estimates.

3.2 Evidence of Spillovers

According to our model, the spillover is subsumed in the choice of schooling (Proposition 1) andmanifests as a gap in log earnings that remains constant through life (Corollary 3). So we splitindividuals into three subsamples depending on their own education levels, according to whetherthey have less than, exactly, or more than 12 years of schooling. Each group is further dividedaccording to whether the mother has more than 8 years of schooling, corresponding to the end ofjunior high in most states in 1900 U.S.. Figure 1 depicts the average age-(log)earnings profiles foreach subsample. The left and right panels compare children with 12 years against children withless and more than 12 years of schooling, respectively.

The profiles support the importance of parents’ human capital: individuals with more edu-cated mothers have higher earnings throughout their life-cycles. For all 3 education levels, theaverage log earnings profiles of children with the same education level but different mother’sschooling levels are nearly parallel with a constant gap. This points to a permanent level effectthat persists throughout an individual’s career with no evidence of increasing steepness in thelog earnings profile. It is precisely this gap that is captured by the spillover parameter ν in ourmodel16. Moreover, the gaps are similar across all three categories of the children’s educationalattainment.17

In contrast, note that the profiles of children with the same mother’s schooling levels becomesteeper for higher education levels. This is consistent with equation (12) in Corollary 2, where

15 Social security earnings records exceeding the maximum level subject to social security taxes were top-coded inthe years 1951 through 1977.

16To be precise, the gap should capture βν/ [1− λ(1− α)], where β is defined in (18).17For robustness, we have tried dividing children and their mothers according to different levels of education. The

parallel gaps remain, although when we split children’s education categories into very fine levels with few observa-tions, the gaps between different levels of mother’s schooling vary slightly. We also confirmed that this evidence ispresent in recent cohorts, through similar exercises using the NLSY79 and PSID.

19

Page 20: The Causal Effect of Parents’ Education on …Of course, because the child’s schooling choice is also endogenous, the reduced form evidence does not tell us what the exact model

Figure 1: Identifying Spillovers8.5

99.5

10

10.5

11

Avera

ge L

og E

arn

ings

0 5 10 15 20 25 30Potential Experience

S=12,Sp<=8 S=12,Sp>8 S=[8,12),Sp<=8 S=[8,12),Sp>8

(a) Children with 12 vs. [8,12) years of schooling.

8.5

99.5

10

10.5

11

Avera

ge L

og E

arn

ings

0 5 10 15 20 25 30Potential Experience

S=12,Sp<=8 S=12,Sp>8 S=(12,16],Sp<=8 S=(12,16],Sp>8

(b) Children with 12 vs. (12,16] years of schooling.

Earnings profiles of children of different schooling levels by mother’s schooling level: 1924-1941 birth cohort. The y-axis is average log annual earnings in 2008 USD. Mothers’ schooling levels are divided by 8 years or below, and morethan 8 years.

earning differences across different education levels would increase in age through the functionC2. If we had perfect knowledge of the relationship between age, schooling, and earnings, and cancontrol for this non-linearity, the differences in the controlled gaps between the profiles of childrenwith different schooling but same mother’s schooling level would identify λ (Corollary 2). Alsoremember that λ is also needed to quantify the size of ν. But these relationships are non-linearand difficult to control for, as we will see in the next subsection.

3.3 Mincer Regressions

Consider a standard Mincer regression augmented with parental schooling:

log ei,a = β0 + β1Si + β2SP,i + f (EXPi,a) + εi,a (15)

where ei,a is the earnings of individual i at age a, and f (·) is a flexible function of EXPi,a, potentialexperience (age-6-S), which we specify in various different ways below. The variables Si and SP,i

denote years of schooling of individual i and individual i’s parents, respectively, and εi,a is anerror term. We estimate different specifications of (15) for earnings data from ages 23 to 42, andtabulate the results in Table 2.18,19

18Although the estimates barely change even if we use all the available earnings records, we restrict ourselves to thisage interval because this is what we estimate the model to in the next section. We begin at age 23 because many earningsrecords are missing prior to this age, and also in our model for all individuals choosing higher levels of schooling. Weend at age 42 because it is close to the peak of the earnings profiles for most individuals, and our model has nothing tosay about the irregular labor supply or retirement behavior at older ages.

19Including race or cohort dummies did not affect our estimates much, so we did not include them in the baselineregression. Moreover, such effects would be absorbed in the unobserved heterogeneity in tastes for schooling, which isincluded in the estimated model, so controlling for them would make the reduced form estimates incomparable withour GMM estimates.

20

Page 21: The Causal Effect of Parents’ Education on …Of course, because the child’s schooling choice is also endogenous, the reduced form evidence does not tell us what the exact model

Table 2: Mincer Regressions

(1) (2) (3) (4) (5) (6) (7) (8) (9)

S 0.090 0.082 0.083 0.080 0.080 0.076 0.097 0.095(99.73) (84.21) (82.29) (78.14) (78.14) (81.02) (35.24) (33.10)

Mom SP 0.017 0.015 0.017 0.018 0.017(22.71) (15.60) (22.44) (22.92) (21.63)

Dad SP 0.011 0.003(15.39) (3.45)

Mom + Dad 0.009(20.64)

EXP×S -0.001 -0.001(-6.09) (-4.85)

EXP×SP -0.000(-2.99)

R2 0.190 0.194 0.193 0.195 0.195 0.190 0.195 0.195 0.200OLS regressions of (log) earnings on own and parents’ years of schooling. HRS initial cohort, males born 1924-1941,ages 23-42. All columns include a full set of dummies for each level of potential experience (age-6-S), except for (6),which includes only a linear and quadratic term instead. t-stats shown in parentheses.

We consider four measures for SP,i: the mother’s and father’s years of schooling, respectively,their sum, and also including both as separate controls. Our theory is silent on which of thesemeasures is more appropriate. However, many studies have found that parental inputs have itsstrongest impact on children’s human capital early on (the ν effect), e.g. Del Boca et al. (2012),who also find that mothers spend more time with their children at an early age. This leads us tosuspect mothers should play the dominant role, which is confirmed in our results in Table 2.

The first specification (1) is a standard Mincer regression with dummies for each potentialexperience level observed in the data. The return to schooling is estimated to be 9%. This is in thelower range of the estimated returns to schooling for more recent cohorts, which is in line with theincreased return to education over the last century (Goldin and Katz, 2007). The returns slightlydecrease to 8.2% when we include mother’s years of schooling in regression (2). The coefficienton mother’s education is 1.7% and statistically significant. This suggests that an additional yearof schooling for the child has about the same effect on earnings as would being born to a motherwith five additional years of schooling.

The results are quite similar when we measure parents’ human capital with the father’s year ofschooling in (3) but with an attenuated coefficient. The rate of return of paternal education is 1.1%and is statistically significant. The coefficient drops further when we measure parents’ humancapital as the sum of the schooling of both parents in (4). If we include both separately as in (5),mother’s education is very slightly reduced from 1.7% to 1.5% while the coefficient on fathersdrop significantly from 1.1% to 0.3%. This implies the lack of perfectly assortative mating, andthat mom’s schooling has dominant explanatory power. The relative magnitudes of the parentalcoefficients are consistent with previous studies that have run similar regressions. So for whatfollows, we take mother’s schooling as the proxy for parent’s human capital.

21

Page 22: The Causal Effect of Parents’ Education on …Of course, because the child’s schooling choice is also endogenous, the reduced form evidence does not tell us what the exact model

In column (6) we replaced the experience dummies with a linear and quadratic in experience,and in columns (7) and (8) we added two interactions terms to experience dummies: one betweenown education and experience and another between mother’s years of schooling and experience.What is noticeable here is not the coefficients on the interaction terms themselves, but that theMincer coefficient on S is as much as 2 percentage points higher than other specifications. Lastlyin column (9), we controlled for individual schooling by including a full set of dummies for allobserved years of schooling in the data (0 to 17) instead of linearly. Again, the coefficient onmother’s years of schooling is 1.7% and highly significant.

We have also rerun these sets of regressions controlling for race and cohort dummies, andalso for white males separately. The results are quite stable across all specifications. The strikingfeature is that no matter how we control for experience, the coefficients on S and mom’s SP aresimilar in magnitude and highly significant. To summarize, mothers’ schooling have a strongerrelationship with sons’ earnings than fathers’, and the estimated effect of mother’s schooling, orβ2, is about 1.7%. This is in the range of previous empirical work on this topic (Card, 1999). Thequestion is, does this reflect intergenerational persistence of abilities or causal spillovers?

Insofar as S is controlled for linearly in (2), our model would indicate that β2 captures onlyability selection (Corollary 2). On the other hand, (9) would capture spillover effects (Corollary3). But without knowledge of the true schooling-earnings relationship, depending on what thedifferent specifications for experience are capturing in (6)-(8), it is unclear whether the coefficientsin these columns can be interpreted as selection or spillovers. The small but statistically signif-icant interaction terms and the larger coefficients on S in (7) and (8) point toward a non-linearrelationship between education and earnings. We also saw in Figure 1 that the main difference inthe profiles between children with different levels of schooling but same mother’s schooling wasthe steepness of their profiles. This may take the interpretation of (7) and (8) closer to (9), wherewe control for all levels of schooling, than (2). Hence, the fact that β2 is invariant across differentspecifications should rather be interpreted as the overall magnitudes of selection and spilloversbeing similar, with each having approximately one-fifth an effect on earnings as own schooling.

The takeaway is that the interpretation of a parental effect is sensitive to the econometriciancontrols for age and schooling. But since schooling (S) itself is a function of abilities (z) and par-ents’ schooling (SP), no matter how one controls for age and schooling the estimates do not revealthe isolated causal effect of abilities nor spillovers. In terms of our model, this means that withoutknowing the magnitudes of the rest of the parameters in the model, in particular α, which governsthe returns to human capital accumulation, and λ, ability’s role in early human capital formation,we cannot say much about how much of β2 is the selection and/or spillover effect, even in cases(2) and (9). As we emphasized in the introduction, we need to be able to exploit information onschooling and earnings outcomes jointly. The Ben-Porath model does this for us, by admitting anage-schooling-earnings relationship that has been used ubiquitously in the literature. To estimatethe exact values of the spillover and its causal effect, we generalize the simple model of section 2to bring it closer to the data.

22

Page 23: The Causal Effect of Parents’ Education on …Of course, because the child’s schooling choice is also endogenous, the reduced form evidence does not tell us what the exact model

4 Estimation

The theory and evidence up to now suggest a novel way of estimating the causal effect of parent’seducation on children’s earnings. By conditioning on the schooling of the child, we are able toget around some of the selection issues commonly encountered in empirical work. The goal ofthe generalized model we estimate in this section is to retain the simplicity of the frameworkpresented in section 2 and empirical intuition from section 3, and yet be able to obtain realisticpredictions on the transmission of human capital and schooling across generations.

The underlying environment is one in which learning ability z is transmitted across genera-tions. However, our framework does not require us to make specific assumptions on this trans-mission process per se. The magnitude of the spillover is determined only the state of the child,which consists of his or her learning ability z and the human capital level of the parent hP. Hence,for our purposes all that is required is the estimation of ρzhP , the correlation between z and hP.Since an overriding objective is to fit the distribution of schooling, we will also include a tastefor schooling in the objective function. As we already argued, many previous studies find thatnon-pecuniary benefits, in addition to learning ability (selection), play an important role in ratio-nalizing schooling decisions.

In line with the previous sections, the counterfactual experiments in this section show thatthe coefficient on mother’s schooling in a Mincer regression captures selection, in the sense thatwhen ρzhP is set to zero, the counterfactual coefficient is also close to zero. We also show thatthe coefficient changes little even if we set ν = 0. Nonetheless, we show that a counterfactualincrease in mother’s schooling leads to a 1.2% increase in children’s earnings controlling for se-lection on average, while further allowing for selection leads to an additional 1.3% increase. Fur-thermore, this happens without increasing children’s schooling. This is explained by parents withhigher human capital having a negative effect on children’s schooling, as we saw in Proposition1, while children with higher human capital parents having higher tastes for schooling is whatdrives the observed positive intergenerational relationship in schooling. Given this, we conduct acounterfactual schooling reform which shows that a large OLS coefficient when regressing child’sschooling outcomes on parent’s schooling is consistent with a negative or zero IV coefficient, butincreases children’s lifetime earnings by 2.8%.

4.1 A Generalized Model of Parental Spillovers

The biggest change we make in comparison to the model of section 2 is to constrain the optimalchoice of schooling to be discrete (but with many nodes) to be consistent with the data. Anotherbenefit of doing so is that we can use a nested logit model to capture unobserved, non-pecuniarybenefits from schooling. The generalized model also assumes different laws of motion for humancapital during schooling and on-the-job (OJT). The problem faced by an individual at age 6 can be

23

Page 24: The Causal Effect of Parents’ Education on …Of course, because the child’s schooling choice is also endogenous, the reduced form evidence does not tell us what the exact model

written as

V(6, h0) = maxS∈{8,10,12,14,16,18}

{V (S; 6, h0) + ξS

}where ξ ≡ [ξS] is a vector that represents a non-pecuniary benefit for each level of schooling(but measured in pecuniary units) that varies across individuals and schooling levels, but staysconstant throughout the life-cycle. The age 6 pecuniary value from attaining S years of schoolingis

V(S; 6, h0) = max{n(a),m(a)}

{−∫ 6+S

6e−r(a−6)m(a)da +

∫ R

6+Se−r(a−6)h(a) [1− n(a)] da

}(16)

subject to

h(a) =

zh(a)α1 m(a)α2 , for a ∈ [6, S),

z[n(a)h(a)]αW , for a ∈ [S, R),

h(6) = h0 = bzλhνP. (17)

Since ξ only affects an individual’s desire to remain (or not) in school while having no directeffect on earnings, the inclusion of tastes for schooling allows the model to flexibly account forunobserved heterogeneity in schooling-earnings relationships that do not solely rely on economicfactors (human capital and learning ability). This not only helps account for the data but also tiesour hands to not label everything as ability selection or parental spillovers.

The only changes we have made in addition to the tastes for schooling is to explicitly split theschooling and working phase, during which the human capital accumulation technology differs.Schooling only involves goods inputs while OJT only involves time inputs. The technology inthe schooling phase is identical to the simple model with n(a) = 1, while the working phase isidentical to the the simple model with α2 = 0. We also allow αW , the returns to human capitalinvestments during the working phase, differ from the schooling phase. The parameter b thatmultiplies initial human capital captures the overall level of human capital in the model, whilewe have dropped the wage rate w since it is not separately identified from b in our partial equilib-rium setup (i.e., it is not separately identified from units of human capital without modeling thedemand for labor).

At age 6, an individual is completely characterized by{

hP, z, ξ}

. If the schooling choice werecontinuous, we cannot derive closed form solutions for schooling and earnings as in the simplemodel,20 but when S is given exogenously the resulting profile of earnings can be characterizedusing similar methods. In Appendix B, we characterize to solve (16) and the equations governingearnings given {hP, z, S}, and describe how a solution is found numerically in Appendix C.

20This is primarily because in general, the supply of labor, 1− n(a), jumps from 0 to a strictly positive amount oncean individual begins to work, unlike in the simple model where it increases continuously over time.

24

Page 25: The Causal Effect of Parents’ Education on …Of course, because the child’s schooling choice is also endogenous, the reduced form evidence does not tell us what the exact model

4.2 Population Distribution Assumptions

Our dataset contains information on parental schooling, children’s schooling as well as completeearnings profiles of children. Since we only have information on the schooling of he parent, we ap-proximate the parent’s human capital, or earnings, of the parent by a standard Mincerian equationrelating parental schooling to earnings,

hP = a exp(βSP). (18)

The only usage of β is to normalize parents’ schooling and transform it into human capital unitsbefore applying them to the children’s initial condition (17). Since the effect we are interested inis the causal effect of increasing mother’s schooling by one year on children’s earnings, this nor-malization is innocuous.21 The coefficient a is normalized to 1 since it is not separately identifiedfrom b in the child’s initial human capital (17).

The population distribution of{

hP, z, ξ}

are parametrized as follows. First, we assume that(log hP, log z) are joint normal, specifically[

log hP

log z

]∼ N

([µhP

µz

],

[σ2

hPρzhP σhσz

ρzhP σhσz σ2z

]).

Given the relationship hP = exp(βSP), we know hP once we know SP and β. The distribution ofSP is taken from the empirical p.m.f. of mother’s schooling in the HRS, so it is observed in discreteyears ranging from 0 to 16. Then for each mother’s schooling level and corresponding hP, ourdistributional assumptions imply

log z| log hP ∼ N(

µz + ρzhP

σz

σhP

(log hP − µhP) , σ2z(1− ρ2

zhP

)). (19)

For each combination of {hP, z, S ∈ {8, 10, 12, 14, 16, 18}}, we solve the model numerically as de-scribed in Appendix C. This induces the optimal life-cycle earnings for any given initial condition(hP, z) and choice of S.

Schooling choices are determined as follows. We incorporate a very rich structure for taste het-erogeneity to account for a large range of unobserved heterogeneity. First, the tastes for schoolingξ vary not only with the schooling levels but also parent’s human capital hP and ability z:

ξS ≡ δS (γhP hP + γzz) + ξS. (20)

where ξ ≡ [ξS] is 6-dimensional logit. The constants γhp and γz captures the correlation betweenpreferences for schooling, and parents’ human capital hP and ability z, respectively. We normalizeδ8 = 0 and δ10 = 1, so that only δS, S ∈ {12, 14, 16, 18} need to be estimated.22

A nested logit model is used to model the unobserved taste heterogeneity ξ. Schooling de-21If β were not included and parent’s human capital is exp(SP), the estimated spillover would be βν.22Other normalizations would be colinear.

25

Page 26: The Causal Effect of Parents’ Education on …Of course, because the child’s schooling choice is also endogenous, the reduced form evidence does not tell us what the exact model

cisions are nested depending on college-entry, i.e. the distributions of tastes for S ∈ {8, 10, 12}and S ∈ {14, 16, 18} are nested. The vector ξ is drawn from a 6-dimensional, generalized extremevalue distribution with c.d.f. G and scale parameter σξ :

G(ξ) = exp{−[exp

(−ξ8/σξζh

)+ exp

(−ξ10/σξζh

)+ exp

(−ξ12/σξζh

)]ζh

−[exp

(−ξ14/σξζc

)+ exp

(−ξ16/σξζc

)+ exp

(−ξ18/σξζc

)]ζc}

where (1− ζh, 1− ζc) ∈ [0, 1] measures the correlation within each nest. Now let

uS ≡ V(S; 6, h0) + δs (γhP hP + γzz)

for S ∈ {8, 10, 12, 14, 16, 18}. Nesting yields the following conditional choice probabilities (CCP),given (hP, z):

Pr(S = 8) = Pr(S = 8|S ∈ {8, 10, 12}) · Pr(S ∈ {8, 10, 12}) (21a)

Pr(S = 8|s ∈ {8, 10, 12}) =exp

(u8/σξζh

)exp

(u8/σξζh

)+ exp

(u10/σξζh

)+ exp

(u12/σξζh

) (21b)

Pr(S = 14|S ∈ {14, 16, 18}) =exp

(u14/σξζc

)exp

(u14/σξζc

)+ exp

(u16/σξζc

)+ exp

(u18/σξζc

) (21c)

and

Pr (s ∈ {8, 10, 12}) = (22)[exp

{u8

σξ ζh

}+ exp

{u10

σξ ζh

}+ exp

{u12

σξ ζh

}]ζh

[exp

{u8

σξ ζh

}+ exp

{u10

σξ ζh

}+ exp

{u12

σξ ζh

}]ζh+[exp

{u14σξ ζc

}+ exp

{u16σξ ζc

}+ exp

{u18σξ ζc

}]ζc.

4.3 Generalized Method of Moments

This sub-section proposes a simple estimation of our model using the method of moments. Wetake advantage of uor closed-form solution, which make calculating the model moments compu-tationally easy. The schooling and earnings moments of the extended model can be computedexactly subject only to numerical approximation error (See the Appendix C for details). We mini-mize the distance between sample moments and theoretical moments with respect to the model’sparameters. While we could derive the likelihood of the model, we choose to use the method ofmoments because it allows us to derive identification from key features of the data we believe ourought to match precisely. A likelihood estimator would attempt to fit other dimensions of the datawhich our stylised model is not designed to match.

There are 27 parameters in total, which we denote by a vector θ. We partition θ into three

26

Page 27: The Causal Effect of Parents’ Education on …Of course, because the child’s schooling choice is also endogenous, the reduced form evidence does not tell us what the exact model

Table 3: Parameters Set a PrioriParameter Value Description

r 5% interest rate, after-tax rate of return on capital in Poterba (1998)w 1 Wage rate per human capital unit, normalizationR 65 Retirement age, exogenousa 1 Normalization (see text)β 6% Mincer return to schooling, HRS AHEAD cohorts

µhP 0.55 Mean human capital of mothers, from data (see text)σhP 0.22 Standard deviation of mothers’ human capital, from data (see text)δ8 0 Taste for 8 years of schooling, normalizationδ10 1 Taste for 10 years of schooling, normalization

* See text for more details.

vectors, i.e. θ ≡ [θ0, θ10, θ11], where

θ0 = [r, w, R, β, µhP , σhP , a, δ8, δ10]

θ10 = [α1, α2, αW , ν, λ, b, ρzhP , µz, σz]

θ11 =[σξ , γhP , γz, ζh, ζc, δ12, δ14, δ16, δ18

].

The first partition, θ0, are parameters that are set a priori. The rest of the parameters, θ1 ≡ [θ10 θ11],are from the simple model and the taste structure in the generalized model, respectively, and areestimated by GMM.

Parameters Set a Priori The interest rate and wage per human capital unit are fixed at constantlevels, but Heckman et al. (1998) find that this misspecification barely affects a similar humancapital production technology that they estimate for the NLSY79. We follow their procedure andfix the interest rate at 0.05, which is in the range of the after-tax rate of return on capital reported inPoterba (1998).23 The wage w is normalized to 1, because it cannot be separately identified from b,the constant multiplying the initial human capital of children. Since human capital in our modelis essentially efficiency wage units, without a demand side for human capital we cannot separatethe average level of human capital from the wage. The retirement age, R, is fixed at the statutoryretirement age of 65.

The coefficient β is recovered from running a standard Mincer regression similar to (15) for theHRS AHEAD cohorts, only without including mother’s schooling. This regression is kept simplesince the purpose is to induce a statistical distribution of earnings from the schooling distribu-tion, including all endogenous effects. Technically, the parameter a would be the constant whenrunning this regression, but as is the case with w, is not separately identified from b and hencenormalized to 1. The β coefficient is quite stable across cohorts, ranging from approximately 0.04

23On the other hand, we did not explicitly model taxes on earnings. This is innocuous if earnings were taxed at aflat rate. Since we abstract from uncertainty, a more complicated tax structure would also not affect our results muchas long as relative lifetime earnings remain similar.

27

Page 28: The Causal Effect of Parents’ Education on …Of course, because the child’s schooling choice is also endogenous, the reduced form evidence does not tell us what the exact model

to 0.06 for men and 0.05 to 0.09 for women; we fix β = 0.06. This value is not very differentfrom the coefficients we recover from the the original HRS cohorts in Table 2 which includes morecontrols; our estimates are not sensitive to different values of a β within this range.

Given hP = exp (βSP), we have log hP = βSP. Thus µhP = βµSP and σhP = βσSP . We take themean and variance of mother’s schooling, µSP and σSP , directly from their sample analogs in thedata. Hence the only parametric assumption we are imposing by assuming that SP is Gaussian isits correlation structure with z. Since µSP = 9.24 and σSP = 3.60, we then obtain µhP = 0.55 andσhP = 0.21.

The tastes for 8 and 10 years of schooling, δ8 and δ10, are normalized to 0 and 1, respectively.These two parameters are not separately identified from the other taste parameters (namely, γhP ,γz and σξ). The entire list of exogenously fixed parameters are summarized in Table 3.

Estimated Parameters We are left with 18 parameters to be estimated, the model and taste pa-rameters θ1 = [θ10 θ11]. These parameters are chosen by GMM so that the model reproduces a setof empirical moments of interest. The moments used in the estimation are tabulated in the last5 columns of Table 4. The moments of interest for us are schooling and earnings outcomes bylevel of mother’s schooling. Since we constrain schooling choices to lie on 6 grid points, ratherthan targeting average years of schooling we target the probability of attaining high or low levelsof schooling by 6 levels of mother’s schooling. For each of these 12 groups, we construct averageearnings for ages 25,30,35 and 40, which are in turn computed by simply averaging an individual’searnings from ages 23-27, 28-32, and so forth.

For each level of mom’s schooling, 1 of the 2 educational attainment shares of the childrenare dropped (since they add up to 1). All average earnings are normalized by the lowest level ofaverage earnings, i.e. the age 25 average earnings of children with less than 12 years of schoolingand 5 or less years of mom’s schooling, which is also dropped. We also include four additionalmoments: the correlation between S and SP, the OLS coefficient from regressing S on SP, and theMincer regression coefficients on S and SP from specification (2) of Table 2. These are includedto capture the earnings and schooling gradients in the data we may miss by targeting aggregatedmoments. In sum, we have 57 moments to match with 18 model parameters.

Denote these target moments by Ψ. We first compute the variance-covariance matrix of Ψusing 2000 bootstrap repetitions. For an arbitrary value of θ1, we numerically compute the im-plied model moments, Ψ(θ1), as described in Appendix C. The parameter estimate θ1 is found bysearching over the parameter space Θ1 to find the parameter vector which minimizes the criterionfunction:

θ1 = arg minθ1∈Θ1

(Ψ−Ψ(θ1)

)′W (Ψ−Ψ(θ1)

)where W is a weighting matrix. This procedure generates a consistent estimate of θ1. Following (?),we use a diagonal weighting matrix W = diag(V−1), where V is the variance-covariance matrix ofthe sample auxiliary parameters. This weighting scheme allows for heteroskedasticity and it has

28

Page 29: The Causal Effect of Parents’ Education on …Of course, because the child’s schooling choice is also endogenous, the reduced form evidence does not tell us what the exact model

Table 4: Targeted Empirical Moments

Mom’s Fraction Child’s Average Fraction Average Earnings at ageSP (%) S S (%) 23-27 28-32 33-37 38-42

≤5 12.75≤11 6.35 60.04 1.00 1.41 1.71 1.85≥12 13.12 39.96 1.26 1.91 2.32 2.51

6-7 12.30≤11 8.02 42.48 1.10 1.66 1.98 2.11≥12 13.53 57.52 1.27 1.91 2.37 2.65

8 21.76≤12 10.72 66.23 1.35 1.99 2.42 2.62≥13 15.21 33.77 1.31 2.12 2.71 2.99

9-11 13.77≤12 11.00 63.73 1.37 1.98 2.38 2.57≥13 15.24 36.27 1.30 2.15 2.65 2.96

12 30.00≤12 11.21 45.41 1.52 2.24 2.62 2.77≥13 15.38 54.59 1.36 2.29 2.87 3.34

≥13 9.43≤12 11.41 19.56 1.39 1.96 2.33 2.49≥13 15.83 80.44 1.30 2.23 2.82 3.27

(S, SP) correlation and OLS: 0.48 0.46Mincer coefficients (β1, β2): 0.08 0.02

Note that for mom’s with low SP (the first four rows), we divide whether the child’s educational attainment was low orhigh by whether or not he graduated from high school, while for the rest by whether he advanced beyond high school.In the third column, S denotes the average years of schooling attained in each category. All average earnings arenormalized by the average earnings from 23-27 of the group with less than 12 years of schooling whose moms attained5 years or less of schooling. (S, SP) OLS denotes the coefficient from regressing S on SP, and the Mincer coefficients arefrom specification (2) in Table 2.

better finite sample properties than the optimal weighting matrix (see (Altonji and Segal, 1996)).The minimization is performed using a Nelder-Mead simplex algorithm. While this method doesnot guarantee global optima, we used several thousand different starting values to numericallysearch over a wide range of parameter values (most of which have naturally defined boundaries).We use asymptotic standard errors for the parameter estimates, obtained from

√N(θ1 − θ∗1 )→

(G′WG

)−1 G′WVWG(G′WG

)−1

as N approaches ∞, where θ1 is the estimate, θ∗1 is the unknown, true parameter, and N = 5114is the size of our sample. Let P = 18, M = 57 denote the number of parameters and moments,respectively. Then G is the M× P Jacobian matrix of Ψ(θ1) with respect to θ1, which we computenumerically, and V is the variance-covariance matrix of Ψ.

Identification Identification is achieved by a combination of functional forms and parametricassumption. It is hard to formally prove as is usual with this class of models. However, our choiceof moments is guided by intuition of how certain moments should have more influence on certainparameters, as summarized in Table 10 in the appendix.

Although we showed identification of (ν, λ, ρzhP) in Corollaries 2-3 with the simple model, we

29

Page 30: The Causal Effect of Parents’ Education on …Of course, because the child’s schooling choice is also endogenous, the reduced form evidence does not tell us what the exact model

cannot get as tight predictions in the extended model. This is because even though incorporat-ing taste heterogeneity and different human capital accumulation technologies in the school andworking phases may seem like minor extensions, closed form solutions do not exist for a generalchoice of parameters.

Nonetheless, the choice of target moments in Table 4 is guided by intuition gained from thesimple model. According to the corollaries, the most important parameters of interest, (ν, λ, ρzhP),are identified by earnings levels by mother’s schooling levels, earnings levels by children’s school-ing levels, and the Mincer coefficient on mother’s schooling in regression (2) of Table 2. We showthat this intuition carries over to our extended model through counterfactuals in Table 7.

The identification argument for the exponents on the human capital production functions,(α1, α2, αW), are similar to other studies using Ben-Porath type functions. Since αW governs thespeed of human capital growth in the working phase, it is identified by the slopes of the averageage-earnings profile. Conversely, (α1, α2) determines at what level of earnings an individual be-gins his working phase. When early age earnings are lower, the slope of the age-earnings profilemust be higher given a level of lifetime earnings. For high values of α ≡= α1 + α2, individualswith high S will have flatter earnings profiles, since they will have higher early age earnings. Forhigh values of α1, individuals with high SP will have flatter earnings profiles, since they will bene-fit more from higher age 6 human capital levels in the schooling phase and thus have higher earlyage earnings.

The parameter b controls the average level of age 6 human capital, so with higher valuesschooling becomes less important for all individuals uniformly. Given an average level of school-ing, the 6 taste parameters δS for S ∈ {12, 14, 16, 18} (δ8 and δ10 are normalized) and (ζh, ζc) shouldperfectly account for the shares of individuals choosing the 6 schooling levels, S ∈ {8, 10, 12, 14,16, 18}. Given an overall variation in schooling across all groups controlled by σξ , the parame-ters (γhP , γz) is identified by how educational attainment varies across mother’s schooling andindividual earnings levels.

5 Results

Table 5 reports the 18 parameter estimates and their asymptotic standard errors. The model gen-erated educational attainment shares (empirical counterparts in fourth column of Table 4) arematched nearly exactly, as well as the 4 additional gradient moments (in the lower panel of Table4 and first panel in Table 7). The earnings moments are compared with the data visually in Figure6 in the Appendix D.

5.1 Interpreting the Parameters

Human Capital Production The parameters that govern human capital production, (α1, α2, αW)

are in the lower range of estimates found in the literature that use comparable Ben-Porath humancapital models, e.g. Heckman et al. (1998) estimate αW = 0.9 in the NLSY79 for post-schooling

30

Page 31: The Causal Effect of Parents’ Education on …Of course, because the child’s schooling choice is also endogenous, the reduced form evidence does not tell us what the exact model

Table 5: Parameter EstimatesHC prod Spillovers “Ability”

α10.258

ν0.778

ρzhP

0.229(0.001) (0.010) (0.004)

α20.348

λ0.060

µz-1.198

(0.005) (0.026) (0.012)

αW0.426 b 0.881

σz0.117

(0.004) (0.026) (0.001)

Tastes Taste levels

σξ0.612

ζh0.266

δ141.894

(0.017) (0.007) (0.008)

γhP

1.551ζc

0.846δ16

2.699(0.036) (0.010) (0.021)

γz0.549

δ121.646

δ184.198

(0.027) (0.004) (0.021)

*Standard errors in parentheses.

human capital production. This may have to do with the fact that the HRS cohort lived in aperiod in which observed education returns were much lower, for example, the college premiumwas about 40% prior to the 1980s rising to above 100% in 2000. The returns to human capitalinvestment are slightly larger in school (α1 + α2 = 0.606) than on-the-job (αW = 0.426). Thismeans that for purposes of human capital accumulation, an individual would prefer to stay inschool rather than work.

Parental Spillover and Early Childhood The magnitude of ν seemingly implies large spillovers—a mom with 10 percent higher human capital has a child with 8 percent higher initial human cap-ital, controlling for ability and taste selection. More simply put, increasing mom’s schooling by 1year increases her child’s initial human capital by 12.3%. On the other hand, the estimated λ isboth small and almost insignificant. This indicates that parents are much more important than thechild’s learning abilities for early human capital formation.24

Although the estimated ν is large, note that by construction of the model, it must encom-pass any causal input that influences the child’s human capital that can be explained by mother’sschooling.25 For example, if the child’s human capital is sensitive to early childhood investmentsas in Cunha and Heckman (2007), it would imply a large ν that summarizes the compounded ef-fects of dynamic complementarity in their model. But also note that the impact from increasingmom’s hP is attenuated later in the life, because children with higher h0 (age 6 human capital)accumulate less human capital in school and onward.26 This is for two reasons: i) the human cap-

24The only boundaries we imposed on in the estimation were that α, αW ∈ (0, 1), to guarantee an interior solutionfor both schooling and working times, and ν, λ > 0.

25The effects coming from its correlations with abilities and tastes are selection, not causal.26This is somewhat reminiscent of the Perry early intervention program initially boosting children’s cognitive skills,

31

Page 32: The Causal Effect of Parents’ Education on …Of course, because the child’s schooling choice is also endogenous, the reduced form evidence does not tell us what the exact model

ital accumulation technology displays decreasing returns, so they accumulate less human capitalwithin any given time period, and moreover ii) schooling time decreases, i.e., they use the highreturns technology (α > αW) for a shorter amount of time (Proposition 1). As we show soon, thecausal effect of an additional year of mom’s schooling is 1.2%, much smaller than 12.3%, and mostof this comes from early labor market entry rather than an increase in life-cycle earnings.27

One may also question that we recover a large estimate because we have assumed that parentsonly have a level effect, i.e., that hP only has a causal effect on age 6 human capital but not learningabilities z. As we discussed in footnote 7, the problem with including such a slope effect is that it isnot separately identified from ρzhP in the simple model of Section 2. We have run several numericalsimulations with the extended model to verify that assuming a slope spillover has small influenceon all other parameter estimates except ρzhP .28 This implies that a slope spillover would onlycrowd out selection effects, so our results can be viewed as a conservative lower bound estimatefor parental spillovers.

Selection on Learning Abilities The estimate for ρzhP implies that on average, mothers with 1standard deviation of schooling above the population mean have children with learning abilities0.23 standard deviations above the population mean. Given the empirical estimate of σSP and themodel estimated σz, this means that mothers with 1 more year of schooling have children with0.7% higher abilities.

Unlike the spillover, this is a permanent difference that sustains through life. The impact onearnings at all ages, according to Lemma 4 in Appendix B, is similar to what we found in thesimple model in Corollary 1, and can be approximated by ∆ log z/(1− αW), approximately 1.3%.However, because high z children also go to school longer, the impact on lifetime earnings is a bitless at 1% on average, as we soon show in our counterfactuals. Following similar calculations, astandard deviation of 0.117 for z translates into σz/(1− αW) = 0.204 or a 20.4% standard deviationin earnings, once controlling for schooling.

The estimates for (ρzhP , σz) can also be used to put a lower bound on the intergenerationaltransmission of abilities. To this end, suppose that (2) takes the form

log zPi = µzP + ρP ·σzP

σhP

(log hPi − µhP) + εPi

where ρP is the correlation between the human capital and ability of the parent. If we furtherassume that the cross-sectional distribution of abilities and correlation structure between humancapital and abilities remain stable over generations, we can impose ρP = ρ, where ρ is the cor-

but the effect quickly dissipating during school years.27A large ν may also indicate a large degree of intergenerational altruism. Although we have abstracted from in-

dividuals internalizing the spillover, if the underlying environment is one in which it is internalized, parents of allgenerations would invest more in their children. Then, ν would be capturing a composite of spillovers (the causaleffect of parents on children’s earnings) and altruism (how much parents care about children). For our purposes, how-ever, ν would still capture the causal effect as long as altruism does not vary much across the mother’s education.

28Analytical results from the simple model and simulated results from the extended model are available upon re-quest.

32

Page 33: The Causal Effect of Parents’ Education on …Of course, because the child’s schooling choice is also endogenous, the reduced form evidence does not tell us what the exact model

Table 6: Schooling Benefits, Average

S = 8 S = 10 S = 12 S = 14 S = 16 S = 18

Present value earnings by ability quartile

Q( 0) 224,000 281,236 352,036 378,771 379,583 337,408

Q( 1) 211,681 247,796 267,454 246,399 242,531 242,651Q( 2) 255,637 292,894 314,487 297,382 295,694 300,278Q( 3) 286,640 326,009 356,178 342,649 342,001 347,917Q( 4) 322,800 362,776 407,678 453,431 459,233 447,386

Nonpecuniary benefits correlated with hP by ability quartile

Q( 0) - 5,097 9,560 10,343 15,653 28,297

Q( 1) - 5,297 10,127 10,220 15,468 28,159Q( 2) - 5,079 9,950 10,304 15,605 28,368Q( 3) - 4,782 9,659 10,296 15,614 28,407Q( 4) - 4,369 8,945 10,406 15,734 28,285

Nonpecuniary benefits correlated with z by ability quartile

Q( 0) - 324 598 717 1,029 1,524

Q( 1) - 301 513 576 824 1,296Q( 2) - 332 562 638 915 1,450Q( 3) - 354 603 687 986 1,562Q( 4) - 377 650 790 1,139 1,766

PDV pecuniary value at age 6, in 2000 USD. Non-pecuniary benefits are all relative to S = 8.

relation between (h, z) of the children in our estimation, and σzP = σz. Using lifetime income asa proxy for h, our model implies a correlation of ρ = 0.794 between the children’s h and z. Theimplied correlation of abilities across generations in our model is then

ρz =ρzhP

ρP· σz

σzP

≈ ρzhP

ρ= 0.232.

Since it is unlikely that the independence assumptions (1)-(3) hold exactly in reality, i.e., log z| log hP

is likely still correlated with log zP, this number should be viewed as a lower bound.

Tastes for Schooling The relative magnitudes of the constants δS measure the overall non-pecuniarybenefits to each schooling level S ∈ {12, 14, 16, 18}. For example, the relative non-pecuniary ben-efit from 12 as opposed to 10 years of schooling is more than 4 times as large as getting somecollege education (14 years) rather than starting work after high school. But a college degree (16years) comes with a substantial benefit, while continuing further (18 years) comes with a benefitas large as graduating from high school. These values imply that the large number of high school

33

Page 34: The Causal Effect of Parents’ Education on …Of course, because the child’s schooling choice is also endogenous, the reduced form evidence does not tell us what the exact model

graduates and college graduates in the data are hard to justify in our model with continuous het-erogeneity in learning abilities based only on pecuniary gains, so to some extent the tastes forschooling are capturing individuals conforming to the exogenous schooling conventions.

Recall that (1− ζh, 1− ζc) are a measure of correlation of tastes for schooling levels for highschool and below, and some college and above. Hence, tastes for staying in high school are muchmore correlated than in college. To some degree, this must explain the small jump to δ14 from δ12,since the two nests are uncorrelated and the possibility of choosing across nests partially compen-sates for smaller non-pecuniary benefit. But it also implies that some college is associated witha large enough pecuniary return so that the non-pecuniary benefits need not be large, and con-versely, the large jumps at δ12 and δ16 must also reflect that the pecuniary returns are not largeenough to justify stopping at 12 or 16 years of schooling. This is in contrast to recent trends inwhich some non-pecuniary costs and even larger costs are needed to rationalize some college ed-ucation and 4-year college graduation (Taber, 2001), but consistent with the low college graduatepremium prior to the 1980s.

The discrete schooling choices in our model are bounded between 8 and 18 years. Althoughyears of schooling in the HRS are top-coded at 17 years, if this were a problem top-coded indi-viduals would have had much higher earnings than lower years, and δ18 would not need to jumpfar beyond δ16. But the large value of δ18 implies that in fact the pecuniary gains from graduatedegrees or staying in college for more than 4 years are so low that a very large non-pecuniary gainis needed to justify such a choice. In contrast, the large value of δ10 = 1 may simply imply that thepecuniary returns to schooling are large at years below 8 in the data, but that we are missing thatfeature by imposing a minimum of 8 years of schooling.

The most interesting parameters for our purposes are (γhP , γz) that govern correlation of thetastes for schooling with the population heterogeneity in (hP, z). Note that γhP is much larger thanγz, implying that non-pecuniary or non-cognitive motives for staying in school are much moreinfluenced by parents than by a child’s learning ability. Fast-learning children tend to like schoolmore, but the major determinant is the mother, or more broadly the family background. This con-forms to the notion that highly educated mothers are more likely to provide a family environmentconducive for longer schooling, and also inculcate in their children a higher motivation to advancefurther in education.

Notice that such a desire for schooling necessarily decreases lifetime earnings, since the pe-cuniary gains would be higher in the absence of the non-pecuniary gain by definition of lifetimeincome maximization. But schooling cannot be a bad thing for future earnings; since more humancapital is accumulated with a better technology than when working, later wages will always behigher. Hence, the lost earnings differential that is made up for by the non-pecuniary gain fromstaying in school longer must come from later labor market entry. So the tastes for schooling canalso be thought of as putting a larger weight on future rather than the present discounted valueof earnings, and the large value of γhP may indicate that mother’s with more schooling push theirchildren to stay in school to be better off later in life, rather than earn a small amounts as a teenager.

34

Page 35: The Causal Effect of Parents’ Education on …Of course, because the child’s schooling choice is also endogenous, the reduced form evidence does not tell us what the exact model

Table 7: Counterfactual Effect of Spillover and Correlation Parameters.

CorrS OLSS β1 β2

Data 0.469 0.444 0.076 0.017Model 0.494 0.419 0.076 0.018

Structuralν = 0 0.656 0.487 0.072 0.012ρzhP = 0 0.461 0.387 0.078 0.004ρzhP = ν = 0 0.643 0.464 0.074 -0.003

TastesγhP µhP -0.454 -0.381 0.076 0.025γzµz 0.492 0.415 0.075 0.018γzµz, γhP µhP -0.461 -0.385 0.074 0.025

A barred variable x implies that we set the value of the x, which is heterogeneous across the population, to is meanvalue. CorrS and OLSS denote, respectively, the correlation between S and SP, and the OLS coefficient from regressionS on SP. (β1, β2) are the coefficients on (S, SP) from the Mincer regression in column (2) of Table 2.

6 Counterfactuals

We conduct three counterfactuals:

1. Shutting down key parameters to verify the sources of intergenerational transmissions,

2. The main counterfactual of decomposing a one-year increase in mother’s education, and

3. A counterfactual compulsory schooling reform.

Counterfactual effects on moments To interpret the effect of spillovers and ability selection inlight of our empirical analysis from Section 3, it is useful to see how the gradient moments areaffected when shutting down the key parameters (ν, ρzhP).

29 These results are shown in the secondpanel of Table 7. The third panel tabulates the counterfactual changes in the same moments whencontrolling for selection on tastes for schooling. The first and second rows labeled γhP hP and γz zdenote the cases where we keep all else equal and set

ξ(S) = δS(γhP µhP + γzz) + ξ(S), ξ(S) = δS(γhP hP + γzµz) + ξ(S)

instead of (20), and the third row is when there is no selection on either (hP, z).Both ν and ρzhP do little to affect β1, the Mincer schooling coefficient, and shutting down ν

has only a small effect on the Mincer parental coefficient β2 as well. As expected, β2 is mostlyaffected by ρzhP than anything else. This is in line with our intuition from Section 2 that β2 capturesselection rather than spillovers, which pervades also into our general model. But while abilityselection explains much of the linear parent’s schooling-earnings relationship, it does little to affect

29We have tried similar exercises with λ, which had small effects because the estimate it small and less imprecise.

35

Page 36: The Causal Effect of Parents’ Education on …Of course, because the child’s schooling choice is also endogenous, the reduced form evidence does not tell us what the exact model

the intergenerational schooling relationship. In fact, when both ρzhP and ν are set to zero, schoolingpersistence becomes even higher, while it should be zero according to (11) in the simple model.

This indicates that observed intergenerational schooling persistence is mainly a result of un-observed heterogeneity in tastes for schooling. Specifically, it is this and the countervailing forcefrom ν that explains both the schooling correlation and OLS coefficient. When ν = 0, the parentallevel affect disappears, inducing high ability individuals (whose parents tend to have higher levelsof schooling) to increase their length of schooling, which in turn increases the correlation of school-ing across generations. When taste selection on mother’s schooling is shut down (γhP µhP ), chil-dren of high human capital parents (who tend to have higher levels of schooling) no longer havea desire to remain in school longer, and both CorrS and OLSS essentially become negative. Thisimplies that selection on tastes dominates and selection on abilities for schooling: even thoughchildren with high z would spend more time in school, they do not once tastes are shut down.Taste selection on learning abilities has little effect on all moments as can be seen in the row γzz;this is somewhat expected since γhP is much larger than γz; indeed when both are shutdown, thenumbers are more or less identical to when only γhP is shut down.

One interesting result that is not predicted by our theoretical or reduced form analysis is thatin all cases where we shut down taste selection, β1 is unaffected while β2 increases by a third.Remember that the role of tastes for schooling is essentially neutral for earnings outcomes (theeffect of own education on earnings is unchanged), but since without taste selection schooling isexplained by (hP, z) only, the parental effect on earnings (β2) becomes larger. This is because nowindividuals diverge less from the schooling levels that would maximize their lifetime incomes.

6.1 Decomposing Spillover from Selection

Reduced form prediction Having confirmed that our intuition from Sections 2-3 carries over,we can apply the GMM estimates to Corollaries 2-3 and compare the model-predicted values ofb2 to the regression coefficients on SP in columns (2) and (9), Table 2. This gives us the model anddata predicted ability selection and spillover effects from having a 1-year more educated mother,according to the simple model. Since log hP = βSP, the b2’s in both corollaries must be multipliedby β = 0.06 to be comparable with the regression coefficients, and since we have different α’s inthe extended model, we can get a range by computing

selection : βb2 =βρzhP /(1− α) =

0.035 if α = α1 + α2

0.024 if α = αW

(23)

spillover : βb2 =βν/ [1− λ(1− α)] ≈ 0.048 for both cases. (24)

Both the implied ability selection and spillover coefficients, in particular the latter, are larger thanits reduced form estimate of 0.017. All else equal, tastes for schooling induce individuals to stay inschool at the the detriment of lifetime earnings. To make up for this and explain a 1.7% observed,reduced form return, the value of ρzhP must be larger than what would be implied by the simple

36

Page 37: The Causal Effect of Parents’ Education on …Of course, because the child’s schooling choice is also endogenous, the reduced form evidence does not tell us what the exact model

model. Conversely, if we ignore taste heterogeneity as in (23), the implied effect of ability selectionon earnings will be counterfactually high. On the other hand, as we saw above, selection on tastesis what generates the large intergenerational schooling correlation, which needed to be moderatedby a negative effect on schooling coming from a large ν. So again, if we were to ignore the tasteheterogeneity as in (25), the implied spillover effect on earnings will be counterfactually high. Theestimated ν has to be larger to moderate the direct effect of tastes on schooling, while the effect oftastes on earnings is only indirect, requiring only a modestly larger estimate for ρzhP .

The relative effects of (ρzhP , ν) can also be understood intuitively from (11). We cannot make adirect comparison without explicitly computing the function F(·), but given that tastes for school-ing generate a very high persistence in schooling as evidenced in Table 7, the estimate for ρzhP

must remain small; otherwise the correlation and OLS coefficient would become even larger. Onthe other hand, a large value of ν is needed to moderate the large, positive impact that tastes haveon schooling.

Counterfactual prediction The simple model predictions, while helpful for identification, doesnot tell us the causal effect of increasing mother’s education on children’s earnings (and school-ing). But we can simulate the effect through counterfactuals using the estimated model. Given theindividual state (hP, z, ξ) at age 6, we compute the change in the child’s schooling and earningsoutcomes in response to a 1-year increase in SP by 1 year, which translates into an increase of β

units of hP in logarithms. We do this in several ways to control for spillovers and selection effects.First, we hold constant the individual’s (z, ξ), which isolates the pure effect coming only from

the spillover. Next, we let either z or ξ vary with hP as dictated by the distributional assumptionsin Section 4.2; this separately captures the selection effects from each. Finally, we let both z andξ vary together, which we label a “reduced-form" effect—i.e., this is just comparing the averageoutcomes of children with SP years of mother’s schooling, to the average outcome of those withSP + 1 years of mother’s schooling.

Formally, for any initial condition x = (SP, log z, ξ), the model implied schooling and age-aearnings outcomes can be written as functions of x, S = S(x), E(a) = E(x; a).30 Then schoolingand age a earnings following a j-year increase in SP, holding (z, ξ) constant, are

Sjν(x) ≡ S(SP + j, z, ξ), Ej

ν(x; a) ≡ E(SP + j, z, ξ; a). (25)

Selection on abilities and tastes associated with a j-year increase in SP can be written as

∆jz ≡ exp [(ρzhP σz/σhP) · βj]

∆jξ(SP) ≡

{∆j

ξ(S; SP)}

S≡ {δSγhP exp(βSP) [exp(βj)− 1]}S ,

respectively, where ∆jξ(SP) is a 6-dimensional vector for each level of schooling S ∈ {8, . . . , 18}.

The first expression follows since (SP, log z) are joint-normal, and the second from the definition

30Since although the parent variable in the initial condition is hP, it is defined as log hP = βSP in (18).

37

Page 38: The Causal Effect of Parents’ Education on …Of course, because the child’s schooling choice is also endogenous, the reduced form evidence does not tell us what the exact model

Table 8: Aggregate Effect of 1 Year Increase in Mom’s Schooling.

fixed S spillover ability tastes RF

Schooling diff (years) - -0.425 -0.384 0.451 0.479Avg earnings diff (%) 0.007 0.012 0.025 0.002 0.016

The second row denotes the change in the cross section average of the present discounted value of lifetime earnings,in logarithims. The first column holds abilities and tastes constant, while the next two columns let ability or taste alsovary according to their estimated correlations with hP. The column RF is when we allow for both selection on abilitiesand tastes. Schooling OLS in data and estimated model is 0.458 and 0.478, respectively.

of tastes in (20). Then schooling and age a earnings following a j-year increase in SP, includingpartial selection effects on z or ξ, are

Sjz(x) ≡ S(SP + j, z · ∆j

z, ξ), Ejz(x; a) ≡ E(SP + j, z · ∆j

z, ξ; a), (26)

Sjξ(x) ≡ S(SP + j, z, ξ + ∆j

ξ(SP)), Ejξ(x; a) ≡ E(SP + j, z, ξ + ∆j

ξ(SP); a). (27)

Outcomes incorporating all spillover and selection effects following a j-year increase are

Sjr f (x) ≡S(SP + j, z · ∆j

z, ξ + ∆jξ(SP) + ∆j

zξ(z)) (28a)

Ejr f (x; a) ≡E(SP + j, z · ∆j

z, ξ + ∆jξ(SP) + ∆j

zξ(z); a), (28b)

where ∆jzξ(z) ≡

{∆j

zξ(S; z)}

S={

δSγzz[∆j

z − 1]}

Sis a compounded selection effect on tastes that

comes from (z, ξ) being correlated, even conditional on hP. We coin this the “reduced form" effectsince by construction,∫

Sjr f (x)dΦ(SP = SP, z, ξ) =

∫S(x)dΦ(SP = SP + j, z, ξ),

where Φ is the joint distribution over x, and x are dummies for integration.The first row of Table 8 lists the average spillover and selection effects following a 1 year

increase in SP, obtained by integrating the change from S(x) to (25)-(28) over the population dis-tribution Φ. The second row lists the average spillover and selection effects, in log-points (toapproximate percentage changes) on the present-discounted value of lifetime earnings:

log

[R

∑a=14

(1

1 + r

)a−14 ∫E1

k(x; a)dΦ(x)

]− log

[R

∑a=14

(1

1 + r

)a−14 ∫E(x; a)dΦ(x)

]

for k ∈ {ν, z, ξ, r f }.As expected, the spillover has a negative effect on schooling. What may be slightly surprising

is that allowing for selection on abilities only moderates this by 0.039 years (comparing the 1stand 2nd columns) or 0.028 years (3rd vs. 4th columns). As we saw in Table 7 earlier, selection onabilities did not have much of an effect on intergenerational schooling relationships once selectionon tastes were taken into account. Since tastes already induce individuals to stay in school longer

38

Page 39: The Causal Effect of Parents’ Education on …Of course, because the child’s schooling choice is also endogenous, the reduced form evidence does not tell us what the exact model

Figure 2: Lifecycle Effect of 1 Year Increase in Mom’s Schooling.

15 20 25 30 35 40 45 50 55 60 65

-0.12

-0.10

-0.08

-0.06

-0.04

-0.02

0.00

0.02

0.04

0.06

0.08

0.10

0.12

Fixed S No Selection AbilitiesTastes Reduced Form

than what maximizes lifetime earnings, the fact that abilities become higher does not further in-crease schooling much. In addition, since h0 = zλhν

P, some of the desire to increase schooling(since children can learn more in a fixed amount of time) is countervailed by a higher h0 (sincethere is less need to learn when human capital is already high), although this effect is likely smallgiven the small value of λ = 0.06.

As was implied from the previous counterfactual of shutting down the taste correlations withother variables, only when we allow for selection on tastes do we see a positive relationship be-tween mother’s schooling on child’s schooling. This relationship is large (0.876 years), but is mod-erated by the negative spillover effect. Even ignoring selection on abilities, the parental spilloverand selection on tastes generate a level of intergenerational schooling that is close to its empiricallyobserved OLS relationship.

In contrast, the spillover has an approximately 1.2 percent positive effect on lifetime earnings,while selection on abilities have a 1.3 percent effect. We conclude that independently of selectionon tastes, the causal effect of mother’s education on earnings is more or less similar to ability se-lection, i.e., highly ability mothers having high ability children. Selection on tastes have a negativeaverage impact: the increase in lifetime earnings drops by 1 percentage point once we allow forselection (1st vs. 3rd columns). The fact that the effect is negative is expected, since tastes forschooling make individuals deviate from lifetime earnings maximization. What is surprising isthat this effect is so large that it almost dominates the spillover.

However, it is important to remember that this is only an average effect over the entire life-cycle; the earnings effects differ substantially by age, and also across children of mother’s with

39

Page 40: The Causal Effect of Parents’ Education on …Of course, because the child’s schooling choice is also endogenous, the reduced form evidence does not tell us what the exact model

different levels of schooling.31 Such life-cycle effects are depicted in Figure 2. Each line plots

log[∫

E1k(x; a)dΦ(x)

]− log

[∫E(x; a)dΦ(x)

]for k ∈ {ν, z, ξ, r f }. Longer schooling induced by high tastes increase lifetime earnings later inlife (through more human capital accumulated in school), but this is dominated by the foregoneearnings earlier in life. Conversely, the spillover effect comes almost entirely from early labormarket entry, while there is no change in earnings after age 24 (the latest age we allow labormarket entry in the model). This means that on average, children of mothers with less schoolingcatch up with those of mothers with more schooling by staying in school longer. In Figure 7 in theappendix we show the median life-cycle effects, which basically tell the same story.

In Table 4, we saw that although schooling and average earnings increase with mothers’ school-ing level over all, children of mothers with very high schooling (≥ 13 years) earn less than chil-dren of mothers with 12 years of schooling. In fact, their earnings levels are similar to those whosemothers’ schooling is between 9 and 11 years. The implications of this is born out in Figure 3,which depicts the cross-sectional impact of spillovers, abilities and tastes, in response to a oneyear increase in mom’s schooling. The top panel is for schooling, where each bar plots

log[∫

S1k(x)dΦ(SP ∈ M, z, ξ)

]− log

[∫S(x)dΦ(SP ∈ M, z, ξ)

]and the bottom panel for lifetime earnings, which plots

log

[R

∑a=14

(1

1 + r

)a−14 ∫E1

k(x; a)dΦ(SP ∈ Ms, z, ξ)

]

− log

[R

∑a=14

(1

1 + r

)a−14 ∫E(x; a)dΦ(SP ∈ Ms, z, ξ)

],

for k ∈ {ν, z, ξ, r f } and the 6 mothers’ schooling group Ms, in the first column of Table 4. Thestructural impact of spillovers is always negative on schooling and positive on earnings; abilityselection has a positive impact on both; and taste selection always has a positive impact on school-ing at the expense of a negative impact on earnings.

Note that ability selection has a similar impact on schooling and earnings for all levels of SP

(the gaps between the first 2 bars, and the gaps between the next 2 bars). But both spillovers andtaste selection has an increasing impact up to 12 years of SP. In particular, the parental spillover onearnings is increasing by inducing less schooling, despite the human capital technology displayingdecreasing returns. This means that preferences for longer schooling are increasing faster than thespillover in the population. Since high SP children deviate farther from their lifetime earnings

31This may also have to do with the fact that the HRS cohort faced low education returns compared to recent cohorts.In the data, the most highly educated individuals do not earn more than their slightly less-educated counterparts. Asdiscussed in Section 5.1, large, positive tastes for schooling are needed to justify why these individuals obtained highereducation despite the low returns.

40

Page 41: The Causal Effect of Parents’ Education on …Of course, because the child’s schooling choice is also endogenous, the reduced form evidence does not tell us what the exact model

Figure 3: Cross-section Effects of 1 Year Increase in Mom’s Schooling

-5 6-7 8 9-11 12 13--0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

No Selection AbilitiesTastes Reduced Form

Mom's Schooling

Ow

n S

cho

olin

g

(a) Children’s Schooling

-5 6-7 8 9-11 12 13--0.005

0.000

0.005

0.010

0.015

0.020

0.025

0.030

Fixed Schooling No Selection AbilitiesTastes Reduced Form

Mom's Schooling

Life

time

Ea

rnin

gs

Diff

(lo

g p

oin

ts)

(b) (log) Lifetime Earnings

41

Page 42: The Causal Effect of Parents’ Education on …Of course, because the child’s schooling choice is also endogenous, the reduced form evidence does not tell us what the exact model

maximizing choice, an increase in SP controlling for tastes has a more negative causal impact onschooling and larger, positive impact on lifetime earnings.

But in the last cell, this trend is reversed. Children in this category are those with the highesttastes for schooling, and are closer to the upper bound of 18 years of schooling. Their observedchoices deviate less from what maximizes lifetime earnings, and also decreasing returns to humancapital accumulation sets in more strongly. As a result, both the selection and spillover effectsbecome smaller. Conversely, for the same reasons, although children of less educated mothersexperience the smallest impact from a 1-year increase in their mothers’ schooling, the reducedform impact on their earnings is the largest since they are the least affected by preferences forlonger schooling.

Counterfactual Compulsory Schooling Reform We saw above that the fact that even when thecausal effect of mother’s education on children’s earnings is positive, the effect on children’sschooling can be negative. The observed, positive intergenerational schooling correlation is in-stead explained by unobserved heterogeneity in tastes for schooling. Having understood themarginal effects of increasing a parental schooling by 1 year, we now conduct a counterfactualexperiment by imposing a minimum schooling requirement, which is intended to mimic compul-sory schooling reforms that took place in many countries throughout the 20th century. A mini-mum schooling requirement has heterogeneous affects across the population since it only affectsthose parents who would otherwise not attain the required level of schooling, and even withinthis group, the additional number of years that is attained will vary. The goal of the exercise is toshow that a large schooling OLS coefficient is consistent with a small or negative IV coefficient,and the evidence for spillovers would be found in children’s earnings, not schooling.

We simply impose a minimum 8 years of schooling for all parents. The initial level of humancapital in the economy is set to

h0 = bzλhνP = bzλ exp [νβ max {SP, 8}] .

We choose 8 years as the hypothetical schooling requirement because 8 years already was thecompulsory schooling requirements in many U.S. states at the time, or soon after.32 Such a reformwould affect 25% of the individuals in our data, increasing the schooling of their mothers by anaverage of 3.5 years.

For the schooling OLS and IV regressions, we combine samples of two regimes: one withoutthe minimum requirement (our benchmark model) and one imposing the requirement. Theserepresent mother-child pairs pre- and post-reform, respectively. Then we run both an OLS and IVregression on the merged data, using a dummy variable for the different regimes as an instrument.

32In contrast, Black et al. (2005) study the case of Norway, whose compulsory schooling requirement went up from7 to 9 years in the 1960s. Of course, while the location and timing differs (many of the parents in our data would havebeen in school at turn of the 20th century), because the U.S. was a forerunner of public schooling (Goldin and Katz,2007), mothers’ average years of schooling is only 1 year less (10.5 years vs 9.3 years). However, the percentage of thepopulation that would be affected is almost two-fold (12.4% vs 25%).

42

Page 43: The Causal Effect of Parents’ Education on …Of course, because the child’s schooling choice is also endogenous, the reduced form evidence does not tell us what the exact model

Table 9: Counterfactual Schooling Reform

fixed S spillover ability tastes RF

SchoolingOLS - 0.492 0.487 0.458 0.455IV - -0.204 -0.148 0.227 0.261

Avg Earnings diff (%) 0.025 0.028 0.075 0.017 0.066The second row denotes the change in the cross section average of the present discounted value of lifetime earnings, inlogarithms, for only those individuals affected by the reform. The first column holds abilities and tastes constant, whilethe next two columns let ability or taste also vary according to their estimated correlations with hP. The column RF iswhen we allow for both selection on abilities and tastes. Schooling OLS in data and estimated model is 0.444 and 0.412,respectively.

As above, we repeat this exercise for four cases controlling for selection on z and ξ, and includingthe selection effects from z and/or ξ. The top panel in 9 shows the regression results for all cases.The bottom panel shows the change in the present discounted value of lifetime earnings, in log-points (to approximate percentages).

The OLS coefficients are somewhat difficult to interpret, since the constants in the regressionsare also changing. But with low-educated parents no longer in the sample post-reform, the OLSincreases in all 4 cases to a value slightly higher than in the benchmark (0.412). The IV regressions,which measures the average 1-year effect among children whose mother’s became more educateddue to the reform, basically reflects the results from Table 8 qualitatively. The controlled effect isnegative, but of smaller magnitude. The IV coefficient increases when including ability selection(column 1 vs. column 2) but much more when including taste selection (column 1 vs. column3). The reduced form effect is only half of the population reduced form effect following a 1-yearincrease, in Table 8. As discussed there, this is because both the spillover and selection on tasteshas a smaller effect for lower levels of SP, who are the only ones affected by the reform.

We conjecture that this partially explains the puzzling fact that many studies using special datasets on twins, adoptees, or compulsory schooling reforms find a zero or negative effect of parents’schooling on children’s schooling. First, the IV coefficients are small in absolute magnitude incolumns 1 and 3, because children of less educated mothers enjoy less spillovers from their parentsbut also lose less in terms of lifetime earnings due to tastes. Second, the structural effect may infact be negative, as we have argued throughout this paper. The fact that in some studies theeffect is found to be close to zero but not negative, but nonetheless small, can be due to the factthat rather than the schooling tastes for children not being affected at all, as in the hypotheticalschooling reform of column 3, forcing mothers to obtain more schooling does induce their childrento develop higher tastes for schooling to a certain degree. We can interpret this as mothers who areforced to attend school longer, but would not have attained higher levels of schooling otherwise,having some impact on children’s non-cognitive abilities or perception of schooling that help themstay in school, although perhaps not to the extent that mothers who choose to become highlyeducated transmit high tastes for schooling. In our experiment, that would mean that educationfor mothers who would otherwise attain very low levels of schooling could increase the schooling

43

Page 44: The Causal Effect of Parents’ Education on …Of course, because the child’s schooling choice is also endogenous, the reduced form evidence does not tell us what the exact model

Figure 4: Lifecycle Impact of Reform

15 20 25 30 35 40 45 50 55 60 65

-0.06

-0.04

-0.02

0.00

0.02

0.04

0.06

0.08

0.10

0.12

0.14

0.16

Fixed S No Selection AbilitiesTastes Reduced Form

of their children anywhere from -0.2 to 0.2 years.33

The negative structural effect that comes from a quantity-quality trade-off in the effect ofmother’s schooling, and the positive effect that comes from higher tastes for schooling, is a qualita-tively different explanation to explain intergenerational schooling relationships. Previous studiesimplicitly assume that a causal effect should be positive, and have been sought for pecuniary orcognitive answers such as assortative mating. Some studies have posited that increased femalelabor market participation of more highly educated women may have reduced their time spentwith children, thereby reducing their early education levels, but recent data tells us that moreeducated women in fact spend more, not less, time with children in their first formative years oflife. Our findings indicate that in order to understand intergenerational schooling relationships,one should not assume that the causal effect should necessarily be positive, and that we need toinvestigate deeper into the non-pecuniary or information effects that a parent’s education has ontheir children. For example, if schooling has non-cognitive benefits that are not captured by earn-ings, the total causal spillover would be larger than just the pecuniary benefit gained by boostingearly human capital formation. But if our tastes for schooling parameters are capturing childrenbeing over-optimistic about the pecuniary gains from longer schooling, and such beliefs are cor-related with their parents’ education levels, more caution needs to be taken when thinking aboutintergenerational education spillovers.

That such an investigation would be important becomes more apparent when observing thelife-cycle earnings effects. The isolated spillover effect on lifetime earnings is 2.8%, on average,

33This explanation would reconcile ?’s finding of a zero IV with ?’s finding that compulsory schooling reforms inthe U.S. decreased the number of the reform cohort’s children repeating grades.

44

Page 45: The Causal Effect of Parents’ Education on …Of course, because the child’s schooling choice is also endogenous, the reduced form evidence does not tell us what the exact model

although this is in response not to a 1-year increase in mom’s schooling but an average of 3.5years. It is also important to note that the spillover effect is positive through life, as can be seenin Figure 4, unlike in Figure 2 where it became virtually zero at ages 24 and above. The negativeeffect that comes from taste selection is still there, meaning that, if mothers with low education,following a reform, raise their children’s tastes for schooling, we may expect children’s schoolingand lifetime earnings to drop. But the negative effect is smaller in magnitude (although the lifetimeearnings effect is -1.1 percentage points for both Tables 8 and 9, mother’s schooling increases by3.5 years in the latter compared to 1 year in the former). Part of this is because the earnings theycould have made through early labor market entry is smaller for these children, which is evenmore obvious when we look at median earnings in Figure 8 in the appendix. There, at nowhereduring the life-cycle does the reform have a negative effect, not even at early ages when includingselection on tastes.

This has policy implications for public education. On the one hand, if we are only concernedwith the child’s lifetime earnings, or if tastes for schooling represent misinformation, public edu-cation enforcement should be careful not to “over-educate" individuals by over-emphasizing thecausal effect of schooling on earnings. On the other hand, if parents are less concerned aboutsending a teenager to work and more about a child having higher earnings past his mid-20s, pass-ing on non-cognitive skills or beliefs that schooling is important is worthwhile even for children’searnings, as can be seen in Figure 4. It may be even more worthwhile given that the earnings theyforgo as a teenager would be small.

7 Conclusion

In this paper we presented a Ben-Porath human capital model of endogenous schooling and earn-ings to isolate the causal effect of parent’s education on children’s education and earnings out-comes. Despite the positive relationship between the child’s own schooling and earnings, thecausal effect of parent’s schooling on children’s schooling can be negative, even when the causaleffect is positive for children’s earnings. Children of higher human capital parents begin life withhigher human capital themselves, and when schooling is endogenous, they can spend less timein school but still attain the same or higher level of earnings. A simple version of the model wassolved in closed form and its implications compared to empirical evidence in the HRS data.

We argue that without controlling for its quality, schooling outcomes may not be a good indi-cator of intergenerational effects since years of schooling is only a quantity measure. Conversely,although earnings is closer to a quality measure, years of experience must be taken into accountwhen studying parental effects. Indeed in our estimation, the causal spillover effect of parents onchildren’s earnings came mostly from expediting labor market entry. Another important findingfrom the estimated model is that unobserved correlation between mothers’ education and chil-dren’s tastes for schooling is the main determinant of children’s schooling, not the causal effectand not selection on abilities.

The estimated causal effect of mother’s education on children’s lifetime earnings is found to be

45

Page 46: The Causal Effect of Parents’ Education on …Of course, because the child’s schooling choice is also endogenous, the reduced form evidence does not tell us what the exact model

1.2%, compared to 1.3% that can be explained by ability selection across generations. Although notdirectly comparable, our result that the causal effect on earnings is similar to the selection effectis in line with the “nature-nurture" literature, who find that nurture effects are at most similar orless than nature effects.

Since this stems from our consideration of a quantity-quality trade-off in education and un-observed heterogeneity in tastes for schooling, more attention may need to be taken to constructqualitative measures for education attainment and how parents affect the non-cognitive abilitiesof children and their beliefs about individual-specific education returns. There already is a blos-soming literature on estimating major-specific returns to education and how much students knowabout such returns. A separate literature studies the non-cognitive benefits to early childhoodinterventions, how mothers and fathers jointly influence children’s perceptions and changes ofeducational attainment, as well as the effect of borrowing constraints faced by parents with suchyounger children. But as of yet there has been few attempts to link these these findings to in-tergenerational transmissions. A deeper consideration for such factors will surely imply largerreturns for parental spillovers.

46

Page 47: The Causal Effect of Parents’ Education on …Of course, because the child’s schooling choice is also endogenous, the reduced form evidence does not tell us what the exact model

Appendices

A Proofs to Propositions 1, 2, and Corollary 1.

The proof requires a complete characterization of the income maximization problem. While wecan use standard methods to obtain the solution, we do this elsewhere and in what follows simplyguess and verify the value function. For notational convenience, we drop the age argument aunless necessary. We separately characterize the solutions before and after the constraint n ≤ 1 isbinding in Lemmas 1 and 2. Then schooling time S is characterized as the solution to an optimalstopping time problem in Lemma 3. To this end, we further assume that

V(a, h) = q2(a)h + CW(a), for a ∈ [6 + S, R),

V(a, h) = q1(a) · h1−α1

1− α1+ e−r(6+S−a)C(S, hS), for a ∈ [6, 6 + S), if S > 0,

where

C(S, hS) = q2(6 + S)hS + CW(6 + S)− q1(6 + S) ·h1−α1

S1− α1

,

for which the length of schooling S and level of human capital at age 6 + S, hS, are given, and CW

is some redundant function of age. Given the forms of g(·) and f (·), these are the appropriateguesses for the solution, and the transversality condition becomes q(R) = 0. Given the structureof the problem, we first characterize the working phase.

LEMMA 1: WORKING PHASE Assume that the solution to the income maximization problem is suchthat n(a) = 1 for a ≤ 6 + S for some S ∈ [0, R − 6). Then given h(6 + S) ≡ hS and q(R) = 0, thesolution satisfies, for a ∈ [6 + S, R),

q2(a) =wr· q(a) (29)

m(a) = α2 [κq(a)z]1

1−α (30)

h(a) = hS +rw·[∫ a

6+Sq(x)

α1−α dx

]· (κz)

11−α (31)

and

wh(a)n(a)α1

=m(a)

α2, (32)

where

q(a) ≡[1− e−r(R−a)

], κ ≡

αα11 αα2

2 w1−α1

r.

47

Page 48: The Causal Effect of Parents’ Education on …Of course, because the child’s schooling choice is also endogenous, the reduced form evidence does not tell us what the exact model

Proof. Given that equation (5) holds at equality, dividing by (6) leads to equation (32), so once weknow the optimal path of h(a) and m(a), n(a) can be expressed explicitly. Plugging (5) and theguess for the value function into equation (7), we obtain the linear, non-homogeneous first orderdifferential equation

q2(a) = rq2(a)− w,

to which (29) is the solution. Using this result in (5)-(6) yields the solution for m, (30). Substituting(29), (30) and (32) into equation (8) trivially leads to (31).

If S = 0 (which must be determined), the previous lemma gives the unique solution to the incomemaximization problem. If S > 0, what follows solves the rest of the problem, beginning with thenext lemma describing the solution during the schooling period.

LEMMA 2: SCHOOLING PHASE Assume that the solution to the income maximization problem is suchthat n(a) = 1 for a ∈ [6, 6 + S) for some S ∈ (0, R− 6). Then given h(6) = h0 and q1(6) = q0, thesolution satisfies, for a ∈ [6, 6 + S),

q1(a) = er(a−6)q0 (33)

m(a)1−α2 = α2er(a−6) · q0z (34)

h(a)1−α1 = h1−α10 +

(1− α1)(1− α2)

rα2·[

eα2r(a−6)

1−α2 − 1]· (α2q0)

α21−α2 z

11−α2 . (35)

Proof. Since n(a) = 1 during the schooling phase, using the guess for the value function in (7) wehave

q1(a) = rq1(a),

to which solution is (33). Then equation (34) follows directly from (6), and using this in (8) yieldsthe first order ordinary differential equation

h(a) = h(a)α1 [α2q1(a)]α2

1−α2 z1

1−α2 ,

to which (35) is the solution.

The only two remaining unknowns in the problem are the age-dependent component of the valuefunction at age 6, q0, and human capital level at age 6+ S, hS. This naturally pins down the lengthof the schooling phase, S. The solution is solved for as a standard stopping time problem.

LEMMA 3: VALUE MATCHING AND SMOOTH PASTING Assume S > 0 is optimal. Then (q0, hS),

48

Page 49: The Causal Effect of Parents’ Education on …Of course, because the child’s schooling choice is also endogenous, the reduced form evidence does not tell us what the exact model

are given by

q0 =e−rS

αα22·([κq(6 + S)]1−α2 zα1

) 11−α

(36)

hS =α1

w· [κq(6 + S)z]

11−α . (37)

Proof. The value matching for this problem boils down to setting n(6 + S) = 1 in the workingphase, which yields (37).34 The smooth pasting condition for this problem is

lima↑6+S

∂V(a, h)∂h

= lima↓6+S

∂V(a, h)∂h

.

Using the guesses for the value functions, we have

q1(6 + S)h−α1S = q2(6 + S) ⇔ hα1

S =rw· erS

q(6 + S)· q0,

and by replacing hS with (37) we obtain (36).

This proves proves Proposition 2, and the solutions for n(a)h(a) and m(a) during the workingphase in Lemma 1 proves Corollary 1. We must still show Proposition 1.

Proof of Proposition 1. The length of the schooling period can be determined by plugging equations(36)-(37) into (35) evaluated at age 6 + S:

(α1

w· [κq(6 + S)z]

11−α

)1−α1

≤ h1−α10 +

(1− α1)(1− α2)

rα1−α22

·(

1− e−α2rS1−α2

)·([κq(6 + S)]α2 z1−α1

) 11−α

,

with equality if S > 0. All this equation implies is that human capital accumulation must bepositive in schooling, which is guaranteed by the law of motion for human capital. Rearrangingterms,

h0 ≥α1

1− (1− α1)(1− α2)

α1α2· 1− e−

α2rS1−α2

q(6 + S)

11−α1

· [κq(6 + S)z]1

1−α ,

34This means that there are no jumps in the controls. When the controls may jump at age 6 + S, we need the entirevalue matching condition.

49

Page 50: The Causal Effect of Parents’ Education on …Of course, because the child’s schooling choice is also endogenous, the reduced form evidence does not tell us what the exact model

or now replacing h0 ≡ zλhνP,

z1−λ(1−α)h−ν(1−α)P ≤ F(S), (38)

F(S)−1 ≡ κ(α1

w

)1−α·

1− (1− α1)(1− α2)

α1α2· 1− e−

α2rS1−α2

q(6 + S)

1−α1−α1

· q(6 + S)

which is the equation in the proposition. Define S as the solution to

α1α2q(6 + S) = (1− α1)(1− α2)

(1− e−

α2rS1−α2

),

i.e. the zero of the term in the square brackets. Clearly, S < R− 6, F′(S) > 0 on S ∈ [0, S), andlimS→S F(S) = ∞. An interior solution (S > 0) requires that

F(0) < z1−λ(1−α)h−ν(1−α)P ⇔ z1−λ(1−α)h−ν(1−α)

P >r

α1−α21 (α2w)α2 · q(6)

,

and S is determined by (38) at equality. The full solution is given by Lemmas 1-3 and we obtainProposition 2 and Corollary 1. Otherwise S = 0 and the solution is given by Lemma 1.

B Analytical Characterization of the Generalized Model

It is instructive to first characterize the solution to the model when the schooling choice, S, isstill continuous. In this case, the solution to the schooling phase is identical to Lemma 2. In theworking phase, there can potentially be a region where n(a) = 1 for a ∈ 6+ [S, S+ J), and n(a) < 1for a ∈ [6 + S + J, R), so we can characterize the “full-time OJT” duration, J, following AppendixA. Although we normalize w = 1 in the estimation, we keep it here for analytical completeness.

LEMMA 4: WORKING PHASE, GENERALIZED Assume that the solution to the income maximizationproblem is such that n(a) = 1 for a ∈ [6 + S, 6 + S + J) for some J ∈ [0, R − 6 − S). Then givenhS ≡ h(6 + S), the value function for a ∈ [6 + S + J, R) can be written as

V(a, h) =wr· q(a)h + DW(a) (39)

and the solution is characterized by

n(a)h(a) =[αW

r· q(a)z

] 11−αW (40)

h(a) = hJ +(αW

r

) αW1−αW ·

[∫ a

6+S+Jq(x)

αW1−αW dx

]· z

11−αW , (41)

where hJ ≡ h(6+ S + J) is the level of human capital upon ending full-time OJT. If J = 0, there is nothingfurther to consider. If J > 0, the value function in the full-time OJT phase, i.e. a ∈ [6 + S, 6 + S + J) can

50

Page 51: The Causal Effect of Parents’ Education on …Of course, because the child’s schooling choice is also endogenous, the reduced form evidence does not tell us what the exact model

be written as

V(a, h) = er(a−6−S)qS ·h1−αW

1− αW+ e−r(6+S+J−a)D(J, hJ) (42)

where

D(J, hJ) =wr· q(6 + S + J)hJ + DW(6 + S + J)− erJqS ·

h1−αWJ

1− αW

while human capital evolves as

h(a)1−αW = h1−αWS + (1− αW)(a− 6− S)z. (43)

If J > 0, the age-dependent component of value function at age 6 + S, qS, and age 6 + S + J level ofhuman capital, hJ , are determined by

qS = we−rJ ·[

ααWWr· q(6 + S + J)zαW

] 11−αW

(44)

hJ =[αW

r· q(6 + S + J)z

] 11−αW . (45)

The previous Lemma follows from applying the proof in Appendix A. The solution for J is alsoobtained in a similar way we obtained S. Since human capital accumulation must be positiveduring the full-time OJT phase,

αW

r· q(6 + S + J)z ≤ h1−αW

S + (1− αW)Jz,

with equality if J > 0. Rearranging terms,

z

h1−αWS

≤ G(J) ≡[αW

r· q(6 + S + J)− (1− αW)J

]−1. (46)

Define J as the zero to the term in the square brackets, then clearly J < R− S− 6, G′(J) > 0 onJ ∈ [0, J), and limJ→ J G(J) = ∞. Hence an interior solution J > 0 requires that

G(0) <z

h1−αWS

⇔ rαWq(6 + S)

<z

h1−αWS

, (47)

and J is determined by (46) at equality. Otherwise J = 0.Now if S were discrete, as in the model we estimate, we only need to solve for hS, the level of

human capital at age 6 + S. Then we can solve for V(h0, z; s) for all 6 possible values of s, usingLemmas 2 and 4 for the schooling and working phases, respectively. But it is also possible tocharacterize the unconstrained continuous choice of S, even though a closed form solution doesnot exist in general. We only need consider new value matching and smooth pasting conditions.

51

Page 52: The Causal Effect of Parents’ Education on …Of course, because the child’s schooling choice is also endogenous, the reduced form evidence does not tell us what the exact model

LEMMA 5: SCHOOLING PHASE, GENERALIZED The length of schooling, S, and level of human capitalat age 6 + S, hS, are determined by

1. if J = 0,

ε + (1− α2)

[αα2

2 wr· q(6 + S)zhα1

S

] 11−α2

= w ·

hS + (1− αW)

[ααW

Wr· q(6 + S)z

] 11−αW

(48)

h1−α1S ≤ h1−α1

0 +(1− α1)(1− α2)

rα2·(

1− e−α2rS1−α2

)·[α2w

r· q(6 + S)hα1

S

] α21−α2 z

11−α2 (49)

with equality if S > 0. In an interior solution S ∈ (0, R− 6), the age-dependent component of thevalue function at age 6, q0 is determined by

q0 =we−rS

r· q(6 + S)hα1

S . (50)

2. if J > 0,

ε + (1− α2)

αα22 we−rJ

[ααW

Wr· q(6 + S + J)z

] 11−αW· hα1−αW

S

11−α2

= we−rJ[

ααWWr· q(6 + S + J)z

] 11−αW

(51)

h1−α1S ≤ h1−α1

0 +(1− α1)(1− α2)

rα2·(

1− e−α2rS1−α2

)(52)

·

α2we−rJ[

ααWWr· q(6 + S + J)zαW

] 11−αW

hα1−αWS

α2

1−α2

· z1

1−α2

with equality if S > 0. In an interior solution S ∈ (0, R− 6), the age-dependent component of thevalue function at age 6, q0 is determined by

q0 = we−r(S+J) ·[

ααWWr· q(6 + S + J)zαW

] 11−αW· hα1−αW

S . (53)

Proof. Suppose S ∈ (0, R − 6). The value matching and smooth pasting conditions when J = 0are, respectively,

ε−m(6 + S) + erSq0zm(6 + S)α2 = whS [1− n(6 + S)] +wr· q(6 + S)z [n(6 + S)hS]

αW

erSq0h−α1S =

wr· q(6 + S).

Hence (50) follows from the smooth pasting condition. Likewise, (48) follow from plugging n(6 +

52

Page 53: The Causal Effect of Parents’ Education on …Of course, because the child’s schooling choice is also endogenous, the reduced form evidence does not tell us what the exact model

S), m(6 + S) from Lemmas 2 and 4 and q0 from (50) in the value matching condition. Lastly, (49)merely states that the optimal hS must be consistent with optimal accumulation in the schoolingphase, h(6 + S).

The LHS of the value matching and smooth pasting conditions when J > 0 are identical towhen J = 0, and only the RHS changes:

ε−m(6 + S) + erSq0zm(6 + S)α2 = qSz

erSq0h−α1S = qSh−αW

S .

Hence (53) follows from plugging qS and hS from (44)-(45) in the smooth pasting condition. Like-wise, (51) follow from plugging n(6 + S) = 1, m(6 + S) from Lemma 2, and q0 from (53) in thevalue matching condition. Again, (52) requires consistency between hS and h(6 + S).

For each case where we assume J = 0 or J > 0, it must also be the case that condition (47) doesnot or does hold.

C Numerical Algorithm

For the purposes of our estimated model in which S is fixed, the solution method in Appendix Bis straightforward. We need not worry about value-matching conditions and only need to solvethe smooth-pasting conditions given S, which are equations (49) and (52), to obtain hS. Note thatthere is always a solution to (49) or (52)—i.e., we can always define a function hS(S) as a functionof S. This is seen by you rearranging the equations as (bold-face for emphasis)

1 =

(h0

hS

)1−α1

+(1− α1)(1− α2)

rα2·(

1− e−α2rS1−α2

)·[α2w

r· q(6 + S)

] α21−α2 z

11−α2 · hS

− 1−α1−α2 (54)

1 =

(h0

hS

)1−α1

+(1− α1)(1− α2)

rα2·(

1− e−α2rS1−α2

)(55)

·

α2we−rJ[

ααWWr· q(6 + S + J)zαW

] 11−αW

α2

1−α2

· z1

1−α2 · hS− 1−α+α2αW

1−α2 ,

respectively. Hence, for any given value of S, both RHS’s begin at or above 1 at hS = h0, goes to 0as hS → ∞, and is strictly decreasing in hS. The solution hS(S) to both (54) and (55) are such that

1. hS = h0 when S = 0 or S + J = R− 6

2. hS(S) is hump-shaped in S (i.e., there ∃S s.t. hS reaches a maximum).

The rest of the model can be solved by Lemmas 2 and 4, and we can use Lemma 4 to determine J.Depending on whether condition (47) holds, we may have two solutions:

1. If only one solution satisfies (47), it is the solution.

53

Page 54: The Causal Effect of Parents’ Education on …Of course, because the child’s schooling choice is also endogenous, the reduced form evidence does not tell us what the exact model

2. If both satisfy (47), compare the two value functions at age 6 given S and candidate solutionsJ1 = 0 and J2 > 0 from Lemma 4 using the fact that the function DW in (39) can be written

DW(6 + S + J)

=w(αW

r

) αW1−αW

{∫ R

6+S+Je−r(a−6−S−J)

[∫ a

6+S+Jq(x)

αW1−αW dx− αW

r· q(a)

11−αW

]da}· z

11−αW

and

V(S; 6, h0) =∫ 6+S

6e−r(a−6) [ε−m(a)] da + e−rSV(6 + S, hS)

=1− e−rS

r· ε− 1− α2

rα2· (α2zq0)

11−α2

(e

rα2S1−α2 − 1

)+ e−rSV(6 + S, hS).

The candidate solution that yields the larger value is the solution.

Computing Model Moments Given our distributional assumptions on mother’s schooling, learn-ing abilities and tastes for schooling, we can compute the exact model implied moments as follows.We set grids over hP ,z, and S, with NhP = 17, Nz = 100 and NS = 6 nodes each.

1. Construct a grid over all observed levels of SP in the data. This varies from 0 to 16 withmean 9.26 and standard deviation 3.52. Save the p.m.f. of SP to use as sampling weights.

2. Assuming β = 0.06, construct the hP-grid which is just a transformation of the SP-grid ac-cording to (18).

3. For each node on the hP-grid, construct z-grids according to (19), according to Kennan (2006).This results in a total of NhP × Nz nodes and probability weights, where for each hP node wehave a discretized normal distribution.

4. For each (hP, z) compute the pecuniary of choosing S ∈ {8, 10, 12, 14, 16, 18} (solve forV(S; 6, h0) according to the above) and compute the fraction of individuals choosing eachschooling level using the CCP’s in (21)-(22).

All moments are computed by aggregating over the NhP × Nz × NS grids using the product of theempirical p.m.f. of hP, the discretized normal p.d.f. of z, and CCP’s of S as sampling weights.

54

Page 55: The Causal Effect of Parents’ Education on …Of course, because the child’s schooling choice is also endogenous, the reduced form evidence does not tell us what the exact model

D Tables and Figures not in text

Figure 5: Log earnings by Mother’s Schooling

8.5

99.5

10

10.5

11

Avera

ge L

og E

arn

ings

0 5 10 15 20 25 30Potential Experience

Sp=[4,8) Sp=8 Sp=(8,12]

Table 10: Identification

HC Productionα E slopes across S given Eα1 E slopes across SP given EαW E slope controller

Spilloversν E levels across SPλ E levels across Sb S level controller

AbilitiesρzhP Mincer coefficient β2µz E level controllerσz E level variation

TastesδS, ζh, ζc S levelsγh, γz S levels across SP and Eσξ S level variation

(SP, S, E) stand for mother’s schooling, and the individuals’ schooling and earnings levels, respectively. Taste hetero-geneity picks up the residual unobserved heterogeneity not captured by the simple model.

55

Page 56: The Causal Effect of Parents’ Education on …Of course, because the child’s schooling choice is also endogenous, the reduced form evidence does not tell us what the exact model

Table 11: Schooling Benefits, Median

S = 8 S = 10 S = 12 S = 14 S = 16 S = 18

Present value earnings by ability quartile

Q( 0) 223,874 280,365 353,766 366,731 366,986 325,786

Q( 1) 209,734 252,868 273,449 247,519 245,504 245,175Q( 2) 265,120 293,683 319,773 299,677 292,973 309,249Q( 3) 282,891 322,135 357,059 342,983 345,109 349,229Q( 4) 340,101 359,287 409,854 456,748 472,201 443,605

Nonpecuniary benefits correlated with hP by ability quartile

Q( 0) - 5,344 9,381 10,424 15,368 28,627

Q( 1) - 5,313 10,573 10,257 14,448 28,492Q( 2) - 5,206 9,922 10,087 14,480 28,882Q( 3) - 4,780 9,356 10,047 14,438 28,987Q( 4) - 4,205 8,793 10,229 15,371 28,772

Nonpecuniary benefits correlated with z by ability quartile

Q( 0) - 324 600 712 1,023 1,510

Q( 1) - 304 523 588 837 1,302Q( 2) - 334 575 641 920 1,486Q( 3) - 354 606 682 984 1,566Q( 4) - 376 648 794 1,143 1,771

PDV pecuniary value at age 6, in 2000 USD.

56

Page 57: The Causal Effect of Parents’ Education on …Of course, because the child’s schooling choice is also endogenous, the reduced form evidence does not tell us what the exact model

Figure 6: Model Fit

11.5

22.5

33.5

25 30 35 40

(a) SP < 5

11.5

22.5

33.5

25 30 35 40

(b) SP ∈ {6, 7}

11.5

22.5

33.5

25 30 35 40

(c) SP = 8

11.5

22.5

33.5

25 30 35 40

(d) SP ∈ [9, 12)

11.5

22.5

33.5

25 30 35 40

(e) SP = 12

11.5

22.5

33.5

25 30 35 40

(f) SP > 12

y-axis: normalized average earnings, x-axis: ages 25,30,35,40. Solid and dashed lines are, respectively, the data andmodel moments implied by the GMM parameter estimate values. The red lines on top correspond to individual’s withS < 12 for the first row of plots, and S ≤ 12 for the rest. The blue lines on the bottom correspond to the converse.

57

Page 58: The Causal Effect of Parents’ Education on …Of course, because the child’s schooling choice is also endogenous, the reduced form evidence does not tell us what the exact model

Figure 7: Median Lifecycle Effect of 1 Year Increase in Mom’s Schooling.

15 20 25 30 35 40 45 50 55 60 65

-0.12

-0.10

-0.08

-0.06

-0.04

-0.02

0.00

0.02

0.04

0.06

0.08

0.10

0.12

Fixed S No Selection AbilitiesTastes Reduced Form

Figure 8: Median Lifecycle Effect of 9 Year Compulsory Schooling Reform.

15 20 25 30 35 40 45 50 55 60 65

-0.06

-0.04

-0.02

0.00

0.02

0.04

0.06

0.08

0.10

0.12

0.14

0.16

Fixed S No Selection AbilitiesTastes Reduced Form

58

Page 59: The Causal Effect of Parents’ Education on …Of course, because the child’s schooling choice is also endogenous, the reduced form evidence does not tell us what the exact model

References

Altonji, J. G. and L. M. Segal (1996, July). Small-Sample Bias in GMM Estimation of CovarianceStructures. Journal of Business & Economic Statistics 14(3), 353–66.

Avery, C. and S. Turner (2012, Winter). Student loans: Do college students borrow too much–ornot enough? Journal of Economic Perspectives 26(1), 165–92.

Becker, G. S. and N. Tomes (1986, July). Human capital and the rise and fall of families. Journal ofLabor Economics 4(3), S1–39.

Behrman, J. R. and M. R. Rosenzweig (2002, March). Does increasing women’s schooling raise theschooling of the next generation? American Economic Review 92(1), 323–334.

Behrman, J. R. and P. Taubman (1989, December). Is Schooling “Mostly in the Genes"? Nature-Nurture Decomposition Using Data on Relatives. Journal of Political Economy 97(6), 1425–46.

Belley, P. and L. Lochner (2007). The Changing Role of Family Income and Ability in DeterminingEducational Achievement. Journal of Human Capital 1(1), 37–89.

Ben-Porath, Y. (1967). The production of human capital and the life cycle of earnings. Journal ofPolitical Economy 75, 352.

Betts, J. R. (1996). What do students know about wages? evidence from a survey of undergradu-ates. The Journal of Human Resources 31(1), pp. 27–56.

Björklund, A., M. Lindahl, and E. Plug (2006). The origins of intergenerational associations:Lessons from swedish adoption data. The Quarterly Journal of Economics 121(3), 999–1028.

Black, S. E. and P. J. Devereux (2011). Recent Developments in Intergenerational Mobility, Volume 4 ofHandbook of Labor Economics, Chapter 16, pp. 1487–1541. Elsevier.

Black, S. E., P. J. Devereux, and K. G. Salvanes (2005). Why the apple doesn’t fall far: Understand-ing intergenerational transmission of human capital. The American Economic Review 95(1), pp.437–449.

Bowles, S. and H. Gintis (2002, Summer). The Inheritance of Inequality. Journal of Economic Per-spectives 16(3), 3–30.

Bowles, S., H. Gintis, and M. Osborne (2001). The determinants of earnings: A behavioral ap-proach. Journal of Economic Literature 39(4), 1137–1176.

Cameron, S. V. and C. Taber (2004). Estimation of educational borrowing constraints using returnsto schooling. Journal of Political Economy 112(1), pp. 132–182.

Card, D. (1999). The causal effect of education on earnings. Volume 3A of Handbook of LaborEconomics. Amsterdam: Elsevier.

59

Page 60: The Causal Effect of Parents’ Education on …Of course, because the child’s schooling choice is also endogenous, the reduced form evidence does not tell us what the exact model

Carneiro, P. and J. J. Heckman (2002). The evidence on credit constraints in post-secondary school-ing*. The Economic Journal 112(482), 705–734.

Cunha, F. and J. J. Heckman (2007, May). The technology of skill formation. American EconomicReview 97(2), 31–47.

Cunha, F., J. J. Heckman, and S. M. Schennach (2010, 05). Estimating the technology of cognitiveand noncognitive skill formation. Econometrica 78(3), 883–931.

Del Boca, D., C. Flinn, and M. Wiswall (2013, April). Transfers to households with childrenand child development. Working Papers 2013-001, Human Capital and Economic OpportunityWorking Group.

Del Boca, D., C. Monfardini, and C. Nicoletti (2012, March). Children’s and parents’ time-usechoices and cognitive development during adolescence. Working Papers 2012-006, Human Cap-ital and Economic Opportunity Working Group.

Goldin, C. and L. F. Katz (2007, March). The race between education and technology: The evolu-tion of u.s. educational wage differentials, 1890 to 2005. NBER Working Papers 12984, NationalBureau of Economic Research, Inc.

Heckman, J., L. Lochner, and C. Taber (1998, January). Explaining rising wage inequality: Ex-planations with a dynamic general equilibrium model of labor earnings with heterogeneousagents. Review of Economic Dynamics 1(1), 1–58.

Heckman, J., R. Pinto, and P. Savelyev (2013). Understanding the mechanisms through which aninfluential early childhood program boosted adult outcomes. American Economic Review 103(6),2052–86.

Heckman, J. J., L. J. Lochner, and P. E. Todd (2006, June). Earnings Functions, Rates of Return andTreatment Effects: The Mincer Equation and Beyond, Volume 1 of Handbook of the Economics of Edu-cation, Chapter 7, pp. 307–458. Elsevier.

Heckman, J. J., L. J. Lochner, and P. E. Todd (2008). Earnings functions and rates of return. Journalof Human Capital 2(1), pp. 1–31.

Heckman, J. J., J. Stixrud, and S. Urzua (2006, July). The Effects of Cognitive and NoncognitiveAbilities on Labor Market Outcomes and Social Behavior. Journal of Labor Economics 24(3), 411–482.

Hoxby, C. and C. Avery (2013). The Missing &quot;One-Offs&quot;: The Hidden Supply of High-Achieving, Low-Income Students. Brookings Papers on Economic Activity 46(1 (Spring), 1–65.

Huggett, M., G. Ventura, and A. Yaron (2011, December). Sources of lifetime inequality. AmericanEconomic Review 101(7), 2923–54.

60

Page 61: The Causal Effect of Parents’ Education on …Of course, because the child’s schooling choice is also endogenous, the reduced form evidence does not tell us what the exact model

Keane, M. P. and K. I. Wolpin (1997, June). The career decisions of young men. Journal of PoliticalEconomy 105(3), 473–522.

Kennan, J. (2006). A note on discrete approximations of continuous distributions.

Léonard, D. and N. Van Long (1992). Optimal Control Theory and Static Optimization in Economics.Cambridge University Press.

Oreopoulos, P., M. Page, and A. H. Stevens (2008). The intergenerational effects of worker dis-placement. Journal of Labor Economics 26(3), pp. 455–000.

Plug, E. (2004). Estimating the effect of mother’s schooling on children’s schooling using a sampleof adoptees. The American Economic Review 94(1), pp. 358–368.

Plug, E. and W. Vijverberg (2003, June). Schooling, family background, and adoption: Is it natureor is it nurture? Journal of Political Economy 111(3), 611–641.

Poterba, J. M. (1998). The rate of return to corporate capital and factor shares: new estimates usingrevised national income accounts and capital stock data. Carnegie-Rochester Conference Series onPublic Policy 48(0), 211 – 246.

Rege, M., K. Telle, and M. Votruba (2011). Parental job loss and children’s school performance. TheReview of Economic Studies 78(4), pp. 1462–1489.

Sacerdote, B. (2007, February). How Large Are the Effects from Changes in Family Environment?A Study of Korean American Adoptees. The Quarterly Journal of Economics 122(1), 119–157.

Sacerdote, B. (2011). Nature and nurture effects on children’s outcomes: What have we learnedfrom studies of twins and adoptees? Volume 1 of Handbook of Social Economics, pp. 1 – 30.North-Holland.

Scholz, J. K., A. Seshadri, and S. Khitatrakun (2006). Are americans saving "optimally" for retire-ment? Journal of Political Economy 114, 607–643.

Solon, G. (1999, October). Intergenerational mobility in the labor market. In O. Ashenfelter andD. Card (Eds.), Handbook of Labor Economics, Volume 3 of Handbook of Labor Economics, Chapter 29,pp. 1761–1800. Elsevier.

Taber, C. R. (2001, July). The rising college premium in the eighties: Return to college or return tounobserved ability? Review of Economic Studies 68(3), 665–91.

Willis, R. J. and S. Rosen (1979, October). Education and self-selection. Journal of Political Econ-omy 87(5), S7–36.

61