JOHN MARSHALL AY - Harvard University...IDENTIFYING EDUCATION’S POLITICAL EFFECTS WITH INCOMPLETE...

47
I DENTIFYING EDUCATION S POLITICAL EFFECTS WITH INCOMPLETE DATA :I NSTRUMENTAL VARIABLE ESTIMATES COMBINING TWO DATASETS J OHN MARSHALL * MAY 2014 Political scientists are increasingly using instrumental variable (IV) methods, but are often faced with datasets that lack key variables or which only provide coarse vari- able codings. While completely missing a key variable typically causes projects to be abandoned, coarsening a treatment variable with multiple intensities—e.g. creating a binary treatment indicator—can substantially upwardly bias IV estimates. This bias arises where the coarsening causes the first stage to only capture part of the instru- ment’s effect. Two-sample IV methods offer a powerful solution to both problems: imputing values for the missing or coarsened variable using a separate dataset drawn from the same population with richer measurement of the treatment consistently es- timates the weighted average per-unit treatment effect. Applying this approach in a fuzzy regression discontinuity setting in Great Britain, I show that an additional year of schooling substantially increases the probability of voting Conservative later in life. The estimate for completing high school, however, is upwardly biased by between two and six times. * PhD candidate, Department of Government, Harvard University. [email protected]. I thank Anthony Fowler and Horacio Larreguy for illuminating discussions. 1

Transcript of JOHN MARSHALL AY - Harvard University...IDENTIFYING EDUCATION’S POLITICAL EFFECTS WITH INCOMPLETE...

  • IDENTIFYING EDUCATION’S POLITICAL EFFECTSWITH INCOMPLETE DATA: INSTRUMENTAL VARIABLE

    ESTIMATES COMBINING TWO DATASETS

    JOHN MARSHALL∗

    MAY 2014

    Political scientists are increasingly using instrumental variable (IV) methods, but areoften faced with datasets that lack key variables or which only provide coarse vari-able codings. While completely missing a key variable typically causes projects to beabandoned, coarsening a treatment variable with multiple intensities—e.g. creating abinary treatment indicator—can substantially upwardly bias IV estimates. This biasarises where the coarsening causes the first stage to only capture part of the instru-ment’s effect. Two-sample IV methods offer a powerful solution to both problems:imputing values for the missing or coarsened variable using a separate dataset drawnfrom the same population with richer measurement of the treatment consistently es-timates the weighted average per-unit treatment effect. Applying this approach in afuzzy regression discontinuity setting in Great Britain, I show that an additional yearof schooling substantially increases the probability of voting Conservative later in life.The estimate for completing high school, however, is upwardly biased by between twoand six times.

    ∗PhD candidate, Department of Government, Harvard University. [email protected]. I thank AnthonyFowler and Horacio Larreguy for illuminating discussions.

    1

    [email protected]

  • 1 Introduction

    Instrumental variable (IV) techniques are now a standard component of the political scientist’s

    methodological toolkit. Sovey and Green’s (2011) meta-analysis identifies more than one hundred

    articles published in three top journals over two decades using IV techniques. It is easy to under-

    stand why. Interpreted in the heterogeneous potential outcomes framework (Imbens and Angrist

    1994; Angrist, Imbens and Rubin 1996), IV approaches promise to identify the average causal

    effect of a treatment for the population of units that would not have received the treatment without

    the intervention of the instrumental variable.

    The prevalence of IV techniques has warranted increased scrutiny. Sovey and Green’s (2011)

    review highlights six key concerns in IV analyses, and identifies the types of evidence and argu-

    ment required to justify the assumptions underpinning the IV framework. Angrist and Pischke

    (2008) also provide clear advice on using IV methods in practice.

    However, this article highlights an important additional concern: using a binary (or coarsened)

    treatment variable when the true treatment has multiple intensities can substantially upwardly bias

    IV estimates. An important example is where a dummy variable for completing high school is

    used because years of schooling is not measured. This missing data issue has been ignored by

    both political scientists and economists, but frequently arises in empirical applications. I explain

    how two-sample IV methods can alleviate the problem when, as is often the case, the data needed

    to correct the bias is not available in the original sample. I then use the two-sample approach to

    identify the causal effect of schooling on political preferences, using fuzzy regression discontinuity

    methods to show that an additional year of high school substantially increases Conservative voting

    in Great Britain.

    This article first shows how coarsening a multi-valued (or interval) treatment variable intro-

    duces upward bias. The reduced form captures the impact of an instrument on an outcome for ev-

    ery individual regardless of their (coarsened) treatment intensity. However, the first stage—which

    2

  • re-weights the reduced form coefficient—underestimates the effect of the instrument on the coars-

    ened treatment by failing to recognize that the treatment intensity increases for some individuals

    without passing the threshold required to be designated a new coarsened treatment intensity value.

    In the case of schooling, the first stage for compulsory schooling laws (CSLs)—a popular instru-

    ment for completing high school (e.g. Dee 2004; Lochner and Moretti 2004; Milligan, Moretti and

    Oreopoulos 2004)—only captures the individuals CSLs push to complete high school, neglect-

    ing any increase in schooling which does not result in completing high school. However, since

    the reduced form includes the effects for individuals who experienced greater schooling without

    completing high school, this can substantially upwardly bias IV estimates.

    The bias is most severe when there is a large first stage for neighboring intensities with large

    causal effects. The political effects of completing high school may thus be substantially biased

    if each additional year of schooling has a significant causal effect on political preferences and

    the instrument increasing schooling for many students without inducing them to complete high

    school. When the true causal effect is not highly discontinuous at a known point, estimating the

    weighted average per-unit treatment effect for an interval treatment intensity is more appropriate.

    I will show that IV techniques provide a consistent estimate of this quantity of interest, even when

    some categories of the underlying treatment intensity are unobserved. Unlike the the case where

    a treatment intensity is discretized, there is also a clear and conceptually-appealing counterfactual

    interpretation for this estimate.

    While a treatment intensity variable can be incorrectly discretized or “miscoded” through the

    choice of a researcher, a common problem is that more granular data is not available. Using a

    dummy for completing high school, for example, is often necessitated because datasets such as

    the American National Election Survey and British Social Attitudes Survey only provide relatively

    coarse measures of education. In such cases, any IV estimate of schooling’s political effects may

    be significantly biased.

    The two-sample IV techniques pioneered by Angrist and Krueger (1992, 1995) can substan-

    3

  • tially alleviate or solve this problem. Conceptually, these methods estimate the reduced form in

    a sample containing data on only the outcome and the instrument, and the first stage in a sample

    containing data on only the treatment and the instrument, before combining the two to produce

    an IV estimate. The sample used for the first stage effectively serves as a means of imputing the

    missing treatment variable. In this sense, two-sample IV methods behave like multiple imputation

    techniques (e.g. Honaker and King 2010; King et al. 2001).1 Beyond the standard IV assump-

    tions, both samples must be random draws from the same population. I show that the two-sample

    2SLS (TS2SLS) estimator first proposed by Angrist and Krueger (1995)—which can accommodate

    both overidentification and additional covariates—is consistent, and I also extend Inoue and Solon

    (2010) to derive the associated cluster-robust variance matrix which corrects for finite-sample dif-

    ferences between samples 1 and 2. Despite their merits, two-sample IV methods have not yet been

    used in political science.

    Finally, I show how using Britain’s compulsory schooling reforms as instruments for dis-

    cretized measures of high school education can significantly upwardly bias estimates of school-

    ing’s effects on political preferences. Using Britain’s two major reforms, the upward bias can be

    cleanly decomposed: using a dummy for high school completion, instead of the true linear effect

    of an additional year of late high school, upwardly biases estimates of schooling’s effect on voting

    conservative by between two and six times. The two-sample IV approach instead finds that an

    additional year of late high school increases the probability that a voter votes Conservative in later

    life by around 10-15 percentage points.

    This substantial difference, which is reiterated by the reduced form estimates, raises a dilemma

    for left-of-center parties—like Labour and more recently the Liberals—which have championed

    inclusive educational policies at the expense of electoral success. Although it is beyond this ar-

    ticle’s scope to evaluate the mechanisms underpinning these large effects, it preliminary suggests

    1While multiple imputation involves imputing missing data using other variables within a givensample, two-sample IV imputes all observations for a given variable using a second sample. Unlikemultiple imputation programs like Amelia II, the methods used here have analytic solutions.

    4

  • that education’s political effects are driven by income-based concerns—rather than socially liberal

    attitudes that would be expected to cause voters to support the Labour or Liberal parties.

    This paper is organized as follows. Section 2 demonstrates analytically the extent of the bias

    and discusses the implications for applied empirical work. Section 3 explains how two-sample IV

    techniques can alleviate the missing data problem. Section 4 applies these methods to identify the

    effect of schooling on voting preferences in Great Britain. Section 5 concludes.

    2 IV’s upward bias with coarsened treatments

    2.1 Characterizing the bias

    To illustrate the upward bias of coarsening a treatment intensity, consider the simplest case where

    there is a single randomly assigned binary instrument.2 Denote this instrument for each obser-

    vation i ∈N ≡ {1, ...,n} as Zi ∈ {0,1}. The observed treatment intensity Ti ∈ {1, ...,J} assumes

    one of J ordered values, where Tiz ≡ T (Zi = z) denote the potential outcomes of Tiz conditional

    on the assignment of the instrument Zi = z. Yi is i’s observed outcome of interest, with potential

    outcomes Yit ≡ Y (Ti = t) corresponding to i’s treatment assignment Ti = t. To illustrate the prob-

    lem, let us assume that the instrumental variables assumptions of monotonicity and the exclusion

    restriction hold (see Imbens and Angrist 1994; Angrist, Imbens and Rubin 1996); see below for

    formal definitions.

    The researcher, whether by choice or necessity, decides to coarsen the treatment intensity. In

    particular, in the hope of identifying the effect of experiencing Ti = k > 1, they partition T by

    defining the indicator Dik ≡ 1(Ti ≥ k).3 Crucially, the researcher interested in identifying the

    effect of obtaining Ti = k is only interested in estimating βk ≡ E[Yik−Yik−1|Ti1 ≥ t > Ti0]. This2The results presented here extend easily to the cases of multi-valued instruments and to the

    inclusion of control variables.3If multiple instruments are available, the coarsening need not be binary.

    5

  • quantifies the local average treatment effect (LATE) of obtaining intensity k beyond only obtaining

    the preceding level k−1 for instrument compliers. In the case of schooling, this could be the effect

    of completing high school (12th grade) beyond completing 11th grade. In many applications, this

    counterfactual is not clearly specified.4

    This approach yields the following system of IV equations to be estimated:

    Yi = β̃kDik + ui, (1)

    Dik = γZi + εi. (2)

    Equation (1) is the structural model defining the relationship between the binary treatment and the

    outcome, while equation (2) is the first stage regression of the binary treatment on the instrument.

    The true causal effect of obtaining a treatment intensity of k for instrument compliers is βk, while

    β̃k represents the population average effect that IV approaches typically cannot identify.5

    Angrist and Imbens (1995) show that the Wald estimator βWk for this system of equations can

    be expressed as the weighted sum of the causal effect for compliers moving from intensity t−1 to

    t for each such interval:

    βWk ≡E[Yi|Zi = 1]−E[Yi|Zi = 0]

    E[Dik|Zi = 1]−E[Dik|Zi = 0]=

    ∑Jt=1 pitβtpik

    , (3)

    where pit ≡ Pr(Ti1 ≥ t > Ti0) denotes the probability that i only reaches category Ti = t because

    they received the instrument Zi = 1, and thus represents the proportion of compliers at treatment

    intensity t in the population. pik therefore represents the relevant first stage for ascertaining the

    treatment intensity k. βt ≡ E[Yit −Yit−1|Ti1 ≥ t > Ti0] is the LATE for compliers moving from4When the treatment is truly binary, the interpretation is clear. However, if the latent treatment

    is multi-valued, the researcher implicitly argues for the difference between some kind of averageof values contained within each discretized treatment condition.

    5As Oreopoulos (2006) shows, as the number of compliers increase the local average treatmenteffect converges toward the population average treatment effect.

    6

  • treatment intensity t−1 to treatment intensity t.

    The following proposition extends Angrist and Imbens (1995) to demonstrate the inconsistency—

    and thus a bias even as the sample size is large—associated with the Wald estimator seeking to

    identify βk.6

    Proposition 1. Suppose the following assumptions hold:

    A1. Exclusion restriction: (Ti0,Ti1,{Yit}Jt=1) are jointly independent of Zi, for all i ∈N .

    A2. Monotonicity: Ti1−Ti0 ≥ 0 or Ti1−Ti0 ≤ 0, for all i ∈N .

    Then the dummy variable Wald estimator βWk of equations (1) and (2) can be expressed as:

    βWk −βk =∑t 6=k pitβt

    pik. (4)

    Provided sign(βk) = sign(βt) for all t 6= k where pit > 0, the dummy variable Wald estimator

    accentuates the true causal effect: |βk| ≤ |βWk |.

    All proofs are provided in the Appendix.

    This result establishes that the Wald estimator generally over-estimates the true LATE of ob-

    taining intensity k. The estimator is consistent only in two special cases. First, when the instrument

    only affects reaching intensity k; or pit = 0,∀t 6= k. Second, when the causal effect for all intervals

    other than k is zero; or βt = 0,∀t 6= k. Otherwise, the inconsistency of the estimator is increasing

    in both pit/pik and βt for any t 6= k.

    Our education example clearly illustrates the bias. Consider a compulsory school law requiring

    that students remain in school until age 15 in a country like the Britain where high school is

    completed at age 16.7 For many students who would have dropped out before age 15 without6In general, IV estimators are biased but consistent (see Bound, Jaeger and Baker 1995). The

    term bias is reduced somewhat loosely here to mean the deviation between the inconsistent andconsistent estimators.

    7The U.S. is also a good example, where high school is completed at age 18 but the schoolleaving age is (or has been) 16 for many states.

    7

  • the law, the law may not induce the completion of high school. Many likely only stay until 15,

    although some may go on to complete high school. This implies that there is a significant first

    stage, pit > 0, for levels of schooling below high school. The IV bias, however, only arises if

    an additional year of schooling before the completion of high school matters for the outcome of

    interest. For outcomes like income, where either human capital or signaling may matter for labor

    market returns (e.g. Becker 1993; Mincer 1974; Spence 1973), it is easy to believe that βt > 0.

    Similarly, if income maps to political preferences (e.g. Marshall 2014), or remaining in high school

    imparts politically-relevant norms, then political outcomes are also likely to suffer from bias.

    2.2 When is the bias severe?

    Proposition 1 demonstrated that the extent of bias depends upon the first stage and the LATE at

    different treatment intensities. This analytical insight permits precise description of the extent of

    bias in terms of a weighted causal response function (CRF). The CRF provides the causal effect of

    the treatment at each intensity.

    2.2.1 Sharp jumps in the CRF

    When the CRF exhibits sharp discontinuities, as exemplified in Figure 1, the dummy approach

    can be most appropriate. If the researcher’s understanding of the problem is strong, then correctly

    identifying intensity k—the only point at which there is a (positive) causal effect in the figure—as

    the key jump will yield a consistent estimate of βk, provided a suitable instrument exists to ensure

    pik > 0. The reason that this works well is because βt = 0 for all t 6= k. Therefore, the Wald

    estimator is consistent regardless of whether pit > 0 for some other t 6= k.

    Since the true CRF is unobserved, it is hard for researchers to know in practice whether k is the

    correct cutoff to use when defining their dummy variable. In general, tipping point equilibria that

    lack clear institutional definition may not be straight-forward to theorize about. Experiments, on

    the other hand, are not subject to these concerns if subjects cannot be partially treated.

    8

  • T

    Y

    k k+1

    Y0

    Y1

    Figure 1: Discontinuous causal response function

    If the researcher incorrectly surmises that k + 1 is the correct threshold, at best they fail to

    detect the existence of the causal effect of intensity k but correctly identify no effect at k+ 1. In

    the example of Figure 1, the researcher concludes that βk+1 = 0 provided that their instrument

    does not induce subjects to reach intensity k and βt = 0,∀t 6= k. In other words, pik = 0 ensures a

    correct causal estimate of a quantity that was probably not of primary interest. When pik > 0, the

    Wald estimator will produce an inconsistent estimate of the LATE at intensity k+ 1 given by:

    βWk+1−βk+1 =pikβkpik+1

    > 0. (5)

    Although this estimate is approximately right in the sense that there is a causal effect nearby, it

    both wrongly attributes the effect to intensity k+1 and does not even provide a consistent estimate

    9

  • of βk unless pik+1 = pik.

    2.2.2 Linear CRFs

    The bias associated with using a dummy variable can be particularly large when the true CRF is

    linear. Letting the causal effect associated with each interval be βt = τ 6= 0, the dummy variable

    Wald estimator yields:

    βWk −βk =∑t 6=k pit

    pikτ . (6)

    This requires that more than one half of all compliers must achieve intensity k for the inconsistency

    to be less than double the size of the true coefficient.8 This concern increases with how close

    the treatment intensity categories are to one another (i.e. increases in J), because it becomes

    increasingly implausible that any instrument could induce all i to receive exactly Ti1 = k.

    When the causal response is linear, an alternative Wald estimator—also proposed by Angrist

    and Imbens (1995)—estimating the weighted average per-unit treatment effect (WAPTE) is more

    appropriate:

    βWWAPT E ≡E[Yi|Zi = 1]−E[Yi|Zi = 0]E[Ti|Zi = 1]−E[Ti|Zi = 0]

    =∑Jt=1 pitβt∑Jt=1 pit

    . (7)

    When the true causal effect is τ at each interval, it is exactly recovered by βWWAPT E . When the

    causal effect is not exactly linear, the estimator disproportionately weights the intervals with most

    compliers.

    It is easy to show that the dummy variable approach yields a coefficient at least as large as the

    8To see this, note that∑t 6=k pit

    pik=

    p− pikpik

    < 1,

    only when pik > p/2, where p≡ ∑ j pi j.

    10

  • WAPTE when the instrument satisfies monotonicity (Angrist and Imbens 1995).9 Consequently,

    if the CRF is that in Figure 1, then the linear approach underestimates the true causal effect at

    intensity k by fitting a complier-weighted linear form. In the special case where the instrument

    only affects the first stage of interest, or pit = 0,∀t 6= k, the WAPTE estimator yield an identical

    estimate to the discretized Wald estimator. To the extent that a more conservative estimate is

    desired when the CRF is uncertain, the linearization may therefore be preferred.

    Furthermore, the linear approach may be robust even when not all categories are observed. If

    the J observed categories represent a coarsening of the true intervals (e.g. because T is continuous),

    the linear causal effect can still be recovered provided the intervals are equally spaced.10

    Proposition 2. Suppose assumptions A1 and A2 in Proposition 1 hold. Let only J equally-spaced

    categories of Ti be observed when there are in fact αJ equally-spaced categories, where α > 1 is

    finite and αJ is an integer. Denote βW ,JWAPT E and βW ,αJWAPT E respectively as the Wald estimators in the

    observed sample (denoted by superscript J) and unobserved sample (denoted by superscript αJ).

    If the effect of Ti is linear and β Jj = τ for all intervals j, then βW ,JWAPT E = αβ

    W ,αJWAPT E

    This result suggests that any linear relationship can be accurately estimated with the WAPTE es-

    timator, even when all intervals cannot be observed in practice. Obtaining the coefficient for the

    quantity of interest only requires an adjustment by factor α to provide the average linear causal

    effect at the desired unit interval.

    2.3 Implications for applied research

    The analysis here demonstrates that the shape of the CRF is critical for ascertaining the bias of the

    Wald estimator with a binary treatment. Unless the instrument is very specific in inducing subjects

    to only reach treatment intensity k or the causal response is non-zero only at that particular point,

    9Comparison of the denominators shows that ∑Jt=1 pit ≥ pik if sign(pit) = pik,∀t.10More generally, even if the spacing is uneven the true causal effect could be identified if the

    spacing is proportional to the causal effects at each observed intensity.

    11

  • the Wald estimator can be severely biased. If the CRF is instead approximately linear in form, it is

    more appropriate to estimate the WAPTE.

    Although researchers may in some cases have strong prior beliefs over the shape of the CRF,

    and thus the most appropriate empirical strategy, definitive evidence is hard to produce. For ex-

    ample, it is far from clear whether it is the additional learning imparted every day that students

    remain in high school or simply obtaining the diploma that should matter for how an individual

    votes. Unfortunately, the researcher must rely on evidence and intuition—including the reduced

    form relationship, separate first stage regressions and the dummied-out OLS relationship—in order

    to determine the appropriate variable specification when only a single instrument is available.

    However, when multiple instruments are available, a sharper empirical assessment is possible.

    With p > 1 instruments, p intervals of the CRF can be estimated by instrumenting for p binary

    indicators of different treatment levels.11 Under the assumption that different instruments do not

    affect different types of compliers differently, this permits the researcher to estimate βt for com-

    pliers at the intervals where the researcher believes the per-unit causal effect is likely to be largest.

    Large causal effects at t 6= k provide strong evidence against the kind of CRF required to use βWkas a consistent estimator for βk. Applying this approach, section 4 shows that IV estimates for

    completing high school can substantially over-estimate education’s political effects.

    Carefully examining the effects of different levels of a treatment intensity in a single dataset is

    often not possible. As noted above, researchers often only resort to using dummy variables when

    better measures are not available. I now show how two-sample IV methods can be used to address

    this missing data problem.

    11The above analysis can be generalized to the case of multiple instruments.

    12

  • 3 Using two samples to address missing data

    This section shows how two-sample IV techniques—a method yet to be employed in political sci-

    ence, as far as I am aware—can solve the problem that the researcher is forced to use a dummy

    because an insufficient number of categories are measured in their dataset. Of course, when all

    categories are available, the researcher is free to re-specify their treatment intensity variable how-

    ever they deem fit. The two-sample method can similarly address the problem that the treatment

    variable is completely unobserved.

    The key idea underpinning two-sample IV techniques is that the reduced form and first stage

    can be estimated in separate samples. Conceptually, we can then combine these estimates by

    respectively replacing the numerator and denominator of the WAPTE estimator in equation (7) or

    the Wald estimator in equation (3). Accordingly, two datasets are needed—one which includes

    Zi and Yi, and a second which includes Zi and Ti. If covariates Wi are desired, they must also be

    observed in both samples. If these datasets are both random draws from the same population, then

    the relationship between Zi and Ti in the first stage sample should be equivalent to that which would

    have been measured in the reduced form sample had good measures of Ti been available. Under

    these conditions, which are formalized below, it is reasonable to effectively impute values values

    of Ti using our second dataset.

    3.1 Estimation

    The goal is estimate the following system of IV equations:

    Yi = TiβT +Wiβ−T + ui = Xiβ + ui (8)

    Ti = ZiΠ+ εi, (9)

    13

  • where Zi includes Wi and q excluded instruments. Identification requires that only p≤ q treatment

    variables, Ti, can be instrumented for.

    Two methods have been proposed for IV estimation with two samples. Angrist and Krueger

    (1992) propose a Wald-style estimator where the reduced form estimates are divided by their first

    stage counterparts, which can be generalized to the overidentified case where the number of in-

    struments outnumber the number of endogenous variables. Inoue and Solon (2010) show that this

    estimator is less efficient than the 2SLS counterpart—proposed by Angrist and Krueger (1995)

    for splitting a sample—that will be used in the empirical application here. The advantage of this

    estimator is that it corrects for finite-sample differences between the two samples.12 Furthermore,

    its extension to multiple instruments and multiple endogenous variables is straight-forward—both

    of which are important in many empirical applications, including the analysis in this paper.

    In matrix form (stacking over i), the TS2SLS estimator is:

    β̂ T S2SLS = (X̂ ′1X̂1)−1X̂ ′1Y1, (10)

    where X̂1 = (T̂1,W1) is the matrix of predicted values in sample 1. The OLS regression coefficients

    generating T̂1 are based on p first stage regressions estimated in sample 2:

    X̂1 = Z1Π̂ = Z1(Z′2Z2)−1Z′2X2. (11)

    3.2 Properties of TS2SLS

    The following assumptions are required to ensure the consistency of the TS2SLS estimator:

    1. Random sampling from the same population: {Y1i,Z1i}n1i=1 and {T2i,Z2i}n2i=1 are indepen-

    dently and identically distributed draws of size n1 and n2 from the same population with

    12Inoue and Solon (2005) show that the TS2SLS estimator remains consistent even when differ-ences in the sampling rates vary with some of the instrumental variables.

    14

  • finite second moments.

    2. Exclusion restriction: E[Z′1iu1i] = 0.

    3. Instrument exogeneity: E[Z′1iε1i] = E[Z′2iε2i] = 0.

    4. Rank conditions: (a) Z′1iZ1i and Z′2iZ2i have full rank, (b) X

    ′1iZ2i and X

    ′2iZ2i have full rank.

    5. Interchangeable sample moments: (a) E[Z′1iX1i] = E[Z′2iX2i], (b) E[Z

    ′1iZ1i] = E[Z

    ′2iZ2i].

    Assumption 1 says that the samples must draw from the same population. Assumption 2 is im-

    plied by the exclusion restriction above, but is written in terms of expectations. Assumption 3

    requires that the instrument be exogenous in the first stage. Assumption 4 is a standard rank con-

    dition required for matrix invertibility. Assumption 5 requires that crucial samples moments can

    be interchanged, thereby permitting substitution between samples. As n1 and n2 converge to the

    population size, Assumption 5 necessarily holds.

    Proposition 3 demonstrates the consistency of TS2SLS, while the proof illustrates the use of

    the assumptions above.13

    Proposition 3. Under Assumptions 1-5, β̂ T S2SLS is an n1-consistent estimator of β .

    Correctly calculating the TS2SLS standard errors is not obvious. Calculating the standard er-

    rors from a regression of Y1 on X̂1 neglects the uncertainty in the first stage, in addition to distribu-

    tional differences between the first stage and reduced form samples. The Murphy and Topel (1985)

    two-stage framework for understanding “generated regressors”—accounting for the uncertainty in-

    troduced where a variable is estimated as a proxy to enter a separate regression—incorporates such

    estimation uncertainty.14 Proposition 4 derives the homoskedastic and cluster-robust variance (ma-13Angrist and Krueger’s (1995) proof rests on showing that the TS2SLS estimator converges to

    the consistent Angrist and Krueger (1992) estimator, because of Assumption 5. The proof providedhere instead demonstrates the consistency of TS2SLS directly.

    14Inoue and Solon (2010) acknowledge this approach but derive homoskedastic and het-eroskedastic variance matrices in an alternative way, but do not provide a cluster-robust varianceestimate.

    15

  • trices), of which the robust variance is the particular case of G1 = n1 and G2 = n2 clusters. (i is

    dropped to facilitate exposition.)

    Proposition 4. The asymptotic variance of the TS2SLS estimator, V[β̂ T S2SLS], is

    [σ2u +

    n1n2

    β̂ T S2SLS′S Ωβ̂T S2SLSS

    ]E[X̂ ′1X̂1]

    −1, Ω = E[ε ′ε|X̂1] =

    σ21 · · · σ1,p... . . .

    ...

    σp,1 . . . σ2p

    (12)

    when the reduced form squared error σ2u = E[u2|X̂1] and the error covariances Ω of the p first

    stage regressions are homoskedastic; when the reduced form and first stage errors are grouped

    into G1 and G2 clusters respectively, the cluster-robust variance is

    E[X̂ ′1X̂1]−1[

    V[β̂ T S2SLS]+n1n2

    E[X̂ ′1(β̂T S2SLS′S ⊗Z1)]V(Π̂)E[(β̂ T S2SLS′S ⊗Z1)′X̂1]

    ]E[X̂ ′1X̂1]

    −1,(13)

    where β̂ T S2SLSS is the vector of coefficients on p endogenous variables, the uncorrected TS2SLS

    variance is given by V[β̂ T S2SLS] = G1G1−1 ∑G1g=1 E[X̂

    ′1gû1gû

    ′1gX̂1g] and the variances from m first-

    stage regressions are V(Π̂) = G2G2−1 Φ⊗E[Z′2Z2]

    −1, where

    Φ =

    E[Z′2Z2]

    −1 ∑G2g=1 E[Z′2gε̂2g1ε̂

    ′2g1Z2g] · · · E[Z′2Z2]−1 ∑

    G2g=1 E[Z

    ′2gε̂2g1ε̂

    ′2gpZ2g]

    ... . . ....

    E[Z′2Z2]−1 ∑G2g=1 E[Z

    ′2gε̂2gpε̂

    ′2g1Z2g] . . . E[Z

    ′2Z2]

    −1 ∑G2g=1 E[Z′2gε̂2gpε̂

    ′2gpZ2g]

    . (14)

    Standard errors are given by the square roots of the diagonal elements of V[β̂ T S2SLS]/n1. Using

    the analogy principle, expectations can be replaced by sample moments.

    16

  • In the case of a single endogenous regressor, V(Π̂) is simply the standard cluster-robust vari-

    ance matrix for the first stage:

    E[Z′2Z2]−1

    [G2

    G2−1

    G2

    ∑g=1

    E[Z′2gε̂2gε̂′2gZ2g]

    ]E[Z′2Z2]

    −1. (15)

    When there are multiple endogenous variables, the first stage estimates may be correlated across

    models. This requires the more complex formulation in Proposition (4).

    4 High school education and political preferences

    I use the two-sample IV methods expounded above to answer an important question about political

    behavior: how does high school affect who citizens vote for? Despite widespread interest in the

    causal effects of education on political participation (see Sondheimer and Green 2010), education’s

    partisan bias has received limited attention from scholars seeking to move beyond survey corre-

    lations. Furthermore, identifying the political effects of schooling is challenging because many

    surveys provide insufficiently granular measures of education.

    There are various ways in which education could affect political preferences. One of the most

    robust correlations in political surveys in developed democracies is the link between income and

    support for right-wing political parties (e.g. Gelman et al. 2010; Thomassen 2005). If educa-

    tion increases income, as human capital theory strongly suggests (e.g. Acemoglu and Angrist

    2000; Becker 1993), then additional high school may well increase support for right-wing parties

    proposing lower taxes (Meltzer and Richard 1981).15

    However, education is also associated with socially liberal attitudes. This link has also been

    widely documented in survey research (Dee 2004; Gerber et al. 2010; Schoon et al. 2010), although

    15This relationship could similarly work through changing demand for social insurance (Iversenand Soskice 2001; Moene and Wallerstein 2001). In the U.S., Marshall (2014) finds that highschool education predominantly works through tax policy preferences.

    17

  • it is particularly strong at the university rather than high school level. Rather than supporting right-

    wing parties, this impetus generally pushes voters toward left-wing parties who are more likely

    to support post-materialist and socially liberal policies (e.g. Heath et al. 1985; Inglehart 1981).

    In the United Kingdom, the Labour and Liberal Democrat parties are regarded as more socially

    progressive.

    Given the formative role of education, there are many other channels through which schooling

    could affect political behavior.16 This paper does not seek to disentangle the mechanisms underpin-

    ning the relationship, but rather to demonstrate that high school education has important political

    implications for a large proportion of voters. Identifying schooling’s causal effects is challenging

    because which individuals receive greater education is very unlikely to be random, even after var-

    ious observables are controlled for or matched upon (e.g. Kam and Palmer 2008). I use Britain’s

    compulsory schooling reforms as in instrument for schooling to identify high school’s political ef-

    fects. Britain represents a particularly important case because, unlike the U.S., the reforms affected

    a substantial proportion of the population. With a large proportion of compliers, the estimates for

    compliers approach the population average treatment effect (see Oreopoulos 2006).

    4.1 Compulsory schooling laws in Britain

    Great Britain’s education laws define the maximum age by which students must start school and the

    minimum age at which students can leave school. To identify the effect of high school education,

    I exploit two landmark reforms of the minimum leaving age that came into force in 1947 and

    1972. First, Winston Churchill’s wartime coalition government passed the Education Act 1944,

    which increased the leaving age from 14 to 15 in England and Wales. The Education (Scotland)

    Act 1945 enacted the same reform in Scotland. The new leaving age, which had repeatedly failed

    to pass in the 1920s and 1930s due to financial constraints (Gillard 2011), came into force 1st16For example, education could alter the political composition of social networks (Abrams,

    Iversen and Soskice 2010), induce politically-biased participation, or teaching could instill dif-ferent values and norms (Bowles and Gintis 1976).

    18

    http://www.legislation.gov.uk/ukpga/1944/31/pdfs/ukpga_19440031_en.pdfhttp://www.legislation.gov.uk/ukpga/1945/37/pdfs/ukpga_19450037_en.pdfhttp://www.legislation.gov.uk/ukpga/1945/37/pdfs/ukpga_19450037_en.pdf

  • April 1947 after several years of intensive preparation. Second, Parliament passed the Education

    Act 1962 raising the school leaving age to 16, although it was Conservative Edward Heath who

    finalized the extension under Statutory Instrument 444 (1972). Like the 1947 reform, Labour had

    consistently pushed for the increase,17 while education was widely seen as an economically and

    socially beneficial investment at the time (Woodin, McCulloch and Cowan 2013). This second

    reform came into force in England, Scotland and Wales on 1st September 1972. Northern Ireland,

    which experienced different education reforms (Oreopoulos 2006), is excluded from the analysis.

    The reforms substantially altered the education profile of Britain’s students. As Figure 2 shows,

    relative to the immediately prior academic cohorts, both reforms induced a large fraction of stu-

    dents to remain in school for an additional year. Unlike compulsory schooling reforms in Canada

    and the U.S., which affected a small and somewhat idiosyncratic set of students (Clark and Royer

    2013; Goldin and Katz 2008; Oreopoulos 2006), Britain’s reforms affected a large proportion of

    the population. Almost half of students remained in school one year longer following the 1947 re-

    form, while a quarter were remained in school because of the 1972 reform. While the 1947 reform

    also increased the proportion staying in school until 16, the 1972 reform did not affect schooling

    beyond the high school level.

    Although the number of students in school rose considerably, the education system itself did

    not greatly change. Prior to the 1947 reform, the government had engaged in a major expansion

    effort to increase the number of teachers, buildings and classroom materials. In both cases, the

    additional year of schooling was primarily intended to ensure students grasped all the material

    they had previously been taught (see Clark and Royer 2013).

    Britain’s education reforms have proved popular instruments among labor economists. The

    discontinuities in schooling laws have been used to identify positive effects of an additional year

    of schooling on income (Devereux and Hart 2010; Grenet 2013; Harmon and Walker 1995; Ore-

    17Under Labour Prime Minister Gordon Brown, Parliament passed the Education and Skills Act2008, raising the education leaving to 18 by 2015.

    19

    http://www.educationengland.org.uk/documents/pdfs/1962-education-act.pdfhttp://www.educationengland.org.uk/documents/pdfs/1962-education-act.pdfhttp://www.legislation.gov.uk/uksi/1972/444/pdfs/uksi_19720444_en.pdfhttp://www.legislation.gov.uk/ukpga/2008/25/pdfs/ukpga_20080025_en.pdfhttp://www.legislation.gov.uk/ukpga/2008/25/pdfs/ukpga_20080025_en.pdf

  • 0.2

    .4.6

    .8

    Pro

    port

    ion

    leav

    ing

    scho

    ol

    1940 1950 1960 1970

    Cohort: year aged 14

    Leave before 15 Leave before 16

    1947 reform

    0.2

    .4.6

    .8

    Pro

    port

    ion

    leav

    ing

    scho

    ol

    1950 1960 1970 1980 1990 2000

    Cohort: year aged 15

    Leave before 16 Leave before 17

    1972 reform

    Figure 2: Compulsory schooling reforms and staying in school by cohort

    Notes: Data based on the Labour Force Survey data used in the empirical analysis below. Black lines representfractional polynomial regression line fits. Grey dots are birth-year cohort averages.

    opoulos 2006), and also used to demonstrate that schooling does not affect mortality rates (Clark

    and Royer 2013).18 However, the potential political effects of these reforms have not received

    attention.18There also exists a large literature exploring the economics effects of U.S. compulsory school-

    ing reforms (see Acemoglu and Angrist 2000; Angrist and Krueger 1991; Goldin and Katz 2008).These studies differ in that they exploit cross-state differences using difference-in-differences typestrategies.

    20

  • 4.2 Data

    In order to test the political implications of these reforms, I use the British Social Attitudes Survey

    (BSAS). The BSAS, which randomly samples a nationally-representative cross-section of adults

    (aged 18 or above) with postal addresses in Great Britain, has been conducted in the summer of

    every year since 1983 except in 1988 and 1992. In ten of the 28 available surveys,19 respondents

    were asked which party they voted for in the most recent general election. In the sample used in

    this analysis, 34% of respondents reported voting Conservative, while 45% and 16% respectively

    voted Labour and Liberal.20 Given the theoretical claims outlined above, the analysis focuses on a

    dummy for voting Conservative as the main dependent variable.

    I operationalize whether a student is affected by the reform by coding indicators—1(CSLc =

    15) = 1(birth year+14∈ [1947,1972]) and 1(CSLc = 16) = 1(birth year+15≥ 1972)—for the

    minimum schooling leaving age affecting individuals in cohort c. The residual category is below

    15. Although month of birth is not available in the BSAS, respondents can be mapped on the basis

    of their year of birth (determined by age in years at the date of the survey).21 Whether an individual

    was affected by the reform is thus assigned by academic cohort, defined by the year aged 14 and

    15, such that 1(CSLc = 15) = 1 for those aged 14 in any between 1947 and 1972.22

    However, the BSAS measures of education are problematic. Educational attainment is mea-

    19These surveys were conducted in: 1987, 1994, 1995, 1996, 1999, 2001, 2003, 2005, 2008 and2010.

    20The Conservative vote share, the main dependent variable in this paper, pretty accurately re-flects the survey-weighted average of 36% of votes received by the Conservatives across the periodunder study. The difference is even smaller in the raw data; as explained below, the TS2SLS ap-proach necessitates removing some observations.

    21Our estimates of the effects of the reforms on schooling outcomes are very similar to Clarkand Royer (2013), who can perfectly assign the instruments using month of birth data. This, incombination with the clear graphical discontinuities shown below, strongly suggests that lackingmonth of birth is not significantly affecting the results.

    22Scottish students faced a weaker law between 1972 and 1976, they are coded identically toEngland/Wales as a similarly large drop in the proportion leaving occurred. Results are robust toexcluding Scottish students aged 15, 1972-76.

    21

  • sured using six categories, ranging from no qualification to university degree.23 Completing high

    school is captured by the second lowest category, which specifies that a respondent has a certifi-

    cate of secondary education (CSE) or equivalent. At the end of high school (at age 15 or 16), or a

    student’s 11th year of formal schooling, students take CSE exams in a variety of subjects. Given

    only 2-3% of students fail any particular CSE exam, obtaining a CSE is an almost perfect proxy for

    completing high school. An indicator measuring this is used to examine the results when schooling

    is dichotomized at a theoretically appealing point. Although the BSAS also asks respondents what

    age they left school, nearly half of the surveys did not allow respondents to answer that they left

    school below age 15, and thus cannot differentiate the effect of the 1947 reform from the number

    of years of schooling.24

    Using only the BSAS sample to identify the effect of years of schooling would require either

    coarsening the treatment or substantially reducing the sample size. However, collecting a second

    sample containing common basic demographic variables and the age at which an individual left

    school can solve this problem. Accordingly, I use Labour Force Survey (LFS) data—an annual

    and more recently quarterly household survey—from each year in which an election occurred to

    collect a pooled sample of 747,851 voting age respondents.25 Years of schooling is defined by

    the age left a respondent left continuous full time education minus five, and an upper bound of 13

    years of state-supported education is applied.26 Before 2003, the LFS collected both month and

    23Respondents with foreign qualifications were excluded.24This bottom coding is clearly still relevant in the twenty-first century because many of those

    aged 14 or above in 1947 are still alive. Nevertheless, similar estimates are obtained when restrict-ing the analysis to the years for which age left school could be used as the endogenous variable.For many studies, however, the loss of precision necessitates using a separate sample.

    25Only the July-September sample was used since the LFS became quarterly to avoid replication,given that respondents are then surveyed for five consecutive quarters, and to approximate the pe-riod when the BSAS survey was conducted. Observations from Northern Ireland and respondentsbelow the age of 18 were excluded to match the BES sample.

    26After age 18, continuing students bifurcate into university or vocational programs. Given thedifficulties of classifying these programs, the upper bound on state-supported schooling is mostappropriate. Since the CSL reforms did not affect higher education, this choice is inconsequentialfor the results.

    22

  • year of birth, and therefore permitted perfect instrument assignment; since 2003, the instruments

    were assigned as in the BSAS.

    The two-sample approach is only valid if both samples randomly draw from the same pop-

    ulation. Given that the BSAS and LFS are random samples from the population of those with

    available addresses,27 both samples are drawn from essentially identical populations. Neverthe-

    less, imbalances could remain due to chance, different survey sizes and any differential response

    characteristics. To redress the concern that the TS2SLS assumptions are not satisfied, I then chose

    a random subsample of the LFS sample to match the BSAS sample distribution in terms of year

    of birth, gender, ethnicity, and survey year by randomly choosing observations from within these

    blocks.28 This reduced the final sample size to 47,552.29 The summary statistics in Table 1 show

    that the first and second moments on the common variables match very well. In combination with

    the random sampling from the same adult population, both samples effectively draw from the same

    population.

    4.3 Empirical strategy

    To identify the effect of late high school education on political preferences, I use Britain’s com-

    pulsory schooling reforms as instruments for the level of schooling an individual receives. These

    reforms have been widely used as instruments, most convincingly in regression discontinuity (RD)

    designs (see Clark and Royer 2013), because of the sharp change in educational attainment across

    27More precisely, the BSAS uses a multi-stage design where Britain is broken up into sectorsdefined by postcode, from which households are randomly chosen from the address book. Re-spondents aged 18 or above within a household are then randomly chosen. The LFS became anunclustered (“simple”) random sample from the address roll since 1992, having earlier employeda clustered approach from the Valuation Roll and Post Office Address File.

    28Due to a lack of observations in the LFS, the final samples used for both datasets excludedrespondents aged above 74 and those born before 1922 or after 1987.

    29Where sample size concerns are more salient (the first stage is very strong here), anotheroption would be to weight observations in the first stage sample to replicate the reduced formsample distribution. Such a procedure is likely to be more efficient.

    23

  • Tabl

    e1:

    Sum

    mar

    yst

    atis

    tics:

    BSA

    San

    dL

    FSsa

    mpl

    es

    BSA

    SLF

    SO

    bs.

    Mea

    nSt

    d.de

    v.M

    in.

    Max

    .O

    bs.

    Mea

    nSt

    d.de

    v.M

    in.

    Max

    .

    Dep

    ende

    ntva

    riab

    leC

    onse

    rvat

    ive

    vote

    15,9

    340.

    340.

    470

    1

    End

    ogen

    ous

    vari

    able

    sSc

    hool

    ing

    47,5

    5211

    .14

    1.42

    013

    Hig

    hsc

    hool

    15,9

    340.

    730.

    440

    1

    Exc

    lude

    din

    stru

    men

    tsC

    SL=1

    515

    ,934

    0.50

    0.50

    01

    47,5

    520.

    500.

    500

    1C

    SL=1

    615

    ,934

    0.39

    0.49

    01

    47,5

    520.

    390.

    490

    1

    Pre

    -tre

    atm

    entc

    ovar

    iate

    sB

    irth

    year

    15,9

    3419

    51.9

    014

    .68

    1922

    1987

    47,5

    5219

    61.6

    614

    .64

    1922

    1987

    Age

    15,9

    3447

    .12

    14.0

    718

    7347

    ,552

    46.9

    614

    .38

    1873

    Mal

    e15

    ,934

    0.44

    0.50

    01

    47,5

    520.

    450.

    500

    1W

    hite

    15,9

    340.

    950.

    210

    147

    ,552

    0.95

    0.22

    01

    Asi

    an15

    ,934

    0.02

    0.15

    01

    47,5

    520.

    020.

    150

    1B

    lack

    15,9

    340.

    020.

    130

    147

    ,552

    0.02

    0.13

    01

    Surv

    ey15

    ,934

    1999

    .02

    5.91

    1987

    2010

    47,5

    5219

    98.9

    06.

    3319

    8720

    10

    24

  • cohorts. Although further-apart cohorts could differ systematically, it is hard to see why cohorts

    born just before and just after the reform would systematically differ in their political preferences.

    Accordingly, this study also employs an RD design where the running variable determining the

    treatment is birth year cohort.

    The key RD identifying assumption is that partisan preferences are continuous in all covari-

    ates other than school leaving age at the reform discontinuity. Given the difficulty of identifying

    education’s causal effects using observational data, the RD strategy’s weak assumptions are par-

    ticularly appealing. The greatest issue for RD designs is the “sorting” concern that another key

    variable simultaneously changes at the discontinuity. Given that cultural shifts are very unlikely to

    have affected 15 year olds but not 14 year olds, the most plausible concerns relate to demographic,

    socio-economic and labor market characteristics. Figure 3 shows that trends in various proxies for

    these variables are essentially continuous through both discontinuities.

    I first estimate the effects, δ1 and δ2, of the schooling reforms themselves. This entails estimat-

    ing reduced form OLS regressions of the following form in the BSAS sample:

    Yict = δ11(CSLc = 15)+ δ21(CSLc = 16)+ f (birth yearc)+Witγ +ηt + εit , (16)

    where 1(CSLc < 15) is the residual category, Wit includes a gender dummy, standardized age

    polynomials,30 and dummies for white, black and (south and east) Asian ethnicities, and ηt is

    a survey fixed effect. The dependent variable Yict is voting Conservative. f is a flexible global

    polynomial function of the running variable designed to capture general trends away from the

    reform discontinuities.31 I estimate a variety of specifications for f , ranging from including no

    birth year trends to fifth-order polynomial trends to demonstrate the robustness of the relationships.

    All specifications report standard errors clustered by cohort.

    30For simplicity, the age polynomials are assigned the same polynomial order as f .31To fully assess the implications of dummying-out high school, it is necessary to include both

    reforms in the same specification. Consequently, it is imperative to show that the results are robustto highly flexible global polynomial trends.

    25

  • .4.5

    .6.7

    .8P

    ropo

    rtio

    n

    1940 1960 1980 2000Cohort: year aged 14

    Panel A: Male

    .85

    .9.9

    51

    Pro

    port

    ion

    1940 1960 1980 2000Cohort: year aged 14

    Panel B: White

    .3.4

    .5.6

    .7.8

    Pro

    port

    ion

    1940 1960 1980 2000Cohort: year aged 14

    Panel C: Father manual or unskilled worker

    .2.3

    .4.5

    .6.7

    Pro

    port

    ion

    1940 1960 1980 2000Cohort: year aged 14

    Panel D: Father voted Conservative

    02

    46

    810

    Rat

    e (%

    )

    1940 1960 1980 2000Year

    Panel E: Unemployment

    025

    5075

    100

    Inde

    x (2

    000=

    100)

    1940 1960 1980 2000Year

    Panel F: Average annual earnings

    Figure 3: Trends in demographic, socio-economic and labor market demographic variables

    Notes: The data in Panels A and B is from the LFS. The data in Panels C and D is from the British Election Survey1979-2010 (because such variables were not widely available in the BSAS), which is used as a robustness checkbelow. The data in Panels E and F is from the Bank of England “UK Economic Data 1700-2009” dataset.

    26

  • The principal quantity of interest in this paper is the effect of schooling. To estimate the effects

    of different measures of schooling, Si, I use Britain’s reform cutoffs as instruments. Since the

    reforms do not perfectly determine an individual’s level of schooling, the assignment of Si is fuzzy.

    We thus employ a fuzzy RD approach; like standard IV approaches, this additionally requires

    monotonicity and the exclusion restriction to hold. Given the large increase in school attendance

    following each reform, and the fact that very few students failed to comply with the new leaving

    ages, monotonicity is strongly supported. Given the close proximity of the reforms to schooling

    choices, there is very limited scope for the reforms to violate the exclusion restriction by affecting

    an individual’s political preferences through other channels.

    The fuzzy RD entails estimating the following structural equation:

    Yict = βSi + f (birth yearc)+Wiϕ +ηt + εict , (17)

    where Si will be either a dummy for completing high school, years of schooling, or two dummy

    variables for staying in school for 10 or above 10 years. The first stage regression generating

    variation in Si is given by:

    Si = α11(CSLc = 15)+α21(CSLc = 16)+ f (birth yearc)+Wiψ +ηt + εict . (18)

    A strong first stage, which is required to minimize the bias of IV estimates in finite samples (Bound,

    Jaeger and Baker 1995; Staiger and Stock 1997), implies that α1 and α2 are significantly different

    from zero.

    In the case of the dummy for completing high school, equation (17) can simply be estimated

    with 2SLS using only BSAS data. Given that years of schooling comes from the LFS, the effects

    of years of schooling are instead estimated using TS2SLS where the LFS first stage and BES

    reduced form are efficiently combined as above with cohort-clustered standard errors computed as

    in Proposition 4.

    27

  • 4.4 Results

    4.4.1 The effect of compulsory schooling reforms on schooling and political preferences

    Figures 4 and 5 plot the first stage and reduced form graphically. The left hand graph in Figure 4

    shows a large increase in the average number of years of schooling per cohort following the 1947

    reform. This reflects the 40% of students which stayed in school for another year shown in Figure

    2. The right-hand graph shows that the 1972 reform also substantially increased the average years

    of schooling, although the magnitude of the reform was much smaller. This reflects the fact that

    by 1972 students were generally remaining in school longer.

    Although the cohort averages are noisier, the reduced form plots in Figure 5 suggest that around

    the reforms voters differ systematically in their political preferences. Especially following the

    1947 reform, there is an upward shift in support for the Conservative party by cohort. The graphs

    indicate that cohorts affected by the reform are approximately three percentage points more Con-

    servative.32 Given that the 1972 reform affected fewer students, the difference at the discontinuity

    is smaller. Although the difference is less clear, the chart also suggests an increase in support for

    the Conservatives. The fact that both reforms reverse the trend against the Conservatives—which

    is a function of both declining support over time (in the surveys used here) and younger voters

    being more left-wing—further suggests that the posited relationship is not being driven by cohort

    trends.

    Table 2 presents the reduced form and first stage estimates using a simple linear cohort trend.

    Although Figures 4 and 5 indicate that trends in both years of schooling and Conservative support

    are approximately linear,33 the results—as will be demonstrated below—are not sensitive to this

    choice. The first column provides the reduced form estimates, the second column estimates the

    first stage in the BSAS sample, and the third column estimate the years of schooling first stage in

    32A linear trend, which fits similarly well, indicates an even larger five percentage point effect.33Note that there is very little data for cohorts born in the 1930s.

    28

  • 99.

    510

    10.5

    1111

    .512

    Yea

    rs o

    f sch

    oolin

    g

    1940 1950 1960 1970

    Cohort: year aged 14

    1947 reform

    99.

    510

    10.5

    1111

    .512

    Yea

    rs o

    f sch

    oolin

    g

    1950 1960 1970 1980 1990 2000

    Cohort: year aged 15

    1972 reform

    Figure 4: Average years of schooling by birth year cohort (LFS data)

    Notes: Black lines represent fractional polynomial regression line fits. Grey dots are cohort averages.

    the LFS sample.

    The reduced form shows that the reforms induced a large and statistically significant increase

    in support for the Conservative party. Cohorts affected by the 1947 are six percentage points more

    likely to vote Conservative, while the 1972 reform—which affected fewer students—increased

    Conservative voting by a further 2.5 percentage points. Such large shifts for affected cohorts

    imply that the reforms substantially altered national politics, and could easily have altered the

    outcomes of the close 1970s and 2000s elections. Figure 6 demonstrates that the reduced form

    coefficients are consistent across specifications using higher-order polynomial terms to account for

    29

  • .1.2

    .3.4

    .5

    Vot

    e sh

    are

    1940 1950 1960 1970

    Cohort: year aged 14

    1947 reform

    .1.2

    .3.4

    .5

    Vot

    e sh

    are

    1950 1960 1970 1980 1990 2000

    Cohort: year aged 15

    1972 reform

    Figure 5: Proportion conservative by birth year cohort (BSAS data)

    Notes: Black lines represent fractional polynomial regression line fits. Grey dots are cohort averages.

    more complex trends in Conservative support. However, by averaging across all individuals, these

    estimates underestimate the impact on individuals who only remained in school because of the

    reforms. To calculate the effects for such compliers, I turn to the fuzzy RD estimates.

    4.4.2 The effect of schooling on political preferences

    The first stage estimates confirm that both reforms substantially increased schooling. Looking at

    the dummy for completing high school in the BSAS sample, column (2) shows that both the 1947

    and 1972 reforms significantly increased the probability of completing high school. Column (3)

    30

  • -.05

    0.0

    5.1

    .15

    .2

    CSL=15 CSL=16

    Linear-.

    050

    .05

    .1.1

    5.2

    CSL=15 CSL=16

    Quadratic

    -.05

    0.0

    5.1

    .15

    .2

    CSL=15 CSL=16

    Cubic

    -.05

    0.0

    5.1

    .15

    .2

    CSL=15 CSL=16

    Quartic

    -.05

    0.0

    5.1

    .15

    .2

    CSL=15 CSL=16

    Quintic

    Figure 6: Reduced form estimates using higher-order polynomial controls

    Notes: Higher-order polynomial specifications include standardized global birth year trends of order p and stan-dardized age trends of order p (excluding linear age because it is perfectly collinear with linear birth year).

    31

  • Table 2: The effect of CSLs on schooling and voting for the Conservative party

    Vote Con High school SchoolingOLS OLS OLS(1) (2) (3)

    1947 reform 0.061*** 0.176*** 0.384***(0.020) (0.024) (0.036)

    1972 reform 0.085*** 0.275*** 0.604***(0.033) (0.043) (0.074)

    Sample BSAS BSAS LFSObservations 15,934 15,934 47,552First stage F test 27.6 56.9

    Notes: All specifications include a linear birth year term, male, white, black and south Asian dummies, andsurvey year fixed effects. Standard errors clustered by cohort. * denotes p < 0.1, ** denotes p < 0.05, ***denotes p < 0.01.

    instead examines years of schooling in the LFS, and similarly shows that the 1947 reform was

    especially effective at keeping students in school. In both cases, the large F statistic—testing the

    relevance of the reform dummies—indicates a strong first stage. Table 3 presents the fuzzy RD

    results, instrumenting for schooling with the compulsory schooling reforms.

    I first examine the 2SLS estimates where schooling is discretized. Column (1) shows the Wald

    estimate, and suggest that voters induced to complete high school by the reform are 33 percentage

    points more likely to vote Conservative in later life. The estimated effect is very large by almost any

    standard, but particularly when considering that a large segment of the population are compliers.

    This estimate, however, could suffer from the bias established above: given Table 2 showed a

    significant reduced form effect for the 1947 reform, but the reform did not compel all students to

    complete high school, there is clear scope for upward bias. This concern is even more evident in

    column (2), which uses only the 1947 reform as an instrument (removing those born after 1972). In

    this specification—where the bias is expected to be largest, given that the 1947 caused a significant

    32

  • Table 3: The effect of schooling on voting

    Con Con Con Con Labour Liberal2SLS 2SLS TS2SLS TS2SLS TS2SLS TS2SLS

    (1) (2) (3) (4) (5) (6)

    Completed high school 0.332** 0.885***(0.132) (0.311)

    10 years of schooling 0.124***(0.042)

    11 or more years of schooling 0.237**(0.111)

    Years of schooling 0.152*** -0.047 -0.081**(0.054) (0.046) (0.031)

    First stage sample BSAS BSAS LFS LFSReduced form observations 15,934 9,783 15,934 15,934 15,934 15,934First stage observations 15,934 9,783 47,552 47,552 47,552 47,552First stage F test 27.6 22.3 56.9 56.9 56.9 56.9

    Notes: In each specification, the variables listed on the left side of the table are instrumented for by the indicatorsfor the 1947 and 1972 reforms. All specifications include a linear birth year term, male, white, black and Asian(south and east combined) dummies, and survey year fixed effects. Specification (2) excludes respondents affectedby the 1972 reform. While specifications (1)-(4) have Conservative vote as dependent variable, the dependentvariable in specifications (5) and (6) respectively is Labour and Liberal vote. Standard errors clustered by cohort.* denotes p < 0.1, ** denotes p < 0.05, *** denotes p < 0.01.

    33

  • proportion of student to also complete high school—the 2SLS estimates imply an implausibly large

    89 percentage point increase in the probability of voting Conservative.

    The presence of two instruments permits a more precise exploration of any bias. Column (3)

    uses the 1947 and 1972 reforms to instrument for indicators for completing ten years of schooling

    or 11 or more years of schooling. Given that the reforms did not affect attaining nine or fewer years

    of schooling, or more than 11 years of schooling, the coefficients in column (3) non-parametrically

    estimate the effect of an additional year of late high school. For both years, an additional year

    equates to a LATE of 12 percentage points. This shows that, at least at the end of high school, the

    effect of schooling is almost exactly linear. Unsurprisingly, the WAPTE estimate in column (4)

    shows a similar effect for an additional year of schooling.34 Figure 7 again demonstrates that the

    results are not being driven by linear cohort trends, and are in fact highly stable.

    The non-parametric and linear results suggest that the dummy for completing high school sub-

    stantially overstates the political effect of the final year of high school. The bias more than doubles

    the true LATE for the final year of school when examining both instruments, but increases sixfold

    when focusing only on the 1947 reform. While these results are clearly biased in terms of magni-

    tude, our more careful analysis nevertheless shows that late high school causes voters to become

    substantially more conservative in later life. Reinforcing results from the U.S. (Marshall 2014), this

    evidence is consistent with schooling’s economic effects dominating any effects working through

    socially liberal attitudes.

    Given Britain has had three main political throughout the survey period analyzed here, it is

    not obvious which party primarily loses votes to the Conservatives. Specifications (5) and (6)

    respectively use Labour and Liberal vote indicators as dependent variables, and show that school-

    ing decreases the probability of voting for both parties. The reduction is largest, and statistically

    significant, for the Liberal Democrats.

    34The point estimate differs because the first stage for other levels of schooling is not exactlyzero.

    34

  • Quintic

    Quartic

    Cubic

    Quadratic

    Linear

    0 .1 .2 .3 .4

    Marginal effect of years of schooling

    Figure 7: TS2SLS estimates using higher-order polynomial controls

    Notes: Higher-order polynomial specifications include standardized global birth year trends of order p and stan-dardized age trends of order p (excluding linear age because it is perfectly collinear with linear birth year).

    4.4.3 Robustness checks

    Beyond the polynomial cohort trends, I now show that the reduced form and TS2SLS estimates

    are highly robust to a variety of potential threats to the identification assumptions. All robustness

    checks are reported in Table 4.

    Although Figure 3 above showed that trends in plausible confounders are continuous through

    the 1947 and 1972 reform discontinuities, I also control for the unemployment rate and average

    earnings in column (1) of Table 4 and find that the effect if anything increases. To more thoroughly

    demonstrate that age is not driving the results, column (2) shows that the results are robust to

    including age fixed effects.

    I also employ several out-of-sample checks. First, column (3) in Table similarly shows that an

    additional year of late high school increases the likelihood of identifying as a Conservative parti-

    san by 12 percentage points. This shows that survey respondents are responding consistently when

    35

  • Table 4: Robustness checks

    Controls Age dummies Partisan BES vote BES partisan(1) (2) (3) (4) (5)

    Panel A: Reduced form estimates1947 reform 0.084*** 0.059** 0.045** 0.072*** 0.067***

    (0.023) (0.022) (0.023) (0.014) (0.012)1972 reform 0.110*** 0.080*** 0.067** 0.082*** 0.086***

    (0.031) (0.033) (0.033) (0.023) (0.024)

    Panel B: TS2SLS estimatesYears of schooling 0.223*** 0.153** 0.115** 0.148*** 0.133***

    (0.072) (0.061) (0.057) (0.029) (0.025)First stage F test 28.9 67.1 56.9 98.9 98.9

    Reduced form observations 15,934 15,934 15,934 14,105 13,765First stage observations 47,552 47,552 47,552 49,016 49,016

    Notes:All specifications include a linear birth year term, male, white, black and Asian (south and east combined)dummies, and survey year fixed effects. Specification (1) includes the national unemployment rate and averageearnings index at age 14 as controls. Specification (2) includes a full set of age dummies. Specification (3) takesConservative partisanship is an indicator dependent variable. Specifications (4) and (5) use the BES data withConservative voting and partisanship as dependent variables; a different LFS sample is used to match the BESdistribution. Standard errors clustered by cohort. * denotes p < 0.1, ** denotes p < 0.05, *** denotes p < 0.01.

    36

  • asked about their political preferences. Second, very similar results are obtained when linking the

    British Election Study (BES) with an LFS first stage.35 In terms of both voting and partisan iden-

    tification, columns (4) and (5) clearly show a substantively similar increase Conservative political

    preference.36

    5 Conclusion

    This article addresses an important issue frequently faced by empirical researchers using instru-

    mental variable techniques: good (or any) measures of both the outcome and treatment of interest

    may not be available in the same dataset. While lacking the outcome or treatment variable en-

    tirely may force researchers to abandon their project, using a coarsened measure of a multi-valued

    treatment intensity can substantially bias estimates. As estimates of the effect of high school on

    political preferences demonstrated, this bias is especially large when the causal response function

    is not discontinuous and the instrument induces different respondents to achieve different treatment

    intensities.

    Two-sample IV methods can solve these missing data problems. Two samples can be combined—

    if they draw from the same population—even when the treatment is not measured in the same

    dataset as the outcome. This allows researchers to estimate quantities that a single dataset could

    not, but can also provide consistent estimates of quantities that might otherwise be substantially

    biased. In this article, I outlined a general approach to implementing two-sample methods. In

    particular, I highlight the assumptions required to ensure the consistency of the TS2SLS estimator

    as well as a range of formulas for calculating standard errors that adjust for the uncertainty of the

    first stage estimation.

    35The first stage sample differs from that used with the BSAS data to better match the BESsample. In particular, the LFS first stage sample draws only upon LFS samples from the relevantelection years and matches the BES sample characteristics.

    36There is a similarly large bias when using the dummy for completing high school in the BESdata.

    37

  • These two-sample methods are applied to the question of how education affects political pref-

    erences. More specifically, I show that an additional year of late high school significantly increases

    downstream support for Britain’s Conservative party. Exploiting two major educational reforms,

    the fuzzy regression discontinuity estimates indicate that an additional year of schooling cause a 15

    percentage point increase in the probability of voting Conservative later in life. These large effects

    are “local” in that they only apply to students that would not have remained in school without the

    reforms—albeit a large proportion of the population—and are specific to late high school. While

    Marshall (2014) provides clear evidence of an income mechanism in the U.S., this important re-

    lationship requires further research. It is also possible that university education instills liberal

    attitudes that counteract schooling’s effects.

    38

  • Appendix

    Proof of Proposition 1. Angrist and Imbens (1995) prove that the exclusion restriction and mono-

    tonicity yield equation (3). Recognizing βk = E[Yik −Yik−1|Si1 ≥ k > Si0] yields equation (3).

    Because pit ≥ 0, sign(βk) = sign(E[Yit−Yit−1|Si1 ≥ t > Si0]),∀t 6= k where pit > 0 ensures |βk| ≤

    |βWk |. �

    Proof of Proposition 2. Note βW ,JWAPT E =∑Jt=1 pitβ

    Jt

    ∑Jt=1 pit= τ and βW ,αJWAPT E =

    ∑αJt=1 pitβαJt

    ∑αJt=1 pit= τ/α , where the

    linearity of the causal effect at each intensity interval implies αβ αJt = β Jt . The result follows. �

    Proof of Proposition 3. Substituting for Y1 yields:

    β̂ T S2SLS = (X̂ ′1X̂1)−1X̂ ′1X1β +(X̂

    ′1X̂1)

    −1X̂ ′1u1. (19)

    Dividing top and bottom of each term by n1, taking the probability limit and applying Slutsky’s

    theorem yields:

    plimn1→∞

    β̂ T S2SLS =(

    plimn1→∞

    1n1

    X̂ ′1X̂1

    )−1(plimn1→∞

    1n1

    X̂ ′1X1

    )β +

    (plimn1→∞

    1n1

    X̂ ′1X̂1

    )−1(plimn1→∞

    1n1

    X̂ ′1u1

    ). (20)

    To prove consistency we require i) the first term to equal β and ii) second term to be 0.

    i). First note that Slutsky’s theorem implies:

    plimn1→∞

    1n1

    X̂ ′1X̂1 = plimn1→∞

    (1n1

    X ′2Z2(Z′2Z2)

    −1Z′1Z1(Z′2Z2)

    −1Z′2X2

    )(21)

    =

    (plimn1→∞

    1n1

    X ′2Z2

    )(plimn1→∞

    1n1

    Z′2Z2

    )−1(plimn1→∞

    1n1

    Z′1Z1

    )×(

    plimn1→∞

    1n1

    Z′2Z2

    )−1(plimn1→∞

    1n1

    Z′2X2

    ). (22)

    39

  • Applying the weak law of large numbers and then Assumptions 5(a) and 5(b) yields:

    plimn1→∞

    1n1

    X̂ ′1X̂1 = E[X′2Z2]E[Z

    ′2Z2]

    −1E[Z′1Z1]E[Z′2Z2]

    −1E[Z′2X2] (23)

    = E[X ′2Z2]E[Z′2Z2]

    −1E[Z′2X2] (24)

    = E[X ′2Z2]E[Z′2Z2]

    −1E[Z′1X1]. (25)

    Similarly,

    plimn1→∞

    1n1

    X̂ ′1X1 =(

    plimn1→∞

    1n1

    X ′2Z2

    )(plimn1→∞

    1n1

    Z′2Z2

    )−1(plimn1→∞

    1n1

    Z′1X1

    )(26)

    = E[X ′2Z2]E[Z′2Z2]

    −1E[Z′1X1] (27)

    = plimn1→∞

    1n1

    X̂ ′1X̂1. (28)

    Given the rank condition in Assumption 4(a), this proves part i).

    ii). Substituting out and applying the weak law of large numbers gives:

    (plimn1→∞

    1n1

    X̂ ′1X̂1

    )−1(plimn1→∞

    1n1

    X̂ ′1u1

    )=

    (plimn1→∞

    1n1

    X̂ ′1X̂1

    )−1(plimn1→∞

    1n1

    X ′2Z2

    )(29)

    ×(

    plimn1→∞

    1n1

    Z′2Z2

    )−1(plimn1→∞

    1n1

    Z′1u1

    )=

    (E[X ′2Z2]E[Z

    ′2Z2]

    −1E[Z′1X1])−1

    (30)

    ×E[X ′2Z2]E[Z′2Z2]−1E[Z′1u1]

    = 0, (31)

    where the final line follows from Assumption 3, as well as the full rank and finite moment assump-

    tions. �

    40

  • Proof of Proposition 4. Start by separating X̂ into its endogenous and exogenous components,

    Yi1 = Xi1β−S +Ti1βS + ui = Xi1β−S + T̂i1βS +[Ti1− T̂i1]+ ui, (32)

    where T̂i1 = Zi1Π̂ = Zi1(Z′2Z2)−1Z′2T2 is the predicted value of the treatment using the first stage

    estimates, and Ti1 is the true and unobserved treatment in sample 1. An OLS regression would

    yield:

    √n1

    β̂−T −β−Sβ̂S−βS

    = ( 1n1 X̂ ′1X̂1)−1 1√

    n1X̂ ′1u1 +

    (1n1

    X̂ ′1X̂1

    )−1 1√

    n1X̂ ′1[Ti1− T̂i1]βS, (33)

    where subscripts i and superscripts T S2SLS are omitted to save space. Using the expansion result

    in Murphy and Topel (1985: 374) yields:

    √n1(β̂ −β ) ≡

    √n1

    β̂−T −β−Sβ̂S−βS

    a= ( 1n1 X̂ ′1X̂1)−1 1√

    n1X̂ ′1u1

    +

    (1n1

    X̂ ′1X̂1

    )−1(n1n2

    )1/2 1n1

    X̂ ′1(β̂′T ⊗Z1)

    √n2(Π̂−Π), (34)

    where (β̂ ′T ⊗Z1) is the matrix of defined in equation (12) of Murphy and Topel (1985).

    Let Π̂ be a consistent estimator of the first stage for the endogenous variables, such that√

    n2(Π̂−Π)a∼ N(0,V(Π)). Using our consistent first stage estimate, the asymptotic variance

    is therefore given by:

    V(β̂ −β ) = E[X̂ ′1X̂1]−1[

    V[β ]+n1n2

    E[X̂ ′1(β̂′T ⊗Z1)]−1V[Π]E[(β̂ ′T ⊗Z1)′X̂1]−1

    ]E[X̂ ′1X̂1]

    −1, (35)

    where V[β ] is the variance of the naive TS2SLS estimator. (Note that E[X̂ ′1u1] = 0, in conjunction

    with a consistent first stage, implies the consistency of the estimator.)

    This establishes the general asymptotic variance formula in Proposition 4. We now apply the

    41

  • homoskedastic and cluster-robust error structures:

    1) Homoskedastic errors. Under homoskedasticity, the naive variance from the TS2SLS re-

    gression is simply σ2u (X̂ ′1X̂1)−1. To correct for the first stage estimation, we have:

    X̂ ′1(β̂′T ⊗Z1)V̂(Π̂)(β̂ ′T ⊗Z1)′X̂1 = X̂ ′1(β̂ ′T ⊗Z1)(Ω⊗ (Z′1Z1)−1)(β̂ ′T ⊗Z1)′X̂1 (36)

    = X̂ ′1(β̂′T Ωβ̂T ⊗Z1(Z′1Z1)−1Z′1)X̂1 (37)

    = β̂ ′T Ωβ̂T (X̂′1X̂1), (38)

    where the first line uses the definitions of homoskedasticity given in the proposition, the sec-

    ond line applies the mixed product property of Kronecker products, and the third line exploits

    Z1(Z′1Z1)−1Z′1X̂1 = X̂1 (because all exogenous variables are contained in both X̂1 and Z1) and the

    fact that β̂ ′T Ωβ̂ ′T is a scalar. Substituting into the general variance matrix yields the homoskedastic

    variance formula in Proposition 4.

    2) Clustered errors. In the clustered case, we simply let V(Π̂) = G2G2−1 Φ⊗E[Z′2Z2]

    −1. �

    42

  • References

    Abrams, Samuel, Torben Iversen and David Soskice. 2010. “Informal Social Networks and Ratio-

    nal Voting.” British Journal of Political Science 41:229–257.

    Acemoglu, Daron and Joshua D. Angrist. 2000. “How Large Are Human Capital Externalities?

    Evidence from Compulsory Schooling Laws.” NBER Macroeconomics Annual 2000 pp. 9–59.

    Angrist, Joshua D. and Alan B. Krueger. 1991. “Does Compulsory School Attendance Affect

    Schooling and Earnings?” Quarterly Journal of Economics 106(4):979–1014.

    Angrist, Joshua D. and Alan B. Krueger. 1992. “The Effect of Age at School Entry on Educational

    Attainment: An Application of Instrumental Variables with Moments from Two Samples.” Jour-

    nal of the American Statistical Association 87(418):328–336.

    Angrist, Joshua D. and Alan B. Krueger. 1995. “Split-sample instrumental variables estimates of

    the return to schooling.” Journal of Business and Economic Statistics 13(2):225–235.

    Angrist, Joshua D. and Guido W. Imbens. 1995. “Two-Stage Least Squares Estimation of Average

    Causal Effects in Models With Variable Treatment Intensity.” Journal of the American Statistical

    Association 90(430):431–442.

    Angrist, Joshua D., Guido W. Imbens and Donald B. Rubin. 1996. “Identification of Causal Effects

    Using Instrumental Variables.” Journal of the American Statistical Association 91(June):444–

    455.

    Angrist, Joshua D. and Jörn-Steffan Pischke. 2008. Mostly Harmless Econometrics: An Empiri-

    cist’s Companion. Princeton, NJ: Princeton University Press.

    Becker, Gary S. 1993. Human Capital: A Theoretical and Empirical Analysis, with Special Refer-

    ence to Education. University of Chicago Press.

    43

  • Bound, John, David A. Jaeger and Regina M. Baker. 1995. “Problems with instrumental vari-

    ables estimation when the correlation between the instruments and the endogenous explanatory

    variable is weak.” Journal of the American Statistical Association 90(430):443–450.

    Bowles, Samuel and Herbert Gintis. 1976. Schooling in Capitalist America: Educational reform

    and the Contradictions of Economic Life. Chicago, IL: Haymarket Books.

    Clark, Damon and Heather Royer. 2013. “The Effect of Education on Adult Mortality and Health:

    Evidence from Britain.” American Economic Review 103(6):2087–2120.

    Dee, Thomas S. 2004. “Are there civic returns to education?” Journal of Public Economics

    88:1697–1720.

    Devereux, Paul J. and Robert A. Hart. 2010. “Forced to be Rich? Returns to Compulsory Schooling

    in Britain.” Economic Journal 120:1345–1364.

    Gelman, Andrew, Park, Boris Shor, Joseph Bafumi and Jeronimo Cortina. 2010. Red State, Blue

    State, Rich State, Poor State: Why Americans Vote the Way They Do. Princeton, NJ: Princeton

    University Press.

    Gerber, Alan S., Gregory A. Huber, David Doherty, Conor M. Dowling and Shang E. Ha. 2010.

    “Personality and Political Attitudes: Relationships Across Issue Domains and Political Con-

    texts.” American Political Science Review 104(01):111–133.

    Gillard, Derek. 2011. “Education in England: A Brief History.” Web link.

    Goldin, Claudia D. and Lawrence F. Katz. 2008. The Race Between Education and Technology.

    Cambridge, MA: Harvard University Press.

    Grenet, Julien. 2013. “Is Extending Compulsory Schooling Alone Enough to Raise Earnings?

    Evidence from French and British Compulsory Schooling Laws.” Scandinavian Journal of Eco-

    nomics 115(1):176–210.

    44

    http://www.educationengland.org.uk/history/

  • Harmon, Colm and Ian Walker. 1995. “Estimates of the Economic Return to Schooling for the

    United Kingdom.” American Economic Review 85(5):1278–1286.

    Heath, Anthony, Roger Jowell, John Curtice, Julia Field and Clarissa Levine. 1985. How Britain

    Votes. Pergamon Press Oxford.

    Honaker, James and Gary King. 2010. “What to Do about Missing Values in Time-Series Cross-

    Section Data.” American Journal of Political Science 54(2):561–581.

    Imbens, Guido W. and Joshua D. Angrist. 1994. “Identification and Estimation of Local Average

    Treatment Effects.” Econometrica 62(2):467–475.

    Inglehart, Ronald. 1981. “Post-Materialism in an Environment of Insecurity.” American Political

    Science Review 75(4):880–900.

    Inoue, Atsushi and Gary Solon. 2005. “Two-Sample Instrumental Variables Estimators.”.

    Inoue, Atsushi and Gary Solon. 2010. “Two-Sample Instrumental Variables Estimators.” Review

    of Economics and Statistics 92(3):557–561.

    Iversen, Torben and David Soskice. 2001. “An Asset Theory of Social Policy Preferences.” Amer-

    ican Political Science Review 95(4):875–894.

    Kam, Cindy D. and Carl L. Palmer. 2008. “Reconsidering the Effects of Education on Political

    Participation.” The Journal of Politics 70(3):612–631.

    King, Gary, James Honaker, Anne Joseph and Kenneth Scheve. 2001. “Analyzing Incomplete

    Political Science Data: An Alternative Algorithm for Multiple Imputation.” American Political

    Science Review 95(1):49–69.

    Lochner, Lance and Enrico Moretti. 2004. “The Effect of Education on Crime: Evidence from

    Prison Inmates, Arrests, and Self-Reports.” American Economic Review 94(1):155–189.

    45

  • Marshall, John. 2014. “Learning to be conservative: How staying in high school changes political

    preferences in the United States and Great Britain.” Working paper.

    Meltzer, Allan H. and Scott F. Richard. 1981. “A rational theory of the size of government.”

    Journal of Political Economy 89:914–927.

    Milligan, Kevin, Enrico Moretti and Philip Oreopoulos. 2004. “Does education improve citizen-

    ship? Evidence from the United States and the United Kingdom.” Journal of Public Economics

    88:1667–1695.

    Mincer, Jacob. 1974. Schooling, Experience, and Earnings. New York: Columbia University

    Press.

    Moene, Karl O. and Michael Wallerstein. 2001. “Inequality, social insurance, and redistribution.”

    American Political Science Review pp. 859–874.

    Murphy, Kevin M. and Robert H. Topel. 1985. “Estimation and Inference in Two-Step Econometric

    Models.” Journal of Business and Economic Statistics 20(1):88–97.

    Oreopoulos, Philip. 2006. “Estimating Average and Local Average Treatment Effects of Education

    when Compulsory Schooling Laws Really Matter.” American Economic Review 96(1):152–175.

    Schoon, Ingrid, Helen Cheng, Catharine R. Gale, G. David Batty and Ian J. Deary. 2010. “Social

    status, cognitive ability, and educational attainment as predictors of liberal social attitudes and

    political trust.” Intelligence 38(1):144–150.

    Sondheimer, Rachel M. and Donald P. Green. 2010. “Using Experiments to Estimate the Effects

    of Education on Voter Turnout.” American Journal of Political Science 41(1):178–189.

    Sovey, Allison J. and Donald P. Green. 2011. “Instrumental variables estimation in political sci-

    ence: A readers’ guide.” American Journal of Political Science 55(1):188–200.

    46

  • Spence, Michael. 1973. “Job market signaling.” Quarterly Journal of Economics 87(3):355–374.

    Staiger, Douglas and James H. Stock. 1997. “Instrumental Variables Regression with Weak Instru-

    ments.” Econometrica 65(3):557–586.

    Thomassen, Jacques J.A. 2005. The European Voter: A Comparative Study of Modern Democra-

    cies. Oxford: Oxford University Press.

    Woodin, Tom, Gary McCulloch and Steven Cowan. 2013. “Raising the participation age in

    historical perspective: policy learning from the past?” British Educational Research Journal

    39(4):635–653.

    47

    IntroductionIV's upward bias with coarsened treatmentsCharacterizing the biasWhen is the bias severe?Sharp jumps in the CRFLinear CRFs

    Implications for applied research

    Using two samples to address missing dataEstimationProperties of TS2SLS

    High school education and political preferencesCompulsory schooling laws in BritainDataEmpirical strategyResultsThe effect of compulsory schooling reforms on schooling and political preferencesThe effect of schooling on political preferencesRobustness checks

    Conclusion