Grouped effects estimators in fixed effects models

22
Accepted Manuscript Grouped effects estimators in fixed effects models C. Alan Bester, Christian B. Hansen PII: S0304-4076(13)00203-0 DOI: http://dx.doi.org/10.1016/j.jeconom.2012.08.022 Reference: ECONOM 3832 To appear in: Journal of Econometrics Received date: 15 May 2009 Revised date: 23 August 2012 Accepted date: 27 August 2012 Please cite this article as: Bester, C.A., Hansen, C.B., Grouped effects estimators in fixed effects models. Journal of Econometrics (2013), http://dx.doi.org/10.1016/j.jeconom.2012.08.022 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Transcript of Grouped effects estimators in fixed effects models

Accepted Manuscript

Grouped effects estimators in fixed effects models

C. Alan Bester, Christian B. Hansen

PII: S0304-4076(13)00203-0DOI: http://dx.doi.org/10.1016/j.jeconom.2012.08.022Reference: ECONOM 3832

To appear in: Journal of Econometrics

Received date: 15 May 2009Revised date: 23 August 2012Accepted date: 27 August 2012

Please cite this article as: Bester, C.A., Hansen, C.B., Grouped effects estimators in fixedeffects models. Journal of Econometrics (2013),http://dx.doi.org/10.1016/j.jeconom.2012.08.022

This is a PDF file of an unedited manuscript that has been accepted for publication. As aservice to our customers we are providing this early version of the manuscript. The manuscriptwill undergo copyediting, typesetting, and review of the resulting proof before it is published inits final form. Please note that during the production process errors may be discovered whichcould affect the content, and all legal disclaimers that apply to the journal pertain.

Grouped Effects Estimators in Fixed Effects Models

C. Alan Bester and Christian B. Hansen∗

First Draft: February 2009 This Draft: October 2011

Abstract.

We consider estimation of nonlinear panel data models with common and individual specific parameters.Fixed effects estimators are known to suffer from the incidental parameters problem, which can lead to largebiases in estimates of common parameters. Pooled estimators, which ignore heterogeneity across individuals,are also generally inconsistent. We assume that individuals in the data are grouped on multiple levels wheregroups are defined by some observable external classification. We consider “group effects” estimators, whereindividual specific parameters are assumed common across groups at some level. We provide conditionsunder which group effects estimates of common parameters are asymptotically unbiased and normal. Theconditions suggest a tradeoff between two sources of bias, one due to incidental parameters and the otherdue to misspecification of unobserved heterogeneity.

Keywords: Fixed Effects, Panel Data, Hierarchical Models

JEL Codes: C10, C13, C23

1. Introduction

Panel data is widely used in empirical economics. Such data allows researchers to control forunobservable, time invariant individual-level heterogeneity that, according to economic theory, maybe related to covariates of interest. Examples include unobserved, household-specific willingness topay for a product, which may be correlated with income, and unobserved firm-specific policies thatmay be related to capital structure.

This paper considers settings where individuals may be grouped at different levels. For example,students may be grouped into classes, grades, schools, and districts; or firms may be groupedaccording to 1-digit, 2-digit, 3-digit, and 4-digit SIC codes and further broken into sub-groupswithin 4-digit SIC based on quintiles of their size and market-to-book ratios. We consider “groupedeffects” estimators, which estimate model parameters treating individual specific effects as if theyare constant within groups at a particular level. Our grouped effects estimators require assumptionsabout the distribution of unobserved heterogeneity and thus should broadly be classified as random

∗ The University of Chicago, Booth School of Business, 5807 South Woodlawn Avenue, Chicago, IL 60637, USA.

1

effects. However, the assumptions employed are very different from those employed in typicalrandom effects estimators. We neither assume independence between observables and unobservablesnor impose a parametric form for the distribution of unobserved effects. This style of random effectsmodel does not require integrating over unobserved effects distributions, and so is computationallysimple and readily implemented in standard econometrics software. Since the groupings are formedon the basis of observables and no integration is required in their implementation, these groupedeffects estimators may be thought of as a natural random effects structure that is intermediate topooled and fixed effects estimators.

We suppose that the model to be estimated is known up to a finite dimensional common pa-rameter and another finite dimensional parameter that may be specific to each individual but isassumed constant over time. Broadly speaking, there are two approaches to estimating these mod-els. In a linear model, individual specific effects are often treated as parameters to be estimated,an approach referred to as fixed effects estimation. Using fixed effects allows researchers to makeinference about common parameters while placing very little structure on the distribution of un-observable heterogeneity. However, this approach may be problematic in nonlinear or dynamicmodels. As noted by Neyman and Scott (1948), noise in the estimation of individual level effectswhen the time dimension of the panel is short will in general contaminate estimates of the commonparameters, a phenomenon generally referred to as the incidental parameters problem.

We consider an asymptotic sequence where N and T go to infinity jointly. We show that groupedeffect estimates suffer from two sources of bias. The first is due to incidental parameters and is oforder GNT /(NT ) where GNT is the number of groups used in a sample of size NT and GNT →∞as N and T increase. This order of bias arises since each group-level parameter is estimated usingapproximately GNTT observations when approximately equal sized groups are used. This incidentalparameters bias decreases as GNT decreases. The second bias arises from model misspecification,in the sense that individual specific heterogeneity is incorrectly assumed to be constant withingroups of individuals, and decreases as the total number of groups, GNT , increases. We provideconditions on the sampling scheme and the behavior of unobservables within groups such thatthe group effects estimator is asymptotically unbiased and normal. These conditions suggest atradeoff between the two sources of bias. Unsurprisingly, the key conditions involve restricting therate at which GNT → ∞ and assuming the error from treating individual-specific heterogeneityas constant within each of the GNT groups goes to zero quickly as GNT increases. Satisfying thesecond condition necessarily involves placing homogeneity restrictions on unobserved heterogeneity.These results will be most useful in cases where researchers do not have perfect information on howindividuals are grouped but have strong ex ante beliefs about potential grouping structures.

Recently a number of papers have studied econometric properties of fixed effects (hereafterFE) estimators, with the explicit aim of characerizing biases arising from the incidental parametersproblem. See, e.g., Lancaster (2002), Hahn and Kuersteiner (2002, 2004), Arellano (2003), Hahn

2

and Newey (2004), Woutersen (2005), Fernandez-Val (2005), Carro (2006), and Bester and Hansen(2007). These papers work with asymptotic sequences where N , the number of individuals, and T ,the number of time periods in the panel, both go to infinity, so that individual specific parametersare consistently estimable. However, they show that for nonlinear models, estimation of individuallevel effects introduces biases in the common parameters of order 1/T , implying that fixed effectsestimators will generally perform badly when T is small. These papers propose to estimate the 1/Tbias directly and remove the estimated bias from common parameters and other objects of interest(e.g., marginal effects). In simulations, these bias corrections provide dramatic MSE improvementsover the uncorrected fixed effects estimator in moderate-T panels. However, these bias correctedestimators may still perform badly when the time dimension is short.

There are two other approaches to dealing with unobserved heterogeneity without restrictingthe distribution of unobservables. One approach is to find an estimator of common parameters thatdoes not depend on unobserved effects as in Anderson (1970), Manski (1987), Honore (1992), Honoreand Kyriazidou (2000), several examples discussed in Wooldridge (2002), and Bonhomme (2010).These approaches are interesting, but their applicability must be established on a case-by-casebasis. In addition, interesting parameters, such as marginal effects averaged over the unobservedeffects distribution, may be left unidentified. The other approach is to consider set-identificationof parameters with fixed T and unrestricted distributions as in Honore and Tamer (2006), Cher-nozhukov, Hong, and Tamer (2004), and Chernozukov, Fernandez-Val, Hahn, and Newey (2009).The set-identification procedures apply quite generally but tend to be computationally more bur-densome than fixed effects procedures with the complexity increasing as the number of parametersin the model increases. Consistent with the small amount of structure they impose, they may alsoproduce very wide bounds.

Another common approach, termed broadly as random effects, places restrictions on the distri-bution of unobserved heterogeneity, either assuming independence between observables and unob-servables, or assuming unobservables are drawn from a distribution defined up to a finite dimen-sional parameter, or both. An extreme example is a pooled estimator, which ignores heteogeneityentirely. The best known random effects assumption is independence between observables andunobservables; see Hausman and Taylor (1981), Wooldridge (2002), Honore and Lewbel (2002),Lewbel (2005), Lin and Carroll (2000), Ullah and Roy (1998), and Henderson and Ullah (2005).Another common restriction is that unobservables depend upon observables only through a linearindex, such as Mundlak (1978), Chamberlain (1980), and Wooldridge (2005). Models where thedistribution of unobservables is parametric are an important special case of hierarchical models, di-cussed by Lindley and Smith (1972) and Raudenbush and Bryk (2002), which have a long traditionin Bayesian statistics. Several more recent papers, including Chen and Khan (2007), Gayle andViauroux (2007), and Bester and Hansen (2008), exploit index restrictions to obtain identificationof common parameters and marginal effects of covariates with panel data in very general semi- andnonparametric settings.

3

When the assumptions they make about the distribution of unobserved heterogeneity are sat-isfied, random effects estimators can perform extremely well even in very short-T panels. Unfortu-nately, economic theory often implies dependence between observed quantities and unobservablesand rarely suggests a parametric form for this dependence. In random effects approaches, mis-specifying the distribution of unobservables may result in inconsistency of estimates for commonparameters.2 Random effects estimators that involve integration over a specified distribution of un-observables are also often computationally burdensome except in very simple cases. Fixed effectsapproaches are therefore often preferred in empirical applications despite their potentially poorfinite sample properties.

Our approach is related to the panel structure model and estimator proposed by Sun (2005),who also studies a panel data model based on grouping individuals. Sun (2005) assumes the modelis linear with Gaussian errors and that individuals are perfectly classified within a finite numberof groups but treats group membership as unobserved to the researcher. Our approach appliesto general nonlinear models and does not rely on any grouping perfectly classifying individuals infinite samples, though we do require that the group structure is “rich enough” that grouping errorsare small relative to sampling variation. The idea we exploit is also related to Heckman and Singer(1984), who assume a discrete distribution for unobserved heterogeneity with a finite number ofsupport points, and Browning and Carro (2009), who work with dynamic binary choice models andconsider models of heterogeneity that place minimal to no restrictions on the data.

The next section of the paper defines the models and estimators we are interested in andprovides examples of restrictions on unobserved heterogeneity that could satisfy our assumptions.Section 3 states assumptions and provides asymptotic theory for group effects estimators. Section4 concludes. Statements of several technical lemmas, further details on our asymptotic expansions,and all proofs are collected in the appendix.

2. Model and Examples

Denote observed data as wit, where i = 1, . . . , N indexes individuals and t = 1, . . . , Ti indexestime. We consider a panel data model whose parameters are estimated by maximizing the sampleobjective function

QNT (θ, α1, ..., αN ) =1∑i Ti

N∑

i=1

Ti∑

t=1

ϕ(wit, θ, αi);

i.e., the model is known up to a finite dimensional common parameter, θ, and a set of individualspecific parameters αi. We develop all formal results for general M-estimators and unbalancedpanels; but, for simplicity, we present the heuristics of the problem in this section for the case

2There are relatively few papers that have examined the behavior of misspecified random effects estimators. See

Baltagi (1992), Matyas and Blanchard (1998), and Arellano and Bonhomme (2009).

4

ϕ = log f where f is a density function with respect to some measure and f(w, θ0, αi0) is the p.d.fof wit, αi is a scalar, and the panel is balanced, Ti ≡ T so

∑i Ti = NT . The fixed effects (hereafter

FE) maximum likelihood estimator is then defined as

(θFE , αi

)= argmax

θ,αiNi=1

1N

N∑

i=1

QiT (θ, αi), where QiT (θ, α) =1T

T∑

t=1

log f(wit, θ, α).

We also suppose the researcher has available a sequence of grouping schemes, which we representfor a given N and T by a collection of index sets,

(2.1) INTg = i : individual i belongs to group g , g = 1, . . . , GNT .

As we suggest below in examples, these groups may be based on wit, or on other observableinformation such as classification of firms based on industry groupings or households based ongeographic locations. For a given N,T , we have GNT groups consisting of Ng individuals each.3

The indexing by NT is due to groups potentially changing as the panel grows along either orboth dimensions; for readability, we will drop this indexing for the remainder of the paper. As analternative to the FE estimator, we suppose the researcher considers the group effects (hereafterGE) estimator,

(θG, γg

)= argmax

θ,γgGNTg=1

1GNT

GNT∑

g=1

QgT (θ, γg), where QgT (θ, γ) =1Ng

i∈Ig

1T

T∑

t=1

log f(wit, θ, γ).

Note that the GE estimator solves an optimization problem with the same objective function as theFE estimator, subject to the linear constraints that αi = γg for all i ∈ Ig and all 1 ≤ g ≤ GNT . It isobvious, but important to note, that the grouped effects estimator nests the fixed effects estimatorwhen GNT = N and Ng = 1 for all g, and the pooled estimator when Ng = N and GNT = 1.

2.1. Two Sources of Bias

To understand the large sample behavior of the FE and GE estimators, it is useful to concentrateindividual- and group- level effects out of the problem. To this end, we define

αiT (θ) = argmaxα

QiT (θ, α) and θFE = argmaxθ

1N

N∑

i=1

QiT (θ, αiT (θ))

and let θ0 and αi0 denote the true parameter values. Note that, for a given finite T , due to samplingerror we will in general have αiT (θ0) 6= αi0. Therefore, with T fixed and N →∞, we will have

θFEp−→ θT where θT ≡ argmax

θlimN→∞

1N

N∑

i=1

1T

T∑

t=1

E [log f(wit, θ, αiT (θ))]

3For simplicity, we treat Ng as constant across groups in this section.

5

and in general θT 6= θ0. This is the source of the incidental parameters problem noted by Neymanand Scott (1948).

For fixed T , we may view θFE as the solution to a misspecified problem, in the sense that ifone replaces αi(θ) with

αiT (θ) = argmaxα

E [QiT (θ, α)] ,

one would have θT = θ0. That is, noise in estimation of individual specific parameters causescommon parameters to be inconsistent. Intuitively, one gets bias terms of order 1/T since eachindividual specific effect is estimated using T observations. Because the problem is nonlinear, thesebias terms enter the probability limit of θFE , leading to inconsistency when T is fixed.

The same heuristic argument may be applied to the grouped effects estimator. Define

γgT (θ) = argmaxγ

QgT (θ, γ) and θG = argmaxθ

1GNT

GNT∑

g=1

QgT (θ, γgT (θ)),

where, as above, Ng is the number of individuals in group g.4 Like the fixed effects estimator, oneobtains θG as the solution to a misspecified problem. With T fixed and N →∞, we have

θGp−→ θT where θT = lim

N→∞1

GNT

GNT∑

g=1

1NgT

i∈Ig

T∑

t=1

E [log f(wit, θ, γgT (θ))] ,

and again in general θT 6= θ0. We show below that, via an expansion of γgT (θ) around γgT (θ) =argmax

γE [QgT (θ, γ)], the grouped effects estimator also suffers from ‘incidental parameters bias’

that is of order 1/(NgT ) = GNT /(NT ), as each group level effect is being estimated with NgT

observations. Depending on the behavior of Ng as N and T increase, it is clear that the incidentalparameters bias in θG is potentially (much) smaller order than that in θFE . Here, however, there isan additional source of bias: Replacing γgT (θ) with γgT (θ) does not give θ0. This happens becausein general individual effects will differ within groups; i.e. we will not have γgT (θ) = αiT (θ) for alli ∈ Ig. We consider the discrepancy between individual effects within groups under the sup norm,

ξNT = supg

sup(i,j)∈Ig : i 6=j

|αi0 − αj0| ,

and show in Section 3 that the second source of bias is closely related to ξNT .

4In this heuristic argument, we ignore the fact that Ig and hence the definition of γgT changes with N as well the

fact that γgT (θ)p−→ γgT (θ) = argmax

γE [QgT (θ, γ)] with T fixed and N →∞ for many potential grouping structures.

We do note that this provides the possibility of N →∞, T fixed inference in the grouped effects setup when groups

are such that γgT (θ) = αiT (θ) for all i and g.

6

2.2. Examples of Restrictions on Unobservables

As part of our assumptions, we will place restrictions on the behavior of individual effects αi0 withingroups. Here we present a characterization of a data generating process that is compatible with ourassumptions. Most importantly, note that the restrictions placed on the data generating processare quite different from most ‘random effects’ estimators. In particular, the restrictions allow forfairly general dependence between observables and unobservables and do not impose a parametricform for the distribution of αi. We let g = 1, 2, . . . , GNT index groups in a given grouping scheme.

An equivalent formulation of the grouped effects problem is also useful and allows us to relatethe quantity ξNT to the R2 from the infeasible regression of the true unobserved effects on a setof group dummies in the case where unobserved heterogeneity is a scalar. Suppose groups canbe represented by a sequence of matrices, DNT , where for a given N and T , DNT ∈ RN×GNT

with typical element [DNT ]i,g = 1(i ∈ Ig). These group membership matrices may be generated,for example, by grouping individuals according to quantiles, quintiles, deciles, etc. of a certainobservable or set of observables, or by grouping according to other observable information such asSIC codes to a given number of digits. Consider the infeasible regression of individual level effectson group indicators,

(2.2) (α10, . . . , αN0)′ = DNTβ + νNT ,

and let R2NT be the R-squared of this regression. It is obvious that the error sum of squares from

this regression is∑N

i=1

(αi0 − 1

Ng

∑Nj=1 1 (j belongs to same group as i)αj0

)2. It is then immediate

from the triangle inequality, definition of R2, and definition of ξNT that 1−R2NT ≤

ξ2NT1N

∑Ni=1(αi−αN )2

.

Thus, ξNT → 0 as GNT increases implies 1−R2NT → 0 or, equivalently, R2

NT → 1. A key ingredientin understanding the bias in θG will be the rate at which ξNT → 0.

Examples. For comparison, it is useful to consider two simple benchmark cases. In the first,we suppose that αi i.i.d.N(0, 1) and the sequence of (in this case non-nested) groupings

INTg

is formed for each N,T by assigning each individual i at random to one of GNT equally sizedgroups. In this case, there is no observable group information, so the groupings clearly contain noinformation about individual-specific effects. It is an easy exercise to show that R2

NT = O(GNT /N),so that the R-squared of the regression above goes to one linearly with the number of groups. Itis also clear that we must have GNT /N → 1 to have ξNT → 0, essentially implying that onemust run fixed effects to keep the bias due to misspecification small. The intuition from thissimple case carries through to any situation in which the researcher does not believe there is anyinformation available about the process generating unobserved effects. Without some beliefs aboutthe underlying structure of the unobserved heterogeneity, the only way to keep specification erroruniformly small across all possible structures is to run fixed effects, though this will maximize thebias due to incidental parameters.

7

For the second simple illustrative case, consider αi = f(xi0) where xi0 is some scalar initialcondition that has compact support [a, b]. Further suppose that f(xi0) is Lipschitz continuousover [a, b] such that |f(z1) − f(z0)| ≤ C|z1 − z0| for some C < ∞. In this case, suppose that thesequence of groups is defined by splitting the interval [a, b] into GNT equally sized groups. Thenwe have that ξNT = supg sup(i,j)∈Ig : i 6=j |αi0 − αj0| = supg sup(i,j)∈Ig : i 6=j |f(xi0)− f(xj0)| ≤sup(i,j)∈Ig : i 6=jC |xi0 − xj0| ≤ C(b−a)

GNT. In this case, the observed initial condition is informative

about the group structure, and GNT →∞ is sufficient for ξNT → 0.

3. Asymptotic Theory

The main theorem in our paper establishes consistency and asymptotic normality of θG under as-sumptions about the sampling environment and the behavior of unobservables within the observablegroup structure. We first state two assumptions that are sufficient, respectively, for consistencyand asymptotic normality of the fixed effects estimator. Our third assumption is on the sequence ofgroping schemes that define the group effects estimator. Under this assumption, we obtain asymp-totic approximations that reflect a belief by the researcher that the error introduced by groupingobservations according to the selected grouping scheme Ig is small relative to sampling variation.Our assumptions are stated for unbalanced panels, unequally sized groups, and dim(αi) ≥ 1 withproofs given under these conditions.

To fix some notation, for a random variable Wit, let Eit[Wit] =∫WitdFit be the expectation

with respect to the marginal distribution of the data, Fit, for individual i at time t, and letEiW = 1

Ti

∑Tit=1 Eit[Wit] where Ti is the number of observations for individual i. Let T = 1

N

∑Ni=1 Ti.

Throughout, α and |α| denote a real scalar (or vector) and its absolute value (sum of absolute valuesof its elements) while α(·) and ‖α(·)‖ = supi∈N |α(i)| denote a sequence of real numbers (vectors)and its supremum norm. For a real vector θ, define ‖(θ, α(·))‖ = |θ|+ ‖α(·)‖.

Assumption 1. (i) Let N,T → ∞ indicate that N → ∞ and each Ti → ∞ jointly in sucha way that Ti

T → ρi and supi |TiT − ρi| → 0 where infi ρi ≥ δ > 0 and supi ρi ≤ ∆ < ∞. Fornotational readability, we index elements of a sequence indexed by N and (T1, ..., TN ) only by NT .(ii) wit, α0(i) are independent across i. For each i, wit is a strong mixing sequence withmixing coefficient ai(m) = supt supB1∈Bi−∞,t,B2∈Bit+m,∞ |P (B1 ∩B2)− P (B1)P (B2)| where Bi−∞,t =

σ(wit, wit−1, wit−2, ...) and Bit,∞ = σ(wit, wit+1, wit+2, ...), and there exists a τ ∈ 2N and r > τ such

that supi|ai(m)| ≤ Cm(1−τ)rr−τ −ε for some C > 0 and some ε that satisfies ε− δ > 0 for some δ > 0.

(iii) Let ϕ(wit; θ, α) be a function indexed by the parameters θ ∈ Θ and α ∈ A where Θ and Aare compact, convex subsets of Rk and Rp respectively, and let ϕit(θ, α) ≡ ϕ(wit; θ, α). Assumeϕit(θ, α) is continuous in θ and α. Let θ0 ∈ int(Θ) and α0(i) ∈ int(A) for each i = 1, ..., N be suchthat for each i and η > 0,

limTi→∞

Ei[ϕ(θ0, α0(i))]− sup(θ,α):|(θ,α)−(θ0,α0(i))|>η

limTi→∞

Ei[ϕ(θ, α)] > 0

8

Also assume that for each η > 0,

limN,T→∞

1NT

N∑

i=1

Ti∑

t=1

Eit[ϕit(θ0, α0(i))]

− sup(θ,α(·)):‖(θ,α(·))−(θ0,α0(·))‖>η

limN,T→∞

1NT

N∑

i=1

Ti∑

t=1

Eit[ϕit(θ, α(i))] > 0

where ‖(θ, α(·))‖ =∑k

j=1 |θj | + supi∑p

j=1 |αj(i)|. (iv) Let v = (v1, ..., vk)′ and u = (u1, ..., up)′ be

vectors of nonnegative integers. Define D(v,u)ϕit(θ, α) = ∂|v|+|u|ϕit(θ,α)

∂θv11 ...∂θ

vkk ∂α

u11 ...∂α

upp

. Assume there exists

a function M(wit) such that |D(v,u)ϕit(θ2, α2)−D(v,u)ϕit(θ1, α1)| ≤ M(wit)|(θ2, α2)− (θ1, α1)| forall (θ1, α1), (θ2, α2) ∈ Θ × A and |v| + |u| ≤ 3 and that sup(θ,α)∈Θ×A ‖D(v,u)ϕit(θ, α)‖ ≤ M(wit)for |v| + |u| ≤ 3. Assume that supi,t Eit[M(wit)τ+ε] ≤ ∆ < ∞ for some ε that satisfies ε − δ > 0for some δ > 0.

The conditions in Assumption 1 are fairly standard and are sufficient to establish consistency ofthe FE estimator. The only somewhat unusual condition is Assumption 1(i), which, for unbalancedpanels, restricts the number of observations for each cross-sectional unit such that no units areasymptotically dominant or negligible.

We next state a set of conditions which are used in establishing asymptotic normality of theestimators considered.

Assumption 2. (i) Let λg be the minimum eigenvalue of 1NgTg

∑i∈Ig

∑Tit=1 Eit

[∂2ϕit(θ0,α0(i))

∂α∂α′

]. As-

sume that infg λg ≥ δ > 0 where δ does not depend on N or (T1, ..., TN ). (ii) Suppose Hθθ(θ, α(·)) =limN,T→∞ 1

NT

∑Ni=1

∑Tit=1 Eit[

∂2ϕit(θ,α(i))∂θ∂θ′ ] exists for all (θ, α(·)) ∈ Θ × (×∞i=1A) and let Hθθ

0 =Hθθ(θ0, α0(·)). Let

JNT =1NT

GNT∑

g=1

i∈Ig

Ti∑

t=1

(Eit

[∂2ϕit(θ0, α0(i))

∂θ∂θ′

]− κgEit

[∂2ϕit(θ0, α0(i))

∂α∂θ′

])

for

κg =

1NgTg

i∈Ig

Ti∑

t=1

Eit

[∂2ϕit(θ0, α0(i))

∂θ∂α′

] 1NgTg

i∈Ig

Ti∑

t=1

Eit

[∂2ϕit(θ0, α0(i))

∂α∂α′

]−1

,

and suppose J = limN,T→∞ JNT exists and has minimum eigenvalue λJ ≥ δ > 0. (iii) For i ∈ Ig,let S∗it =

√ρi(uθit − κguαit

)where uθit = ∂ϕjt(θ0,α0(j))

∂θ − Eit[∂ϕjt(θ0,α0(j))

∂θ

]and uαit = ∂ϕjt(θ0,α0(j))

∂α −Eit[∂ϕjt(θ0,α0(j))

∂α

]. Let Ωi,Ti = Var

(1√Ti

∑Tit=1 S

∗it

), and let λi,Ti be the minimum eigenvalue of

9

Ωi,Ti. Assume that infi infTi λi,Ti > 0 and that Ω = limN,T→∞ 1N

∑Ni=1 Ωi,Ti exists. (iv) As-

sume that 1√NT

∑Ni=1

∑Tit=1Eit

[∂ϕit(θ0,α0(i))

∂θ

]→ 0, 1√

NT

∑Ni=1

∑Tit=1 κgEit

[∂ϕit(θ0,α0(i))

∂α

]→ 0, and

supg ‖ 1√NgTg

∑i∈Ig

∑Tit=1Eit

[∂ϕit(θ0,α0(i))

∂α

]‖ → 0.

Assumption 2 provides analogs to standard conditions used to verify asymptotic normality ofM-estimators under mixing allowing for non-stationarity. With the exception of allowing Ng ≥ 1,these conditions are essentially identical to those required to establish asymptotic normality of theFE estimator. Notice that this set of assumptions allows for very general dependence in the timeseries direction.

The final assumption is about the sequence of grouping schemes that define the GE estimator.Note that indexing of g by NT is supressed; the objects Ig, Tg, Ng, etc. are defined with respectto the partition for a given sample size N,T. The assumption relies on the key object

ξNT = supg

supi,j∈Ig

max1≤s≤p

|α0,s(i)− α0,s(j)|

where α0,s(·) is the sth element of vector α0(·) which measures the degree of misspecification bythe maximum discrepancy between “true” unobserved effects for individuals in the same group.

Assumption 3. Suppose that there exists a sequence of partitions such that for each N,T1, ..., TNthe data are partitioned into GNT groups consisting of

∑i∈Ig Ti observations for g = 1, ..., GNT

where Ig = i : individual i belongs to group g. Let Ng denote the number of elements in theset Ig, and let Tg = 1

Ng

∑i∈Ig Ti. Let NG = 1

GNT

∑GNTg=1 Ng and assume supg | NgNG − ζg| → 0

where supgζg ≤ ∆ < ∞ and infg ζg ≥ δ > 0. Assume (i)(

supgNg−1Ng

)ξNT → 0 such that (ii)

√NT

(supg

Ng−1Ng

)ξ2NT → 0 as N,T → ∞. (iii) |dFit(wit) − dFjt(wit)| ≤ C(wit)|α0(i) − α0(j)|

with supi,tC(wit)/dFit(wit) ≤ M(wit). (iv) GNT (infgNgTg)−τ/2 → 0 and GNT /√NT → 0 as

N,T → ∞.

Assumption 3(i)–(ii) is a high level condition which asserts that either the maximum discrepancybetween individuals within groups goes to zero as the sample size increases, or that the data areeventually grouped at the individual level.5 Assumptions 1 and 3(i) are sufficient to establishconsistency of θG. We show below how this condition relates to the example discussed in Section2.2. Conditions (iii) and (iv) are restrictions on unobserved heterogeneity. Condition (iii) is atechnical condition on smoothness of individual specific marginal distributions in the parameterαi. In the likelihood case, this condition simply requires Lipschitz continuity of the likelihood inα. For convenience, we assume that in the case where A is continuous, Fit has a density so that|dFit − dFjt| may be interpreted in the obvious way. Condition (iv) stipulates that the rate at

5Recall that with Ng = 1 for all g, the researcher is running fixed effects.

10

which groups are added grows more slowly than√NT , which ensures that bias due to incidental

parameters does not enter the asymptotic distribution of the GE estimator. Recalling that the FEestimator sets Ng = 1 and GNT = N , we see that fixed effects estimators satisfy this condition whenN/√NT → 0, or in other words T grows faster than N . This is a well-known necessary condition

(c.f. Hahn and Kuersteiner (2002) for the dynamic linear model and Hahn and Newey (2004) forthe general nonlinear case) for the asymptotic distribution of θFE to be correctly centered.

We now state the main result for θG.

Proposition 1. Let γG = (θ, αG(·)) = arg max(θ,α(·))∈ΓG1NT

∑Ni=1

∑Tit=1 ϕit(θ, α(i)) for

ΓG = γ ∈ Γ : |αG(i)− αG(j)| = 0 ∀ i, j ∈ Ig

for Ig defined in Assumption 3. If Assumptions 1 and 3.(i) are satisfied, then γGp−→ γ0 =

(θ0, α0(·)). Further,√NT (θ − θ0) d−→ J−1N(0,Ω) under Assumptions 1-3.

Note that under Assumptions 1 and 3.(i), we can consistently estimate both common param-eters and individual specific parameters for each i with the grouped effects estimator. This resultfollows from the fact that the fixed effects estimator is consistent as T → ∞ regardless of N forregular models and that Assumption 3 implies that one is eventually running fixed effects or un-observed effects are “close enough” within groups to be well-estimated by group effects. Note thatin a fixed-T environment, the fixed effects and grouped effects estimators would generally have dif-ferent probability limits and the grouped effects estimator could remain consistent with sufficienthomogeneity in the individual effects within the grouping structure while fixed effects will generallybe inconsistent.

Proposition 1 establishes that θG is asymptotically normal and unbiased. It is worth comparingProposition 1 to similar results for the fixed effects estimator as in, for example, Hahn and Kuer-steiner (2004). Under an asymptotic sequence where N

T → c < ∞, we have√NT

(θFE − θ0

)d−→

N(cB, J−1ΩJ−1) where cB is bias resulting from incidental parameters. We see that the fixedeffects estimator and the grouped effects estimator are both asymptotically normal with the samevariance but different centering under this sequence. Specifically, the fixed effects estimator is biaseddue to incidental parameters while the grouped effects estimator, which exploits homogeneity inthe underlying unobserved effects, is not. Exploiting the assumed homogeneity allows the groupedeffects estimator to remain correctly centered and asymptotically normal in situations where fixedeffects or bias-corrected fixed effects are dominated by bias or may even be divergent when centeredaround the true parameter value and normalized by the sample size.

Assumption 3 is the key to establishing Proposition 1 and is closely related to the two sources ofbias discussed in Section 2.1. Note that Assumption 3.(iv) is purely about how quickly the numberof groups may increase as observations are added to the sample. Assumption 3.(i)-(ii) is implicitlya restriction on the data generating process, as it presumes it is possible to group individuals so

11

that the maximum within group discrepancy in the unobservable effects goes to zero sufficientlyquickly as groups are added. It is this assumption that keeps the proposed approach within theclass of random effects estimators.

Examples (continued). Suppose that we have a balanced panel and the sampling scheme issuch that N = O(T ρ), and consider the first example scenario given in Section 2.2 where groupingschemes are uninformative. Satisfying Assumption 3.(i) in this case requires that GNT /N → 1, sowe will simply take GNT = N in which case Assumption 3.(ii) is obviously also satisfied. To satisfy

Assumption 3.(iv), we then need GNT√NT

=√

NT → 0 which implies T grows more quickly than N .

This condition is a well-known condition for asymptotic unbiasedness of θFE which is unsurprisingsince the θG is identical to θFE with GNT = N . When T and N increase at the same rate (ρ = 1),we have

√NT

(θFE − θ0

)d−→ N(cB, J−1ΩJ−1). That is, the asymptotic distribution of the fixed

effects estimator is incorrectly centered, with bias cB arising due to the incidental parametersproblem where N

T → c < ∞. Hahn and Kuersteiner (2004) propose an estimate of this bias andshow that the resulting bias corrected fixed effects estimator is asymptotically unbiased when ρ = 1.Under the additional assumption that wit is i.i.d. for each i, Hahn and Newey (2004) show thata similar bias corrected fixed effects estimator is asymptotically unbiased when ρ < 3.

Now consider the second case where groups do contain information about unobservables, andsuppose the researcher chooses to add groups at rate GNT = N δ. In this case, Assumption 3.(iv),which controls bias due to incidental parameters, requires that T ρδ/T

12

(1+ρ) → 0, or equivalentlyδ < 1

2(1 + ρ−1). That is, the faster N increases relative to T , the slower groups must be added toavoid incurring asymptotic biases due to noisy estimates of group-level effects. As demonstrated inSection 2.2, we also have ξNT = O(G−1

NT ) = O(N−δ) given the rate for GNT . Satisfying Assumption

3.(ii) will then require that T12 (1+ρ)

T 2δρ → 0 or δ > 14(1 + ρ−1). Taking these two conditions implies

that groups can be added at a rate N δ with δ satisfying 14(1+ρ−1) < δ < 1

2(1+ρ−1). If we considerthe case where ρ = 3 so even bias-corrected fixed estimators are asymptotically biased, we havethat the group effects estimator will remain asymptotically normal and correctly centered as longas 1

3 < δ < 23 . In other words, it is possible for θG to be asymptotically unbiased in a setting where,

to our knowledge, any estimator that attempts to estimate all individual-level parameters will bebiased asymptotically.6

4. Conclusion

This paper has analyzed group effects estimators for nonlinear panel data models with a finitedimensional common parameter and time invariant individual specific effects that are unobserved tothe econometrician. Group effects estimators hold individual-level heterogeneity constant according

6The same qualitative conclusions are obtained by the same argument for ξNT = O(G−κ/2NT ) with κ > 1.

12

to an observed grouping structure, and may be thought of as intermediate to pooled and fixed effectsestimators. We provided conditions under which group effects estimators of the common parameterare asymptotically unbiased. These conditions demonstrate a tradeoff between two sources ofasymptotic bias. One source of bias is the well-known incidental parameters problem suffered byfixed effects estimators, and the other arises from discrepancies in individual level effects withingroups, that is, misspecification in the structure of unobservable heterogeneity. The results in thispaper may be extended in several interesting ways. First, following Sun (2005), one can consider asetup where the group structure is unobservable or only partially observable to the econometrician,and another useful extension would be to allow for a simple random effects structure that remainsafter group effects have been removed. Second, one may wish to consider bias correction of groupeffects estimators. Correction of biases due to incidental parameters should follow from similarapproaches for fixed effects estimators, e.g., Hahn and Newey (2004) and Hahn and Kuersteiner(2004). Finally, the issue of group selection needs to be addressed.Our results suggest these maybe interesting avenues for future research.

5. Appendix

In this appendix, we outline the proof of Proposition 1. A more detailed derivation can be found in Besterand Hansen (2010).

In the following, we let αg(θ) = arg maxα∈A 1NgTg

∑i∈Ig

∑Ti

t=1 ϕit(θ, α) and use αg = αg(θ0) ∈ A. We

similarly use θ = arg maxθ∈Θ1NT

∑GNT

g=1

∑i∈Ig

∑Ti

t=1 ϕit(θ, αg(θ)) and note that the estimators (θ, α(θ))obtained in this way are numerically identical to the solution to

(θ, α1, ...αG) = arg maxθ∈Θ,α1∈A,...,αG∈A

1NT

GNT∑

g=1

i∈Ig

Ti∑

t=1

ϕit(θ, αg).

Throughout the following we use superscripts to denote partial differentiation; e.g. ϕαit = ∂ϕit

∂α and

ϕθαit = ∂2ϕit

∂θ∂α′ . Let weight ωi =(

1Ng

∑i∈Ig

∑Ti

t=1Ejt [ϕααit (θ0, α0(j))])−1 (∑Ti

t=1Ejt [ϕααit (θ0, α0(j))])

for

some j ∈ Ig, and let αg = 1Ng

∑i∈Ig

ωiα0(i).

5.1. Expansions

We make use of the following expansions for αg and θ.

α: αg − αg = −[Hααg

]−1

(∑6j=1 ψgj) where Hαα

g = 1NgTg

∑i∈Ig

∑Ti

t=1 ϕααit (θ0, αg), ψg1-ψg6 are given by

ψg1 =1

NgTg

i∈Ig

Ti∑

t=1

(ϕαit(θ0, α0(i))− Eit [ϕαit(θ0, α0(i))])(5.1)

ψg2 =1

NgTg

i∈Ig

Ti∑

t=1

Eit [ϕαit(θ0, α0(i))](5.2)

13

ψg3 =1

NgTg

i∈Ig

Ti∑

t=1

(ϕααit (θ0, αg(i))− Eit [ϕααit (θ0, αg(i))]) (αg − α0(i))(5.3)

ψg4 =1

NgTg

i∈Ig

Ti∑

t=1

(Eit [ϕααit (θ0, αg(i))]− Eit [ϕααit (θ0, αg(j))]) (αg − α0(i))(5.4)

ψg5 =1

NgTg

i∈Ig

Ti∑

t=1

(Eit [ϕααit (θ0, αg(j))]− Ejt [ϕααit (θ0, αg(j))]) (αg − α0(i))(5.5)

ψg6 =1

NgTg

i∈Ig

Ti∑

t=1

(Ejt [ϕααit (θ0, αg(j))]− Ejt [ϕααit (θ0, α0(j))]) (αg − α0(i)),(5.6)

j ∈ Ig, αg are intermediate values satisfying |αg−αg| ≤ |αg−αg|, and αg(i) are intermediate values satisfying|αg(i)− α0(i)| ≤ |αg − α0(i)|.

θ: θ − θ0 = −J−1(∑6

j=1Bj − 1NT

∑GNT

g=1

∑i∈Ig

∑Ti

t=1 ϕθαit (θ0, αg)(Hαα

g )−1(∑6j=1 ψgj)

)where

J =1NT

GNT∑

g=1

i∈Ig

Ti∑

t=1

[ϕθθit (θ, αg(θ))− ϕθαit (θ, αg(θ))Hαα

g (θ, αg(θ))−1Hαθg (θ, αg(θ))

],(5.7)

B1 −B6 are given by

B1 =1NT

GNT∑

g=1

i∈Ig

Ti∑

t=1

(ϕθit(θ0, α0(i))− Eit[ϕθit(θ0, α0(i))]

)(5.8)

B2 =1NT

GNT∑

g=1

i∈Ig

Ti∑

t=1

Eit[ϕθit(θ0, α0(i))],(5.9)

B3 =1NT

GNT∑

g=1

i∈Ig

Ti∑

t=1

(ϕθαit (θ0, αg(i))− Eit[ϕθαit (θ0, αg(i))]

)(αg − α0(i)),(5.10)

B4 =1NT

GNT∑

g=1

i∈Ig

Ti∑

t=1

(Eit[ϕθαit (θ0, αg(i))]− Eit[ϕθαit (θ0, αg(j))]

)(αg − α0(i)),(5.11)

B5 =1NT

GNT∑

g=1

i∈Ig

Ti∑

t=1

(Eit[ϕθαit (θ0, αg(j))]− Ejt[ϕθαit (θ0, αg(j))]

)(αg − α0(i)),(5.12)

B6 =1NT

GNT∑

g=1

i∈Ig

Ti∑

t=1

(Ejt[ϕθαit (θ0, αg(j))]− Ejt[ϕθαit (θ0, α0(j))]

)(αg − α0(i)),(5.13)

Hααg (θ, αg) = 1

NgTg

∑i∈Ig

∑Ti

t=1 ϕααit (θ, αg), Hαθ

g (θ, αg) = 1NgTg

∑i∈Ig

∑Ti

t=1 ϕαθit (θ, αg), and αg(·) is a se-

quence intermediate values satisfying ‖αg(·)− α0(·)‖ ≤ ‖αg − α0(·)‖.

5.2. Lemmas

Lemma 1. Under Assumption 1 1NT

∑Ni=1

∑Ti

t=1 (ϕit(θ, α(i))− Eit[ϕit(θ, α(i))])p−→ 0.

Proof. The results follows from a standard argument under Assumption 1. 14

Lemma 2. Let γ = (θ, α(·)) and ‖γ‖ be as in Assumption 1. Let Γ = Θ× (×∞i=1A). Under Assumption 1,∣∣∣ 1NT

∑Ni=1

∑Ti

t=1 ϕit(θ, α(i))− 1NT

∑Ni=1

∑Ti

t=1 ϕit(θ, α(i))∣∣∣ ≤ BNT ‖γ − γ‖ for γ, γ ∈ Γ and BNT = Op(1).

Proof.∣∣∣ 1NT

∑Ni=1

∑Ti

t=1 ϕit(θ, α(i))− 1NT

∑Ni=1

∑Ti

t=1 ϕit(θ, α(i))∣∣∣

≤ 1NT

∑Ni=1

∑Ti

t=1M(wit)(∑k

j=1 |θj |+ supi∑pj=1 |αj(i)|

)=(

1NT

∑Ni=1

∑Ti

t=1M(wit))‖γ − γ‖ where the

inequality follows from the triangle inequality and Lipschitz condition in Assumption 1. Defining BNT =(1NT

∑Ni=1

∑Ti

t=1M(wit))

, Assumption 1 is sufficient for BNT = Op(1).

Lemma 3. Let ΓG = γ ∈ Γ : |α(i)− α(j)| = 0, ∀g, ∀i, j ∈ Ig for Ig defined in Assumption 3. For γ =(θ∗, α∗(·)) ∈ Γ and γG = (θ∗, αG(·)) ∈ ΓG where αG(i) = 1

Ng

∑j∈Ig

α∗(j), we have ‖γ − γG‖ → 0 underAssumption 3.

Proof. The conclusion follows immediately under Assumption 3.

Lemma 4. Under Assumptions 1-3, supg supi∈Ig‖αg − α0(i)‖ ≤ CNT

(supg

Ng−1Ng

)ξNT for some CNT =

O(1).

Proof. We have αg − α0(i) = 1Ng

∑j∈Ig

ωjα0(j) − α0(i) = 1Ng

∑j∈Ig

ωjα0(j) − 1Ng

∑j∈Ig

ωjα0(i) =1Ng

∑j∈Ig:j 6=i ωj(α0(j)− α0(i)). Thus, supg supi∈Ig

‖αg − α0(i)‖C(p)ξNT supgNg−1Ng

supg supi∈Ig‖ωi‖ where

C(p) depends only on the dimension of α and the norm.

It remains to be shown that supg supi∈Ig‖ωi‖ ≤ CNT where CNT = O(1).

‖ωi‖ ≤ ‖

1Ng

i∈Ig

Ti∑

t=1

Ejt[ϕααit (θ0, α0(j))]

−1

‖ ‖Ti∑

t=1

Ejt[ϕααit (θ0, α0(j))]‖

≤ TiTg

∆‖

1NgTg

i∈Ig

Ti∑

t=1

Ejt[ϕααit (θ0, α0(j))]

−1

‖ ≤ TiTgC∆ = CNT

for some C <∞ where the second inequality follows from Assumption 1 and the last from Assumption 2. Itfollows from Assumption 1 that supg supi∈Ig

Ti

Tg= supg supi∈Ig

Ti/TTg/T

= O(1). The conclusion follows.

Lemma 5. Under Assumptions 1-3, 1NT

∑GNT

g=1

∑i∈Ig

∑Ti

t=1 ϕθθit (θ, αg(θ))

p−→ Hθθ0 .

Proof. Let Γ = Θ × (×∞i=1A) and (θ, α(·)) = γ ∈ Γ, 1NT

∑GNT

g=1

∑i∈Ig

∑Ti

t=1 ϕθθit (θ, α(i))

p−→ Hθθ(θ, α(·))under Assumptions 1-2 as in Lemma 1. Similarly to Lemma 2, we have∣∣∣∣∣∣

1NT

GNT∑

g=1

i∈Ig

Ti∑

t=1

ϕθθit (θ, α(i))− 1NT

GNT∑

g=1

i∈Ig

Ti∑

t=1

ϕθθit (θ∗, α∗(i))

∣∣∣∣∣∣≤

1NT

GNT∑

g=1

i∈Ig

Ti∑

t=1

M(wit)

‖γ − γ∗‖

with 1NT

∑GNT

g=1

∑i∈Ig

∑Ti

t=1M(wit) = Op(1) under the conditions of the Lemma. It follows that

supγ∈Γ

∣∣∣ 1NT

∑GNT

g=1

∑i∈Ig

∑Ti

t=1 ϕθθit (θ, α(i))−Hθθ(θ, α(·))

∣∣∣ p−→ 0 by Newey and Powell (2003) Lemma A2.

Thus, defining αθg(·) : N→ A such that αθg(i) = arg minα∈A∑i∈Ig

∑Ti

t=1 ϕit(θ, α), we have∣∣∣∣∣∣

1NT

GNT∑

g=1

i∈Ig

Ti∑

t=1

ϕθθit (θ, αg(θ))−Hθθ0

∣∣∣∣∣15

=

∣∣∣∣∣∣1NT

GNT∑

g=1

i∈Ig

Ti∑

t=1

ϕθθit (θ, αg(θ))−Hθθ(θ, αθg(·)) +Hθθ(θ, αθg(·))−Hθθ0

∣∣∣∣∣∣

≤ supγ∈Γ

∣∣∣∣∣∣1NT

GNT∑

g=1

i∈Ig

Ti∑

t=1

ϕθθit (θ, αg(θ))−Hθθ(θ, αθg(·))

∣∣∣∣∣∣+∣∣∣Hθθ(θ, αθg(·))−Hθθ

0

∣∣∣ p−→ 0

where the convergence in probability follows from the argument above and consistency of the estimator.

Lemma 6. max1≤g≤GNT‖ 1NgTg

∑i∈Ig

∑Ti

t=1(ϕααit (θ, αg(θ))−Eit[ϕααit (θ0, α0(i))])‖ p−→ 0 under Assumptions1-3.

Proof. Using consistency of the estimator and the definition of θ as an intermediate value,

max1≤g≤GNT

∥∥∥∥∥∥1

NgTg

i∈Ig

Ti∑

t=1

Eit[ϕααit (θ, αg(θ))− ϕααit (θ0, α0(i))]

∥∥∥∥∥∥

1NgTg

i∈Ig

Ti∑

t=1

supi,t

Eit[M(wit)]

(‖γ − γ0‖) ≤ ∆‖γ − γ0‖ ≤ ∆‖γ − γ0‖ p−→ 0.

Let ait = ϕααit (θ, α)− Eit[ϕααit (θ, α)].

Then we have Pr[max1≤g≤GNT

sup(θ,α)∈Θ×A

∥∥∥ 1NgTg

∑i∈Ig

∑Ti

t=1 ait

∥∥∥ > η]

≤∑GNT

g=1 Pr[sup(θ,α)∈Θ×A

∥∥∥ 1NgTg

∑i∈Ig

∑Ti

t=1 ait

∥∥∥ > η]. Let ε > 0 be such that 2ε supi,t Eit[M(wit)] < η/3.

Divide Γg = Θ×A into subsets Γ1, ...,Γm(ε) such that ‖(θ, α)−(θ∗, α∗)‖ < ε whenever (θ, α) and (θ∗, α∗) are in

the same subset. Let (θj , αj) denote some point in Γj for each j. Then sup(θ,α)∈Θ×A

∥∥∥ 1NgTg

∑i∈Ig

∑Ti

t=1 ait

∥∥∥= maxj sup(θ,α)∈Γj

∥∥∥ 1NgTg

∑i∈Ig

∑Ti

t=1 ait

∥∥∥ which implies Pr[sup(θ,α)∈Θ×A

∥∥∥ 1NgTg

∑i∈Ig

∑Ti

t=1 ait

∥∥∥ > η]≤

∑m(ε)j=1 Pr

[sup(θ,α)∈Γj

∥∥∥ 1NgTg

∑i∈Ig

∑Ti

t=1 ait

∥∥∥ > η].

Also, define ajit = ϕααit (θj , αj)− Eit[ϕααit (θj , αj)]. For (θ, α) ∈ Γj ,∥∥∥ 1NgTg

∑i∈Ig

∑Ti

t=1 ait

∥∥∥≤∥∥∥ 1NgTg

∑i∈Ig

∑Ti

t=1 ajit

∥∥∥+∥∥∥ 1NgTg

∑i∈Ig

∑Ti

t=1 (ϕααit (θ, α)− ϕααit (θj , αj))∥∥∥

+∥∥∥ 1NgTg

∑i∈Ig

∑Ti

t=1 (Eit[ϕααit (θj , αj)]− Eit[ϕααit (θ, α)])∥∥∥ ≤

∥∥∥ 1NgTg

∑i∈Ig

∑Ti

t=1 ajit

∥∥∥+ ε

∥∥∥ 1NgTg

∑i∈Ig

∑Ti

t=1 (M(wit)− Eit[M(wit)])∥∥∥ + η

3 . Thus, Pr[sup(θ,α)∈Γj

∥∥∥ 1NgTg

∑i∈Ig

∑Ti

t=1 ait

∥∥∥ > η]≤

Pr[∥∥∥ 1

NgTg

∑i∈Ig

∑Ti

t=1 ajit

∥∥∥τ

>(η3

)τ]+ Pr[∥∥∥ 1

NgTg

∑i∈Ig

∑Ti

t=1 (M(wit)− Eit[M(wit)])∥∥∥τ

>(

2η3ε

)τ]

= O((NgT )−τ/2) by the Markov inequality and standard results for mixing sequences as in Doukhan (1994)or Kim (1994).

It follows that Pr[max1≤g≤GNT

sup(θ,α)∈Θ×A

∥∥∥ 1NgTg

∑i∈Ig

∑Ti

t=1 (ϕααit (θ, α)− Eit[ϕααit (θ, α)])∥∥∥ > η

]≤

∆∑GNT

g=1 (NgT )−τ/2 ≤ ∆GNT (infg NgT )−τ/2 → 0 under Assumption 3. Thus,

max1≤g≤GNT

sup(θ,α)∈Θ×A

∥∥∥∥∥∥1

NgTg

i∈Ig

Ti∑

t=1

(ϕααit (θ, α)− Eit[ϕααit (θ, α)])

∥∥∥∥∥∥p−→ 0.

16

Finally, max1≤g≤GNT

∥∥∥ 1NgTg

∑i∈Ig

∑Ti

t=1(ϕααit (θ, αg(θ))− Eit[ϕααit (θ0, α0(i))])∥∥∥

≤ max1≤g≤GNT

∥∥∥ 1NgTg

∑i∈Ig

∑Ti

t=1(ϕααit (θ, αg(θ))− Eit[ϕααit (θ, αg(θ))])∥∥∥

+ max1≤g≤GNT

∥∥∥ 1NgTg

∑i∈Ig

∑Ti

t=1(Eit[ϕααit (θ, αg(θ))]− Eit[ϕααit (θ0, α0(i))])∥∥∥

≤ max1≤g≤GNTsup(θ,α)∈Θ×A

∥∥∥ 1NgTg

∑i∈Ig

∑Ti

t=1(ϕααit (θ, α)− Eit[ϕααit (θ, α)])∥∥∥

+ max1≤g≤GNT

∥∥∥ 1NgTg

∑i∈Ig

∑Ti

t=1(Eit[ϕααit (θ, αg(θ))]− Eit[ϕααit (θ0, α0(i))])∥∥∥ p−→ 0 by the previous argu-

ments.

Lemma 7. Under Assumptions 1-3,

max1≤g≤GNT

1NgTg

i∈Ig

Ti∑

t=1

ϕααit (θ, αg(θ))

−1

1NgTg

i∈Ig

Ti∑

t=1

Eit[ϕααit (θ0, α0(i))]

−1

‖ p−→ 0.

Proof. The result is immediate given convergence in Lemma 6 and Assumption 2.

Lemma 8. max1≤g≤GNT‖ 1NgTg

∑i∈Ig

∑Ti

t=1(ϕθαit (θ, αg(θ))−Eit[ϕθαit (θ0, α0(i))])‖ p−→ 0 under Assumptions1-3.

Proof. Proof proceeds similarly to the proof of Lemma 6 and is omitted. .

Lemma 9. Let ϕθαit = ϕθαit (θ, α(θ)), ϕααit = ϕααit (θ, α(θ)), ϕθαit = ϕθαit (θ0, α0(i)), and ϕααit = ϕααit (θ0, α0(i)).Also define Hθα

NT = 1NgTg

∑i∈Ig

∑Ti

t=1 ϕθαit , Hαα

NT = 1NgTg

∑i∈Ig

∑Ti

t=1 ϕααit , Hθα

NT = 1NgTg

∑i∈Ig

∑Ti

t=1 Eit[ϕθαit ],

and HααNT = 1

NgTg

∑i∈Ig

∑Ti

t=1 Eit[ϕααit ]. Under Assumptions 1-3,

∥∥∥∥1NT

GNT∑

g=1

[(NgTg)Hθα

NT

(HααNT

)−1

HαθNT − (NgTg)Hθα (Hαα

NT )−1HαθNT ]

∥∥∥∥p−→ 0.

Proof. We have that

∥∥∥∥1NT

GNT∑

g=1

[(NgTg)Hθα

NT

(HααNT

)−1

HαθNT − (NgTg)Hθα (Hαα

NT )−1HαθNT ]

∥∥∥∥

≤ max1≤g≤GNTNgTg

NGT

max

1≤g≤GNT

∥∥∥∥(HααNT

)−1

− (HααNT )−1

∥∥∥∥1

GNT

GNT∑

g=1

1NgT

i∈Ig

Ti∑

t=1

M(wit)

2

+1δ

max1≤g≤GNT

∥∥∥HαθNT −Hαθ

NT

∥∥∥ 1GNT

GNT∑

g=1

1NgT

i∈Ig

Ti∑

t=1

M(wit) +∆δ

max1≤g≤GNT

∥∥∥HαθNT −Hαθ

NT

∥∥∥]

=max1≤g≤GNT

NgTgNGT

op(1)

1GNT

GNT∑

g=1

1NgT

i∈Ig

Ti∑

t=1

M(wit)

2

+op(1)1

GNT

GNT∑

g=1

1NgT

i∈Ig

Ti∑

t=1

M(wit) + op(1)

17

using Lemmas 7 and 8. Under Assumptions 1 and 3, we have max1≤g≤GNTNgTg

NGT= O(1). We also have

E∥∥∥ 1GNT

∑GNT

g=11

NgT

∑i∈Ig

∑Ti

t=1M(wit)∥∥∥ ≤ ∆ and

E

∥∥∥∥∥∥∥1

GNT

GNT∑

g=1

1NgT

i∈Ig

Ti∑

t=1

M(wit)

2∥∥∥∥∥∥∥≤ 1GNT

g

1(NgT )2

i∈Ig

Ti∑

t=1

j∈Ig

Ti∑

s=1

(E[M(wit)2]E[M(wjs)2])1/2 ≤ ∆

which gives 1GNT

∑GNT

g=1

(1

NgT

∑i∈Ig

∑Ti

t=1M(wit))2

= Op(1) and 1GNT

∑GNT

g=11

NgT

∑i∈Ig

∑Ti

t=1M(wit) =Op(1). The conclusion then follows.

Lemma 10. If Assumptions 1-3 are satisfied, Jp−→ J for J in equation (5.7) and J defined in Assumption

2 and J−1 p−→ J−1.

Proof. The first result is immediate from Lemmas 5 and 9 and second follows immediately from thecontinuous mapping theorem under the eigenvalue condition in Assumption 2.

Lemma 11. Under Assumptions 1-2,√NT (B1− 1

NT

∑GNT

g=1 (NgTg)κgψg1) d−→ N(0,Ω) for κg and Ω definedin Assumption 2.

Proof. Let Uit = uθit − κguαit. Note that supi E∣∣∣ 1√

Ti

∑Ti

t=1 Uit

∥∥∥2

≤ C supi,t,k E[U2it,k] for some C <∞ where

Uit,k is the kth element of vector Uit follows from standard results for mixing sequences; see, e.g. Doukhan(1994) or Kim (1994). supi,t,k E[U2

it,k] ≤ supi,t E[M(wit)2] + 2 supi,t |κg|E[M(wit)2] + supi,t κ2gE[M(wit)2] ≤

∆(1 + 2 supg |κg| + supg κ2g) follows under Assumptions 1. supg |κg| ≤ ∆/δ and supg κ2

g ≤ ∆2/δ2 are also

obvious under Assumption 1 and 2. It thus follows that supi E∣∣∣ 1√

Ti

∑Ti

t=1 Uit

∥∥∥2

≤ C for some constantC <∞.

Note E∥∥∥ 1√

N

∑Ni=1

(1√T

∑Ti

t=1 Uit −√ρi

1√Ti

∑Ti

t=1 Uit

)∥∥∥2

≤ C supi

(√Ti

T −√ρi

)2

follows from indepen-

dence across i and the previous argument. Assumption 1 also gives that supi

(√Ti

T −√ρi

)2

→ 0, so

1√N

∑Ni=1

(1√T

∑Ti

t=1 Uit −√ρi

1√Ti

∑Ti

t=1 Uit

)= op(1).

Abusing notation and defining YiT =√ρi

1√Ti

∑Ti

t=1 Uit = 1Ti

∑Ti

t=1 S∗it, we have 1√

N

∑Ni=1 YiT

d−→N(0,Ω) as in Hansen (2007) Lemma 2. The conclusion then follows.

Lemma 12. Under Assumptions 1-3, B3 = Op

(ξNT supg

Ng−1Ng√

NT

), B4 = O

(((supg

Ng−1Ng

)+(

supgNg−1Ng

)2)ξ2NT

),

B5 = O((

supgNg−1Ng

)ξ2NT

), and B6 = O

((supg

Ng−1Ng

)2

ξ2NT

).

Proof. Let zit = ϕθαit (θ0, αg(i))− Eit[ϕθαit (θ0, αg(i))]. Note that E[zit] = 0 and that E[∥∥∥∑Ti

t=1 zit

∥∥∥2]≤ Ti∆

under Assumption 1, so we have E[‖B3‖2] ≤ ∆(

supgNg−1

Ng

)2C2

NT ξ2NT

NT using Lemma 4, independence, and somealgebra.

‖B4‖ ≤ ∆CNT(

supgNg−1Ng

)ξNT

(2CNT

(supg

Ng−1Ng

)ξNT + C(p)ξNT

)follows under Assumption 1,

Lemma 4, the triangle inequality, and the definition of αg(i).18

Note that under Assumption 3 ‖Eit[ϕθαit (θ0, αg(j))] − Ejt[ϕθαit (θ0, αg(j))]‖ = ‖∫Wϕθαit (θ0, αg(j))(dFit −

dFjt)‖ ≤(∫WM(w)2dFit

)‖α0(i)−α0(j)‖ ≤ ∆‖α0(i)−α0(j)‖. It then follows that ‖B5‖ ≤ ∆

(supg

Ng−1Ng

)ξ2NT .

‖B6‖ ≤ 1NT

∑GNT

g=1

∑i∈Ig

∑Ti

t=1 ∆‖αg(j)−α0(j)‖‖αg−α0(i)‖ ≤ ∆ 1NT

∑GNT

g=1

∑i∈Ig

∑Ti

t=1 ‖αg−α0(i)‖2 =

O

((supg

Ng−1Ng

)2

ξ2NT

)using The Lipschitz condition and moments bounds in Assumption 1, Lemma 4,

and the fact that ‖αg(j)− α0(j)‖ ≤ ‖αg − α0(i)‖.

For the final lemmas, let Hθαg = 1

NgTg

∑i∈Ig

∑Ti

t=1 ϕθαit (θ0, αg), Hαα

g = 1NgTg

∑i∈Ig

∑Ti

t=1 ϕααit (θ0, αg),

Hθαg = 1

NgTg

∑i∈Ig

∑Ti

t=1 Eit[ϕθαit (θ0, αg)], Hααg = 1

NgTg

∑i∈Ig

∑Ti

t=1 Eit[ϕααit (θ0, αg)],

Hθαg = 1

NgTg

∑i∈Ig

∑Ti

t=1 Eit[ϕθαit (θ0, α0(i))], and Hααg = 1

NgTg

∑i∈Ig

∑Ti

t=1 Eit[ϕααit (θ0, α0(i))].

Note that∑GNT

g=1 (NgTg)Hθαg (Hαα

g )−1(∑6j=1 ψgj) =

∑GNT

g=1 (NgTg)Hθαg (Hαα

g )−1(∑6j=1 ψgj)

+∑GNT

g=1 (NgTg)(Hθαg −Hθα

g )(Hααg )−1(

∑6j=1 ψgj) +

∑GNT

g=1 (NgTg)Hθαg

[(Hαα

g )−1 − (Hααg )−1

](∑6j=1 ψgj).

Lemma 13. Under Assumptions 1-3, 1√NT

∑GNT

g=1 (NgTg)Hθαg (Hαα

g )−1(∑6j=2 ψgj) = op(1).

Proof. 1NT

∑GNT

g=1 (NgTg)Hθαg (Hαα

g )−1(ψg2) = 1NT

∑GNT

g=1

∑i∈Ig

∑Ti

t=1 κgEit[ϕαit(θ0, α0(i))] = 1√

NTo(1) by

Assumption 2. For the remaining terms, note that ‖κg‖ ≤ ∆/δ. That each of these terms is op(1) thenfollows by arguments similar to those used in Lemma 12.

Lemma 14. Under Assumptions 1-3, 1√NT

∑GNT

g=1 (NgTg)(Hθαg −Hθα

g )(Hααg )−1(

∑6j=1 ψgj) = op(1).

Proof. 1NT

∑GNT

g=1 (NgTg)(Hθαg −Hθα

g )(Hααg )−1(

∑6j=1 ψgj) = 1

NT

∑GNT

g=1 (NgTg)(Hθαg −Hθα

g )(Hααg )−1(

∑6j=1 ψgj)+

1NT

∑GNT

g=1 (NgTg)(Hθαg −Hθα

g )(Hααg )−1(

∑6j=1 ψgj). Showing that the first term on the right is op(1) follows

from repeatedly applying the Cauchy-Schwarz and Triangle Inequalities to each element of the sum using As-sumption 2, standard mixing and moment inequalities as in Doukhan (1994) or Kim (1994), and argumentssimilar to those used in Lemmas 12.

For the remaining piece, we have∥∥∥ 1NT

∑GNT

g=1 (NgTg)(Hθαg −Hθα

g )(Hααg )−1(

∑6j=1 ψgj)

∥∥∥≤ C

NT max1≤g≤GNT

∥∥∥(Hααg )−1 − (Hαα

g )−1∥∥∥∑GNT

g=1 (NgTg)∥∥∥∑6j=1 ψgj

∥∥∥∥∥∥∑6j=1 ψgj

∥∥∥+ C

NT

∑GNT

g=1 (NgTg)∥∥∥∑6j=1 ψgj

∥∥∥∥∥∥∑6j=1 ψgj

∥∥∥ . From arguments identical to those used to verify Lemma 7,

we can show max1≤g≤GNT

∥∥∥(Hααg )−1 − (Hαα

g )−1∥∥∥ p−→ 0; so it suffices to show

1√NT

∑GNT

g=1 (NgTg)∥∥∥∑6j=1 ψgj

∥∥∥∥∥∥∑6j=1 ψgj

∥∥∥ p−→ 0. This demonstrations once again follows from repeateduse of the Cauchy-Schwarz and Triangle Inequalities and results established above and in Lemmas 10-13.

Lemma 15. 1√NT

∑GNT

g=1 (NgTg)Hθαg

[(Hαα

g )−1 − (Hααg )−1

](∑6j=1 ψgj) = op(1) under Assumptions 1-3.

Proof. From a mean value expansion of (Hααg )−1 about Hαα

g = Hααg , we have (Hαα

g )−1 − (Hααg )−1 =

−(Hαα∗g )−1(Hαα

g −Hααg )(Hαα∗

g )−1 where ‖Hαα∗g −Hαα

g ‖ ≤ ‖Hααg −Hαα

g ‖. Expanding further yields

(Hααg )−1 − (Hαα

g )−1 = −(Hααg )−1(Hαα

g −Hααg )(Hαα

g )−1

− (Hααg )−1(Hαα

g −Hααg )[(Hαα∗

g )−1 − (Hααg )−1]

− [(Hαα∗g )−1 − (Hαα

g )−1](Hααg −Hαα

g )(Hααg )−1

19

− [(Hαα∗g )−1 − (Hαα

g )−1](Hααg −Hαα

g )[(Hαα∗g )−1 − (Hαα

g )−1].

Plugging this expression into 1√NT

∑GNT

g=1 (NgTg)Hθαg

[(Hαα

g )−1 − (Hααg )−1

](∑6j=1 ψgj) and making use of

max1≤g≤GNT‖(Hαα∗

g )−1 − (Hααg )−1‖ p−→ 0 which can be demonstrated as in Lemma 7 using that ‖Hαα∗

g −Hααg ‖ ≤ ‖Hαα

g −Hααg ‖, the result follows exactly as Lemma 14 with Hθα

g , Hθαg , and Hθα

g replaced by Hααg ,

Hααg , and Hαα

g .

5.3. Proof of Proposition 1

We prove consistency by verifying the conditions of Newey and Powell (2003) Lemma A1. We note thatΓ = Θ × (×∞i=1A) is compact for the norm ‖ · ‖ defined in Assumption 1. The conditions of Newey andPowell (2003) Lemma A2 are therefore satisfied using Lemmas 1 and 2. Lemma A2 of Newey and Powell(2003) implies conditions (i) and (ii) of Newey and Powell (2003) Lemma A1, and Lemma 3 above impliescondition (iii). Thus, the conditions of Newey and Powell (2003) Lemma A1 are satisfied from whichγG

p−→ γ0 = (θ0, α0(·)) follows.

That√NT (θ−θ0) d−→ J−1N(0,Ω) is immediate from the expansions in Sections 5.1 and Lemmas 10-15.

References

Anderson, E. (1970): “Asymptotic Properties of Conditional Maximum Likelihood Estimators,” Journal of theRoyal Statistical Society, Series B, 32(2), 283–301.

Arellano, M. (2003): “Discrete Choice with Panel Data,” Investigaciones Economicas, 27(3), 423–458.Arellano, M., and S. Bonhomme (2009): “Robust Priors in Nonlinear Panel Data Models,” forthcoming Econo-metrica.

Baltagi, B. (1992): “Specification Issues,” in The Econometrics of Panel Data, ed. by Matyas, and Sevestre. KluwerAcademic Publishers.

Bester, A. C., and C. Hansen (2007): “A Penalty Function Approach to Bias Reduction in Nonlinear PanelModels with Fixed Effects,” forthcoming Journal of Business and Economic Statistics.

Bester, C. A., and C. Hansen (2010): “Grouped Effects Estimators in Fixed Effects Models,” SSRN WorkingPaper, www.ssrn.com.

Bester, C. A., and C. B. Hansen (2008): “Identification of Marginal Effects in a Correlated Random EffectsModel,” forthcoming Journal of Business and Economic Statistics.

Bonhomme, S. (2010): “Functional Differencing,” Working Paper.Browning, M., and J. M. Carro (2009): “Dynamic Binary Outcome Models with Maximal Heterogeneity,”Working Paper.

Carro, J. M. (2006): “Estimating Dynamic Panel Data Discrete Choice Models,” forthcoming Journal of Econo-metrics.

Chamberlain, G. (1980): “Analysis of Covariance with Qualitative Data,” Review of Economic Studies, 47, 225–238.Chen, S., and S. Khan (2007): “Semiparametric Estimation of Nonstationary Censored Panel Data Models withTime-Varying Factor Loads,” forthcoming, Econometric Theory.

Chernozhukov, V., H. Hong, and E. Tamer (2004): “Inference on Identified Parameter Sets in EconometricModels,” MIT Working Paper.

Chernozukov, V., I. Fernandez-Val, J. Hahn, and W. Newey (2009): “Identification and Estimation ofMarginal Effects in Nonlinear Panel Models,” Working Paper, Department of Economics, MIT.

Doukhan, P. (1994): Mixing: Properties and Examples, vol. 85 of Lecture Notes in Statistics (Springer-Verlag). NewYork: Springer-Verlag, Editors S. Fienberg, J. Gani, K. Krickeberg, I. Olkin, and N. Wermuth.

Fernandez-Val, I. (2005): “Estimation of Structural Parameters and Marginal Effects in Binary Choice Panel DataModels with Fixed Effects,” Mimeo.

Gayle, G.-L., and C. Viauroux (2007): “Root-N Consistent Semiparametric Estimators of a Dynamic PanelSample Selection Model,” Journal of Econometrics, 141(1), 179–212.

20

Hahn, J., and G. Kuersteiner (2004): “Bias Reduction for Dynamic Nonlinear Panel Models with Fixed Effects,”Mimeo.

Hahn, J., and G. M. Kuersteiner (2002): “Asymptotically Unbiased Inference for a Dynamic Panel Model withFixed Effects When Both N and T Are Large,” Econometrica, 70(4), 1639–1657.

Hahn, J., and W. K. Newey (2004): “Jackknife and Analytical Bias Reduction for Nonlinear Panel Models,”Econometrica, 72(4), 1295–1319.

Hansen, C. B. (2007): “Asymptotic Properties of a Robust Variance Matrix Estimator for Panel Data when T isLarge,” Journal of Econometrics, 141, 597–620.

Hausman, J. A., and W. E. Taylor (1981): “Panel Data and Unobservable Individual Effects,” Econometrica,49(6), 1377–1398.

Heckman, J., and B. Singer (1984): “A Method for Minimizing the Impact of Distributional Assumptions inEconometric Models for Duration Data,” Econometrica, 52(2), 271–320.

Henderson, D. J., and A. Ullah (2005): “A Nonparametric Random Effects Estimator,” Economics Letters,88(3), 403–407.

Honore, B. E. (1992): “Trimmed LAD and Least Squares Estimation of Truncated and Censored Models withFixed Effects,” Econometrica, 60(3), 533–565.

Honore, B. E., and E. Kyriazidou (2000): “Panel Data Discrete Choice Models with Lagged Dependent Vari-ables,” Econometrica, 68(4), 839–874.

Honore, B. E., and A. Lewbel (2002): “Semiparametric Binary Choice Panel Data Models Without StrictlyExogenous Regressors,” Econometrica, 70, 2053–2063.

Honore, B. E., and E. Tamer (2006): “Bounds on the Parameters in Panel Dynamic Discrete Choice Models,”Econometrica, 74(3), 611–632.

Kim, T. Y. (1994): “Moment Bounds for Non-Stationary Dependent Sequences,” Journal of Applied Probability, 31,731–742.

Lancaster, T. (2002): “Orthogonal Parameters and Panel Data,” Review of Economic Studies, 69, 647–666.Lewbel, A. (2005): “Simple Endogenous Binary Choice and Selection Panel Model Estimators,” mimeo.Lin, X., and R. J. Carroll (2000): “Nonparametric function estimation for clustered data when the predictor ismeasured without/with error,” Journal of the American Statistical Association, 95, 520–534.

Lindley, D. V., and A. F. M. Smith (1972): “Bayes Estimates for the Linear Model,” Journal of the RoyalStatistical Society, Series B, 34, 1–41.

Manski, C. (1987): “Semiparametric Analysis of Random Effects Linear Models from Binary Panel Data,” Econo-metrica, 55(2), 357–362.

Matyas, L., and P. Blanchard (1998): “Misspecified heterogeneity in panel data models,” Statistical Papers, 39,1–27.

Mundlak, Y. (1978): “On the Pooling of Time Series and Cross Section Data,” Econometrica, 46, 69–85.Newey, W. K., and J. L. Powell (2003): “Instrumental Variable Estimation of Nonparametric Models,” Econo-metrica, 71, 1565–1578.

Neyman, J., and E. L. Scott (1948): “Consistent Estimates Based on Partially Consistent Observations,” Econo-metrica, 16(1), 1–32.

Raudenbush, S. W., and A. S. Bryk (2002): Hierarchical Linear Models: Applications and Data Analysis Methods.Thousand Oaks: Sage Publications, second edn.

Sun, Y. X. (2005): “Estimation and Inference in Panel Structure Models,” working paper, UCSD.Ullah, A., and N. Roy (1998): “Nonparametric and Semiparametric Econometrics of Panel Data,” in Handbookof Applied Economic Statistics, ed. by A. Ullah, and D. E. A. Giles, vol. 1. Mercel Dekker: New York, NY.

Wooldridge, J. M. (2002): Econometric Analysis of Cross Section and Panel Data. Cambridge, Massachusetts:The MIT Press.

(2005): “Unobserved Heterogeneity and Estimation of Average Partial Effects,” in Identification and Infer-ence for Econometric Models: Essays in Honor of Thomas Rothenberg, ed. by D. W. K. Andrews, and J. H. Stock.Cambridge University Press.

Woutersen, T. (2005): “Robustness against Incidental Parameters and Mixing Distributions,” Mimeo.

21