Download - netivreg: Estimation of Peer Effects in Endogenous Social ...fm · Pablo Estrada 1Juan Estrada Kim P. Huynh2,1 David T. Jacho-Ch avez Leonardo S anchez-Arag on3 The views expressed

netivreg: Estimation of Peer Effects in Endogenous Social

Networks

Pablo Estrada1 Juan Estrada1 Kim P. Huynh2,1 David T. Jacho-Chavez1 Leonardo Sanchez-

Aragon3

The views expressed in this paper are solely those of the authors and may differ from official Bank of Canada

views. No responsibility for them should be attributed to the Bank of Canada.

1Emory University, 2Bank of Canada, 3ESPOL University

Motivation

• Estimation of network effects is becoming increasingly common.

• Interest on structural coefficients: endogenous peer effects and contextual effects

• Estimate treatment effects and spillovers under interference.

• Exogenous network formation is a commonly used assumption in empirical work.

• Recent methods allowing for the presence of network endogeneity require explicit structural

restrictions on the network formation process.

• Research Question: can the multiplex network data structure help with the treatment of

identification issues?

1

Preview of the Results

• Propose novel instruments based on the topology of multiplex networks to construct the

estimator.

• Provide new identification results for peer/contextual effects that generalize existing meth-

ods by accounting for potential endogenous network formation.

• Computationally easy to implement estimator that is consistent and asymptotically normal.

• Stata implementation: netivreg.

2

Framework

1

2

3 4

5 6

• Contextual Effects (interference): i ’s outcome de-

pends on the characteristics of other units.

• Endogenous Peer Effects (multiplier).

yi = α + β∑i 6=j

Wi,jyj + δ∑i 6=j

Wi,jxj + γxi + εi .

Objective: identify and consistently estimate the parameters (α, β, γ, δ).

3

Main Challenges: Observational Data

• Simultaneity of the peer effects regressors (reflection problem)

• The decision of forming a peer connection can be correlated with unobserved

characteristics or there could exists common shocks (correlated effects)

• The network structure could induce correlation between X and ε (unobserved homophily)

4

Data Structure and Main Idea

y = α0ι+ β0Wy + δ0WXδ0 + Xγ0 + ε, with E [ε |W,X] 6= 0 and E [ε |W0,X] = 0.

1

2

3

4

Exogenous Net. (W0)

Net. of Interest (W) 1

2

3

4

1

2

3

4

G

G0

5

Ideal Experiment

• Individuals are (quasi-) randomized into groups (for example classrooms) defining W0.

• Only the fact that two individuals share a classrooms does not necessarily generate social

effects.

• It is possible to observe a relevant network (for example friendship) defining W.

• This method can be used to causally estimate network friendship effects.

6

Main Identifying Assumptions

1. Monolayer Linear model and Bi-layer multiplex network data M = 2 (W and W0).

2. Conditional distribution F(ε | X,M) is such that E[ε|W,X] 6= 0 and E[ε|W0,X] = 0.

3. The networks generating the adjacency matrices W and W0 are correlated in the sense that

it is possible to find connections in common (E0 ∩ E1 6= 0) and distance two paths that

change edge type ((i , j) ∈ E0 and (j , k) ∈ E1).

1

2

1

2

3

7

Identification

Let Π be the projection coefficients from a regression of WS on W0S, where S = [y X].

Theorem:

Let Assumptions 1, 2 , 3 , and γ0(π11β0 + π12δ

0) + π21β0 + π22δ

0 6= 0 hold. If the matrices

I, W0, W02 are linearly independent, then the parameters α0, β0, γ0 and δ0 are identified.

Remark

Note that this is a generalization of the identification result in Proposition 1 of Bramoulle et al.

(2009, JoE), i.e., if W0 = W, one has Π = I, and the condition reduces to γ0β0 + δ0 6= 0 and

the matrices I, W and W2 being linearly independent.

Rank Condition

8

Estimation

y = α0ι+ WSθ0 + Xγ0 + ε for S = [y X] and θ0 = [β0 δ0]

y = α0ι+ W0Sθ∗ + Xγ0 + e, for θ∗ = Πθ0.

Estimation Procedure

1. Estimate Π by OLS (WS on W0S).

2. 2SLS of [ι,X,W0y,W0X] with instrument Z =[ι,X,W0

2X,W0X]. Calculate θ = Π−1θ∗.

3. IV of[ι,X, Wy, WX

]with instruments Z∗ =[

ι,X, [E (W0y|X,W0) ,W0X] Π].

Estimator and Properties

ψG3SLS =(

Z∗>D)−1

Z∗>y,

√n(ψG3SLS −ψ)

d−→ N(0,Vψ)

9

Stata Implementation

Empirical Application: Specification

W: Coauthors - W0: Alumni

yi,r ,t = α + β∑j 6=i

w`;i,j,tyj,r ,t +∑j 6=i

w`;i,j,t x>j,r ,tδ + x>`;i,r ,tγ + λr + λt + λ0 + εi,r ,t

Peer Effects (β)

log(# Citations)

Contextual Effects (δ)

Editor

Different Gender

Direct Effects (γ)

Editor

Different Gender

# Authors

# Pages

# References

Fixed Effects (λs)

Journal

Year

Institutions Component

netivreg lcitations editor diff gender n pages n authors n references isolated

(edges = edges0), wx(diff gender editor) cluster(c coauthor) first second

10

Empirical Application: Specification

W: Coauthors - W0: Alumni

yi,r ,t = α + β∑j 6=i

w`;i,j,tyj,r ,t +∑j 6=i

w`;i,j,t x>j,r ,tδ + x>`;i,r ,tγ + λr + λt + λ0 + εi,r ,t

Peer Effects (β)

log(# Citations)


Editor

Different Gender

Direct Effects (γ)

Editor

Different Gender

# Authors

# Pages

# References

Fixed Effects (λs)

Journal

Year

Institutions Component

netivreg lcitations editor diff gender n pages n authors n references isolated

(edges = edges0), wx(diff gender editor) cluster(c coauthor) first second10

Data Structure

Stata 16 Capabilities: (1) Python Integration for Sparse Matrices and (2) Multiframes

W (Coauthors) W0 (Alumni) (y,X)

11

Step1: Projection Coefficients

WS = W0SΠ + U,

12

Step2: 2SLS estimation

2SLS of [ι,X,W0y,W0X] with instrument Z =[ι,X,W0

2X,W0X]

13

Step3: Optimal Instrument

IV of[ι,X, Wy, WX

]with instruments Z∗ =

[ι,X, [E (W0y|X,W0) ,W0X] Π

]

14

Conclusion

• Identification of a linear-in-means model with endogenous network.

• Computationally simple estimation that uses two-layered multiplex network structure with

Stata implementation.

• Robust to different types of network endogeneity. It does not require to model unobserved

heterogeneity and network formation.

15

Appendix

Distortions Induced by Social Effects: Consumption Examples

• If individuals care about status (conspicuous consumption models), the proportion of con-

spicuous consumption may increase with respect to other goods.

• If conspicuous consumption is considered wasteful, peer effects might have noticeable welfare

consequences.

• Savings may differ from the optimal in an attempt to keeping up with the peers.

Empirical Work

Aggregate Effects: Consumption Example

• Unanticipated tax changes to the rich might have aggregate consequences.

• If individuals who are not affected by the shock change their consumption after observing

changes in consumption of the rich, the shock can spread through the network.

• Social multipliers depend on the size of the endogenous peer effects and the connectedness

of the affected groups.

Empirical Work

Angrist’s (2014) Critique: Group Regressions

• Reflection Problem: a regression of individual outcomes on group mean outcomes is

tautological.

• Correlated Effects: even the leave-one-out estimator does not provide information of

human behavior. “Like students in the same school, households from the same village are

similar in many ways”.

• Mechanical Relationship: the coefficient on group averages in a multivariate model of

endogenous peer effects does not reveal the action of social forces. He interprets the vale

1/(1−β) as approximately the ratio of the 2SLS to OLS estimands for the effect of individual

covariates on outcomes (using dummy groups as instruments).

Motivation

Angrist’s (2014) Critique: Network Regressions

• Start by a saturated model E [yi | xi ] = γ0 + γ1xi satisfying E [ui | xi ] = 0, for ui ≡yi − γ0 − γxi .

• Individuals are ordered from left to right. Each person i is connected only with the individual

to her left i − 1. Friends are only similar on unobservables: ui = βui−1 + εi .

• The outcome can be written in a linear-in-means (lmm) model form:

yi = γ0(1− β) + βyi−1 + γx − βγxi−1 + εi

• Flaw in Angrist’s example: let δ = −βγ to write this model exactly as a lmm. Note that

δ + γβ = 0 so that the outcome equation can be written as (for α = γ0(1− β))

yi =α

1− β+ γxi + vi

Motivation

Different Network Effects

Empirical

Network Effects

Consumption

Coworkers

DFP(2020, Restud)

Financial Decis.

Friends

BEFY(2014, ECA)

Education

Friends

CPZ(2009, Restud)

Substance Abuse

Classmates

GR(2001, Restats)

Labor Market

Coworkers

MM(2009, AER)

Diseases Transm.

Classmates

MK(2004, ECA)

Tech. Adoption

Neighb.

CU(2010, AER)

Knowledge Spillov.

Coauthors

Za.(2020, Restud)

Motivation

Critique to Randomly Assigned Groups

• In principle, randomization of peers would guarantee identification in a monolayer linear in

means model where endogenous network formation is ruled out.

• It can completely eliminate the problem of unobserved common variables.

• However, if individuals endogenously form groups (homophily), there can be a subsequent

resorting. If resorting happens faster than the effects of social interactions, identification is

not possible.

• Even with random peers, researchers face a classical problem of omitted variables when

trying to estimate contextual effects (E[xiεj | wi,j = 1] 6= 0).

Literature

Multilayers Networks in Economics

Labor Supply

• Sisters, Cousins and Neighbors networks

(NST (2018, AEJ))

Consumption

• Coworker and Spouses networks (DFP

(2020, Restud))

Education

• Friendship network in t and t − 1 (GI

(2013, JBES))

• Roommates, classmates, Study-mate,

Friendship networks (CL (2015))

• Siblings and Classmates networks (NR

(2017, JAE))

Publication Outcomes

• Coauthors, Alumni and Same Advisor

networks (EHJS (2020))

Multilayer

Microfoundations

• The monolayer linear model of interest corresponds with the best response of a Bayesian Game of

Social Interactions as proposed by Blume, Brock, Durlauf and Jayaraman (2015, JPE).

• Quadratic utility with social pressure or strategic complementarities

Ui (ωi , ω−i ) =

(γxi + zi + δ

∑j

cijxj

)ωi −

1

2ω2i −

φ

2

(ωi −

∑j

aijωj

)2

• In their model endogeneity arises because an individual i , observing that he is connected to j , make

an inference about the value of zj that is dependent on xj . Then, xj will be correlated with εi in

my equation of interest.

• Their critique of instrumental variable is that if individual i observe the instruments vj , he can use

it to predict zj which will induce correlation between εi and the instrument.

• Our instrument is based on xr of individuals r connected to i in a network that is independent of

the individuals’ utilities. Therefore xr is not useful to predict zj . Model

Positioning the Research Agenda in the Literature

Identification of Network

Effects Linear ModelsCorrelated Effects

Reflection Problem

Solved by Network Data

BDF (2009, JoE)

Natural and Artificial

Experiments

Randomized Peers

Observational Studies

Random Shocks

DFP (2020, Restud)

Structural Endogeneity

JM (2020, Restats)

Panel Data

KP (2020, ECA)Multilayer Networks

This Project

Assumptions

Assumption 1

There exists a n × n adjacency matrix W0 such that: E[v|x,W0] = 0

Assumption 2

Let Π be the be the full-rank matrix of coefficients from the system regression

WS = W0SΠ + U,

E [U|W0y,W0,X] = O.

where E[S>w0;iw0;i>S] > 0. Furthermore, the first row of Π is such that π11β+π12δ < 1/λmax,

where λmax is the largest eigenvalue of W0.

Identification

Rank Condition

• Given that rank(Π) ≤ min{rank(E [S>w0;iw>0;iS]−1), rank(E [S>w0;iw

>i S])}, a necessary con-

dition for rank(Π) = k + 1 is that rank(E [S>w0;iw>i S]) = k + 1 which would be equivalent

to the relevance condition in the classical Instrumental Variable literature.

• For large enough sample, this condition imposes some restriction on the matrix W0W. This

matrix contains the connections in common across the two networks in the main diagonal,

and length two paths that change color in the off- diagonal.

• It cannot be zero so there have to be enough connections in common and indirect triads

that change colors. This is a way to think about the correlation between the two matrices.

Identification

Mote Carlo Experiments

−2

0

2

Un

ob

serv

ed

Het

ero

gen

eity

Peer Effects (β)

−20

−10

0

10

20


−4

−2

0

2

4

6

Direct Effects (γ)

n = 50 n = 100 n = 200

0

0.5

1

Mis

smea

sure

d

Lin

ks

n = 50 n = 100 n = 200

0

1

2

3

n = 50 n = 100 n = 200

1

1.5

2

Empirial Application: Data

• 1,628 articles published in the American Economic Review, Econometrica, the Journal of

Political Economy, and the Quaterly Journal of Economics between 2000 and 2006. Source:

RePEc, Scopus, and Journal Websites.

• Employment, education, and research interest information for 1,985 unique authors and

42 unique editors (37 of which also published papers in thee journals in this time period).

Source: Web scrapping/text mining and Colussi (2018, ReStat).

• Co-authorship (` = 1) and Alumni (` = 0) networks are constructed for all 2,027 scholars.

Articles i and j are connected in network W` if at least one of the authors of article i shares a

professional connection of type ` with at least one of article j ’s authors.

Empirical Application