netivreg: Estimation of Peer Effects in Endogenous Social
Networks
Pablo Estrada1 Juan Estrada1 Kim P. Huynh2,1 David T. Jacho-Chavez1 Leonardo Sanchez-
Aragon3
The views expressed in this paper are solely those of the authors and may differ from official Bank of Canada
views. No responsibility for them should be attributed to the Bank of Canada.
1Emory University, 2Bank of Canada, 3ESPOL University
Motivation
• Estimation of network effects is becoming increasingly common.
• Interest on structural coefficients: endogenous peer effects and contextual effects
• Estimate treatment effects and spillovers under interference.
• Exogenous network formation is a commonly used assumption in empirical work.
• Recent methods allowing for the presence of network endogeneity require explicit structural
restrictions on the network formation process.
• Research Question: can the multiplex network data structure help with the treatment of
identification issues?
1
Motivation
• Estimation of network effects is becoming increasingly common.
• Interest on structural coefficients: endogenous peer effects and contextual effects
• Estimate treatment effects and spillovers under interference.
• Exogenous network formation is a commonly used assumption in empirical work.
• Recent methods allowing for the presence of network endogeneity require explicit structural
restrictions on the network formation process.
• Research Question: can the multiplex network data structure help with the treatment of
identification issues?
1
Preview of the Results
• Propose novel instruments based on the topology of multiplex networks to construct the
estimator.
• Provide new identification results for peer/contextual effects that generalize existing meth-
ods by accounting for potential endogenous network formation.
• Computationally easy to implement estimator that is consistent and asymptotically normal.
• Stata implementation: netivreg.
2
Framework
1
2
3 4
5 6
• Contextual Effects (interference): i ’s outcome de-
pends on the characteristics of other units.
• Endogenous Peer Effects (multiplier).
yi = α + β∑i 6=j
Wi,jyj + δ∑i 6=j
Wi,jxj + γxi + εi .
Objective: identify and consistently estimate the parameters (α, β, γ, δ).
3
Framework
1
2
3 4
5 6
• Contextual Effects (interference): i ’s outcome de-
pends on the characteristics of other units.
• Endogenous Peer Effects (multiplier).
yi = α + β∑i 6=j
Wi,jyj + δ∑i 6=j
Wi,jxj + γxi + εi .
Objective: identify and consistently estimate the parameters (α, β, γ, δ).
3
Main Challenges: Observational Data
• Simultaneity of the peer effects regressors (reflection problem)
• The decision of forming a peer connection can be correlated with unobserved
characteristics or there could exists common shocks (correlated effects)
• The network structure could induce correlation between X and ε (unobserved homophily)
4
Data Structure and Main Idea
y = α0ι+ β0Wy + δ0WXδ0 + Xγ0 + ε, with E [ε |W,X] 6= 0 and E [ε |W0,X] = 0.
1
2
3
4
Exogenous Net. (W0)
Net. of Interest (W) 1
2
3
4
1
2
3
4
G
G0
5
Ideal Experiment
• Individuals are (quasi-) randomized into groups (for example classrooms) defining W0.
• Only the fact that two individuals share a classrooms does not necessarily generate social
effects.
• It is possible to observe a relevant network (for example friendship) defining W.
• This method can be used to causally estimate network friendship effects.
6
Ideal Experiment
• Individuals are (quasi-) randomized into groups (for example classrooms) defining W0.
• Only the fact that two individuals share a classrooms does not necessarily generate social
effects.
• It is possible to observe a relevant network (for example friendship) defining W.
• This method can be used to causally estimate network friendship effects.
6
Ideal Experiment
• Individuals are (quasi-) randomized into groups (for example classrooms) defining W0.
• Only the fact that two individuals share a classrooms does not necessarily generate social
effects.
• It is possible to observe a relevant network (for example friendship) defining W.
• This method can be used to causally estimate network friendship effects.
6
Ideal Experiment
• Individuals are (quasi-) randomized into groups (for example classrooms) defining W0.
• Only the fact that two individuals share a classrooms does not necessarily generate social
effects.
• It is possible to observe a relevant network (for example friendship) defining W.
• This method can be used to causally estimate network friendship effects.
6
Main Identifying Assumptions
1. Monolayer Linear model and Bi-layer multiplex network data M = 2 (W and W0).
2. Conditional distribution F(ε | X,M) is such that E[ε|W,X] 6= 0 and E[ε|W0,X] = 0.
3. The networks generating the adjacency matrices W and W0 are correlated in the sense that
it is possible to find connections in common (E0 ∩ E1 6= 0) and distance two paths that
change edge type ((i , j) ∈ E0 and (j , k) ∈ E1).
1
2
1
2
3
7
Identification
Let Π be the projection coefficients from a regression of WS on W0S, where S = [y X].
Theorem:
Let Assumptions 1, 2 , 3 , and γ0(π11β0 + π12δ
0) + π21β0 + π22δ
0 6= 0 hold. If the matrices
I, W0, W02 are linearly independent, then the parameters α0, β0, γ0 and δ0 are identified.
Remark
Note that this is a generalization of the identification result in Proposition 1 of Bramoulle et al.
(2009, JoE), i.e., if W0 = W, one has Π = I, and the condition reduces to γ0β0 + δ0 6= 0 and
the matrices I, W and W2 being linearly independent.
Rank Condition
8
Estimation
y = α0ι+ WSθ0 + Xγ0 + ε for S = [y X] and θ0 = [β0 δ0]
y = α0ι+ W0Sθ∗ + Xγ0 + e, for θ∗ = Πθ0.
Estimation Procedure
1. Estimate Π by OLS (WS on W0S).
2. 2SLS of [ι,X,W0y,W0X] with instrument Z =[ι,X,W0
2X,W0X]. Calculate θ = Π−1θ∗.
3. IV of[ι,X, Wy, WX
]with instruments Z∗ =[
ι,X, [E (W0y|X,W0) ,W0X] Π].
Estimator and Properties
ψG3SLS =(
Z∗>D)−1
Z∗>y,
√n(ψG3SLS −ψ)
d−→ N(0,Vψ)
9
Stata Implementation
Empirical Application: Specification
W: Coauthors - W0: Alumni
yi,r ,t = α + β∑j 6=i
w`;i,j,tyj,r ,t +∑j 6=i
w`;i,j,t x>j,r ,tδ + x>`;i,r ,tγ + λr + λt + λ0 + εi,r ,t
Peer Effects (β)
log(# Citations)
Contextual Effects (δ)
Editor
Different Gender
Direct Effects (γ)
Editor
Different Gender
# Authors
# Pages
# References
Fixed Effects (λs)
Journal
Year
Institutions Component
netivreg lcitations editor diff gender n pages n authors n references isolated
(edges = edges0), wx(diff gender editor) cluster(c coauthor) first second
10
Empirical Application: Specification
W: Coauthors - W0: Alumni
yi,r ,t = α + β∑j 6=i
w`;i,j,tyj,r ,t +∑j 6=i
w`;i,j,t x>j,r ,tδ + x>`;i,r ,tγ + λr + λt + λ0 + εi,r ,t
Peer Effects (β)
log(# Citations)
Contextual Effects (δ)
Editor
Different Gender
Direct Effects (γ)
Editor
Different Gender
# Authors
# Pages
# References
Fixed Effects (λs)
Journal
Year
Institutions Component
netivreg lcitations editor diff gender n pages n authors n references isolated
(edges = edges0), wx(diff gender editor) cluster(c coauthor) first second
10
Empirical Application: Specification
W: Coauthors - W0: Alumni
yi,r ,t = α + β∑j 6=i
w`;i,j,tyj,r ,t +∑j 6=i
w`;i,j,t x>j,r ,tδ + x>`;i,r ,tγ + λr + λt + λ0 + εi,r ,t
Peer Effects (β)
log(# Citations)
Contextual Effects (δ)
Editor
Different Gender
Direct Effects (γ)
Editor
Different Gender
# Authors
# Pages
# References
Fixed Effects (λs)
Journal
Year
Institutions Component
netivreg lcitations editor diff gender n pages n authors n references isolated
(edges = edges0), wx(diff gender editor) cluster(c coauthor) first second10
Data Structure
Stata 16 Capabilities: (1) Python Integration for Sparse Matrices and (2) Multiframes
W (Coauthors) W0 (Alumni) (y,X)
11
Step1: Projection Coefficients
WS = W0SΠ + U,
12
Step2: 2SLS estimation
2SLS of [ι,X,W0y,W0X] with instrument Z =[ι,X,W0
2X,W0X]
13
Step3: Optimal Instrument
IV of[ι,X, Wy, WX
]with instruments Z∗ =
[ι,X, [E (W0y|X,W0) ,W0X] Π
]
14
Conclusion
• Identification of a linear-in-means model with endogenous network.
• Computationally simple estimation that uses two-layered multiplex network structure with
Stata implementation.
• Robust to different types of network endogeneity. It does not require to model unobserved
heterogeneity and network formation.
15
Appendix
Distortions Induced by Social Effects: Consumption Examples
• If individuals care about status (conspicuous consumption models), the proportion of con-
spicuous consumption may increase with respect to other goods.
• If conspicuous consumption is considered wasteful, peer effects might have noticeable welfare
consequences.
• Savings may differ from the optimal in an attempt to keeping up with the peers.
Empirical Work
Aggregate Effects: Consumption Example
• Unanticipated tax changes to the rich might have aggregate consequences.
• If individuals who are not affected by the shock change their consumption after observing
changes in consumption of the rich, the shock can spread through the network.
• Social multipliers depend on the size of the endogenous peer effects and the connectedness
of the affected groups.
Empirical Work
Angrist’s (2014) Critique: Group Regressions
• Reflection Problem: a regression of individual outcomes on group mean outcomes is
tautological.
• Correlated Effects: even the leave-one-out estimator does not provide information of
human behavior. “Like students in the same school, households from the same village are
similar in many ways”.
• Mechanical Relationship: the coefficient on group averages in a multivariate model of
endogenous peer effects does not reveal the action of social forces. He interprets the vale
1/(1−β) as approximately the ratio of the 2SLS to OLS estimands for the effect of individual
covariates on outcomes (using dummy groups as instruments).
Motivation
Angrist’s (2014) Critique: Network Regressions
• Start by a saturated model E [yi | xi ] = γ0 + γ1xi satisfying E [ui | xi ] = 0, for ui ≡yi − γ0 − γxi .
• Individuals are ordered from left to right. Each person i is connected only with the individual
to her left i − 1. Friends are only similar on unobservables: ui = βui−1 + εi .
• The outcome can be written in a linear-in-means (lmm) model form:
yi = γ0(1− β) + βyi−1 + γx − βγxi−1 + εi
• Flaw in Angrist’s example: let δ = −βγ to write this model exactly as a lmm. Note that
δ + γβ = 0 so that the outcome equation can be written as (for α = γ0(1− β))
yi =α
1− β+ γxi + vi
Motivation
Different Network Effects
Empirical
Network Effects
Consumption
Coworkers
DFP(2020, Restud)
Financial Decis.
Friends
BEFY(2014, ECA)
Education
Friends
CPZ(2009, Restud)
Substance Abuse
Classmates
GR(2001, Restats)
Labor Market
Coworkers
MM(2009, AER)
Diseases Transm.
Classmates
MK(2004, ECA)
Tech. Adoption
Neighb.
CU(2010, AER)
Knowledge Spillov.
Coauthors
Za.(2020, Restud)
Motivation
Critique to Randomly Assigned Groups
• In principle, randomization of peers would guarantee identification in a monolayer linear in
means model where endogenous network formation is ruled out.
• It can completely eliminate the problem of unobserved common variables.
• However, if individuals endogenously form groups (homophily), there can be a subsequent
resorting. If resorting happens faster than the effects of social interactions, identification is
not possible.
• Even with random peers, researchers face a classical problem of omitted variables when
trying to estimate contextual effects (E[xiεj | wi,j = 1] 6= 0).
Literature
Multilayers Networks in Economics
Labor Supply
• Sisters, Cousins and Neighbors networks
(NST (2018, AEJ))
Consumption
• Coworker and Spouses networks (DFP
(2020, Restud))
Education
• Friendship network in t and t − 1 (GI
(2013, JBES))
• Roommates, classmates, Study-mate,
Friendship networks (CL (2015))
• Siblings and Classmates networks (NR
(2017, JAE))
Publication Outcomes
• Coauthors, Alumni and Same Advisor
networks (EHJS (2020))
Multilayer
Microfoundations
• The monolayer linear model of interest corresponds with the best response of a Bayesian Game of
Social Interactions as proposed by Blume, Brock, Durlauf and Jayaraman (2015, JPE).
• Quadratic utility with social pressure or strategic complementarities
Ui (ωi , ω−i ) =
(γxi + zi + δ
∑j
cijxj
)ωi −
1
2ω2i −
φ
2
(ωi −
∑j
aijωj
)2
• In their model endogeneity arises because an individual i , observing that he is connected to j , make
an inference about the value of zj that is dependent on xj . Then, xj will be correlated with εi in
my equation of interest.
• Their critique of instrumental variable is that if individual i observe the instruments vj , he can use
it to predict zj which will induce correlation between εi and the instrument.
• Our instrument is based on xr of individuals r connected to i in a network that is independent of
the individuals’ utilities. Therefore xr is not useful to predict zj . Model
Positioning the Research Agenda in the Literature
Identification of Network
Effects Linear ModelsCorrelated Effects
Reflection Problem
Solved by Network Data
BDF (2009, JoE)
Natural and Artificial
Experiments
Randomized Peers
Observational Studies
Random Shocks
DFP (2020, Restud)
Structural Endogeneity
JM (2020, Restats)
Panel Data
KP (2020, ECA)Multilayer Networks
This Project
Assumptions
Assumption 1
There exists a n × n adjacency matrix W0 such that: E[v|x,W0] = 0
Assumption 2
Let Π be the be the full-rank matrix of coefficients from the system regression
WS = W0SΠ + U,
E [U|W0y,W0,X] = O.
where E[S>w0;iw0;i>S] > 0. Furthermore, the first row of Π is such that π11β+π12δ < 1/λmax,
where λmax is the largest eigenvalue of W0.
Identification
Rank Condition
• Given that rank(Π) ≤ min{rank(E [S>w0;iw>0;iS]−1), rank(E [S>w0;iw
>i S])}, a necessary con-
dition for rank(Π) = k + 1 is that rank(E [S>w0;iw>i S]) = k + 1 which would be equivalent
to the relevance condition in the classical Instrumental Variable literature.
• For large enough sample, this condition imposes some restriction on the matrix W0W. This
matrix contains the connections in common across the two networks in the main diagonal,
and length two paths that change color in the off- diagonal.
• It cannot be zero so there have to be enough connections in common and indirect triads
that change colors. This is a way to think about the correlation between the two matrices.
Identification
Mote Carlo Experiments
−2
0
2
Un
ob
serv
ed
Het
ero
gen
eity
Peer Effects (β)
−20
−10
0
10
20
Contextual Effects (δ)
−4
−2
0
2
4
6
Direct Effects (γ)
n = 50 n = 100 n = 200
0
0.5
1
Mis
smea
sure
d
Lin
ks
n = 50 n = 100 n = 200
0
1
2
3
n = 50 n = 100 n = 200
1
1.5
2
Empirial Application: Data
• 1,628 articles published in the American Economic Review, Econometrica, the Journal of
Political Economy, and the Quaterly Journal of Economics between 2000 and 2006. Source:
RePEc, Scopus, and Journal Websites.
• Employment, education, and research interest information for 1,985 unique authors and
42 unique editors (37 of which also published papers in thee journals in this time period).
Source: Web scrapping/text mining and Colussi (2018, ReStat).
• Co-authorship (` = 1) and Alumni (` = 0) networks are constructed for all 2,027 scholars.
Articles i and j are connected in network W` if at least one of the authors of article i shares a
professional connection of type ` with at least one of article j ’s authors.
Empirical Application