Geographic Constraints on Knowledge Spillovers: Political ...€¦ · fraction of the overall...
Transcript of Geographic Constraints on Knowledge Spillovers: Political ...€¦ · fraction of the overall...
MANAGEMENT SCIENCEVol. 59, No. 9, September 2013, pp. 2056–2078ISSN 0025-1909 (print) � ISSN 1526-5501 (online) http://dx.doi.org/10.1287/mnsc.1120.1700
© 2013 INFORMS
Geographic Constraints on Knowledge Spillovers:Political Borders vs. Spatial Proximity
Jasjit SinghINSEAD, Singapore 138676, [email protected]
Matt MarxMIT Sloan School of Management, Massachusetts Institute of Technology, Cambridge, Massachusetts 02142,
Geographic localization of knowledge spillovers is a central tenet in multiple streams of research. However,prior work has typically examined this phenomenon considering only one geographic unit—country, state,
or metropolitan area—at a time and has rarely accounted for spatial distance. We disentangle these multipleeffects by using a regression framework employing choice-based sampling to estimate the likelihood of citationbetween random patents. We find both country and state borders to have independent effects on knowledgediffusion beyond what just geographic proximity in the form of metropolitan collocation or shorter within-regiondistances can explain. An identification methodology comparing inventor-added and examiner-added citationpatterns points to an even stronger role of political borders. The puzzling state border effect remains robust onaverage across analyses, though it is found to have waned with time. The country effect has, in contrast, notonly remained robust but even strengthened over time.
Key words : knowledge spillovers; borders; distance; economic geography; patent citation; innovation;institutions
History : Received March 25, 2011; accepted November 24, 2012, by Kamalini Ramdas, entrepreneurship andinnovation. Published online in Articles in Advance April 4, 2013.
1. IntroductionEmpirically establishing the microfoundations ofindustrial agglomeration is a key focus in multiplestreams of research. Ever since the seminal work ofMarshall (1920), scholars have studied not just exoge-nous locational factors but also three endogenousmechanisms for why agglomeration takes place: ben-efits from labor pooling, efficiency gains from collo-cation of industries with input–output relationships,and localized knowledge spillovers.1 Of these, knowl-edge spillovers have generated the most scholarlyattention in recent years, perhaps because they areseen as critical for innovation and new value creationin an increasingly knowledge-intensive economy.In this study, we take a closer look at various geo-graphic elements in shaping such spillovers, distin-guishing between the roles played by political borders(at the national or state level) versus simply spatialproximity.
Although several studies have documented local-ization of knowledge spillovers, the geographic levelsmost relevant for this phenomenon remain unclear,
1 See Ellison et al. (2010) for a recent study that rigorously demon-strates Marshallian mechanisms.
given the approaches employed. For instance, asignificant body of empirical work has studied onlycountry-level spillovers (Branstetter 2001; Keller 2002;Jaffe and Trajtenberg 2002, Chap. 7; Singh 2007), withthe findings from such studies then being justificationfor assumptions used in theoretical models of eco-nomic growth of nations (Romer 1990, Grossman andHelpman 1991). Others have taken political bordersat a less aggregate level—states—as the geographicunit of interest (Jaffe 1989, Audretsch and Feldman1996, Almeida and Kogut 1999, Rosenthal and Strange2001), but mostly without consideration of nationalborders or geographic distance. The typical focus onjust one of the borders arises for two reasons—onepractical and the other more theoretical. The practicalreason has been that it is hard to obtain precise mea-sures for geographic proximity, so national or stateborders can be seen as a convenient way of mea-suring it indirectly. The more substantive reason hasbeen a belief that political borders are important overand above any spatial proximity effects, for exam-ple, caused by institutional differences. In either case,however, the extent to which political borders andgeographic proximity measure the same thing or nothas not been typically tested.
2056
INFORMS
holds
copyrightto
this
article
and
distrib
uted
this
copy
asa
courtesy
tothe
author(s).
Add
ition
alinform
ation,
includ
ingrig
htsan
dpe
rmission
policies,
isav
ailableat
http://journa
ls.in
form
s.org/.
Singh and Marx: Geographic Constraints on Knowledge SpilloversManagement Science 59(9), pp. 2056–2078, © 2013 INFORMS 2057
A glaring gap in the literature therefore remains—few studies have attempted to rigorously disentan-gle the effects operating at different geographic levels,providing limited guidance regarding the exact geo-graphic scope of knowledge spillovers. For example,the fact that within-country knowledge spillovers arefound to be more intense than those across countriesmight simply reflect an aggregation of state-level ormetropolitan-level mechanisms. Similarly, interpreta-tion of state-level localization findings is also unclear,as these too might be driven by effects operating morelocally and are open to criticisms to the effect that“state boundaries are a very poor proxy for the geo-graphical units within which knowledge ought to cir-culate” (Breschi and Lissoni 2001, p. 982). Perhapsmotivated by such ambiguities, or by criticism likeKrugman’s (1991, p. 43) remark that “states aren’treally the right geographic units” in economic anal-ysis, recent research appears to have renewed focuson exploring agglomeration at less coarsely definedgeographic levels.
The economic geography literature has a longtradition of emphasizing a link between localizedknowledge spillovers and urban growth (Jacobs 1969,Glaeser et al. 1992, Saxenian 1994). More recently,studies such as Rosenthal and Strange (2003), Singh(2005), and Arzaghi and Henderson (2008) havecarefully established agglomeration effects related tosuch spillovers as indeed being particularly strongover short distances within a city (a few miles oreven less). This research is complemented by thebroader literature emphasizing the role knowledgespillovers through spin-offs—also often geographi-cally localized—can play in industry and regionalgrowth (Gompers et al. 2005, Klepper and Sleeper2005, Agarwal et al. 2007). Advances are now beingmade toward formal models that can capture micro-foundations of the geographic scope of geographicclusters like Silicon Valley and calibrating these mod-els against real data (Kerr and Kominers 2010).
Despite these rich yet distinct bodies of workexamining different geographic levels, few studiesrelated to knowledge spillovers have considered dif-ferent levels of border and proximity effects simulta-neously in an attempt to unpack the true contributionof each. In addition to not separating the country,state, and metropolitan effects from one another, moststudies do not separately identify these from therole of spatial distance either. It is therefore unclearwhether to interpret prior findings simply as reflect-ing that “distance matters” or as borders also hav-ing an important and independent role. Even studiesthat do consider multiple geographic levels, such asthe path-breaking paper by Jaffe et al. (1993), ana-lyze these different geopolitical units separately andagain do not account for precise spatial distance.
When at least some distance-based measures havebeen employed, most have still been too aggregateto disentangle the precise geographic effects of inter-est. For example, although Keller (2002) employs dataon distance between capital cities of countries, hedoes not consider different within-country distances.Likewise, Peri (2005) extends this to consider dis-tances between different pairs of states, but still doesnot distinguish city-to-city distances within a state.The paper by Belenzon and Schankerman (2013) isan exception that does simultaneously consider geo-graphic distance and collocation even within a state.However, their analysis is based only on knowledgeoriginating from universities, which composes a tinyfraction of the overall knowledge created in the econ-omy and plays a relatively minor role in overallknowledge diffusion patterns.
We analyze the role of borders versus proximityin diffusion of a large knowledge base representedby a set of 631,586 patents, employing recent empir-ical advancements in studying knowledge diffusionand documenting long-term trends in both within-country and within-state localization effects after mul-tiple geographic effects are all accounted for together.Given the role that specific assumptions regarding thegeographic scope of knowledge spillovers can playin research areas as diverse as regional and interna-tional economics, business strategy, and technologicalinnovation and entrepreneurship, we see our studyas fulfilling a need to dig deeper into the geogra-phy of knowledge spillovers in a manner analogousto advances in the literature on cross-regional trade—a field that has made much more progress in exam-ining the borders versus proximity question at boththe country level (e.g., McCallum 1995, Anderson andWincoop 2003) and the state level (e.g., Wolf 2000;Hillberry and Hummels 2003, 2008).
Our empirical approach builds on the increasinglysophisticated use of patent citations to measure dif-fusion of knowledge. As a motivation for examiningthe above questions using patent citation data, Fig-ure 1 provides simple graphical evidence regardingthe geographic pattern of interfirm citations to patentsfiled during 1975–2004 by inventors based in theUnited States.2 Three observations are worth making.First, the likelihood of citation between random pairsof patents decreases with geographic distance. Sec-ond, the citation likelihood is greater within countryborders than across, within state borders than across,and within metropolitan area boundaries (measured
2 These charts were constructed using our data set and employed inour regression analysis. Although our data set was derived usingstratified sampling from the population of patent pairs, we calcu-lated the summary statistics by appropriately weighting each obser-vation so that Figure 1 represents true population characteristics.
INFORMS
holds
copyrightto
this
article
and
distrib
uted
this
copy
asa
courtesy
tothe
author(s).
Add
ition
alinform
ation,
includ
ingrig
htsan
dpe
rmission
policies,
isav
ailableat
http://journa
ls.in
form
s.org/.
Singh and Marx: Geographic Constraints on Knowledge Spillovers2058 Management Science 59(9), pp. 2056–2078, © 2013 INFORMS
Figure 1 Graphical Depiction of the Role of Geography in Patent Citation Likelihood
0.0025
0.0020
0.0015
0.0010
0.0005
0.0000
< 10
0
Cita
tion
freq
uenc
y be
twee
nra
ndom
pat
ent p
airs
(%
)
100–
150
150–
200
200–
300
300–
400
400–
500
500–
750
750–
1,00
0
1,00
0–1,
500
1,50
0–2,
000
2,00
0–2,
500
2,50
0–4,
000
4,00
0–6,
000
>6,
000
Geographic distance (miles)
Same country
Different country
(i) Country borders vs. distance between inventor cities
0
Cita
tion
freq
uenc
y be
twee
nra
ndom
pat
ent p
airs
(%
)
Geographic distance (miles)
0.0140
0.0120
0.0100
0.0080
0.0060
0.0040
0.0020
0.0000
0–10
10–2
0
20–3
0
30–4
0
40–5
0
50–7
5
75–1
00
100–
150
150–
200
200–
300
300–
400
> 40
0
Same state
Different state
(ii) State borders vs. distance between inventor cities
Geographic distance (miles)
(iii) CBSA boundaries vs. distance between inventor cities
Cita
tion
freq
uenc
y be
twee
nra
ndom
pat
ent p
airs
(%
)
0.0140
0.0120
0.0100
0.0080
0.0060
0.0040
0.0020
0.00000
0–10
10–2
0
20–3
0
30–4
0
40–5
0
50–7
5
75–1
00>
100
Same CBSA
Different CBSA
INFORMS
holds
copyrightto
this
article
and
distrib
uted
this
copy
asa
courtesy
tothe
author(s).
Add
ition
alinform
ation,
includ
ingrig
htsan
dpe
rmission
policies,
isav
ailableat
http://journa
ls.in
form
s.org/.
Singh and Marx: Geographic Constraints on Knowledge SpilloversManagement Science 59(9), pp. 2056–2078, © 2013 INFORMS 2059
using “CBSA” definitions explained later) than across.Third, the national and state border effects seem to beonly partly explained by geographic proximity: thereseems to be an independent “border effect” withineach of the buckets for spatial distance (a finding thatcontinues to hold even with further refining the dis-tance buckets used in the figure).
Figure 1 is, however, just a depiction of sum-mary statistics and does not account for a number ofempirical issues addressed in more formal analyses(reported later as a part of this study). In our pre-ferred approach, we employ regression models to runa “horse race” among different geographic variablesto isolate the level at which localization of knowledgespillovers operates most prominently. Specifically, weconstruct a data set of patent pairs (representingactual or potential patent citations) using choice-based sampling and estimate a “citation function”modeling the likelihood of citations between randompatents. Our framework departs from previous stud-ies by making no ex ante assumptions about themost appropriate geographic unit of analysis. Instead,it allows us to simultaneously account for collocationof the source and destination of knowledge within thesame country, state, or metropolitan area as well asfor fine-grained spatial distance.
Consistent with prior work, separate analyses weconduct at the national, state, and metropolitan levelsexhibit spillover localization at all levels. Importantly,simultaneously accounting for all of these shows howindividual analyses overstate their respective impor-tance. We also extend the analysis to first paramet-rically control for distance and then employ a setof nonparametric indicator variables to more flexiblycapture the more nuanced effects of distance. A keyfinding is that robust border effects are seen even afteraccounting for geographic proximity. In other words,we continue to see independent country border andstate border effects even after carefully controlling forthe effect of metropolitan area collocation and thedecaying of spillover intensity with spatial distance.
The same-country localization of knowledgespillovers turns out to be several times stronger thanthe same-state effect. One could view this large androbust national border effect as perhaps in line withexpectation, given the well-documented linguistic,cultural, institutional, and economic differencesamong countries (Coe et al. 2009). However, time-trend analysis also reveals a more surprising finding:a strengthening of the same-country effect over timedespite the accepted trend toward globalizationand technological advances that supposedly smoothcross-border communication.
We find the state border effect even more puz-zling. The finding turns out not to be driven by justone or two specific states (like California) or sectors
(like computers or communication technologies). Theresult is seen even in subsamples comprised of citedpatents close to state borders, so it is also not drivenjust by those in the interior of states. In fact, signifi-cant state border effects are found even in a conser-vative test where metropolitan effects are completelyisolated by considering only patents and (potential)citations arising within a subset of metropolitan areasthat span state borders. We also analyze trends overtime and find that—in contrast to country borders—the state border effect weakens considerably over the30-year time period from which our cited patent sam-ple is drawn.
To further boost confidence in our findings, weconfront two key challenges inherent in using patentcitations to measure knowledge spillover localiza-tion. First, citation patterns are determined in partby technological relationships, which cannot be per-fectly captured by any formal classification system(Thompson and Fox-Kean 2005). Second, some cita-tions are added by patent examiners, not inventors,and the extent to which the two represent spilloverslikely differs (Alcacer and Gittelman 2006). To addressthese concerns, we first examine the robustness of ourprior findings to using only inventor-added citations.We then employ an identification strategy (motivatedby Thompson 2006) that calculates true geographiceffects as the difference between localization estimatesfor subsamples of citations added by inventors ver-sus patent examiners. Border effect findings remainrobust, even though this approach weakens the effectof proximity, further highlighting the importance ofborders beyond pure geographic proximity in shap-ing knowledge diffusion patterns.
2. Empirical Approach2.1. Constructing a Patent-Based Data SetAlthough citation-based measures are noisy in cap-turing true knowledge flows, surveys of inventorshave established that citations—especially in largesamples—do capture knowledge flows meaningfully(Jaffe and Trajtenberg 2002, Duguet and MacGarvie2005). Even assuming that citations do correctly cap-ture knowledge flows, it is not possible to decipherwhen a given citation represents a “spillover,” that is,a true externality for which the receiver does not fullypay. Nevertheless, we follow the prevalent view thatcitations do at least partly represent spillovers and forthe rest often represent benefits the receiver gets inthe form of “gains from trade,” even in cases wherethey represent purely market transactions.
Our data set is based on U.S. Patent and Trade-mark Office (USPTO) patents with application years1975–2009. To have at least a five-year citation win-dow, we restrict the set to cited patents until 2004,
INFORMS
holds
copyrightto
this
article
and
distrib
uted
this
copy
asa
courtesy
tothe
author(s).
Add
ition
alinform
ation,
includ
ingrig
htsan
dpe
rmission
policies,
isav
ailableat
http://journa
ls.in
form
s.org/.
Singh and Marx: Geographic Constraints on Knowledge Spillovers2060 Management Science 59(9), pp. 2056–2078, © 2013 INFORMS
with citing patents going all the way until 2009.Because recent literature (Alcacer and Gittelman2006, Thompson 2006) has emphasized the distinctionbetween citations added by the inventors themselvesand those added by patent examiners working forUSPTO, we also keep track of this information whenavailable (2001 onward) and use it to complement ourfull-sample analyses. Patent data also include inven-tors’ city, state, and country of residence. Consistentstate identification is available only for patents orig-inating in the United States, so we restrict the citedpatent sample (but not the citing patent sample) tothose filed by U.S. inventors. Our calculation of geo-graphic distances relies on data from Lai et al. (2009)that map cities where inventors live to latitudes andlongitudes.3
We also map these cities to core based statisticalareas (CBSAs), a U.S. Office of Management and Bud-get definition effective as of 2003 and using data fromthe 2000 Census. A CBSA refers to a county (or localequivalent) with a population center of at least 10,000plus surrounding counties within reasonable com-muting distance. CBSAs with a population center ofmore than 50,000 are designated as metropolitan sta-tistical areas (MSAs), or micropolitan otherwise.4 TheCBSA definitions enable us to classify considerablymore patents than with earlier definitions based onthe 1990 Census (96.3% versus 84.7%).5 Our resultsare, however, robust to either classification system.
Before proceeding to construct a sample of patentpairs representing actual or potential citations, werestrict the cited patent sample to only patents whosegeographic origin is unambiguously defined in orderto avoid making arbitrary assumptions in trying toresolve locational ambiguity of a knowledge source.In other words, we exclude patents from geograph-ically dispersed inventor teams, even though thesemight be an interesting (but different) topic to study.We also omit patents not assigned to any organiza-tion, as well as those assigned to nonfirm sources
3 Our distance data are restrictive in two ways. First, because wehave only a single latitude and longitude coordinate per city,we cannot calculate distances between inventors within a city oreven be completely precise about distances between those in differ-ent cities. Second, USPTO data contain the city of residence of theinventor, which might not coincide with the city where the workwas done. But this is still the best available proxy for an invention’sgeographical origin, as the assignee address typically refers only tothe overall firm’s registered office or headquarters.4 A cruder approach (that has been employed in the past) could beto designate one single “phantom MSA” for each state to handlecases where an inventor does not fall within an actual MSA. How-ever, doing this would confound metropolitan effects with stateeffects and is therefore inappropriate for the research question ofinterest here.5 Additional details on CBSA definitions are at https://www.census.gov/geo/www/2010census/gtc/gtc_cbsa.html (accessed August2012).
(such as universities and government bodies), as thefocus of this study is to examine interfirm diffusionof knowledge. In the end, all the steps mentionedabove yield a set of 631,586 potentially cited patentsas sources of knowledge.
2.2. Constructing a Matched Sample ofActual and Potential Citations
For each cited patent mentioned above, we collectdata on all citations received during a 10-year win-dow since its application and drop all within-firmcitations. As a highly influential study by Jaffe et al.(1993) (hereafter, JTH) points out, just calculating col-location frequency within pairs of patents involvedin realized citations would not suffice for establishinggeographic localization of knowledge. Instead, whatis needed is an appropriate control sample of poten-tial (but unrealized) citations to establish a benchmarklevel of expected collocation, given the existing geo-graphic distribution of technological activity. To facil-itate a comparison of our subsequent analysis withthe JTH method, we therefore also start with theirapproach of matching each citing patent to a randomcontrol patent with the same three-digit technologyclass and application year as the original citing patent(but not from the same organization as the focalcited patent and also not actually citing it). Like Jaffeet al. (1993), we drop the small fraction of citationsfor which no match is found. This leads to a bal-anced sample of 4,007,217 realized citations (based on631,586 cited patents) and exactly as many unrealizedmatched control citations.
The above JTH-style sample allows us to comparethe extent of geographic collocation of the source anddestination for the original sample of citations and thesample of corresponding control pairs, in turn usingthe country, state, and metropolitan area as the geo-graphic units of analysis in three different sets of cal-culations. Although it is a useful starting point, thisapproach is not well suited to directly addressing ourkey question: How much do national or state bordersper se constrain knowledge flow, as opposed to thecorresponding effects being manifestations of mech-anisms operating at more local levels (such as cityor CBSA) or driven purely by spatial distance? Ourpreferred approach for answering these questions is aregression framework that can simultaneously exam-ine the effect of different geographic levels.
2.3. A Regression Framework for EstimatingCitation Likelihood
With collocation within a certain predefined geo-graphic unit as the dependent variable in the JTHmodel, one cannot easily examine multiple geo-graphic levels at the same time. One could try toascertain the relative importance of different geo-graphic levels by somehow comparing the findings
INFORMS
holds
copyrightto
this
article
and
distrib
uted
this
copy
asa
courtesy
tothe
author(s).
Add
ition
alinform
ation,
includ
ingrig
htsan
dpe
rmission
policies,
isav
ailableat
http://journa
ls.in
form
s.org/.
Singh and Marx: Geographic Constraints on Knowledge SpilloversManagement Science 59(9), pp. 2056–2078, © 2013 INFORMS 2061
across models; however, this would likely remaina statistically complex and unsatisfactory exercise.We instead rely on a regression framework that esti-mates the likelihood of citation between two randompatents, making the existence of a citation between apair of patents the dependent variable and employingthe entire set of geography-related variables simulta-neously as explanatory variables in a single model.6
Our citation-level regression framework has theadded advantage of flexibility in modeling technolog-ical relatedness between patents, allowing multiplelevels of technological granularity to be consideredat once. This addresses a criticism previous studieshave faced in choosing a specific technological gran-ularity when constructing a JTH-style control sample.As Thompson and Fox-Kean (2005) and Hendersonet al. (2005) discuss, one faces a dilemma in usingmatching: the three-digit technology match com-monly employed might be too crude to capture allrelevant technological relationships, but using a finerclassification could result in selection bias becausea match would not be found for most of the sam-ple. Both papers suggest a regression approach thatsimultaneously accounts for technological relatednessat multiple levels of granularity.7
A seemingly straightforward extension of the JTHmethodology might be to employ a regressionapproach using a JTH-style matched sample in a(logit or probit) regression model, wherein the exis-tence of a citation between a pair of patents is takenas the dichotomous dependent variable. However,this would imply that the matching procedure wasin effect used to carry out sampling based on thedependent variable in the first place, because the JTHmethod draws a “zero” (unrealized citation) corre-sponding to each “one” (actual citation). This needsto be taken into account to get estimates truly rep-resentative of the population. Further, the potentiallyciting patents used in constructing the control pairsare drawn only from technology classes and yearsfrom which citations to the cited patent actually exist,ignoring the population of potentially citing patentsfrom the remaining technology classes and years.As the appendix explains, this can further bias theresults. Here, building on Singh (2005), we describe amicrolevel citation regression framework that amelio-rates this issue.
6 Our methodology builds on studies such as Sorenson and Fleming(2004) and Singh (2005) that also model the citation likelihoodbetween patents in a regression framework, though to study differ-ent research questions.7 This does not fully address the issue that no technological classifi-cation system—however finely defined—can perfectly capture truetechnological relationships between patents. We address this con-cern later in the paper by extending our JTH-style as well as a regres-sion analysis using an approach motivated by Thompson (2006).
Before discussing how to extend our matched sam-ple for citation-level regression analysis, for exposi-tion we imagine a sample of random patent pairsconstructed by pairing each of our potentially citedpatents with a random draw of potentially citingpatents. We could model the likelihood of a patentcitation in this sample as a Bernoulli outcome y thatequals 1 with a probability
Pr4y = 1 � x = xi5=å4xi�5=1
1 + e−xi�0
Here, i is an index for the sample of potential cita-tions (i.e., patent pairs), xi represents the vector ofcovariates and controls (described later), and  is thevector of parameters to be estimated. Because thelikelihood of a focal patent being cited by a randompatent is extremely small, it is not practical to carryout the estimation based solely on a data set con-structed via random sampling of possible pairs ofpatents. Instead, we employ a “choice-based” sam-ple, wherein the sampled fraction � of potentially cit-ing patents that actually cite a focal patent is muchlarger than the fraction � of the patents not involvedin a real citation to it. The usual (unweighted) logisticestimation based on such a sample would be biased,because the sampling rate is different for different val-ues of the dependent variable. One way to avoid thebias is the weighted exogenous sampling maximum likeli-hood (WESML) approach, which involves a modifiedlogistic estimation based on weighting each observa-tion by the reciprocal of the ex ante probability of itsinclusion in the sample (Manski and Lerman 1977).8
The basic WESML approach as previously describedis based on employing a sample where the “zeroes”are drawn from the population of unrealized cita-tions with the same ex ante likelihood. Recogniz-ing that technological relatedness is a particularlystrong driver of citation likelihood between patents,we can refine the choice-based sampling approach fur-ther to also get benefits from stratification on thisexplanatory variable. This implies allowing the param-eter � to vary across different y = 0 subpopulations(Manski and McFadden 1981; Amemiya 1985, Chap.9). Indeed, by carefully considering the respective sub-populations (defined by different technology classesand years of origin) from which we have effectivelydrawn our JTH-style control patents in the previoussection, we can interpret our matched sample as aboveand appropriately calculate the weights to use witheach control pair. However, as the appendix explainsin more detail, this is not sufficient in itself. Usingthe WESML approach with the matched sample alsorequires extending the sample to ensure representation
8 See the appendix for further details. See also Greene (2003,Chap. 21) for a discussion of choice-base sampling.
INFORMS
holds
copyrightto
this
article
and
distrib
uted
this
copy
asa
courtesy
tothe
author(s).
Add
ition
alinform
ation,
includ
ingrig
htsan
dpe
rmission
policies,
isav
ailableat
http://journa
ls.in
form
s.org/.
Singh and Marx: Geographic Constraints on Knowledge Spillovers2062 Management Science 59(9), pp. 2056–2078, © 2013 INFORMS
of potentially citing patents belonging to years and/ortechnology classes not represented in the originalpatent citations (and hence in the resulting matchedsample). Doing this ensures that the strata consideredare not only mutually exclusive but also exhaustive inrepresenting the full population of potential citations.The above steps lead to our final sample of 13,728,582patent pairs, which includes 4,007,217 actual citations(taking � = 1), 4,007,217 JTH-style matched pairs, and5,714,148 additional pairs from citing classes and yearsnot represented in the matched sample. An exampleincluded in the appendix further illustrates the abovesampling procedure as well as calculation of appropri-ate weights for all the control observations.
Rather than making specific functional formassumptions about temporal patterns, we accountfor citation lag (i.e., years elapsed between the citedand citing patents in a pair) nonparametrically byemploying a full set of indicator variables. This isin addition to accounting for the cited patent’s timeperiod of origin using a separate set of indicator vari-ables (Rysman and Simcoe 2008). Relying on lon-gitudinal variation in our sample, we are thereforeable to separately identify cohort effects and cita-tion lag effects in a way that previous studies withmore restrictive samples (such as Thompson 2006)were not able to. We also include indicators for thecited patent’s two-digit National Bureau of EconomicResearch (NBER) technological subcategory.9 Finally,because the citation probability might also be drivenby other characteristics of the cited patent, we controlfor observables and cluster standard errors to accountfor unobserved ones.
3. Extending the TraditionalMatching Approach
Before turning to our regression approach in thenext section, we present some analysis that extendsthe more traditional JTH-style approach. This shouldallow a reader familiar with prior literature to betterrelate our study to existing research in terms of bothwhat kind of findings remain similar across the twoapproaches and which new insights emerge specifi-cally from using the regression approach.
3.1. Baseline Analysis Comparable withPrior Work
Following the empirical approach of Jaffe et al. (1993),we compare the incidence of geographic collocationof the potential knowledge sources as representedin actual citations as well as matched control pairs,in turn using the country, state, and metropolitan area
9 The NBER classification we refer to is drawn from Appendix 1 inJaffe and Trajtenberg (2002, Chap. 13).
as the geographic units of analysis. As the side-by-side comparison in Table 1 shows, our findings ateach of the three units of analysis are quite compara-ble to those reported by Jaffe et al. (1993) as well asto a replication by Thompson and Fox-Kean (2005).The incidence of collocation for all three geographicunits is statistically and economically greater betweenactual citations and the corresponding matched con-trol pairs: 74.7% versus 57.6% at the country level;13.4% versus 6.2% at the state level; and 7.6% versus2.6% at the metropolitan level.10
3.2. Further Investigation of the Border EffectsUsing the Traditional Matching Framework
It is difficult within the JTH framework to sepa-rate the extent to which localization spillovers aredriven primarily by political borders, spatial proxim-ity, or both. However, we can carry out at least someinformative analysis. In doing so, we focus most onthe robustness and nature of the state border effectbecause, although localization at the country levelmight be less surprising, the presence of a localiza-tion effect truly associated with state borders withina country like the United States is puzzling.
A first step in separating border and proximityeffects even within the JTH framework is check-ing whether the state finding is driven by observa-tions geographically distant from the state border.Columns (1)–(4) in Table 2 report findings from a JTH-style analysis using a subsample where the distanceof a potentially cited patent’s originating town or cityto the closest state border is not more than 20 miles.If state borders played no role in knowledge diffusionand prior findings were driven entirely by observa-tions distant from the borders, the state result oughtto now disappear. Comparing column (2) in Table 1with column (2) in Table 2, we find that does not hap-pen. Even though state-level collocation in column (2)is substantially lower for citations in the near-bordersample than the whole sample (7.1% in Table 2 versus13.4% in Table 1), the matched pair sample collocationincidence is also lower (2.7% in Table 2 versus 6.2% inTable 1) so the ratio reported calculated in column (4)is in fact higher in Table 2. In other words, takingaccount of geographic distribution of technologicalactivity, we find no evidence that not accounting fordistance from a state border is somehow driving thestate effect reported earlier.
Although the above analysis based on a subset ofcited patents (representing the source of knowledge)
10 When Thompson and Fox-Kean (2005) subsequently employnine-digit technology matching, they find that over two-thirds oftheir patents cannot be matched. Our approach is instead to stick toa three-digit initial match but control for a finer technological levelthrough additional variables introduced directly into our regressionmodel.
INFORMS
holds
copyrightto
this
article
and
distrib
uted
this
copy
asa
courtesy
tothe
author(s).
Add
ition
alinform
ation,
includ
ingrig
htsan
dpe
rmission
policies,
isav
ailableat
http://journa
ls.in
form
s.org/.
Singh and Marx: Geographic Constraints on Knowledge SpilloversManagement Science 59(9), pp. 2056–2078, © 2013 INFORMS 2063
Tabl
e1
Repl
icat
ing
Find
ings
from
Prev
ious
Stud
ies
Ourm
atch
edsa
mpl
eJa
ffeet
al.(
1993
)sam
ple
Thom
pson
and
Fox-
Kean
(200
5)th
ree-
digi
tsam
ple
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
Cita
tions
Intra
regi
onIn
trare
gion
Ratio
Cita
tions
Intra
regi
onIn
trare
gion
Ratio
Cita
tions
Intra
regi
onIn
trare
gion
Ratio
sam
ple
cita
tions
(%)
cont
rols
(%)
(2)/(
3)sa
mpl
eci
tatio
ns(%
)co
ntro
ls(%
)(6
)/(7)
sam
ple
cita
tions
(%)
cont
rols
(%)
(10)
/(11)
Coun
try-
leve
l4,
007,
217
7407
5706
1030
7,75
968
0061
0410
117,
627
6806
5506
1023
anal
ysis
Stat
e-le
vel
4,00
7,21
713
0460
220
167,
759
907
501
1090
7,62
770
850
010
55an
alys
isM
etro
polit
an-le
vel
4,00
7,21
770
620
620
927,
759
606
107
3088
7,62
750
230
510
50an
alys
is
Notes.
The
Jaffe
etal
.(19
93)
num
bers
repo
rted
here
wer
eca
lcul
ated
base
don
pool
ing
ofre
sults
for
thei
rdi
ffere
ntsu
bsam
ples
prim
arily
usin
gin
form
atio
nav
aila
ble
inth
eir
Tabl
e3
ina
man
ner
sim
ilar
toth
atre
porte
dby
Thom
pson
and
Fox-
Kean
(200
5).T
heTh
omps
onan
dFo
x-Ke
an(2
005)
sam
ple
stat
istic
sar
efo
rthe
first
sam
ple
they
cons
truct
byem
ploy
ing
thre
e-di
gitt
echn
olog
ym
atch
ing
tobe
com
para
ble
toth
atin
Jaffe
etal
.(19
93).
Whe
reas
Thom
pson
and
Fox-
Kean
(200
5)su
bseq
uent
lyco
nstru
ctot
hers
ampl
esus
ing
mor
efin
e-gr
aine
dte
chno
logy
mat
chin
g,w
ein
stea
dre
lyon
regr
essi
onm
odel
sto
sim
ilarly
acco
untf
orte
chno
logy
mor
efin
ely.
Usin
gfo
rmalt-
test
sco
nfirm
edth
atdi
ffere
nce
ofm
eans
betw
een
inci
denc
esof
geog
raph
icco
lloca
tion
fora
ctua
lcita
tions
vers
usco
rres
pond
ing
cont
rols
wer
est
atis
tical
lysi
gnifi
cant
inal
lca
ses,
soth
et-
stat
istic
sha
veno
tbee
nre
porte
dto
cons
erve
spac
e.
Tabl
e2
Furth
erIn
vest
igat
ion
ofth
eSt
ate
Bord
erEf
fect
Cite
dpa
tent
from
near
ast
ate
bord
erCi
ted
pate
ntfro
mne
ara
stat
ebo
rder
and
citin
gpa
tent
from
foca
lsta
tedy
adCi
ted
pate
ntfro
mne
ara
stat
ebo
rder
and
citin
gpa
tent
from
foca
lsta
tedy
adas
wel
las
sam
eCB
SAas
cite
dpa
tent
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
Cita
tions
Intra
regi
onIn
trare
gion
Ratio
Cita
tions
Intra
regi
onIn
trare
gion
Ratio
Cita
tions
Intra
regi
onIn
trare
gion
Ratio
sam
ple
cita
tions
(%)
cont
rols
(%)
(2)/(
3)sa
mpl
eci
tatio
ns(%
)co
ntro
ls(%
)(6
)/(7)
sam
ple
cita
tions
(%)
cont
rols
(%)
(10)
/(11)
Coun
try-
leve
l99
6,62
77409
5804
1028
anal
ysis
Stat
e-le
vel
996,
627
701
207
2063
93,7
0368
045507
1023
40,7
8487
.282
.81.
05an
alys
isM
etro
polit
an-le
vel
996,
627
601
202
2077
93,7
0355
083800
1047
anal
ysis
Notes.
Toen
sure
that
with
in-s
tate
loca
lizat
ion
repo
rted
abov
eis
notj
usta
dist
ance
effe
ctdr
iven
byci
ted
pate
nts
ina
stat
e’sin
terio
r,co
lum
ns(1
)–(4
)ca
rry
outt
heJT
H-st
yle
anal
ysis
usin
ga
subs
ampl
eof
our
mat
ched
sam
ple
whe
reth
edi
stan
ceof
the
cite
dpa
tent
’sor
igin
atin
gto
wn
orci
tyto
the
clos
ests
tate
bord
eris
notm
ore
than
20m
iles.
Inco
lum
ns(5
)–(8
),th
ese
tofa
ctua
lcita
tions
isre
stric
ted
toth
ose
aris
ing
eith
erw
ithin
the
cite
dpa
tent
sor
inth
ecl
oses
tnei
ghbo
ring
stat
e—w
ithth
ese
tofc
ontro
lcita
tions
tous
eas
abe
nchm
ark
also
bein
gre
gene
rate
dba
sed
ona
mat
chin
gw
ithal
lpot
entia
llyci
ting
pate
nts
with
inth
ese
two
stat
esus
ing
thei
rap
plic
atio
nye
aran
da
thre
e-di
gitt
echn
olog
ycl
ass.
Inco
lum
ns(9
)–(1
2),a
san
addi
tiona
lrob
ustn
ess
chec
kto
dist
ingu
ish
the
effe
ctof
met
ropo
litan
collo
catio
nfro
mst
ate
bord
ers,
anal
ysis
has
been
furth
erre
stric
ted
toci
ted
pate
nts
orig
inat
ing
inCB
SAs
that
cros
sst
ate
bord
ers
and
have
both
actu
alas
wel
las
corr
espo
ndin
gco
ntro
lcita
tions
with
inth
eCB
SA(w
ithon
eor
both
ofth
empo
tent
ially
still
cros
sing
the
stat
ebo
rder
).Di
ffere
nce
ofm
eans
betw
een
inci
denc
esof
geog
raph
icco
lloca
tion
fora
ctua
lcita
tions
vers
usco
rres
pond
ing
cont
rols
wer
est
atis
tical
lysi
gnifi
cant
inal
lcas
es,s
oth
et-
stat
istic
sha
veno
tbe
enre
porte
dto
cons
erve
spac
e.
INFORMS
holds
copyrightto
this
article
and
distrib
uted
this
copy
asa
courtesy
tothe
author(s).
Add
ition
alinform
ation,
includ
ingrig
htsan
dpe
rmission
policies,
isav
ailableat
http://journa
ls.in
form
s.org/.
Singh and Marx: Geographic Constraints on Knowledge Spillovers2064 Management Science 59(9), pp. 2056–2078, © 2013 INFORMS
originating near a state border increases confidencein the possibility that state borders do indeed havean independent effect, columns (5)–(8) refine this byrestricting the set of potentially citing patents to thosethat originate within one of two states separated bythe state border under consideration. For example,for a cited patent from Haverhill, Massachusetts (nearthe New Hampshire border), we would consideronly (potential) citations from either Massachusetts orNew Hampshire. Given that the citing patents in thematched pairs in our original sample could be fromanywhere, this analysis relies on a new matched sam-ple appropriate to the task. A control patent is nowgenerated by matching the citing patent to a patentnot just from the same three-digit technology classand the same year but also originating from withinthe state dyad being considered.
The interpretation of the results reported incolumns (5)–(8) is that, in a sample comprising onlydyads of neighboring states, knowledge generatedwithin 20 miles of a state border is still much morelikely to be used within its state of origin than in theneighboring state (after, as before, adjusting for geo-graphic distribution of different technology classes).In other words, the finding of a state border remainsqualitatively robust to using this alternate method-ology.11 Because we use a new sample that restrictsactual and potential citations to be between neighbor-ing states within the United States, note that countryborder effects have been filtered out (so country-levelanalysis is no longer carried out) and that the reportednumbers are also not comparable with the findingsfrom columns (1)–(4).
One interesting feature of U.S. geography is that62 of the 943 CBSAs include more than one state.For example, the CBSA containing Cincinnati, Ohio,also extends into sections of Kentucky and Indiana.This allows us to test the border effect by exam-ining whether in-state localization exists even forknowledge flows within such CBSAs. Specifically,columns (9)–(12) report the findings based on a sub-sample of the data in columns (5)–(8) where theobservations only include cited patents originatingin a multistate CBSA. The observations are furtherrestricted to citations coming from within the CBSAthat are also matched to control citations also withinthe same CBSA. By construction, metropolitan effectshave therefore been filtered out (so CBSA-level anal-ysis is no longer carried out). Difference of means
11 In choosing the sample of cited patents near state borders, wehave reported findings based on a cut-off of 20 miles as a compro-mise between being close to the border and having a reasonablesample size. We tried progressively smaller windows starting from50 miles and going all the way down to those within 5 miles ofa state border. The findings remained robust in support of a stateborder effect.
between incidences of state-level collocation for actualcitations and the corresponding controls remains sta-tistically significant. Although their ratio is now muchsmaller, it should be noted that this is a very conser-vative test using a smaller, highly restrictive within-CBSA sample. Thus, just the fact that we find anystate-border effect in this case is perhaps in itself quiteremarkable. To a skeptic, this could be an indica-tion instead that the state border effect is perhapsnot as strong as it is made out to be in the earlieranalysis. At this point, we are agnostic to an exactinterpretation—preferring instead to address the issueusing our regression framework.
3.3. Analyzing Long-Term Localization TrendsUsing the Traditional Matching Framework
Our sample size is orders of magnitude larger thanthose employed in previous studies, so we can carryout more detailed analyses as reported in Table 3.Columns (1)–(4) segment our cited patents drawnfrom 1975–2004 into six five-year periods.12 Localiza-tion of knowledge spillovers remains robust acrossall periods for all three geographic units. Further,we can examine the time trends by taking the ratioof collocation frequency for inventor pairs compris-ing actual citations versus matched controls reportedin column (4) as an indicator of the strength of thegeographic effects. What is rather striking is that—despite much talk about globalization and decreas-ing relevance of geographic separation—the role ofgeography appears to have increased rather thandecreased over time. Given that the JTH frameworkonly analyzes each geographic unit in isolation, thisanalysis is, however, not able to disentangle whetherthe time trends are reflective primarily of under-lying border effects, proximity effects, or a combi-nation of the two. We will therefore return to thisissue later in the context of our preferred regressionframework that accounts for all geographic effectssimultaneously.
3.4. Analyzing Inventor vs. Examiner CitationsUsing the Traditional Matching Framework
Recent work has noted that many patent citationsare included not by the inventors themselves butlater by patent examiners (Alcacer and Gittelman2006). Therefore, it is useful to carry out an anal-ysis complementary to the above by examiningjust inventor-added citations, because these mightarguably be more likely to reflect prior art that aninventor was aware of in coming up with the focal
12 The sample size drops during the last five-year period (2000–2004) because, although the earlier periods employ a full 10-yearcitation window, for this period we only observe citing patentsthrough 2009.
INFORMS
holds
copyrightto
this
article
and
distrib
uted
this
copy
asa
courtesy
tothe
author(s).
Add
ition
alinform
ation,
includ
ingrig
htsan
dpe
rmission
policies,
isav
ailableat
http://journa
ls.in
form
s.org/.
Singh and Marx: Geographic Constraints on Knowledge SpilloversManagement Science 59(9), pp. 2056–2078, © 2013 INFORMS 2065
Tabl
e3
Dist
ingu
ishi
ngDi
ffere
ntTi
me
Perio
dsan
dCi
tatio
nsAd
ded
byIn
vent
ors
vs.E
xam
iner
s
Full
mat
ched
sam
ple
Inve
ntor
-add
edci
tatio
nsu
bsam
ple
Exam
iner
-add
edci
tatio
nsu
bsam
ple
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
Cita
tions
Intra
regi
onIn
trare
gion
Ratio
Cita
tions
Intra
regi
onIn
trare
gion
Ratio
Cita
tions
Intra
regi
onIn
trare
gion
Ratio
sam
ple
cita
tions
(%)
cont
rols
(%)
(2)/(
3)sa
mpl
eci
tatio
ns(%
)co
ntro
ls(%
)(6
)/(7)
sam
ple
cita
tions
(%)
cont
rols
(%)
(10)
/(11)
Coun
try-
leve
lana
lysi
s19
75–1
979
2621
657
6607
5900
1013
1979
–198
430
7109
067
055605
1019
1985
–198
950
4154
673
045800
1027
1990
–199
494
1114
176
015703
1033
360,
541
8501
5705
1048
154,
186
5907
5501
1008
1995
–199
911
4961
672
7700
5802
1032
917,
811
8504
5900
1045
495,
037
6201
5606
1010
2000
–200
449
5111
175
005509
1034
288,
992
8504
5701
1050
203,
926
6003
5402
1011
Stat
e-le
vela
naly
sis
1975
–197
926
2165
780
940
610
9319
79–1
984
3071
090
904
405
2009
1985
–198
950
4154
61101
409
2027
1990
–199
494
1114
11304
508
2031
360,
541
1507
601
2057
154,
186
905
506
1070
1995
–199
911
4961
672
1407
701
2007
917,
811
1608
702
2033
495,
037
1008
609
1057
2000
–200
449
5111
11603
703
2023
288,
992
1903
705
2057
203,
926
1201
701
1070
Met
ropo
litan
-leve
lana
lysi
s19
75–1
979
2621
657
503
201
2052
1979
–198
430
7109
050
620
120
6719
85–1
989
5041
546
607
201
3019
1990
–199
494
1114
180
020
530
2036
0,54
190
420
630
6215
4,18
650
320
320
3019
95–1
999
1149
6167
270
920
820
8291
7,81
190
120
930
1449
5,03
750
620
720
0720
00–2
004
4951
111
904
209
3024
288,
992
1104
300
3080
203,
926
607
207
2048
Notes.
Colu
mns
(1)–
(4)
empl
oyex
actly
the
sam
em
atch
edsa
mpl
eas
the
corr
espo
ndin
gco
lum
nsin
the
prev
ious
tabl
eex
cept
that
the
anal
ysis
has
now
been
brok
enup
into
six
five-
year
time
perio
dsba
sed
onth
eap
plic
atio
nye
arof
the
cite
dpa
tent
.The
sam
ple
size
drop
sdu
ring
2000
–200
4be
caus
e,al
thou
ghth
efir
stfiv
epe
riods
empl
oyth
efu
ll10
-yea
rci
tatio
nw
indo
w,t
heob
serv
edw
indo
wis
shor
ter
for
pate
nts
inth
ispe
riod,
give
nth
atw
eon
lyob
serv
eci
ting
pate
nts
until
2009
.Col
umns
(5)–
(8)
are
base
don
lyon
the
subs
ampl
eof
cita
tions
adde
dby
inve
ntor
san
dth
eir
corr
espo
ndin
gco
ntro
ls,a
ndco
lum
ns(9
)–(1
2)ar
eba
sed
only
onth
esu
bsam
ple
ofci
tatio
nsad
ded
byex
amin
ers
and
thei
rcor
resp
ondi
ngco
ntro
ls.B
ecau
seth
isdi
stin
ctio
nis
only
avai
labl
efo
rciti
ngpa
tent
spo
st-2
001,
this
anal
ysis
isdo
neon
lyfo
rthe
cite
dpa
tent
orig
inat
ing
perio
dsfo
rwhi
chth
eci
tatio
nw
indo
wov
erla
psw
ithav
aila
bilit
yof
the
inve
ntor
vers
usex
amin
erdi
stin
ctio
nin
form
atio
nfo
rcita
tions
.
INFORMS
holds
copyrightto
this
article
and
distrib
uted
this
copy
asa
courtesy
tothe
author(s).
Add
ition
alinform
ation,
includ
ingrig
htsan
dpe
rmission
policies,
isav
ailableat
http://journa
ls.in
form
s.org/.
Singh and Marx: Geographic Constraints on Knowledge Spillovers2066 Management Science 59(9), pp. 2056–2078, © 2013 INFORMS
invention.13 Columns (5)–(8) of Table 3 report theJTH-kind analysis based only on the subsample ofcitations added by inventors (and the correspondingcontrols). Because the inventor/examiner distinctionis only available for citations post-2001, these calcula-tions are reported only for the cited patent originatingduring one of the three five-year periods for whichthe citation window overlaps with availability of thisinformation for a significant fraction of the citations.Comparing the extent of the localization effect cal-culated in column (8) with column (4) reveals thata focus on just inventor-added citations significantlystrengthens the geographic localization for all threegeographic units of analysis. Unlike the results in col-umn (4), the results in column (8) do not show anytime trends—though that is largely reflective of thefact that the analysis cannot even be carried out forthe first three periods because of unavailability of theinventor versus examiner distinction for citing patentspre-2001.
Thompson (2006) exploits the inventor/examinerdistinction to address a challenge when using a JTH-style matching approach: because even the finestavailable technological classification might not cap-ture some unobserved technological characteristicsdriving both patent citation patterns and geograph-ical collocation, it is hard to be definitive aboutgeographic collocation leading to increased knowl-edge diffusion. He suggests an identification strat-egy wherein one only takes greater localizationfor inventor-contributed citations relative to that forcitations added by examiners as reliable evidenceof localized knowledge spillovers. The rationale istwofold. First, because patent examiners are gen-erally recruited directly after college, they do nothave any specialized work experience that could biasthem toward adding citations to patents from spe-cific locations. Second, these examiners work at asingle campus in Alexandria, Virginia, further mak-ing them “geography blind.” In other words, theexaminers should have no reason to disproportion-ately add localized citations over and above the nat-ural distribution of prior patents relevant for a givenpatent application, making examiner-added citationsuseful as an appropriate benchmark for interpretinggeographic localization of inventor-added citations.To use this approach for analyzing inventor citationfindings from columns (5)–(8), we first report analysis
13 In addition to the inventor versus examiner distinction beingavailable only post-2001, a case can also be made in favor of con-sidering all citations (rather than just inventor citations) becauseinventors may omit—even deliberately, for strategic reasons—reference to some patents representing knowledge they build on(Lampe 2012).
using just examiner-added citations (and correspond-ing matched controls) in columns (9)–(12). Compar-ing columns (8) and (12), the calculated ratio betweencollocation incidences for realized citations versusmatched patent pairs is found to be higher in allcases for inventor-added citations than for examiner-added citations, further establishing the robustness ofthe finding on geographic localization of knowledgespillovers. Note that we still have not disentangledborder versus proximity effects, for which we turnto their simultaneous examination in our regressionframework.
4. Analysis Using the WESMLRegression Framework
4.1. Simultaneous Examination of MultipleGeographic Levels
We now turn to the regression framework to simul-taneously examine national and state borders afteraccounting for proximity effects related to metropoli-tan (i.e., CBSA) collocation and geographic distance.Table 4 summarizes the variables used in our analy-ses. It is worth restating the data limitation that dis-tance is measured based on the latitude and longitudecoordinates we have for inventor cities, not the exactinventor addresses. Before trying to disentangle bor-ders and proximity, however, it is instructive to get anoverall sense of diffusion and geography. The analy-sis reported in column (1) of Table 5 is the simplestway of seeing this. The WESML regression estimateshave an intuitive interpretation in terms of how anexplanatory variable drives the likelihood of citationbetween random patents in the population, with thefact that citations are rare events making it possibleto in fact directly interpret the logistic model coef-ficients as percentage effects on citation likelihood.14
Column (1) implies that the likelihood of citation fallsby 36% with a doubling of distance.
The analysis reported in column (2) also includesrelevant control variables. This includes controlsfor technological similarity and relatedness betweenpatents using a series of associated variables ratherthan only relying on the three-digit technology classmatch. The findings in column (2) imply that the like-lihood of citation now falls by 27% with a doubling
14 In a logistic model, the marginal effect for a variable j is �jå′4xÂ5,
which turns out to equal �jå4xÂ561−å4xÂ57. In general, this wouldneed to be calculated either based on the mean predicted proba-bility or using the sample mean for å4xÂ). But the fact that cita-tions are rare events allows further simplification: since å4xÂ5 ismuch smaller than 1, �jå4xÂ561−å4xÂ57 is practically equivalent to�jå4xÂ5. This means the coefficient estimate for �j can be directlyinterpreted as the percentage change in citation probability with aunit change in variable j .
INFORMS
holds
copyrightto
this
article
and
distrib
uted
this
copy
asa
courtesy
tothe
author(s).
Add
ition
alinform
ation,
includ
ingrig
htsan
dpe
rmission
policies,
isav
ailableat
http://journa
ls.in
form
s.org/.
Singh and Marx: Geographic Constraints on Knowledge SpilloversManagement Science 59(9), pp. 2056–2078, © 2013 INFORMS 2067
Table 4 Definitions of Variables Used During Regression Analysis
Political border variablessame country Indicator variable that is equal to 1 if the citing and cited patents originate in the same country, that is, the United
States (given that our cited patent sample is drawn from the United States only)same state Indicator variable that is equal to 1 if the two patents originate in the same state (within the United States)
Spatial proximity variablessame cbsa Indicator variable that is equal to 1 if the citing and cited patents originate from inventors located in the same core
based statistical area (CBSA) as per the 2003 definition of CBSAs by the U.S. Office of Management and Budget(CBSA definitions are meant to cover reasonable commuting distances and replace the prior MSA/PMSA/CMSAdefinitions for defining U.S. metropolitan areas in a more standardized fashion)
distance Distance, in miles, between the cities where the first inventors of the source and destination patents live(calculated as spherical distance between the latitude and longitude values for these cities)
Technological relatedness variablessame tech category Indicator variable that is equal to 1 if the two patents belong to the same one-digit NBER technology categorysame tech subcategory Indicator variable that is equal to 1 if the two patents belong to the same two-digit NBER technical subcategorysame tech class Indicator variable that is equal to 1 if the two patents belong to the same three-digit USPTO primary technology
classrelatedness of tech classes Likelihood of citation (scaled by 100) between random patents with the same respective three-digit primary
technology classes that the focal cited and citing patents belong tooverlap of tech subclasses Natural logarithm of one plus the number of overlapping nine-digit technology subclasses under which the
patents are categorized
Patent-level variablesreferences to other patents Number of references the cited patent makes to other patentsreferences to nonpatent materials Number of references the cited patent makes to published materials other than patentsnumber of claims Number of claims the cited patent makesperiod A sequential number representing which of our six five-year time periods the focal cited patent belongs to:
197,579 being period 0, 1980–1984 being period 1, 1985–1989 being period 2, 1990–1994 being period 3,1995–1999 being period 4, and 2000–2004 being period 5
of distance—the difference between the citation likeli-hoods corresponding to column (1) and (2) estimatesbeing driven mainly by the new technology controls.In line with the JTH argument, we find knowledgeflows within the same or related technologies to bestronger than those across different technologies, asindicated by the positive and significant estimates forsame tech category, same tech subcategory, same tech class,and relatedness of tech classes. Taking into account theThompson and Fox-Kean (2005) critique regarding theinadequacy of three-digit technological controls, wehave also included a control variable to capture over-lap between the citing and cited patent along theirsecondary nine-digit technology subclasses (overlap oftech subclasses); we find that to have a strong effectas well.
Setting the geographic distance variable aside fornow, columns (3)–(5) successively introduce variablesfor collocation at three geographic levels: country(same country), state (same state), and metropolitanarea (same cbsa). In terms of magnitude, column (5)estimates imply a 77% greater likelihood of within-country knowledge flow than across national borders,41% greater likelihood for within-state flow than thatacross state borders, and 77% greater likelihood forwithin-CBSA flow than across CBSA boundaries. Theeffect size corresponding to each of these three esti-mates is smaller than what it would be if estimat-ing these effects individually in separate models, and
nonoverlap of the corresponding confidence intervalsis indicative of this difference being statistically sig-nificant. This highlights the benefit of using a regres-sion approach in disentangling effects at the variousgeographic levels by simultaneously considering theeffect of all three.15
Simultaneously considering multiple geographicunits indicates that there is more to the nationaland state border effects than a mere aggregation oflocalization mechanisms operating at the metropoli-tan level. The estimates in column (5), however, donot rule out the possibility that such effects are notepiphenomenal with spatial distance, because includ-ing the CBSA collocation variable does not accountfor distance-related effects that might be more grad-ual than CBSA collocation. To this point, the modelin column (6) now also includes the distance variablefrom before. As expected, with geographic proxim-ity now better controlled for through the combinationof metropolitan collocation and distance, both bor-der effects become smaller. The extent of this drop—calculated in terms of a percentage difference in the
15 If we carry out this analysis excluding the nine-digit technol-ogy control, the magnitude of geographic localization on all threedimensions turns out to be larger—with the difference being thegreatest for metropolitan collocation. This is in line with intuitionthat geographic concentration of technological activity—whichis what our technology-related control variables account for—isgreater when viewed at a finer level of granularity for technology.
INFORMS
holds
copyrightto
this
article
and
distrib
uted
this
copy
asa
courtesy
tothe
author(s).
Add
ition
alinform
ation,
includ
ingrig
htsan
dpe
rmission
policies,
isav
ailableat
http://journa
ls.in
form
s.org/.
Singh and Marx: Geographic Constraints on Knowledge Spillovers2068 Management Science 59(9), pp. 2056–2078, © 2013 INFORMS
Table 5 Simultaneous Consideration of Political Borders and Spatial Proximity
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10)Full Full Full Full Full Full Full Near-border Excluding Excl. comp.
sample sample sample sample sample sample sample sample California and comm.
same country 00863∗∗∗ 00769∗∗∗ 00766∗∗∗ 00535∗∗∗ 00451∗∗∗ 00513∗∗∗ 00447∗∗∗ 00441∗∗∗
4000065 4000065 4000065 4000115 4000165 4000325 4000215 4000245same state 00750∗∗∗ 00405∗∗∗ 00109∗∗∗ 00228∗∗∗ 00253∗∗∗ 00346∗∗∗ 00230∗∗∗
4000175 4000245 4000275 4000275 4000475 4000495 4000445same cbsa 00769∗∗∗ 00456∗∗∗ 00337∗∗∗ 00295∗∗∗ 00433∗∗∗ 00407∗∗∗
4000305 4000295 4000325 4000715 4000545 4000455ln4distance+ 15 −00364∗∗∗ −00271∗∗∗ −00137∗∗∗
4000015 4000035 4000055distance 0 (i.e., same city) 10665∗∗∗ 10896∗∗∗ 10661∗∗∗ 10910∗∗∗
4000675 4002075 4001035 4000955distance 0–10 miles 10129∗∗∗ 10342∗∗∗ 10203∗∗∗ 10236∗∗∗
4000575 4001125 4000835 4000865distance 10–20 miles 00990∗∗∗ 10020∗∗∗ 00923∗∗∗ 10064∗∗∗
4000495 4000885 4000735 4000705distance 20–30 miles 00836∗∗∗ 00517∗∗∗ 00720∗∗∗ 00919∗∗∗
4000555 4001085 4000815 4000795distance 30–40 miles 00552∗∗∗ 00239∗ 00386∗∗∗ 00660∗∗∗
4000715 4001255 4001165 4001005distance 40–50 miles 00613∗∗∗ 00433∗∗∗ 00657∗∗∗ 00802∗∗∗
4000995 4000965 4000715 4000675distance 50–75 miles 00595∗∗∗ 00533∗∗∗ 00583∗∗∗ 00707∗∗∗
4000405 4000665 4000505 4000525distance 75–100 miles 00546∗∗∗ 00529∗∗∗ 00579∗∗∗ 00665∗∗∗
4000385 4000665 4000495 4000535distance 100–150 miles 00599∗∗∗ 00584∗∗∗ 00614∗∗∗ 00656∗∗∗
4000335 4000535 4000405 4000485distance 150–200 miles 00585∗∗∗ 00553∗∗∗ 00584∗∗∗ 00670∗∗∗
4000295 4000505 4000335 4000395distance 200–300 miles 00479∗∗∗ 00557∗∗∗ 00491∗∗∗ 00544∗∗∗
4000295 4000435 4000355 4000425distance 300–400 miles 00503∗∗∗ 00520∗∗∗ 00566∗∗∗ 00567∗∗∗
4000245 4000445 4000275 4000345distance 400–500 miles 00479∗∗∗ 00569∗∗∗ 00508∗∗∗ 00566∗∗∗
4000245 4000485 4000295 4000365distance 500–750 miles 00480∗∗∗ 00494∗∗∗ 00473∗∗∗ 00519∗∗∗
4000225 4000385 4000275 4000335distance 750–11000 miles 00439∗∗∗ 00483∗∗∗ 00450∗∗∗ 00484∗∗∗
4000205 4000385 4000255 4000295distance 11000–11500 miles 00419∗∗∗ 00433∗∗∗ 00441∗∗∗ 00467∗∗∗
4000205 4000385 4000255 4000305distance 11500–21000 miles 00377∗∗∗ 00400∗∗∗ 00391∗∗∗ 00405∗∗∗
4000195 4000405 4000245 4000275distance 21000–21500 miles 00368∗∗∗ 00470∗∗∗ 00417∗∗∗ 00382∗∗∗
4000195 4000385 4000255 4000285distance 21500–41000 miles 00461∗∗∗ 00487∗∗∗ 00460∗∗∗ 00507∗∗∗
4000155 4000235 4000165 4000215distance 41000–61000 miles 00112∗∗∗ 00257∗∗∗ 00154∗∗∗ 00133∗∗∗
4000105 4000275 4000115 4000125same tech category 10103∗∗∗ 10115∗∗∗ 10111∗∗∗ 10108∗∗∗ 10106∗∗∗ 10107∗∗∗ 10088∗∗∗ 10102∗∗∗ 00893∗∗∗
4000065 4000065 4000065 4000065 4000065 4000065 4000115 4000065 4000075same tech subcategory 10298∗∗∗ 10310∗∗∗ 10300∗∗∗ 10299∗∗∗ 10297∗∗∗ 10296∗∗∗ 10298∗∗∗ 10300∗∗∗ 10460∗∗∗
4000085 4000085 4000085 4000085 4000085 4000085 4000155 4000095 4000105same tech class 20141∗∗∗ 20154∗∗∗ 20156∗∗∗ 20145∗∗∗ 20144∗∗∗ 20144∗∗∗ 20267∗∗∗ 20215∗∗∗ 20283∗∗∗
4000165 4000145 4000165 4000155 4000155 4000145 4000255 4000175 4000165
INFORMS
holds
copyrightto
this
article
and
distrib
uted
this
copy
asa
courtesy
tothe
author(s).
Add
ition
alinform
ation,
includ
ingrig
htsan
dpe
rmission
policies,
isav
ailableat
http://journa
ls.in
form
s.org/.
Singh and Marx: Geographic Constraints on Knowledge SpilloversManagement Science 59(9), pp. 2056–2078, © 2013 INFORMS 2069
Table 5 (Continued)
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10)Full Full Full Full Full Full Full Near-border Excluding Excl. comp.
sample sample sample sample sample sample sample sample California and comm.
relatedness of tech classes 10512∗∗∗ 10604∗∗∗ 10481∗∗∗ 10518∗∗∗ 10501∗∗∗ 10502∗∗∗ 10537∗∗∗ 10650∗∗∗ 10567∗∗∗
4001295 4001045 4001275 4001125 4001165 4001045 4001935 4001385 4001245overlap of tech subclasses 10687∗∗∗ 10691∗∗∗ 10686∗∗∗ 10686∗∗∗ 10684∗∗∗ 10681∗∗∗ 10716∗∗∗ 10704∗∗∗ 10845∗∗∗
4000115 4000105 4000115 4000115 4000115 4000115 4000195 4000135 4000165ln4references to 00134∗∗∗ 00135∗∗∗ 00135∗∗∗ 00136∗∗∗ 00135∗∗∗ 00135∗∗∗ 00159∗∗∗ 00149∗∗∗ 00134∗∗∗
other patents+ 15 4000055 4000055 4000055 4000055 4000055 4000055 4000125 4000065 4000075ln4references to nonpatent 00034∗∗∗ 00034∗∗∗ 00034∗∗∗ 00033∗∗∗ 00033∗∗∗ 00033∗∗∗ 00007 00032∗∗∗ 00039∗∗∗
materials+ 15 4000055 4000045 4000045 4000055 4000055 4000055 4000085 4000065 4000085ln4number of claims5 00092∗∗∗ 00092∗∗∗ 00092∗∗∗ 00093∗∗∗ 00092∗∗∗ 00092∗∗∗ 00110∗∗∗ 00095∗∗∗ 00082∗∗∗
4000055 4000055 4000055 4000055 4000055 4000055 4000105 4000065 4000065Period indicators Yes Yes Yes Yes Yes Yes Yes Yes Yes YesCitation lag indicators Yes Yes Yes Yes Yes Yes Yes Yes Yes YesTwo-digit tech indicators Yes Yes Yes Yes Yes Yes Yes Yes Yes YesState indicators Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes
Number of observations 13,728,582 13,728,582 13,728,582 13,728,582 13,728,582 13,728,582 13,728,582 3,600,000 10,994,852 10,474,569Pseudo-R2 0.0122 0.181 0.179 0.181 0.182 0.182 0.183 0.189 0.188 0.196Wald chi2 124,220 756,991 785,451 767,431 759,980 755,320 763,511 221,768 616,233 519,519Degrees of freedom 1 69 69 70 71 72 91 91 90 87
Notes. The unit of observation is a pair of patents representing an actual or potential citation. The dependent variable is an indicator for whether or not thepotentially citing patent actually cited the focal patent. A choice-based stratified sample is used, and a weighted logistic regression (WESML) approach isimplemented using observation weights that reflect sampling frequency associated with different strata. The regression model also uses a constant term andindicator variables as indicated above, but these are not reported to conserved space and are available from the authors on request. Robust standard errorsare shown in parentheses and are clustered on the cited patent.
∗p < 001; ∗∗∗p < 0001.
average predicted marginal effect for a variable acrossthe two models—turns out to be much larger for thesame state effect than the same country effect.
To allow more flexibility in how distance constrainsknowledge flows, column (7) repeats the analysis witha series of indicator variables for distance ranges, cov-ering increasing distances starting in the sequence dis-tance 0 miles (i.e., same city), distance 0–10 miles, andso on. The omitted category is distance greater than6,000 miles. This nonparametric approach ensuresthat the same country and same state estimates moreaccurately measure border effects independent of geo-graphic proximity. (Even more fine-grained indicatorsdid not materially alter findings.) Not surprisingly,estimates for the distance indicators themselves revealthat knowledge flows are greatest when the sourceand recipients are collocated within the same city (i.e.,distance = 0) and that the distance effect graduallyfalls (more or less monotonically) with distance. Oncemore, however, we find statistically and economicallysignificant estimates for same country and same stateeven after we have accounted for geographic prox-imity using same cbsa and distance indicators.16 (Notethat it is hard to directly compare same country and
16 Rather than calculating standard errors based on clustering atthe patent level, we also tried the associate editor’s suggestionof geographic clustering at different levels—the city, the CBSA,and even the state—to be conservative in the kinds of unobserved
same state coefficients across columns (6) and (7), asthe latter has a large number of new variables in theform of distance indicators.) This finding challengesan interpretation that localized knowledge diffusionreported by previous studies is merely a manifes-tation of intraregional distances being on averagesmaller than cross-regional distances.17
4.2. Further Investigation of the Border EffectsUsing the WESML Regression Framework
We now examine subsamples to figure out whetherour findings are driven by particular kinds of patents.One concern might be whether the state-level find-ing is driven by observations that are quite distantfrom the state border. Analogous to the near-borderanalysis presented for the JTH approach, we analyzediffusion of knowledge originating near state bordersto see if there is on average a similar state-bordereffect even for these. Specifically, we look at the sub-set of potentially cited patents that lie within 20 miles
heterogeneity accounted for. Although the standard errors didbecome larger as expected, the coefficients for same country, samestate, and same cbsa still remained statistically significant at the 1%level.17 In additional analysis, we tried models with indicators for con-tiguous countries and contiguous states to distinguish cases where thesource and destination share a border. While we did find knowl-edge flow to be more intense between contiguous regions, wefound that independent country and state border effects persist.
INFORMS
holds
copyrightto
this
article
and
distrib
uted
this
copy
asa
courtesy
tothe
author(s).
Add
ition
alinform
ation,
includ
ingrig
htsan
dpe
rmission
policies,
isav
ailableat
http://journa
ls.in
form
s.org/.
Singh and Marx: Geographic Constraints on Knowledge Spillovers2070 Management Science 59(9), pp. 2056–2078, © 2013 INFORMS
of a state border. As column (8) in Table 5 indicates,the findings for the near-border cited patent subsam-ple turn out to be qualitatively similar to those fromthe full sample (column (7)), including the continuedpresence of a significant same-state effect.
Next, we subset our sample by removingCalifornia, as Silicon Valley has been often describedas an outlier for diffusion (Almeida and Kogut 1999).As the top state in terms of patenting activity and oneof the largest in terms of area, one might worry thatour results depend on California in ways that statefixed effects do not capture. In column (9) of Table 5,both country and state localization are found to berobust to excluding California. To further investigatewhether our findings are state specific, in an analysisnot reported to conserve space, we also carried outanalogous analyses for cited patent subsamples fromthe 10 largest patenting states. The findings revealedthat, in 6 of these 10 cases, observed state-levellocalization of knowledge originating within thestate borders could not be completely explainedsimply by geographic proximity effects in the form ofmetropolitan collocation and/or shorter geographicdistances. In other words, the finding is not drivenby just one or two specific states. In fact, Californiaturned out to be one of the minority cases where stateborders do not seem to have an effect independent ofdistance (but CBSA boundaries like those of the Sili-con Valley still do), suggesting that—once one crossesout of certain areas like Silicon Valley—knowledge isno longer further constrained by state borders overand above effects related simply to distance.
Next, we turn to checking whether the results couldsimilarly be driven by specific sectors. To start with,we exclude the one-digit NBER technology categorycomputers and communications—a sector many scholarsconsider unique. As column (10) in Table 5 shows, theresults are qualitatively unchanged. We also carriedout (but omit for space) separate analyses for citedpatent subsamples from all six one-digit NBER cat-egories. We found that the findings are not drivenby a specific sector. In fact, in five of the six cases,observed state-level localization of knowledge couldnot be completely explained simply by geographicproximity effects, the only exception being the NBERcategory others. Similarly, repeating the analysis withtwo-digit NBER subcategories revealed an indepen-dent state border effect for 30 of the 36 subsamples.Thus, the state border finding is clearly not driven byjust one or two specific sectors either.
4.3. Analyzing Long-Term Localization TrendsUsing the WESML Regression Framework
We next investigate whether these effects are drivenby particular time periods as opposed to being per-sistent. Before disentangling long-term trends in bor-der and proximity effects, it is useful to start with an
overall sense of how the role of geography in knowl-edge diffusion has evolved over time. With this view,column (1) in Table 6 extends the analysis from col-umn (2) in Table 5 by adding an interaction termperiod ∗ ln4distance+ 15 between the distance variableand the time-period variable capturing the five-yearperiod when the cited patent originated. (See Table 4for detailed definition.)18 Surprisingly, and contrary tothe widespread notion that the importance of distancehas been eroding over time because of globalizationand technological advancement, the decay in citationrate with distance seems to have increased over time,albeit the economical magnitude of this is not toolarge.
In column (2), we turn to disentangling time trendsin the border versus proximity effects, with the goalof figuring out whether the role of political bordershas strengthened or weakened over time once prox-imity is accounted for. In addition to the distancevariable, we re-introduce our other three geographicvariables—same country, same state, and same cbsa—but now also bring in their interaction effects withthe time variable period. The trends turn out to differacross different variables: the effect of national bor-ders seems to have increased over time, whereas thatfor state borders and CBSA boundaries has decreased.Additional analyses in columns (3) and (4) add dis-tance indicators and the full set of distance-periodindicators, respectively, to more completely accountfor any distance-related effects and trends not cap-tured above. The finding on the opposite time trendsfor country versus state borders remains qualitativelyrobust, with the country effect still strengthening overtime and the state effect weakening. However, theCBSA finding is more fickle, becoming statisticallyinsignificant in column (3) and ultimately flipping tobecome positive (and statistically significant) in col-umn (4). This might be caused by the high correlationbetween the distance indicators and same cbsa.
As the model with the least functional formrestrictions on distance, column (4) represents ourspecification of choice. Following Greene (2010), weinterpret the results for the interaction terms in thisnonlinear model graphically by calculating the aver-age predicted effect of a 0 to 1 transition for samecountry, same state, and same cbsa. Specifically, by car-rying out this exercise for the subsamples from dif-ferent time periods, we plot the predicted effects fordifferent periods in Figure 2. Examining the ratioof the predicted effect for the case where a specific
18 Recall that we use separate sets of indicators for the cohort thecited patent comes from and the lag between the cited and citingpatents. Using longitudinal variation, our sample allows us to sep-arately identify cohort effects and lag effects in a way that previousstudies with more restrictive samples (such as Thompson 2006) arenot able to.
INFORMS
holds
copyrightto
this
article
and
distrib
uted
this
copy
asa
courtesy
tothe
author(s).
Add
ition
alinform
ation,
includ
ingrig
htsan
dpe
rmission
policies,
isav
ailableat
http://journa
ls.in
form
s.org/.
Singh and Marx: Geographic Constraints on Knowledge SpilloversManagement Science 59(9), pp. 2056–2078, © 2013 INFORMS 2071
Table 6 Time Trends in Geographic Knowledge Diffusion Patterns
(1) (2) (3) (4) (5)Full sample Full sample Full sample Full sample Full sample
same country 00259∗∗∗ 00140∗∗∗ 00362∗∗∗ 00249∗∗∗
4000225 4000205 4000485 4000345same state 00381∗∗∗ 00367∗∗∗ 00358∗∗∗ 00332∗∗∗
4000705 4000665 4000745 4000485same cbsa 00616∗∗∗ 00424∗∗∗ 00125 −00034
4000695 4000725 4000875 4000845ln4distance+ 15 −00224∗∗∗ −00085∗∗∗
4000085 4000115period ∗ same country 00089∗∗∗ 00106∗∗∗ 00021∗
4000075 4000045 4000115period ∗ same state −00089∗∗∗ −00048∗∗∗ −00035∗∗
4000175 4000165 4000155period ∗ same cbsa −00053∗∗∗ −00026 00048∗∗
4000195 4000205 4000205period ∗ ln4distance+ 15 −00011∗∗∗ −00017∗∗∗
4000025 4000035same country ∗ period 1980–1984 00061
4000805same country ∗ period 1985–1989 00321∗∗∗
4000515same country ∗ period 1990–1994 00255∗∗∗
4000475same country ∗ period 1995–1999 00196∗∗∗
4000435same country ∗ period 2000–2004 00147∗∗∗
4000535same state ∗ period 1980–1984 00069
4000625same state ∗ period 1985–1989 −00279∗∗
4001335same state ∗ period 1990–1994 −00011
4000605same state ∗ period 1995–1999 −00171∗∗∗
4000575same state ∗ period 2000–2004 −00187∗∗∗
4000725same cbsa ∗ period 1980–1984 00113
4001145same cbsa ∗ period 1985–1989 00513∗∗∗
4001335same cbsa ∗ period 1990–1994 00371∗∗∗
4001005same cbsa ∗ period 1995–1999 00412∗∗∗
4000955same cbsa ∗ period 2000–2004 00333∗∗∗
4001195Distance-period indicators No No No Yes YesDistance indicators No No Yes Yes YesPeriod indicators Yes Yes Yes Yes YesCitation lag indicators Yes Yes Yes Yes YesTwo-digit tech indicators Yes Yes Yes Yes YesState indicators Yes Yes Yes Yes YesOther control variables Yes Yes Yes Yes Yes
Number of observations 13,728,582 13,728,582 13,728,582 13,728,582 13,728,582Pseudo-R2 0.181 0.183 0.183 0.183 0.183Wald chi2 758,828 754,616 762,307 778,608 780,425Degrees of freedom 70 76 94 189 201
Notes. All notes from Table 5 apply here as well, except that regression coefficients for the control variables as well as for the distance-period and distanceindicators (when applicable) are also omitted to further conserve space. As indicated, distance indicators are excluded in the first two models because acontinuous distance variable has been directly included in those models.
∗p < 001; ∗∗p < 0005; ∗∗∗p < 0001.
INFORMS
holds
copyrightto
this
article
and
distrib
uted
this
copy
asa
courtesy
tothe
author(s).
Add
ition
alinform
ation,
includ
ingrig
htsan
dpe
rmission
policies,
isav
ailableat
http://journa
ls.in
form
s.org/.
Singh and Marx: Geographic Constraints on Knowledge Spillovers2072 Management Science 59(9), pp. 2056–2078, © 2013 INFORMS
Figure 2 Predicted Probabilities Across Different Time Periods
0.0000
0.0001
0.0002
0.0003
0.0004
0.0005
0.0006
0.0007
0.0008
Pre
dict
ed c
itatio
n lik
elih
ood
(%)
1975-79 1980-84 1985-89 1990-94 1995-99 2000-04
Cited patent year
(i) Country border effect after accountingfor other geographic levels
Citing patent within country
Citing patent outside country
0.0000
0.0001
0.0002
0.0003
0.0004
0.0005
0.0006
0.0007
0.0008
0.0009
Pre
dict
ed c
itatio
n lik
elih
ood
(%)
1975-79 1980-84 1985-89 1990-94 1995-99 2000-04
1975-79 1980-84 1985-89 1990-94 1995-99 2000-04
Cited patent year
(ii) State border effect after accountingfor other geographic levels
Citing patent within state
Citing patent outside state
Pre
dict
ed c
itatio
n lik
elih
ood
(%)
0.0000
0.0001
0.0002
0.0003
0.0004
0.0005
0.0006
0.0007
0.0008
0.0009
0.0010
Cited patent year
(iii) CBSA boundary effect after accountingfor other geographic levels
Citing patent within CBSA
Citing patent outside CBSA
variable (such as same country) is set to 1 versus 0helps interpret the economic magnitude of the trend.(Whether one of these individual predicted effect linesslopes upward or downward from a specific period tothe next is not relevant to this analysis beyond howthe ratio itself evolves over time.) For example, theratio of the cases with same country being 1 versus 0increases from 1.42 in 1975–1980 (predicted probabili-ties of 5.0 in a million for same country= 1 versus 3.5in a million for same country= 0) to 1.66 in 2000–2004(4.4 in a million versus 2.7 in a million). In contrast,the ratio of the cases with same state being 1 versus 0decreases from 1.38 in 1975–1980 (probabilities of 5.9in a million for same state = 1 versus 4.3 in a millionsame state= 0) all the way down to 1.15 in 2000–2004(4.3 in a million versus 3.8 in a million). We offer asimilar chart of CBSA for completeness; however, asour distance variable is based on a single latitude andlongitude value for a city, we consider it too noisy toreliably disentangle microlevel distance effects fromCBSA effects. We therefore suggest caution in inter-preting the CBSA variables, treating these as con-trols for our study rather than taking the results asconclusive.
Relying on statistical testing using the period abovehas helped us formally test for long-term trends inknowledge diffusion patterns. However, for period-by-period findings that do not impose linear restric-tions on the effect of period, column (5) depicts theresults of interacting the geography variables, withindicators corresponding to different five-year timeperiods. The omitted (reference) period is 1975–1979.Relative to 1975–1979, the country border effects arestronger in four of the five subsequent periods (andstatistically indistinguishable in the remaining one).In contrast, relative to the same baseline period,the state border effects are weaker in three of thefive subsequent periods (and statistically indistin-guishable in the two remaining ones). Overall, theseresults are generally consistent with the time trendsdocumented above: country-level localization seemsto have strengthened over time, whereas state-levellocalization has diminished.19
4.4. Analyzing Inventor vs. Examiner CitationsUsing the WESML Regression Framework
We now revisit the issue that many citations aregenerated not by inventors but by patent examin-ers. For easy interpretation, logistic regression esti-mates for the inventor versus examiner subsample
19 We wondered about the extent to which the temporal patternsare an artifact of changes in sectoral composition over time. In anal-ysis not detailed here, we found that the increase in country-levellocalization as well as the drop in state-level localization is a moregeneral phenomenon than being driven by increasing dominanceof specific sectors.
INFORMS
holds
copyrightto
this
article
and
distrib
uted
this
copy
asa
courtesy
tothe
author(s).
Add
ition
alinform
ation,
includ
ingrig
htsan
dpe
rmission
policies,
isav
ailableat
http://journa
ls.in
form
s.org/.
Singh and Marx: Geographic Constraints on Knowledge SpilloversManagement Science 59(9), pp. 2056–2078, © 2013 INFORMS 2073
are first separately reported in columns (1)–(3) andcolumns (4)–(6), respectively, of Table 7. This is fol-lowed by the last three columns, which examinethe two subsamples together in a single multinomiallogistic framework in order to allow more rigorousinference. As noted in §3.4, unavailability of informa-tion on whether a pre-2001 citation was added by aninventor or examiner restricts our analysis to patentsreceiving a meaningful number of relatively recentcitations. This reduces the number of observationsconsiderably compared with Table 5 and also makes itimpractical for exploiting the inventor/examiner cita-tion distinction to shed further light on the long-termtime trends.
We start with side-by-side analyses of the inventor-added citations subsample (which includes not onlyactual citations but also controls matched to those)in columns (1)–(3) and the examiner-added citationssubsample (which also includes actual citations andcorresponding controls) in columns (4)–(6) in Table 7.We compare columns (1) and (4) to assess the overallgeographic effect. When not simultaneously account-ing for political borders, the effect size implied bythe coefficient on ln4distance = 15 variable is almosttwice as large for citations added by inventors insteadof by examiners. However, simultaneously consid-ering all our geographic units representing politi-cal borders and spatial proximity in the remainingcolumns questions whether a large part of the over-all effect is truly comprised of an impact of prox-imity per se. The difference between the effect sizesfor coefficients on ln4distance + 15 for columns (2)and (5) are not that large, relative to the big gapbetween the two for columns (1) and (4). Similarly,the effects size for same cbsa is not too differentbetween columns (2) and (5), and in fact becomes vir-tually indistinguishable between columns (3) and (6),as distance is accounted for nonparametrically inthe form of our full set of indicator variables. Thisreinforces the concerns expressed by Thompson andFox-Kean (2005) and Thompson (2006) that knowl-edge spillovers reported in earlier studies might, toa significant extent, have been a manifestation of theUSPTO classification system (or, for that matter, anyformal classification system), only imperfectly captur-ing true technological relationships across patents.
The previous finding on the influence of politicalborders, however, is not diluted as much by theinventor/examiner distinction. Comparing the esti-mates for same country in columns (2) and (3) withthose in columns (5) and (6), respectively, examiner-added citations in fact show no country-level local-ization, whereas the effect for inventor citationsis economically and statistically highly significant.The state-level result also remains robust. How-ever, although the same state coefficient is statistically
insignificant for the examiner-added citation analysisin column (5), it turns significant for the preferredspecification in column (6) once distance is accountedfor in a nonparametric fashion. Still, the effect sizecorresponding to the coefficient on same state in col-umn (3) remains considerably larger than that in col-umn (6). The relative weakness of the same state effectin this analysis might in part be a result of the lim-ited timeframe of the inventor versus examiner dis-tinction, as we are able to observe patents only in thelatter portion of our 30-year window. Recall from theearlier time trends analysis that the same-state effectwas weaker during this time period when consider-ing all citations together. If we did have the inventor/examiner distinction data available for the earlier partof our sample, it is conceivable that the state bordereffect might have been stronger.
Directly comparing estimates from nonlinear regres-sions employing different subsamples (inventor versusexaminer citations) relies on our earlier observa-tion that these estimates have a natural interpreta-tion in percentage terms because citations are rareevents. While intuitive, this approach leaves two openquestions. First, given the different control groupsfor inventor and examiner subsamples, this directcomparison could be problematic. Second, it is notstraightforward to test hypotheses regarding statisticaldistinguishability of estimates across different mod-els. To address these concerns, we pool the two sub-samples and run the analysis as a single (weighted)multinomial regression for the three mutually exclu-sive and exhaustive outcomes possible for any pair ofrandom patents: inventor citation, examiner citation, andno citation.
The first set of multinomial logit analyses, reportedin columns (7)–(9), take no citation as the omitted (ref-erence) category. The findings seem qualitatively verysimilar to those from separate logistic analyses forthe two subsamples described above. In particular,most of the distinction between the coefficients onln4distance+15 between the inventor citation and exam-iner citation outcomes again disappears in going fromcolumn (7) to column (8). In contrast, the coefficientsfor same country and same state remain much strongerfor the inventor citation outcome than for the examinercitation outcome even in columns (8) or (9). The onlydistinction from before is that even the same cbsa effectnow seems significantly stronger for the inventor cita-tion case than for the examiner citation case, althoughthe magnitude of the same cbsa difference is still some-what smaller than that for the same state effect andmuch smaller than for the same country effect in thepreferred model (9).
To formally test hypotheses comparing examinercitation estimates with the inventor citation estimates,columns (10)–(12) replicate the same multinomial
INFORMS
holds
copyrightto
this
article
and
distrib
uted
this
copy
asa
courtesy
tothe
author(s).
Add
ition
alinform
ation,
includ
ingrig
htsan
dpe
rmission
policies,
isav
ailableat
http://journa
ls.in
form
s.org/.
Singh and Marx: Geographic Constraints on Knowledge Spillovers2074 Management Science 59(9), pp. 2056–2078, © 2013 INFORMS
Tabl
e7
Inve
ntor
-Add
edvs
.Exa
min
er-A
dded
Cita
tions
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
Inve
ntor
Inve
ntor
Inve
ntor
Exam
iner
Exam
iner
Exam
iner
Full
sam
ple
Full
sam
ple
Full
sam
ple
Full
sam
ple
Full
sam
ple
Full
sam
ple
sam
ple
sam
ple
sam
ple
sam
ple
sam
ple
sam
ple
(mul
tinom
ial
(mul
tinom
ial
(mul
tinom
ial
(mul
tinom
ial
(mul
tinom
ial
(mul
tinom
ial
(logi
t)(lo
git)
(logi
t)(lo
git)
(logi
t)(lo
git)
logi
t)lo
git)
logi
t)lo
git)
logi
t)lo
git)
Inve
ntor
cita
tion
samecoun
try
1022
3∗∗∗
0075
3∗∗∗
1022
2∗∗∗
0075
4∗∗∗
1024
6∗∗∗
0080
9∗∗∗
4000
155
4000
215
4000
145
4000
205
4000
135
4000
205
samestate
0008
6∗∗∗
0024
6∗∗∗
0008
3∗∗∗
0024
3∗∗∗
0008
0∗∗∗
0012
5∗∗∗
4000
285
4000
285
4000
265
4000
265
4000
255
4000
285
samecbsa
0041
7∗∗∗
0033
4∗∗∗
0041
1∗∗∗
0032
8∗∗∗
0009
5∗∗∗
0008
9∗∗
4000
355
4000
415
4000
325
4000
375
4000
315
4000
355
ln4distance+
15−
0035
2∗∗∗
−00
166∗
∗∗
−00
353∗
∗∗
−00
168∗
∗∗
−00
189∗
∗∗
−00
028∗
∗∗
4000
045
4000
075
4000
045
4000
075
4000
035
4000
065
Exam
iner
cita
tion
samecoun
try
−00
034
−00
059
−00
024
−00
055∗
∗
4000
275
4000
385
4000
155
4000
235
samestate
−00
015
0012
4∗∗
0000
200
118∗
∗∗
4000
565
4000
535
4000
315
4000
335
samecbsa
0040
8∗∗∗
0032
8∗∗∗
0031
6∗∗∗
0023
9∗∗∗
4000
775
4000
775
4000
445
4000
495
ln4distance+
15−
0017
3∗∗∗
−00
147∗
∗∗
−00
163∗
∗∗
−00
140∗
∗∗
4000
065
4000
135
4000
045
4000
085
Refe
renc
egr
oup:
Noci
tatio
nNo
cita
tion
Noci
tatio
nNo
cita
tion
Noci
tatio
nNo
cita
tion
Noci
tatio
nNo
cita
tion
Noci
tatio
nEx
amin
erEx
amin
erEx
amin
erci
tatio
nci
tatio
nci
tatio
nDi
stan
cein
dica
tors
NoNo
Yes
NoNo
Yes
NoNo
Yes
NoNo
Yes
Cont
rolv
aria
bles
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Lag
fixed
effe
cts
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Year
fixed
effe
cts
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Two-
digi
ttec
hfix
edef
fect
sYe
sYe
sYe
sYe
sYe
sYe
sYe
sYe
sYe
sYe
sYe
sYe
sSt
ate
fixed
effe
cts
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Num
bero
fobs
erva
tions
4,65
1,15
64,
651,
156
4,65
1,15
63,
377,
722
3,37
7,72
23,
377,
722
5,82
8,77
85,
828,
778
5,82
8,77
85,
828,
778
5,82
8,77
85,
828,
778
Wal
dch
i225
4,43
926
2,81
026
7,88
623
0,04
023
2,45
723
3,93
646
6,76
550
7,03
951
1,03
846
6,76
550
7,03
951
1,03
8De
gree
sof
freed
om66
6988
6669
8813
213
817
613
213
817
6
Notes.
Alln
otes
from
Tabl
e5
appl
yhe
re,e
xcep
ttha
treg
ress
ion
coef
ficie
nts
fort
hedi
stan
cein
dica
tors
and
cont
rolv
aria
bles
are
nots
how
n.Th
efir
stsi
xco
lum
nsem
ploy
wei
ghte
dlo
gist
icre
gres
sion
sas
befo
re,b
utw
ithon
lyin
vent
or-a
dded
cita
tions
and
corr
espo
ndin
gco
ntro
lsin
clud
edin
colu
mns
(1)–
(3)a
ndon
lyex
amin
er-a
dded
cita
tions
and
corr
espo
ndin
gco
ntro
lsin
clud
edin
colu
mns
(4)–
(6).
The
last
six
colu
mns
empl
oyw
eigh
ted
mul
tinom
iall
ogis
ticre
gres
sion
sba
sed
onth
eco
mbi
ned
sam
ple,
with
the
regr
essi
onsp
ecifi
catio
nsus
edfo
rcol
umns
(7)–
(9)d
iffer
ing
from
thos
ein
colu
mns
(10)
–(12
)onl
yin
the
refe
renc
eca
tego
ryus
ed(a
sin
dica
ted)
.Thi
sta
ble
empl
oys
only
data
from
citin
gye
ar20
01on
war
dbe
caus
ein
vent
orve
rsus
exam
iner
dist
inct
ion
isno
tava
ilabl
efo
rear
liery
ears
.Giv
enth
ere
sulti
ngci
tatio
nw
indo
wof
atm
ost1
0ye
ars,
all
cite
dpa
tent
sor
igin
atin
gpr
e-19
91ge
tdro
pped
.∗∗p<
0005
;∗∗∗p<
0001
.
INFORMS
holds
copyrightto
this
article
and
distrib
uted
this
copy
asa
courtesy
tothe
author(s).
Add
ition
alinform
ation,
includ
ingrig
htsan
dpe
rmission
policies,
isav
ailableat
http://journa
ls.in
form
s.org/.
Singh and Marx: Geographic Constraints on Knowledge SpilloversManagement Science 59(9), pp. 2056–2078, © 2013 INFORMS 2075
logit specification as in columns (7)–(9) after nowtaking examiner citation as the omitted category. Thisobviates the need to compare coefficient values acrossinventor and examiner citations in specification (9),instead allowing direct inspection of the equivalentcolumn (12) coefficient significance as a formal testof the geography-related effects for political bor-ders after accounting for spatial proximity. All threeeffects—same country, same state, and same cbsa—arefound to be statistically significant in column (12).Additional statistical tests reveal that the same countryeffect is indeed significantly larger than the same stateand same cbsa effects, although the latter two are sta-tistically indistinguishable from each other. Overall,our main qualitative finding—that there are indepen-dent border effects even after geographic proximity isaccounted for—therefore continues to hold even witha careful inventor versus examiner citation distinc-tion. We can be quite confident in concluding that theborder effect finding is indeed quite robust and notjust an artifact of either the geographic proximity ofinventors not being accounted for or there being mea-surement issues related to geographic distribution oftechnological activity.
5. Discussion and ConclusionThe contribution of this study is that it employs anovel regression framework based on choice-basedsampling to simultaneously consider the impact of dif-ferent geopolitical levels to help disentangle bordereffects from geographic proximity effects. In additionto accounting for technological relatedness betweenthe citing and cited patents at multiple levels of gran-ularity, we employ an identification approach inspiredby Thompson (2006) to address concerns about unob-served aspects of technological relatedness. A robustfinding of our study is that, on average, country andstate borders serve as constraints on knowledge dif-fusion even after accounting for geographic proximity inthe form of metropolitan collocation and geographicdistance. We document that the findings are robustto examining only near-border samples and also notdriven by specific states or sectors. In fact, applicationof the alternate identification strategy using the inven-tor/examiner distinction in citations only strength-ens this finding regarding an independent effect ofborders.
The finding that national borders have a strongeffect might not be too surprising. The literaturealready suggests several border-related variables thatfuture research could consider for digging deeper,such as linguistic, cultural, political, and economicdifferences between countries. Indeed, in an analysisnot reported here, we found knowledge flows fromthe United States to other English-speaking countriesto be stronger even after accounting for geographicdistance. A more general treatment of this issue in
the form of gravity-type models employed in inter-national economics would, however, require a samplewhere not just the citing but also the cited patents aredrawn from multiple countries.
One might still worry whether country-level resultsare driven by patents originating in different coun-tries systematically differing in their propensity tocite USPTO patents for reasons unrelated to trueknowledge flows. However, previous work such asJaffe and Trajtenberg (2002, Chap. 7) and Singh(2007)—although not examining multiple geographiclevels—has used USPTO data to report country-levellocalization effects of magnitude comparable to theUnited States in a number of other countries as well,suggesting that biases arising from systematic differ-ences in propensity to cite are unlikely to be too large.Our additional finding that the country-level localiza-tion effect is driven almost entirely by citations thatinventors themselves add also points to systematicdifferences in propensity to cite not being evident, atleast in citations that USPTO examiners add later.
What is most surprising about our country-levelfinding is that it has only grown stronger over aperiod that has seen the rise of information technol-ogy in general and the Internet in particular. We takethis as evidence that U.S. inventors seem to be dispro-portionately relying on knowledge generated withinthe United States even as the fraction of patents orig-inating overseas has grown. However, this findingcould also at least partly be an artifact of patent data.For example, in the absence of inventor/examinercitation distinction for our full sample period, we can-not rule out the possibility that the United States isbecoming more specialized in a way not captured bythe formal technological classification system. If thiswere true, we might still observe U.S. patents asincreasingly citing other domestic patents in a waynot fully reflecting true knowledge diffusion trendsassociated with national borders. Although we alsocannot rule out a possibility that part of the uptickmight be caused by some idiosyncrasy in how USPTOfunctioning has evolved, just the fact that this effecthas not declined over time seems remarkable.
Turning to the counterintuitive finding of an inde-pendent state border effect, it is worth noting that afew studies (e.g., Holmes 1998) have found state-leveleffects in other contexts. Belenzon and Schankerman(2013) even document state-level localization specif-ically for knowledge diffusion, though they exam-ine only patents assigned to universities and henceargue that policies promoting within-state knowl-edge diffusion from state-funded public universitiesmight be a driver of their finding. Our study revealsthat the state border effect is more general, applyingeven to knowledge arising in private companies.Further insight into the puzzling state border effect
INFORMS
holds
copyrightto
this
article
and
distrib
uted
this
copy
asa
courtesy
tothe
author(s).
Add
ition
alinform
ation,
includ
ingrig
htsan
dpe
rmission
policies,
isav
ailableat
http://journa
ls.in
form
s.org/.
Singh and Marx: Geographic Constraints on Knowledge Spillovers2076 Management Science 59(9), pp. 2056–2078, © 2013 INFORMS
will require a comprehensive exploration of consid-erations such as government support for research,spending on higher education, and policies affectinginterfirm mobility of personnel (Marx et al. 2009).Another fruitful research direction might be exploringdiffusion of knowledge through localized networksof individuals (Singh 2005, Singh and Agrawal 2011),with a focus on differences in the nature of formal andinformal networks operating at different geographiclevels and how various networks might have evolvedover time in a way that can explain the state-levellocalization effect (but not the country-level effect) ofdeclining over time. Linking the existence and evo-lution of such networks to underlying institutionaldifferences across regions would then be a naturalnext step.
Further exploration of mechanisms like institu-tional factors and policies seems promising for futureresearch, but we cannot rule out that at least someof the effects we find might not be robust to usingalternate research designs. At a minimum, however,our study is an initial inquiry into border-related dif-fusion effects for flow of ideas, paralleling analogousstudies that disentangle border effects and spatialproximity effects for flow of goods in cross-regionaltrade (McCallum 1995; Wolf 2000; Anderson andWincoop 2003; Hillberry and Hummels 2003, 2008).Further progress toward unpacking the geography ofknowledge spillovers would also help refine exist-ing theoretical models of innovation, entrepreneur-ship, and growth, ultimately leading to more effectiveinnovation-related policies.
AcknowledgmentsThe authors thank INSEAD and the MIT Sloan Schoolof Management for funding this research. They are grate-ful to Ajay Agrawal, Paul Almeida, James Costantini, IainCockburn, Pushan Dutt, Lee Fleming, Jeff Furman, JoshLerner, Ilian Mihov, Peter Thompson, Brian Silverman, andOlav Sorenson for feedback. The authors also thank sem-inar audiences at Boston University, INSEAD, the Lon-don Business School, the Singapore Management Univer-sity, and the University of California (Berkeley), as wellas participants at the Academy of Management Meetings(2010 and 2012), the National University of Singapore Con-ference on Research in Innovation and Entrepreneurship(2010), the Asia-Pacific Innovation Conference (2011), theNational Bureau of Economic Research Productivity Lunch(2012), and the Georgia Tech Roundtable for EngineeringEntrepreneurship Research (2012) for comments. Any errorsremain the authors’ own.
Appendix. Details of Sample Constructionand Weights Calculation
A.1. Basic Choice-Based SamplingChoice-based sampling involves drawing a fraction (�5 ofthe “ones” and a smaller fraction (�5 of “zeroes” from the
population. The probability of a citation conditional on a dyadbeing in the sample follows from Bayes’ rule:
å′
i =�åi
�åi +�41 −åi5=
�
� +�e−�Xi=
11 + e−4ln4�/�5+�Xi5
0
So the usual logistic estimation would lead to biasedresults (Greene 2003). Because the functional form is stilllogistic, one way to correct the logit estimates is to sub-tract ln4�/�5 from the constant term. However, noting thatsuch a correction is overly sensitive to the assumption of thelogistic functional form being completely accurate, Manskiand Lerman (1977) suggest instead the WESML estimatorobtained by maximizing the following weighted “pseudo-likelihood” function:
lnLw =1�
∑
8yi=19
ln4åi5+1�
∑
8yi=09
ln41 −åi5
= −
n∑
i=1
wi ln41 + e41−2yi5xi�51
where wi = 41/�5yi + 41/�541 − yi5. As Amemiya (1985,§9.5.2) demonstrates, consistency of WESML comes fromthe expected value of the weighted log likelihood turn-ing out to be the same (except for a scaling factor) asthe expected log likelihood for the same sample result-ing through random (exogenous) sampling. WESML can beimplemented using a logistic approach by “simulating” anexogenous sample by weighting each observation by thenumber of elements it represents from the population (i.e.,by the reciprocal of the ex ante probability of inclusion ofan observation in the sample). An appropriate estimator ofthe asymptotic covariance matrix is White’s robust “sand-wich” estimator. Strictly speaking, WESML is not statisti-cally “efficient” (Imbens and Lancaster 1996). Nevertheless,efficiency issue can be mitigated by employing sufficientlylarge samples.
A.2. Combining Choice-Based Sampling withStratification on Explanatory Variables
In basic choice-based sampling, the “zeroes” are all drawnfrom the y = 0 population with a uniform sampling rate (�).This approach can be generalized to obtain additional bene-fits from stratification on key explanatory variables—that is,allowing “�” to vary across different y = 0 subpopulations(Manski and McFadden 1981; Amemiya 1985, Chap. 9). Letus define z as a label for different strata that takes values1121 0 0 0 1 T , and note that
Pr4z= zi and y = yj � x = xi5
= Pr4z= zi � x = xi5Pr4y = yj � z= zi and x = xi5
= Pr4z= zi � x = xi5Pr4y = yj � x = xi50
The second equality comes by assuming that the vector xincludes all information about z that affects outcome y—that is, x is a sufficient statistic for z. (In our settings,this means our controls sufficiently capture technology- andyear-related effects on citation likelihood.) Defining the logis-tic outcome as v = 4z = zi and y = yi5 rather than just y,
INFORMS
holds
copyrightto
this
article
and
distrib
uted
this
copy
asa
courtesy
tothe
author(s).
Add
ition
alinform
ation,
includ
ingrig
htsan
dpe
rmission
policies,
isav
ailableat
http://journa
ls.in
form
s.org/.
Singh and Marx: Geographic Constraints on Knowledge SpilloversManagement Science 59(9), pp. 2056–2078, © 2013 INFORMS 2077
the log-likelihood function with exogenous (random) samplewould be
lnL =
n∑
i=1
ln6Pr4z= zi and y = yi � xi57
=
n∑
i=1
{
yi ln6Pr4z= zi � xi5å4xi�57
+ 41 − yi5 ln6Pr4z= zi � xi541 −å4xi�557}
0
This forms the basis for deriving the pseudolikeli-hood function for choice-based sampling with stratification.As per the WESML method, each log-likelihood functionterm needs to be weighted by the inverse of the ex anteprobability of that observation being included in the sam-ple. These weights can still be computed as long as thesample as well as the population counts for each stratumare known. Once we have the weights wtj correspondingto z = t 4t = 1121 0 0 0 1 T 5 and y = j 4j = 0115, the requiredpseudolikelihood function is given by
lnLw =
n∑
i=1
{
yiwzi1 ln6Pr4z= zi � xi5å4xi�57
+ 41 − yi5wzi0 ln6Pr4z= zi � xi541 −å4xi�557}
= C −
n∑
i=1
wi ln41 + e41−2yi5X�51
where wi = yiwzi1 + 41 − yi5wzi0 and C =∑n
i=1 wi ·
ln6Pr4z= zi � xi570Since C is independent of �, it can be ignored. Thus, a
weighted logistic estimation can again be used, with theweights given by wi. (Note that the weights now dependnot just on y but also on the stratum zi.)
A.3. Applying WESML to (Extended) Matched SamplesThis approach can be extended to matched samples such asthe one we have constructed following Jaffe et al. (1993). Fora given cited patent, because the matched patent is drawnrandomly from the year and technology class of an actualciting patent, we can interpret each {citing year, citing class}combination as a different stratum and calculate the impliedsampling rates based on the sample and population countsfor each stratum to determine appropriate weights.
However, the matched sample is not representative of thepopulation because the {citing year, citing class} combina-tions for which no actual citations (“ones”) exist are ignoredfrom the point of view of the potential citations (“zeroes”).To ensure the strata are mutually exclusive and exhaustivewhile still keeping their number manageable, we create (foreach cited patent) a new observation by randomly selectingone potentially citing patent for each year (in the 10-yearwindow) belonging to one of the technology classes fromwhich no citation occurs (in that year). The weight for eachof these is computed using the implied sampling rates forrandom draws from these subpopulations.
An example should clarify the sample construction. Oneof our cited patents is 4,205,881, applied for in 1980 andin tech class 299. It receives two citations during 10 years:from 4,441,761 {year 1982, class 299} and 953,915 {1989, 299}.Therefore, patent pairs (4,205,881, 4,441,761) and (4,205,881,
4,953,915) represent actual citations (“ones”) included witha weight of 1 (as we include all citations, i.e., set � = 1).In JTH-based matching, citing patent 4,441,761 was matchedto control patent 4,402,550 {year 1982, class 299}. In year1982 and class 299, there were 92 potentially matchingpatents from which patent 4,402,550 was chosen througha random draw. So the observation (4,205,881, 4,402,550)was included as a control pair (“zero”) with a weightof 92. Similarly, citing patent 4,953,915 mentioned abovewas matched to control patent 4,974,907 {1989, 299}. In year1989 and class 299, there were 59 potential matches fromwhich 4,974,907 was chosen. So the observation (4,205,881,4,974,907) was included as a control pair (“zero”) with aweight of 59. Finally, for each year 1981 through 1990, weselected a random potentially citing patent, constrained notto be from technology class 299 for the years 1982 and 1989(as class 299 is already included in finer strata above just forthese two years). The range of weights for these 10 observa-tions ended up being between 61,578 and 99,371, dependingon the number of eligible patents in the given citing year.
ReferencesAgarwal R, Audretsch D, Sarkar MB (2007) The process of cre-
ative construction: Knowledge spillovers, entrepreneurship,and economic growth. Strategic Entrepreneurship J. 1(3–4):263–286.
Alcacer J, Gittelman M (2006) Patent citations as a measure ofknowledge flows: The influence of examiner citations. Rev.Econom. Statist. 88(4):774–779.
Almeida P, Kogut B (1999) Localization of knowledge and themobility of engineers in regional networks. Management Sci.45(7):905–917.
Amemiya T (1985) Advanced Econometrics (Harvard UniversityPress, Cambridge, MA).
Anderson JE, Wincoop EV (2003) Gravity with gravitas: A solutionto the border puzzle. Amer. Econom. Rev. 93(1):170–192.
Arzaghi M, Henderson JV (2008) Networking off Madison Avenue.Rev. Econom. Stud. 75(4):1011–1038.
Audretsch D, Feldman M (1996) R&D spillovers and the geog-raphy of innovation and production. Amer. Econom. Rev.86(3):630–640.
Belenzon S, Schankerman M (2013) Spreading the word: Geog-raphy, policy, and knowledge spillovers. Rev. Econom. Statist.95(3):884–903.
Branstetter LG (2001) Are knowledge spillovers international orintranational in scope? J. Internat. Econom. 53(1):53–79.
Breschi S, Lissoni F (2001) Knowledge spillovers and local inno-vation systems: A critical survey. Indust. Corporate Change10(4):975–1005.
Coe DT, Helpman E, Hoffmaister AW (2009) International R&Dspillovers and institutions. Eur. Econom. Rev. 53(7):723–741.
Duguet E, MacGarvie M (2005) How well do patent citations mea-sure knowledge spillovers? Evidence from French innovationsurveys. Econom. Innovation New Tech. 14(5):375–393.
Ellison G, Glaeser E, Kerr W (2010) What causes industry agglomer-ation? Evidence from coagglomeration patterns. Amer. Econom.Rev. 100(3):1195–1213.
Glaeser E, Kallal H, Scheinkman J, Schleifer A (1992) Growth ofcities. J. Political Econom. 100(6):1126–1152.
Gompers P, Lerner J, Scharfstein D (2005) Entrepreneurial spawn-ing: Public corporations and the genesis of new ventures, 1986to 1999. J. Finance 60(2):517–614.
INFORMS
holds
copyrightto
this
article
and
distrib
uted
this
copy
asa
courtesy
tothe
author(s).
Add
ition
alinform
ation,
includ
ingrig
htsan
dpe
rmission
policies,
isav
ailableat
http://journa
ls.in
form
s.org/.
Singh and Marx: Geographic Constraints on Knowledge Spillovers2078 Management Science 59(9), pp. 2056–2078, © 2013 INFORMS
Greene WH (2003) Econometric Analysis, 5th ed. (Prentice Hall,Upper Saddle River, NJ).
Greene WH (2009) Testing hypotheses about interaction terms innonlinear models. Econom. Lett. 107(2):291–296.
Grossman G, Helpman E (1991) Innovation and Growth in the WorldEconomy (MIT Press, Cambridge, MA).
Henderson R, Jaffe A, Trajtenberg M (2005) Patent citations andthe geography of knowledge spillovers: A reassessment: Com-ment. Amer. Econom. Rev. 95(1):461–464.
Hillberry R, Hummels D (2003) Intranational home bias: Someexplanations. Rev. Econom. Statist. 85(4):1089–1092.
Hillberry R, Hummels D (2008) Trade responses to geographic fric-tions: A decomposition using micro-data. Eur. Econom. Rev.52(3):527–550.
Holmes TJ (1998) The effect of state policies on the location ofmanufacturing: Evidence from state borders. J. Political Econom.106(4):667–705.
Imbens GW, Lancaster T (1996) Efficient estimation and stratifiedsampling. J. Econometrics 74(2):289–318.
Jacobs J (1969) The Economy of Cities (Random House, New York).Jaffe AB (1989) Real effects of academic research. Amer. Econom. Rev.
79(5):957–970.Jaffe AB, Trajtenberg M (2002) Patents, Citations and Innova-
tions: A Window on the Knowledge Economy (MIT Press,Cambridge, MA).
Jaffe AB, Trajtenberg M, Henderson R (1993) Geographic localiza-tion of knowledge spillovers as evidenced by patent citations.Quart. J. Econom. 108(3):577–598.
Keller W (2002) Geographic localization of international technologydiffusion. Amer. Econom. Rev. 92(1):120–142.
Kerr W, Kominers SD (2010) Agglomerative forces and clustershapes. HBS Working Paper 11-061, Harvard Business School,Boston.
Klepper S, Sleeper S (2005) Entry by spin-offs. Management Sci.51(8):1291–1306.
Krugman P (1991) Geography and Trade (Leuven University Press,Leuven, Belgium).
Lai R, D’Amour A, Fleming L (2009) The careers and co-authorshipnetworks of U.S. patent holders since 1975. Working paper,Harvard Institute for Quantitative Social Science, HarvardBusiness School, Boston.
Lampe R (2012) Strategic citation. Rev. Econom. Statist. 94(1):320–333.
Manski CF, Lerman SR (1977) The estimation of choice probabilitiesfrom choice based samples. Econometrica 45(8):1977–1988.
Manski CF, MacFadden D (1981) Alternative estimators and sampledesigns for discrete choice analysis. Manski C, McFadden D,eds. Structural Analysis of Discrete Data with Econometric Appli-cations (MIT Press, Cambridge, MA), 1–50.
Marshall A (1920) Principles of Economics (Macmillan, London).Marx M, Strumsky D, Fleming L (2009) Mobility, skills, and
the Michigan non-compete experiment. Management Sci. 55(6):875–889.
McCallum J (1995) National borders matter: Canada-U.S. regionaltrade patterns. Amer. Econom. Rev. 85(3):615–623.
Peri G (2005) Determinants of knowledge flows and their effect oninnovation. Rev. Econom. Statist. 87(2):308–322.
Romer PM (1990) Endogenous technological change. J. PoliticalEconom. 98(5, Part 2):S71–S102.
Rosenthal S, Strange W (2001) The determinants of agglomeration.J. Urban Econom. 50(2):191–229.
Rosenthal S, Strange W (2003) Geography, industrial organization,and agglomeration. Rev. Econom. Statist. 85(2):377–393.
Rysman M, Simcoe T (2008) Patents and the performance ofvoluntary standard-setting organizations. Management Sci.54(11):1920–1934.
Saxenian AL (1994) Regional Advantage: Culture and Competitionin Silicon Valley and Route 128 (Harvard University Press,Cambridge, MA).
Singh J (2005) Collaborative networks as determinants of knowl-edge diffusion patterns. Management Sci. 51(5):756–770.
Singh J (2007) Asymmetry of knowledge spillovers between MNCsand host country firms. J. Internat. Bus. Stud. 38(5):764–786.
Singh J, Agrawal A (2011) Recruiting for ideas: How firms exploitthe prior inventions of new hires. Management Sci. 57(1):129–150.
Sorenson O, Fleming L (2004) Science and the diffusion of knowl-edge. Res. Policy 33(10):1615–1634.
Thompson P (2006) Patent citations and the geography of knowl-edge spillovers: Evidence from inventor- and examiner-addedcitations. Rev. Econom. Statist. 88(2):383–389.
Thompson P, Fox-Kean M (2005) Patent citations and the geographyof knowledge spillovers: A reassessment. Amer. Econom. Rev.95(1):450–460.
Wolf HC (2000) Intra-national home bias in trade. Rev. Econom.Statist. 82(4):555–563.
INFORMS
holds
copyrightto
this
article
and
distrib
uted
this
copy
asa
courtesy
tothe
author(s).
Add
ition
alinform
ation,
includ
ingrig
htsan
dpe
rmission
policies,
isav
ailableat
http://journa
ls.in
form
s.org/.