Visual Analysis of Venture Similarity in Entrepreneurial...

15
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. IEEE TRANSACTIONS ON ENGINEERING MANAGEMENT 1 Visual Analysis of Venture Similarity in Entrepreneurial Ecosystems Rahul C. Basole , Senior Member, IEEE, Hyunwoo Park, and Raul O. Chao Abstract—Entrepreneurial ecosystems are vital sources of inno- vation and critical engines for economic growth. In this study, we use text-based analysis, network visualizations, and topic modeling of nearly 60 000 venture business descriptions—i.e., how ventures present technologies and products to key stakeholders including customers, employees, and investors—to examine the structure of 35 global entrepreneurial ecosystems. Rather than using predefined industry classifications, we allow the structural con- figurations to emerge endogenously revealing a more variegated perspective of venture strategic positioning in entrepreneurial ecosystems. Our study makes several important contributions. First, by examining strategic positioning statements of geographi- cally defined ventures, we contribute and advance our understand- ing of the geography of innovation and structure of entrepreneurial ecosystems. Our results indicate that there are wide differences in entrepreneurial ecosystem size, structure, composition, and ven- ture strategic positioning. Second, methodologically, we use novel computational approaches and introduce visualization as a pow- erful means to understand entrepreneurial ecosystems. Third, our results show that ventures from widely different industries often use similar position statements, thus highlighting that ecosystems are indeed not just defined by industries, but also strategic posi- tioning. We conclude with theoretical and managerial implications. Index Terms—Cluster analysis, entrepreneurial ecosystems, strategic positioning, text analytics, visualization. I. INTRODUCTION E NTREPRENEURIAL ecosystems have been a topic of great interest to both innovation and technology manage- ment scholars, as well as entrepreneurs, policymakers, and ven- ture capitalists [1]–[4]. This interest is in part fueled by the fact that entrepreneurial ecosystems are not just vital sources of innovation but also critical engines for economic growth [5]. Arguably, the example of the most prominent entrepreneurial Manuscript received June 6, 2017; revised January 1, 2018, April 9, 2018, and June 5, 2018; accepted June 22, 2018. This research was supported in part by the Tennenbaum Institute, Georgia Tech, and by the Batten Institute, University of Virginia. Review of this manuscript was arranged by Department Editor N. Joglekar. (Corresponding author: Rahul C. Basole.) R. C. Basole is with the College of Computing and the Institute for People and Technology, Georgia Institute of Technology, Atlanta, GA 30332 USA (e-mail:, [email protected]). H. Park is with the Fisher College of Business, Ohio State University, Colum- bus, OH 43210 USA (e-mail:, [email protected]). R. O. Chao is with the Darden School of Business, University of Virginia, Charlottesville, VA 22903 USA (e-mail:, [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. This paper has supplemental downloadable multimedia material available at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TEM.2018.2855435 ecosystem is Silicon Valley. It is often touted to be the “gold standard” ecosystem, creating many game-changing giants such as Intel, Google, Apple, and Facebook [6] and inspiring the emergence of many other ecosystems across the globe [7]. An entrepreneurial ecosystem can be defined as a set of inter- dependent organizations that engage in productive entrepreneur- ship within a geographically defined territory [8], [9]. Pro- ductive entrepreneurship refers to the outcome of ambitious entrepreneurs who pursue opportunities to create value-added products and services [10]. The ecosystem metaphor stresses that this value creation is generated not by a single organiza- tion but rather within a network of symbiotically interconnected organizations [11]. The spatial context suggests that these en- trepreneurial activities typically occur in close geographic prox- imity, creating valuable agglomeration and network spillover effects [12]. Our understanding of entrepreneurial ecosystems is rooted in two well-established streams of research: 1) the regional de- velopment literature; and 2) the strategy literature [13]. The regional development literature has focused on explaining the differential socioeconomic performance of regions, examining concepts like industrial districts, regional industrial clusters, ag- glomerations, and regional systems of innovation [5], [14], [15]. The strategy literature, on the other hand, has focused on busi- ness ecosystems as a form of economic coordination in which firms collaborate and compete with each other to create an ap- propriate value [11], [16], [17]. While much has been learned about entrepreneurial ecosystems, research to date has predomi- nantly focused on documenting the presence (or absence) of par- ticular ecosystem components, such as access to venture capital funding, proximity to educational institutions, or agglomeration of industries [18], [19]. Existing work has not yet fully exploited insights into the structural nature of entrepreneurial ecosystems and the strategic positioning ventures assume [20], potentially missing important nuances of value configurations that may ex- ist. Prior studies have indeed shown that entrepreneurial ecosys- tems are actually composed of a diverse set of industries, each providing unique complementary and competing value offerings [21], [22]. However, an industry-centric lens simplifies the char- acterization of entrepreneurial ecosystems greatly, often leading to geographic/industry stereotyping—such as biotech in Boston, fintech in London, or mobile in Singapore [23], [24]. In this study, we aim to extend our understanding of en- trepreneurial ecosystems beyond this industry-centric lens by providing a more granular, structural view of ecosystem value configuration using a visual analytic approach. In contrast to 0018-9391 © 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications standards/publications/rights/index.html for more information.

Transcript of Visual Analysis of Venture Similarity in Entrepreneurial...

Page 1: Visual Analysis of Venture Similarity in Entrepreneurial ...entsci.gatech.edu/resources/basoleparkchao-2018-venturesimilarity.… · Entrepreneurial Ecosystems Rahul C. Basole, Senior

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

IEEE TRANSACTIONS ON ENGINEERING MANAGEMENT 1

Visual Analysis of Venture Similarity inEntrepreneurial Ecosystems

Rahul C. Basole , Senior Member, IEEE, Hyunwoo Park, and Raul O. Chao

Abstract—Entrepreneurial ecosystems are vital sources of inno-vation and critical engines for economic growth. In this study, weuse text-based analysis, network visualizations, and topic modelingof nearly 60 000 venture business descriptions—i.e., how venturespresent technologies and products to key stakeholders includingcustomers, employees, and investors—to examine the structureof 35 global entrepreneurial ecosystems. Rather than usingpredefined industry classifications, we allow the structural con-figurations to emerge endogenously revealing a more variegatedperspective of venture strategic positioning in entrepreneurialecosystems. Our study makes several important contributions.First, by examining strategic positioning statements of geographi-cally defined ventures, we contribute and advance our understand-ing of the geography of innovation and structure of entrepreneurialecosystems. Our results indicate that there are wide differences inentrepreneurial ecosystem size, structure, composition, and ven-ture strategic positioning. Second, methodologically, we use novelcomputational approaches and introduce visualization as a pow-erful means to understand entrepreneurial ecosystems. Third, ourresults show that ventures from widely different industries oftenuse similar position statements, thus highlighting that ecosystemsare indeed not just defined by industries, but also strategic posi-tioning. We conclude with theoretical and managerial implications.

Index Terms—Cluster analysis, entrepreneurial ecosystems,strategic positioning, text analytics, visualization.

I. INTRODUCTION

ENTREPRENEURIAL ecosystems have been a topic ofgreat interest to both innovation and technology manage-

ment scholars, as well as entrepreneurs, policymakers, and ven-ture capitalists [1]–[4]. This interest is in part fueled by thefact that entrepreneurial ecosystems are not just vital sourcesof innovation but also critical engines for economic growth [5].Arguably, the example of the most prominent entrepreneurial

Manuscript received June 6, 2017; revised January 1, 2018, April 9, 2018, andJune 5, 2018; accepted June 22, 2018. This research was supported in part bythe Tennenbaum Institute, Georgia Tech, and by the Batten Institute, Universityof Virginia. Review of this manuscript was arranged by Department Editor N.Joglekar. (Corresponding author: Rahul C. Basole.)

R. C. Basole is with the College of Computing and the Institute for People andTechnology, Georgia Institute of Technology, Atlanta, GA 30332 USA (e-mail:,[email protected]).

H. Park is with the Fisher College of Business, Ohio State University, Colum-bus, OH 43210 USA (e-mail:,[email protected]).

R. O. Chao is with the Darden School of Business, University of Virginia,Charlottesville, VA 22903 USA (e-mail:,[email protected]).

Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

This paper has supplemental downloadable multimedia material available athttp://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TEM.2018.2855435

ecosystem is Silicon Valley. It is often touted to be the “goldstandard” ecosystem, creating many game-changing giants suchas Intel, Google, Apple, and Facebook [6] and inspiring theemergence of many other ecosystems across the globe [7].

An entrepreneurial ecosystem can be defined as a set of inter-dependent organizations that engage in productive entrepreneur-ship within a geographically defined territory [8], [9]. Pro-ductive entrepreneurship refers to the outcome of ambitiousentrepreneurs who pursue opportunities to create value-addedproducts and services [10]. The ecosystem metaphor stressesthat this value creation is generated not by a single organiza-tion but rather within a network of symbiotically interconnectedorganizations [11]. The spatial context suggests that these en-trepreneurial activities typically occur in close geographic prox-imity, creating valuable agglomeration and network spillovereffects [12].

Our understanding of entrepreneurial ecosystems is rootedin two well-established streams of research: 1) the regional de-velopment literature; and 2) the strategy literature [13]. Theregional development literature has focused on explaining thedifferential socioeconomic performance of regions, examiningconcepts like industrial districts, regional industrial clusters, ag-glomerations, and regional systems of innovation [5], [14], [15].The strategy literature, on the other hand, has focused on busi-ness ecosystems as a form of economic coordination in whichfirms collaborate and compete with each other to create an ap-propriate value [11], [16], [17]. While much has been learnedabout entrepreneurial ecosystems, research to date has predomi-nantly focused on documenting the presence (or absence) of par-ticular ecosystem components, such as access to venture capitalfunding, proximity to educational institutions, or agglomerationof industries [18], [19]. Existing work has not yet fully exploitedinsights into the structural nature of entrepreneurial ecosystemsand the strategic positioning ventures assume [20], potentiallymissing important nuances of value configurations that may ex-ist. Prior studies have indeed shown that entrepreneurial ecosys-tems are actually composed of a diverse set of industries, eachproviding unique complementary and competing value offerings[21], [22]. However, an industry-centric lens simplifies the char-acterization of entrepreneurial ecosystems greatly, often leadingto geographic/industry stereotyping—such as biotech in Boston,fintech in London, or mobile in Singapore [23], [24].

In this study, we aim to extend our understanding of en-trepreneurial ecosystems beyond this industry-centric lens byproviding a more granular, structural view of ecosystem valueconfiguration using a visual analytic approach. In contrast to

0018-9391 © 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications standards/publications/rights/index.html for more information.

Page 2: Visual Analysis of Venture Similarity in Entrepreneurial ...entsci.gatech.edu/resources/basoleparkchao-2018-venturesimilarity.… · Entrepreneurial Ecosystems Rahul C. Basole, Senior

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

2 IEEE TRANSACTIONS ON ENGINEERING MANAGEMENT

the prior work, we allow structural configurations to emergeendogenously from the business descriptions ventures use todescribe their strategic positioning. The clusters that we un-cover, thus, do not map to existing industry classifications,but provide a more nuanced perspective of entrepreneurialecosystem activities. By examining the topological structure,we address the recent call to examine the relational organi-zation of entrepreneurial ecosystems [25] and consider struc-ture as an important construct of ecosystems [20]. We pursuethis research objective by employing novel computational tech-niques to analyze publicly available unstructured data of nearly60 000 ventures in 35 distinct global entrepreneurship ecosys-tems. Specifically, we derive ecosystem relationships betweenentrepreneurial ventures based on the similarity of textual con-tent. Rather than just using a network analytic approach, we usea complementary visual lens to graphically depict the structureof entrepreneurial ecosystems, enabling us to detect structuralpatterns, clusters, and outliers [26]. In doing so, our visual analy-sis allows us to draw conclusions regarding differences betweenventures in an ecosystem as well as differences between theecosystems themselves.

Our analysis indicates that the heterogeneity of venture posi-tions within an ecosystem varies considerably depending on theentrepreneurial ecosystem characteristics. In fact, for venturesthat exist in larger ecosystems, similarity in business descrip-tions converges to a global mean. Additionally, we find that en-trepreneurial ecosystems located in emerging economies tendto be smaller and they tend to have greater dispersion of similar-ity in terms of business descriptions. In other words, emergingentrepreneurial ecosystems tend to have ventures that positionthemselves either very similarly or very differentiated. The keyimplication of our study is that ventures are constantly balanc-ing legitimacy and differentiation and that this balance becomesmore salient as an entrepreneurial ecosystem grows in size.

Our study makes several contributions to the technologymanagement, entrepreneurship, and strategy literature. First, weprovide a robust and generalizable technique to describe globalentrepreneurial ecosystems with a focus on both traditionalmeasures such as size and industry concentration, as well asmore novel measures that capture strategic position and organi-zational identity [27], [28] relative to other ventures. Positioningand organizational identity are critical as they embody how aventure identifies itself to stakeholders including customers,employees, and investors. Second, by examining strategicpositioning statements of geographically defined ventures, wecontribute and advance our understanding of the geography ofinnovation, structure of entrepreneurial ecosystems, and widelydebated tradeoff between conformity and differentiation [29],[30]. Third, methodologically we use novel computationalapproaches and introduce visualization as a powerful means tounderstand entrepreneurial ecosystems. In doing so, we addressthe call of bringing an important new methodology to the fieldof innovation and entrepreneurship [31]. Our approach can beeasily applied to other business ecosystem contexts, and thus,serves as an important foundation for subsequent research.Finally, our results show that ventures from widely differentindustries actually often use similar business descriptions, thus,

highlighting that entrepreneurial ecosystems are indeed not justdefined by industries but also strategic positioning.

The remainder of the paper is structured as follows. Wefirst provide a brief review of related work on entrepreneurialecosystems and strategic positioning. Next, we describe ourmethodology, including our extensive data extraction and cu-ration process, our data mining and analysis approach, and vi-sualization of our global entrepreneurial ecosystems. We thenpresent and discuss our results. We conclude with implications,limitations, and future research.

II. RELATED WORK

Our study of venture similarity in entrepreneurial ecosystemsdemands a review of related work on industrial clusters, regionalinnovation systems, and business ecosystems as well as strategicpositioning of ventures.

A. From Clusters and Regional Innovation Systems toEntrepreneurial Ecosystems

Understanding the structure and dynamics of spatial agglom-eration of industrial and economic activities has been a topic ofgreat interest to scholars for many decades. Commencing withMarshall’s pioneering analysis of industrial concentrations inVictorian England leading to “agglomeration economies” [32],subsequent studies have found that firms across many advan-tages from spatial co-location with similar firms, in particularthe development of specialized pools of human capital, specialistsuppliers, and specialist infrastructure. The Marshallian view isoften contrasted with the Jacobian externality perspective [33],which argued that spatial agglomerations of unrelated industriescan lead to knowledge spillovers, in which ideas from one in-dustry can be applied in another. However, there continues to beinconclusive evidence whether the Marshallian specialization orJacobian diversification most favors regional innovation [34].

Porter’s seminal work on geographical clusters [35] arguedthat firms benefited from both ideas, namely local sectoral spe-cialization and knowledge spillovers. However, Saxenian’s [36]groundbreaking study of Route 128 and Silicon Valley showedthat different clusters in fact operate in fundamentally differentways, underlining that potentially distinct structure and dynam-ics are at play. Advocated by Porter and colleagues, the “cluster”idea continued to gain traction in the regional development andeconomic geography literature, often used as the “holy grail”policy concept [37]–[39].

In parallel, innovation scholars developed the concept of inno-vation systems to understand the systemic processes that char-acterize localized knowledge creation and transfer [40], [41].A key focus of this approach is its emphasis on the relationalcharacteristics between different stakeholders and how it influ-ences the innovation process [42], [43]. The applicability of thislens led to a significant growth in economic geography studiesexamining a variety of regional innovation systems.

The concept of entrepreneurial ecosystems is a direct resultof the prior work on geographical clusters and regional inno-vation systems. Similar to the other two fields, firms play acentral role in entrepreneurial ecosystems. Drawing on the bi-

Page 3: Visual Analysis of Venture Similarity in Entrepreneurial ...entsci.gatech.edu/resources/basoleparkchao-2018-venturesimilarity.… · Entrepreneurial Ecosystems Rahul C. Basole, Senior

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

BASOLE et al.: VISUAL ANALYSIS OF VENTURE SIMILARITY IN ENTREPRENEURIAL ECOSYSTEMS 3

ological metaphor [16], entrepreneurial ecosystems are shapedby complex interactions and interdependencies between enti-ties and are constantly evolving. While introduced by Mooreand predominantly used in the strategy domain, the concept ofentrepreneurial ecosystems was popularized in the entrepreneur-ship literature by the pioneering work by Isenberg [44]. Similarto the study of geographical clusters and innovation systems,an entrepreneurial ecosystem adapts the idea of spatial bound-edness of economic activities. However, in contrast to the priorwork, the emphasis in existing entrepreneurial ecosystems is in-creasingly on compositional and relational aspects. Despite thisinterest, the literature yet lacks a comprehensive analysis of thestructural underpinnings of entrepreneurial ecosystems [25].

B. Strategic Positioning

In order to succeed, ventures must develop appropriately po-sitioned business models. A successful model captures how aventure will make money and sustain profits, how it organizesitself, what core value propositions it provides, and how it willalign itself in the market relative to other stakeholders [45].Strategic positioning is, thus, a central idea in the business modelconstruct.

The literature on strategic positioning is primarily rooted intwo domains: strategic management and marketing. The major-ity of studies focus on developing a theoretical framework orperforming cluster analysis to describe the strategic positioningoptions available to firms. Almost without exception the studiesdefine strategic positions a priori, forcing an exogenous struc-ture upon the companies being analyzed. An inherent problemwith this approach is that conventional wisdom of the domainsthemselves (i.e., strategy and marketing) tend to dominate thestructures that emerge based on the view of the researcher. Inaddition, the few empirical studies that exist employ relativelysmall-scale proprietary datasets or individual case studies. Thispractice limits the possibility of broader empirical examination,both on a large scale within an ecosystem, as well as betweenecosystems themselves.

Early work on strategic positioning began with the work ofMiles et al. [46], who studied alternative ways that organiza-tions define their product/market domains. At the same time,Mintzberg [47] put forth the concept of “intended strategies,”which can be interpreted as positioning statements in that theyare plans made in advance of specific decisions. Porter [48] wasthe first to define clear strategic positioning choices when hediscussed firm strategies as either narrow or broad in scope andlow-cost or differentiated in terms of core capability. The resultwas strategic positioning based on three possible choices: overallcost leadership, cost focus, and differentiation. Markides [49],[50] adopted a slightly different perspective in terms of strate-gic positioning defining answers to questions about “who, what,how” the firm operates. All of the studies mentioned above areframeworks created based on theory or a limited set of individualcase analyses. Deephouse [51] was among the first to developempirical tests of strategic positioning. He studied the balancebetween differentiation, which reduces competition versus con-formity, which provides legitimacy. Our study generalizes and

expands this work as we develop a “similarity metric” betweenfirms based on their strategic position.

As a consequence of the early work in strategy, market-ing scholars took up the issue of competitive position with aparticular focus on market positioning strategies. Hooley et al.[52], described how competitive positioning links the internalcapabilities of the firm to external market segments. Blanksonand Kalafatis [53], [54] created a typology of marketing posi-tioning strategies. Similar to the early work in strategy, thesepapers are frameworks that presuppose a specific set of strategicpositions. Later work in marketing conducted empirical analy-sis to study positioning. Kalafatis et al. [55], analyzed the ele-ments that make up a strategic marketing position. Hooley andGreenley [56] analyzed when positioning can deliver a sustainedcompetitive advantage. Both of these articles (and others thatfollowed) used cluster analysis with primary survey data. Theproblem with this approach is that it forces structure upon thefirms in question, rather than allowing that structure to emergeendogenously.

There are two important limitations with the existing work onstrategic positioning. The first is that the frameworks or clustersthat define groups are chosen a priori by the researcher. Thisnecessarily forces an exogenous structure on the data. Second,if and when empirical analysis is brought to bear, it is done withrelatively small proprietary datasets or individual case studies.This limits the generalization and breadth of the empirical re-sults. Our computational approach solves both of these problemsas we do not presuppose or force any structure on the data; ratherwe allow groups and strategic positioning statements to emergeendogenously based on a computational analysis of the similar-ity between firms in an ecosystem. In addition, we do this with adataset of nearly 60 000 ventures across 35 ecosystems aroundthe world. The scale of this analysis provides rich opportunityfor understanding strategic positioning within an ecosystem aswell as, and perhaps more importantly, across ecosystems.

III. METHODOLOGY

Our study uses a three-phase approach (see Fig. 1) for analyz-ing and visualizing strategic positioning of ventures in global en-trepreneurial ecosystems, consisting of data extraction and cura-tion, data mining and analysis, and visualization. Our approachbuilds on the well-established data-to-knowledge “human-in-the-loop” model by carefully balancing data management, visualencodings, and sensemaking [57]. We elaborate on each of thesesteps, as well as potential alternatives, in the sections that follow.

A. Data Extraction and Curation

Our study uses Crunchbase,1 a wiki-style curated open sourcedirectory of more than 100 000 global technology companies,people, and investors, as the primary dataset. While other datasources exist (e.g., Thomson VentureXpert, Owler, AngelList,CB Insights), Crunchbase arguably provides the most detailed,up-to-date, and open data on companies, including companydescription, founding year, names of executive teams, funding

1http://www.crunchbase.com

Page 4: Visual Analysis of Venture Similarity in Entrepreneurial ...entsci.gatech.edu/resources/basoleparkchao-2018-venturesimilarity.… · Entrepreneurial Ecosystems Rahul C. Basole, Senior

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

4 IEEE TRANSACTIONS ON ENGINEERING MANAGEMENT

Fig. 1. Methodology.

rounds and amounts, office locations (city, state) and geo-graphic regions, industry segments, and number of employees.Crunchbase is a community-driven dataset; contributors includeexecutives, entrepreneurs, and investors, who all actively con-tribute to company profile pages. While data curation is sociallydriven, the quality and coverage of the data is monitoredand updated continuously through several means.2 First, it isedited and managed by executives and investors associatedwith the venture. Second, machine learning algorithms areused to compare data for accuracy and anomalies against otherpublicly-available information (including corporate websites,analyst reports, and TechCrunch). Lastly, Crunchbase employsa team of global experts and data analysts, who provide manualdata validation and curation.3

We used the Crunchbase application programming interface(API) to extract a complete data dump of 58 880 companiesworldwide.4 It is important to note a few data nuances. First,the data includes a geographic region field, which captures thebroader geographic area rather than traditional city boundaries.For example, San Francisco, San Jose, and Palo Alto, CA, USA,are all part of the San Francisco Bay Area. Second, businesscategories are self-reported and curated using a crowdsourcingapproach. Each company contains multiple business categories.Third, the business description field contains rich textual, un-structured data about the organization.

Given our interest in understanding venture positioning inglobal entrepreneurial ecosystems, our aim was to constrainour analysis to a select set of geographically-defined regionswith particularly active levels of entrepreneurship. We basedour selection on an assessment of well-represented regions inthe Crunchbase dataset as well as most prominent startup regionsidentified in established industry reports (e.g., Compass Report5

and the 2017 Startup Genome Report6). We intentionally did not

2We acknowledge that the Asian entrepreneurial venture market is potentiallyunderrepresented due to both language and cultural barriers.

3To assess the veracity of business descriptions, we compared the businessdescription provided by Crunchbase with actual corporate website descriptionsfor 50 randomly selected firms (see Appendix for details).

4extracted on November 1, 20155http://startup-ecosystem.compass.co/ser2015/6https://startupgenome.com/

limit ourselves to North American and European regions, butalso included prominent entrepreneurial ecosystems in SouthAmerica, the Middle East, Asia, and Australia for a truly globalcomparison. Our final region list contains a diverse, global set of35 established and emerging entrepreneurial ecosystems, shownin Table I. The list contains 11 U.S. ecosystems and 24 non-U.S. We corroborated the validity and coverage of this list withventure capital investors and entrepreneurs.

With a focus on these 35 global entrepreneurial ecosystems,our sample size reduced to 30 081 companies. We dropped com-panies with no business description and/or business categories.Our final sample ultimately consisted of 24 068 operating ven-tures in 35 regions. Table I provides descriptive statistics of oursample by entrepreneurial ecosystem.

B. Data Mining and Analysis

1) Text Mining of Venture Position Statements: The com-pany description field in Crunchbase contains information ona venture’s activities as well as the particular value it providesto various stakeholders. More broadly, a company descriptioncan be considered a proxy for a venture’s position statementas outlined earlier. Company descriptions are provided in nat-ural language form. In order to convert this data into usefulinformation, text analytic methods must be applied.

All text analytic methods convert natural language text blocksinto a set of words. Following convention, we first remove allstop words. Since not all words in a position statement carrythe same level of importance, we employ a weighted schemeto the set of extracted words. With regard to the distinguishingpower of a term, we face a tradeoff relationship in term fre-quency. The more frequent a term appears in a statement, themore likely the term is characterizing some piece of informa-tion for that statement. However, if the term appears acrossall statements, it loses its distinguishing power. In order tobalance term frequency within and across statements, we uti-lize term frequency-inverse document frequency (TF-IDF), awell-established weighting method, which normalizes term fre-quency by the rarity of the term across all documents. We use thegensim Python library for the implementation of the algorithm.

Page 5: Visual Analysis of Venture Similarity in Entrepreneurial ...entsci.gatech.edu/resources/basoleparkchao-2018-venturesimilarity.… · Entrepreneurial Ecosystems Rahul C. Basole, Senior

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

BASOLE et al.: VISUAL ANALYSIS OF VENTURE SIMILARITY IN ENTREPRENEURIAL ECOSYSTEMS 5

TABLE IDESCRIPTIVE STATISTICS

More formally we compute

wt,d = tft,d × idft,D = freqt,d × log2|D|nt

(1)

where t is the focal term, d is the focal document, and D is thetotal set of documents. nt is the number of documents that con-tain term t. tft,d stands for “term frequency of t in d” and idft,D

stands for “inverse document frequency of t in D.” Accordingto this equation, the term frequency is computed as the rawcount of the term t in the document d and the inverse documentfrequency is the logarithm of the inverse ratio of the numberof documents that contains the term t. Thus, this weight is acombination of local measure (i.e., term frequency) and globalmeasures (i.e., inverse document frequency).7

2) Computation of Venture Similarities: After converting thetextual content into weighted vectors, we can now quantitativelycompare position statements between any two ventures in anentrepreneurial ecosystem using the cosine similarity measureas follows:

cos(wp,wq ) =wp · wq

‖wp‖ ‖wq‖ (2)

where wp and wq are the normalized weighted vectors of startupp and q, respectively.8 Fig. 2 provides an illustrative exampleof this computation. We perform this similarity computation foreach pair of companies within each of the 35 entrepreneurial

7There are a few alternative operationalizations of term frequency and inversedocument frequency. Interested readers may find more information about thealternatives in https://en.wikipedia.org/wiki/Tf%E2%80%93idf

8There are alternative similarity measures to cosine similarity such as Jaccardindex. Jaccard index is designed to compute similarity between sets, whilecosine similarity is widely used for computing similarity between vectors withcontinuous values. We adopt cosine similarity because it has been widely usedin natural language processing and information retrieval.

ecosystems. Since the number of pairs [N(N − 1)/2] growsquadratically to the number of ventures (N ), we utilized high-performance computing infrastructure to accelerate the process.

It should be noted that there are several ways of conceptual-izing similarity between ventures, including structural, funding,leadership, or performance. Our aim in this paper was not todevelop a comprehensive venture similarity vector, but insteadfocusing on the similarity in venture descriptions.

3) Construction of Venture Similarity Network: The venture(position statement) similarity network consists of nodes (rep-resenting ventures) and links (drawn between two nodes if thereis a similarity between two ventures). This resulted in an almostfully connected network in all global ecosystems. An examina-tion of pairwise similarities, however, revealed that similaritiesranged from 0–100% and exhibited a highly-skewed distribu-tion to the right. This suggests that more than 90% of venturepairs are in fact not significantly similar to each other. To re-duce the density and generate a more manageable network forsubsequent analysis and visualization, we used a conservativelink similarity threshold of 15% to eliminate insignificant linksbetween ventures. Our choice of using this similarity thresh-old was grounded in the prior work in text mining (e.g., [58]and [59]) and experimentation and expert interpretation withdifferent threshold levels.

4) Modularity-Based Cluster Analysis: With each en-trepreneurial ecosystem represented as a network, many dif-ferent structural characteristics can be computed to build anunderstanding regarding the dynamics of the ecosystem. As weare interested in the macroscopic composition of venture posi-tioning in each entrepreneurial ecosystem, detection of clustersand subcommunities is of importance [60], [61]. One prominentapproach to identify clusters is to compute the modularity of thenetwork [62]. Modularity measures the strength of division of

Page 6: Visual Analysis of Venture Similarity in Entrepreneurial ...entsci.gatech.edu/resources/basoleparkchao-2018-venturesimilarity.… · Entrepreneurial Ecosystems Rahul C. Basole, Senior

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

6 IEEE TRANSACTIONS ON ENGINEERING MANAGEMENT

Fig. 2. Venture similarity computation example.

a network into modules (or groups, clusters, or communities)[63]. In other words, it enables to determine whether there aresome structurally-induced communities in the ecosystem, ratherthan using traditional groupings such as industry category.

Given a partition of a network, modularity is computed as thedifference between a fraction of links that connect nodes in thesame group and the expected fraction if links were generated atrandom [63]. Modularity is, thus, maximized when a proposedpartition gathers nodes that are densely connected to each othertogether and separates out nodes that are not linked frequently.Formally, following [64], modularity for a given ecosystem isdefined as follows:

M =|C |∑

i=1

(liE

−(ei

E

)2)

(3)

where |C| is the total number of identified clusters, li is thenumber of links within the ith cluster, ei is the number of all links(local and bridging) connecting to ventures in the ith cluster, andE is the total number of ecosystem connections.

While several different modularity algorithms exist, we choseto use Louvain’s modularity-based clustering algorithm [62] toidentify the position statement community structure of eachentrepreneurial ecosystem due to its ability and performance todetect communities in large graphs as well as its implementationin our visualization software.

5) Topic Modeling: While the computation of the clustersprovides an understanding of the overall community structureof an ecosystem, it provides little insight into what charac-terizes a cluster. To provide textual labels for each cluster, weagain leverage the position statements of ventures within a givencluster, and utilize topic modeling techniques to provide human-interpretable labels.

The weight of keywords that best describe an ecosystem clus-ters are again computed using the TF-IDF method. One adjust-ment we make for cluster labeling is that we regard the companydescriptions of the entire set of ventures as a document ratherthan an individual company description. More formally we

compute

wt,c = tft,c × idft,C = freqt,c × log2|C|nt

(4)

where t and c are the focal term and the focal cluster. |C| is againthe total number of identified clusters and nt is the number ofclusters that contain term t. We give a higher weight to theterms that appear frequently in a given cluster, but the weightis lowered if the term appears in many other clusters as well.After assigning the weight to each term for a given cluster, weextract the top five keywords based on the TF-IDF weight. Thefive keywords combined together describe a particular clusterin a given ecosystem.

6) Metrics: In addition to modularity, we compute a num-ber of other well-established networks statistics for each en-trepreneurial ecosystem, also shown in Table I. The firstmeasure—nodes—is a count of all the ventures in a givenecosystem. The next two metrics provides the number of nodesand links in the largest (main) component of the ecosystem,respectively. Components refer to the connected set of nodes ina network. The main component refers to the largest such con-nected component in the ecosystem network. All ratio columnsin the table refer to the proportion of nodes and links to that ofthe preceding column.

We also computed a set of commonly used entrepreneurialecosystem measures, including the average number of fundingrounds, the average amount of funding (in $ million), the averageamount of funding per round (in $ million), and the number ofunique industries (a count of the unique number of categoriesassociated with company profiles in each ecosystem). We alsocomputed industry diversity using a modified Blau Index [65]as follows:

Blau Index = 1 − HHI = 1 −∑

j

p2j (5)

where pj is the proportion of category j in the total population ofthe ecosystem. We use the top 20 categories to compute sensiblevalues for the Blau index. As a robustness check, we compute

Page 7: Visual Analysis of Venture Similarity in Entrepreneurial ...entsci.gatech.edu/resources/basoleparkchao-2018-venturesimilarity.… · Entrepreneurial Ecosystems Rahul C. Basole, Senior

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

BASOLE et al.: VISUAL ANALYSIS OF VENTURE SIMILARITY IN ENTREPRENEURIAL ECOSYSTEMS 7

Fig. 3. Visualization of strategic positioning in the Silicon Valley ecosystem. Nodes represent operating ventures; links represent similarities. Nodes are colorencoded by the subcommunities.

the Blau index using the top 10 categories and all categories aswell. The pairwise correlation among three measures are above0.85. Categories are weighted as 1/ni for company i when ithas ni categories.

Table I also presents the most frequently occurring industry ineach ecosystem and proportion of that category in the ecosystem.When computing the top industry, we apply the same categoryweights.

The mean similarity column shows the mean of the pair-wise similarities among the ventures in the ecosystem as apercentage.

C. Visualization

Visual representations are a fundamental component of hu-man learning and understanding [66]. They enable us to not onlycommunicate information or facts but also create, assess, andtransfer insights, experiences, expectations, and perspectives.There is no algorithmic approach of choosing a single best vi-sual representation for a given dataset; instead, there are manydifferent representations available, each useful under different

conditions and with its own advantages and limits [57], [67]. Thechoice is generally guided by the nature of the underlying dataand the questions that are being asked. Given that startup posi-tion statement similarity and community are key issues of ourinquiry, visual representations that can depict interconnectivity,positions, and clusters are particularly suitable [68].

One prominent approach is to use force-directed networklayouts, which arrange nodes based on laws of attraction andrepulsion from classical physics [69]. The use of a force-basedlayout is particularly appealing when the motivating issue isto identify central or prominent nodes, peripheral actors, andclusters. While there are many variations (e.g., Force-Atlas,Kamada Kawai, Yifan Hu, Fruchterman-Reingold [70]), we ul-timately chose the OpenORD graph layout [71] implemented inGephi [72]. The advantage of the OpenORD algorithm is that itallows specification of five distinct phases that enable cluster-differentiating layouts. As it is possible that some nodes canoverlap with each other, we also applied the noverlap algorithmto minimize occlusion.

In order to make visualizations readable, effective, and mem-orable, appropriate visual encodings must be selected. Visual

Page 8: Visual Analysis of Venture Similarity in Entrepreneurial ...entsci.gatech.edu/resources/basoleparkchao-2018-venturesimilarity.… · Entrepreneurial Ecosystems Rahul C. Basole, Senior

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

8 IEEE TRANSACTIONS ON ENGINEERING MANAGEMENT

Fig. 4. London ecosystem with a zoom-in on a cluster of ventures from different industries.

encoding refers to the selection of graphical properties of datamarks for the network primitives. There are many different waysto visually encode nodes and links and the choice often de-pends on the underlying data type and tasks [73]. Nodes aredepicted using a circle, colored by cluster membership, and pro-portionally sized by their prominence (as measured by closenesscentrality). We use curved rather than straight links to reduceclutters caused by link crossings; the thickness of a link is pro-portional to the similarity of two ventures, and the color is basedon the interpolation of the two nodes.

IV. RESULTS AND DISCUSSION

Table I shows a summary of metrics computed for all en-trepreneurial ecosystems. The measures highlight that there arewide differences in the size, structure, diversity, and positioningof global entrepreneurial ecosystems. Ecosystems range in size(in terms of number of operating ventures) from the low hun-dreds (Sao Paulo, Brazil, with 114 ventures) to the thousands(Silicon Valley, USA, with 6 297 ventures). The average ecosys-tem size is 687, but it is highly skewed by the existence of a fewvery large ecosystems, as indicated by the high variance. Notsurprisingly, some of the largest entrepreneurial ecosystems arebased in North America, including New York City, Boston, andLos Angeles. The largest non-U.S. entrepreneurial ecosystem

is located in London (England) followed by Toronto (Canada),Tel Aviv (Israel), and Beijing (China).

The top tag-based industries across these entrepreneurialecosystems include software (21) followed by e-commerce (6),biotechnology (5), and mobile (3). Interestingly, in U.S. en-trepreneurial ecosystems where biotechnology is the top indus-try, it occupies a disproportionate presence in comparison toother industries and the overall industry diversity is low. For in-stance, in San Diego, biotechnology represents 25.06% of firms(compared to a global average of 10.86%) and industry diver-sity is significantly below average at 0.8326 (global average of0.9120). This same pattern holds true for Boston and Raleigh.

Given the scale of our venture similarity networks, we fol-lowed scholarly convention to focus on the main component ofeach ecosystem for our core analyses and visualizations. Themain component captures the largest group of ventures whichare linked by direct or indirect ties (one venture is similar toanother, which is similar to a third). In this component, all ven-tures are linked to each other in some way, and any one canbe related to any other via a similarity pathway. As our resultsshow, main component sizes range from very small (Munich) tovery large (Silicon Valley). Indeed, the ratio of nodes in maincomponents to total nodes ranges from 4.76% to 80.23%, with aglobal average of 30.64%. We observe very similar proportionswhen examining links in the overall ecosystem as well as the

Page 9: Visual Analysis of Venture Similarity in Entrepreneurial ...entsci.gatech.edu/resources/basoleparkchao-2018-venturesimilarity.… · Entrepreneurial Ecosystems Rahul C. Basole, Senior

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

BASOLE et al.: VISUAL ANALYSIS OF VENTURE SIMILARITY IN ENTREPRENEURIAL ECOSYSTEMS 9

Fig. 5. Visualization of strategic positioning in the Boston ecosystem with labeled clusters.

main component. While we apply 15% as the cutoff thresholdfor similarities among ventures, a different cutoff threshold willlikely change the size of main components in each ecosystem.The higher the threshold is, the more fragmented the ecosys-tem network will be, and the smaller the main component sizewill be.

The number of clusters, reflecting the number of strategic po-sition groups in an entrepreneurial ecosystem, ranges from 2 to30, and a global average of 12.30. The number of unique indus-tries ranges from 158 (in Mumbai) to 723 (in Silicon Valley).Both of these measures are highly positively correlated withecosystem size.

The average number of funding rounds ranges from 1.63 to1.93, with a global mean of 1.78. The average funding amount(in $M) ranges from $8.04 (in Dublin) to $21.49 million (inWashington DC). The average funding amount per round rangesfrom $3.26 (in Munich) to $13.61 million (in Washington DC).

While the funding figures for DC are disproportionatelyhigh, and potentially an outlier, it is most likely explainedby the resource-intensive defense and security industry that ispresent in this ecosystem. Indeed, when examining the visual

representation of the DC ecosystem, the most central clustersinclude security and defense.

While visual representations of business ecosystems havebeen increasing in recent years [69], [74], [75], most studiesfocus on relatively well established interfirm relationship typesbetween stakeholders (e.g., partnerships, investments, cus-tomer/supplier, people placements). What is much rarer is to de-rive ecosystem relationships between entrepreneurial venturesbased on similarity of textual content. Our study fills this gap.

We visualized each of the entrepreneurial ecosystems using anetwork visualization approach. Visual representations not onlyhelp communicate complex network structures but also facili-tate both comprehension and sense making. Given the size ofsome of the ecosystems, visual representations are particularlyhelpful.

Consider the visualization provided in Fig. 3. It presents avisualization of the structure of venture similarity in the Sili-con Valley ecosystem. Nodes represent ventures; links repre-sent similar position statements used by ventures. Positioningstatement clusters are grouped by color. As we are using acluster-emphasizing layout algorithm, our visualizations can be

Page 10: Visual Analysis of Venture Similarity in Entrepreneurial ...entsci.gatech.edu/resources/basoleparkchao-2018-venturesimilarity.… · Entrepreneurial Ecosystems Rahul C. Basole, Senior

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

10 IEEE TRANSACTIONS ON ENGINEERING MANAGEMENT

Fig. 6. Association between ecosystem size and ecosystem measures. (a) Clusters. (b) Mean Similarity.

interpreted as follows. First, similar nodes (ventures) typicallycluster together. Centrally located nodes are core to the ecosys-tem, indicating usage of core and bridging strategic positioninglanguage. Peripheral clusters are less central and tend to usemore unique strategic positioning statements. Clusters that arecloser to each other indicate a high number of overlap and in-terrelated statements; similarly, greater distance between clus-ters indicate a low number of overlap. Dense clusters containhighly similar strategic position statements; spread out clus-ters use highly differentiated language. Lastly, a node bridgingtwo clusters can indicate that a startup potentially synthesizesdisparate strategic positioning ideas.

With these interpretation guidelines in mind, we observe thatSilicon Valley contains a rich set of unique strategic positioningclusters. Some of these clusters are centrally located, while there

are several located peripherally. We also notice some very denseclusters, while others are spread apart. We observe a variety ofsuch patterns across our entrepreneurial ecosystems.9

Intuitively, one may think that each cluster represents a spe-cific industry. However, when examining Fig. 4, which depictsa zoomed-in version of the London ecosystem, we can see thatthere are many different tag-based industry categories within acluster. In this example, ventures associated with analytics, mo-bile, design, payments, finance, and hospitality form the bluecluster. This result provides strong evidence that the clusters ofstrategic positions we uncover are not simply based on indus-try classification. In fact, it demonstrates that when we allow

9Given the number of visualizations, we only present a subset of illustrativeexamples here. A complete list of visualizations is provided on request.

Page 11: Visual Analysis of Venture Similarity in Entrepreneurial ...entsci.gatech.edu/resources/basoleparkchao-2018-venturesimilarity.… · Entrepreneurial Ecosystems Rahul C. Basole, Senior

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

BASOLE et al.: VISUAL ANALYSIS OF VENTURE SIMILARITY IN ENTREPRENEURIAL ECOSYSTEMS 11

these positions and clusters to emerge endogenously rather thanthrough a particular industry classification, novel value config-urations appear.

To gain a deeper understanding what the clusters actuallymean, we applied topic modeling to each of the clusters withineach entrepreneurial ecosystem. Consider the Boston ecosys-tem visualized in Fig. 5. First, we notice a centrally locatedcluster, described by words such as backup, storage, security,and virtualization. Boston is well known for its enterprise andsecurity industries, and it appears that their strategic positioningstatements are core to the overall ecosystem. We also notice atightly integrated cluster of energy companies, related to wind,energy, cooling, clean, and vehicle, suggesting the positioningsimilarity between the energy and automotive industries. Whatis particularly interesting is the six labeled clusters at the bot-tom of the ecosystem visualization. Boston’s largest industry isbiotechnology, but we see a nuanced differentiation in strate-gic positioning. One cluster appears to be related to the patientand care-delivery facing side, while others are focused on spe-cific subfields, including therapeutic treatments, cancer, DNAsequencing, and robotics. The proximity of these clusters con-firm that they are related, but the varying topic labels reveal thatthere are some important differences.

With an understanding that ecosystems are characterized bythe presence of venture similarity clusters, it is pertinent toexplore how this measure relates to other metrics and whatdifferences across ecosystems exist. Fig. 6(a)–(b) present howecosystem size correlates with the number of clusters and themean similarity in an ecosystem. Given the skewed distributionof ecosystem size, we utilized a logarithmic scale for the x-axis. Several interesting observations can be made. For instance,Fig. 6(a) compares the number of clusters against the ecosystemsize at our default threshold level (15%).10 It clearly shows astrong positive correlation between the two measures. Since thex-axis is in log scale, the number of clusters identified is quicklysaturating as the ecosystem grows in size.

Fig. 6(b), on the other hand, shows the comparison betweenthe overall mean similarity and the ecosystem size. Here, we ob-serve that the mean similarity of position statements for smallsize ecosystems has a much wider range. As ecosystems growin size, the strategic positioning similarity of firms within thatecosystem converges to the global mean. This result is interest-ing as it does not necessarily have to be the case. On the onehand as an ecosystem grows in size, it becomes attractive to amore diverse set of firms. On the other hand, competition grows,and hence, strategic positioning become more similar.

Despite this convergence to the mean, we do however observethat there are still significant differences between positioningstatements within an entrepreneurial ecosystem as demonstratedby the number of clusters. It is, thus, evident that ventures arecontinuously trying to balance legitimacy and uniqueness.

V. CONCLUDING REMARKS

This study defined and applied a rich, data-driven analysis andvisualization approach for understanding the structure of ven-

10Appendix A shows the results of our sensitivity analysis at different thresh-old levels.

ture similarity in global entrepreneurial ecosystems. Fusing datamining, text analytics, and network visualization, we examinethe structure of strategic positioning of nearly 60 000 venturesin 35 ecosystems. Our visual analysis reveals that there are widedifferences in entrepreneurial ecosystem size, structure, diver-sity, and positioning.

Specifically, we find that for entrepreneurial ventures in largerecosystems, similarity in positioning statements converges tothe global mean. We also find that ecosystems located in emerg-ing economies tend to be smaller and have greater dispersionof venture similarity. That is to say, emerging ecosystems tendto have firms that position themselves either very similarly orvery differentiated. The implication is that ventures within en-trepreneurial ecosystems are constantly balancing legitimacyand differentiation and that balance becomes more salient as theecosystem grows in size. Our ecosystem visualizations revealthat clusters are composed of firms from diverse industries andthat some clusters are more proximate to each other than to oth-ers, suggesting potential differentiable intra- and inter-industrydynamics.

While our results are predominantly descriptive in nature—a natural outcome of visual exploratory analysis—our uniquedata-driven approach provides an important foundation for ex-citing future technology, innovation, and entrepreneurship re-search. First, we demonstrate that a data-driven approach pro-vides the ability to study entrepreneurial ecosystems at scale.Second, we show that visualization is largely about hypothesis-generating, rather than hypothesis-testing. We anticipate inter-esting new empirical studies emerging from our paper. As-sociating the structural characteristics (e.g., cluster position-ing and density) to the financial performance and health ofan entrepreneurial ecosystem, for instance, is of great interestto both scholars and practitioners. Another interesting exten-sion of our paper is to understand the evolution of strategicpositioning in ecosystems and identifying potential mimeticprocesses across ecosystems. Given the generalizability of ourdata-driven approach, another promising avenue for future re-search would include an examination of entire actual businessmodel descriptions, press releases, and patent announcements ofventures.

Our study also has important managerial implications. Thevisual text-analytic approach presented in this study providesindustry analysts and policy makers a new methodology withwhich to conduct competitive market intelligence, which couldbe applied and extended to many other contexts. Our specificresults will help entrepreneurs and investors understand thevariegated nature of entrepreneurial ecosystems, identify ven-ture similarity clusters, and aid in devising effective strategicpositions.

We acknowledge that our study is not without limitations. Weuse a socially-curated dataset, which may have some qualityissues, such as missing or outdated data. We tried to overcomethis by extracting the most up-to-date dataset and performingmanual spot checks. Second, Crunchbase is a predominantlytechnology-industry oriented data source. While a wide varietyof industries are present, it is quite possible that some indus-tries are underrepresented or missing. Lastly, for manageabilityof analysis and visualization, we only investigated 35 global

Page 12: Visual Analysis of Venture Similarity in Entrepreneurial ...entsci.gatech.edu/resources/basoleparkchao-2018-venturesimilarity.… · Entrepreneurial Ecosystems Rahul C. Basole, Senior

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

12 IEEE TRANSACTIONS ON ENGINEERING MANAGEMENT

TABLE IIECOSYSTEM CLUSTERS AT DIFFERENT THRESHOLD AND RESOLUTION LEVELS

entrepreneurial ecosystems. While we believe this set is com-prehensive, there are many other ecosystems that exist, includ-ing regions in Asia (e.g., Shenzhen). Each of these limitations,however, present exciting opportunities for future data-driventechnology, innovation, and entrepreneurship research.

TABLE IIISPEARMAN RANK CORRELATIONS FOR COMBINATIONS OF SIMILARITY

THRESHOLD AND RESOLUTION

APPENDIX

A. Similarity of Company Descriptions

To assess the quality of business descriptions, we com-pared the business description provided by Crunchbase withactual corporate website descriptions for 50 randomly selectedfirms. Specifically, we used a TF-IDF approach to compare thesimilarity between two descriptions. If the TF-IDF score isgreater or equal to 0.15, the two descriptions are deemed similar.This follows the approach we have used throughout the paperand what is recommended by the prior work. Given space lim-itations, the detailed results of our analysis are provided in anonline appendix.11 The descriptions from two distinct sourcesare on average 53.89% similar and three companies have 100%similar descriptions on both the corporate website and Crunch-base. The lowest similarity level is still 19.54%. These results

11http:///entsci.gatech.edu/venturesimilarity/e_supplement.pdf

Page 13: Visual Analysis of Venture Similarity in Entrepreneurial ...entsci.gatech.edu/resources/basoleparkchao-2018-venturesimilarity.… · Entrepreneurial Ecosystems Rahul C. Basole, Senior

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

BASOLE et al.: VISUAL ANALYSIS OF VENTURE SIMILARITY IN ENTREPRENEURIAL ECOSYSTEMS 13

Fig. 7. Association between ecosystem size and ecosystem measures by threshold level at default resolution.

Fig. 8. Association between ecosystem size and the number of clusters by resolution. (a) Threshold 10%. (b) Threshold 15%. (c) Threshold 20%.

suggest significant content overlap with the Crunchbase entry,giving us confidence about the validity of the data.

B. Sensitivity Analysis

We conducted several sensitivity analyses to ensure the ro-bustness of our results. Specifically, for our network model-ing part, we conducted analyses at three different similarity

threshold levels (10%, 15%, and 20%) as well as a ±0.5 vari-ation (using 0.1 increments) on the resolution parameter (see[76]) of the Louvain modularity algorithm. The three thresh-old levels represent a broad spectrum of both more relaxed andstringent similarity criteria. A 10% threshold level suggests thatonly 10% of the descriptions need to match in order for a linkto be drawn between two firms; a 20% threshold on the othersuggests that at least 20% of the descriptions need to match.

Page 14: Visual Analysis of Venture Similarity in Entrepreneurial ...entsci.gatech.edu/resources/basoleparkchao-2018-venturesimilarity.… · Entrepreneurial Ecosystems Rahul C. Basole, Senior

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

14 IEEE TRANSACTIONS ON ENGINEERING MANAGEMENT

For each configuration of threshold and resolution, we reportthe mean after 100 replications (see Table II). The sensitivityanalysis was implemented using Python12 and NetworkX.13

Our results show that increasing the threshold level reducesthe number of clusters, in general, which in turn increases thenumber of clusters as the network gets more disconnected (i.e.,greater threshold → less similarity → reduction in edges →increase in clusters). Fig. 7 shows the results of our sensitiv-ity analysis with respect to the association between ecosystemsize and number of clusters differentiated by the threshold level[corresponding to Fig. 6(a)]. We observe that for a lower thresh-old level the number of clusters increases for the majority ofecosystems (25/35). As the threshold level increases, we seea lower number of clusters in the main component of eachecosystem. However, this finding is less pronounced and infact in some instances opposite for larger ecosystems as higherthreshold levels do not reduce the main component in signifi-cant ways as for smaller ecosystems. We, thus, observe that forecosystems with large main components (e.g., London, NewYork, or Boston) that the number of clusters initially increasesand then decreases with higher threshold levels. While increas-ing the threshold level will by definition improve the similaritybetween companies, we lose important information that mayexists in the network. We, thus, adopt the 15% threshold levelfor our main analysis.

To examine the positive relationship between the ecosystemsize and the number of clusters, we created small multiples foreach combination of resolution and threshold level as shown inFig. 8. We also computed the Spearman rank correlation to seehow stable the correlation is for different levels of resolution.Overall, the Spearman correlation for the whole aggregate datais 0.5995 for N = 1 155. Next, we computed the correlationsfor each combination of similarity threshold and resolution (seeTable III). Together, the correlations from the visuals and anal-ysis confirm the positive relationship between ecosystem sizeand the number of clusters from the main component.

REFERENCES

[1] C. Pitelis, “Clusters, entrepreneurial ecosystem co-creation, and appro-priability: A conceptual framework,” Ind. Corp. Change, vol. 21, no. 6,pp. 1359–1388, Dec. 2012.

[2] J. Bell-Masterson and D. Stangler, “Measuring an entrepreneurial ecosys-tem,” Arlington, VA: Kauffman Foundation, 2015.

[3] B. Feld, Startup Communities: Building an Entrepreneurial Ecosystem inYour City. Hoboken, NJ, USA: Wiley, 2012.

[4] E. Autio, M. Kenney, P. Mustar, D. Siegel, and M. Wright, “Entrepreneurialinnovation: The importance of context,” Res. Policy, vol. 43, no. 7,pp. 1097–1108, 2014.

[5] Z. J. Acs and C. Armington, Entrepreneurship, Geography, and AmericanEconomic Growth. Cambridge, U.K.: Cambridge Univ. Press, 2006.

[6] D. J. Isenberg, “How to start an entrepreneurial revolution,” Harvard Bus.Rev., vol. 88, no. 6, pp. 40–50, 2010.

[7] V. W. Hwang and G. Horowitt, The rainforest: The secret to building theNext Silicon Valley. Regenwald Los Altos Hills, CA, USA: CreatespaceIndependent Pub., 2012.

[8] B. Cohen, “Sustainable valley entrepreneurial ecosystems,” Bus. StrategyEnviron., vol. 15, no. 1, pp. 1–14, 2006.

[9] H. Van de Ven, “The development of an infrastructure for entrepreneur-ship,” J. Bus. Venturing, vol. 8, no. 3, pp. 211–230, 1993.

12https://www.python.org/13https://networkx.github.io/

[10] W. J. Baumol, “Entrepreneurship: Productive, unproductive, and destruc-tive,” J. Bus. Venturing, vol. 11, no. 1, pp. 3–22, 1996.

[11] M. Iansiti and R. Levien, The Keystone Advantage: What the New Dynam-ics of Business Ecosystems Mean for Strategy, Innovation, and Sustain-ability. Boston, MA, USA: Harvard Bus. Press, 2004.

[12] L. A. Plummer and A. Pe’er, “The geography of entrepreneurship,” inHandbook of Entrepreneurship Research. New York, NY, USA: Springer,2010, pp. 519–556.

[13] Z. J. Acs, E. Stam, D. B. Audretsch, and A. OConnor, “The lineages of theentrepreneurial ecosystem approach,” Small Bus. Econ., pp. 1–10, 2017.

[14] M. E. Porter, Clusters and the New Economics of Competition, vol. 76.Boston, MA, USA: Harvard Bus. Review, 1998.

[15] B. T. Asheim and M. S. Gertler, “The geography of innovation: Regionalinnovation systems,” in The Oxford Handbook of Innovation, J. Fagerberg,D. Mowery, and R. Nelson, Eds. Oxford, U.K.: Oxford University Press,pp. 291–317, 2005.

[16] J. F. Moore, The Death of Competition: Leadership and Strategy in theAge of Business Ecosystems. New York, NY, USA: Harper Paperbacks,1997.

[17] R. Adner, The Wide Lens: A New Strategy for Innovation. London, U.K.:Penguin, 2012.

[18] E. L. Glaeser, Agglomeration Economics. Chicago, IL, USA: Univ.Chicago Press, 2010.

[19] J. Alcacer and W. Chung, “Location strategies for agglomerationeconomies,” Strategic Manage. J., vol. 35, no. 12, pp. 1749–1761, 2014.

[20] R. Adner, “Ecosystem as structure an actionable construct for strategy,”J. Manage., vol. 43, no. 1, pp. 39–58, 2017.

[21] M. Delgado, M. E. Porter, and S. Stern, “Clusters and entrepreneurship,”J. Econ. Geography, vol. 10, no. 4, pp. 495–518, 2010.

[22] G. Cattani, J. M. Pennings, and F. C. Wezel, “Spatial and temporal het-erogeneity in founding patterns,” Org. Sci., vol. 14, no. 6, pp. 670–685,2003.

[23] B. Audretsch, “Agglomeration and the location of innovative activity,”Oxford Rev. Econ. Policy, vol. 14, no. 2, pp. 18–29, 1998.

[24] S. Breschi and F. Malerba, “The geography of innovation and economicclustering: Some introductory notes,” Ind. Corp. Change, vol. 10, no. 4,pp. 817–833, 2001.

[25] B. Spigel, “The relational organization of entrepreneurial ecosystems,”Entrepreneurship Theory Practice, vol. 41, pp. 49–72, 2015.

[26] R. C. Basole, A. Srinivasan, H. Park, and S. Patel, “Ecoxight: Discov-ery, exploration and analysis of business ecosystems using interactivevisualization,” ACM Trans. Manage. Inf. Syst., vol. 9, no. 2, 2018, Art.no. 6.

[27] S. Albert and D. A. Whetten, “Organizational identity,” Res. Organiza-tional Behavior, vol. 7, pp. 263–295, 1985.

[28] D. A. Whetten, “Albert and Whetten Revisited: Strengthening the Conceptof Organizational Identity,” J. Manage. Inquiry, vol. 15, no. 3, pp. 219–234, Sep. 2006.

[29] C. Boone, F. C. Wezel, and A. van Witteloostuijn, “Joining the pack orgoing solo? A dynamic theory of new firm positioning,” J. Bus. Venturing,vol. 28, no. 4, pp. 511–527, 2013.

[30] J. Tan, Y. Shao, and W. Li, “To be different, or to be the same? Anexploratory study of isomorphism in the cluster,” J. Bus. Venturing, vol. 28,no. 1, pp. 83–97, 2013.

[31] D. A. Shepherd, “Party on! A call for entrepreneurship research thatis more interactive, activity based, cognitively hot, compassionate, andprosocial,” J. Bus. Venturing, vol. 30, no. 4, pp. 489–507, 2015.

[32] A. Marshall, Principles of Economics: An Introductory Volume. London,U.K.: Macmillan, 1890.

[33] J. Jacobs, The Economy of Cities. New York, NY, USA: Vintage Books,1969.

[34] G. Van der Panne, “Agglomeration externalities: Marshall versus jacobs,”J. Evolutionary Econ., vol. 14, no. 5, pp. 593–604, 2004.

[35] M. E. Porter, “Location, competition, and economic development: Localclusters in a global economy,” Econ. Develop. Quarterly, vol. 14, no. 1,pp. 15–34, 2000.

[36] A. Saxenian, Regional Advantage. Cambridge, MA, USA: Harvard Univ.Press, 1996.

[37] D. B. Audretsch, Everything in its Place: Entrepreneurship and the Strate-gic Management of Cities, Regions, and States. London, U.K.: OxfordUniv. Press, 2015.

[38] D. Isenberg, “The entrepreneurship ecosystem strategy as a new paradigmfor economic policy: Principles for cultivating entrepreneurship,” Pre-sentation at the Institute of International and European Affairs, DublinIreland, May 12, 2011.

Page 15: Visual Analysis of Venture Similarity in Entrepreneurial ...entsci.gatech.edu/resources/basoleparkchao-2018-venturesimilarity.… · Entrepreneurial Ecosystems Rahul C. Basole, Senior

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

BASOLE et al.: VISUAL ANALYSIS OF VENTURE SIMILARITY IN ENTREPRENEURIAL ECOSYSTEMS 15

[39] M. P. Feldman, “The character of innovative places: Entrepreneurial strat-egy, economic development, and prosperity,” Small Bus. Econ., vol. 43,no. 1, pp. 9–20, 2014.

[40] C. Freeman, “The national system of innovationin historical perspective,”Cambridge J. Econ., vol. 19, no. 1, pp. 5–24, 1995.

[41] B. T. McCann and T. B. Folta, “Performance differentials within geo-graphic clusters,” J. Bus. Venturing, vol. 26, no. 1, pp. 104–123, 2011.

[42] P. Cooke, M. G. Uranga, and G. Etxebarria, “Regional innovation systems:Institutional and organisational dimensions,” Res. Policy, vol. 26, no. 4/5,pp. 475–491, 1997.

[43] P. Cooke, “Regional innovation systems, clusters, and the knowledge econ-omy,” Ind. Corp. Change, vol. 10, no. 4, pp. 945–974, 2001.

[44] D. J. Isenberg, “How to start an entrepreneurial revolution,” Harvard Bus.Rev., vol. 88, no. 6, pp. 40–50, 2010.

[45] M. Morris, M. Schindehutte, and J. Allen, “The entrepreneur’s businessmodel: Toward a unified perspective,” J. Bus. Res., vol. 58, no. 6, pp. 726–735, 2005.

[46] R. E. Miles, C. C. Snow, A. D. Meyer, and H. J. Coleman, “OrganizationalStrategy, Structure, and Process.” Acad. Manage. Rev., vol. 3, no. 3,pp. 546–562, Jul. 1978.

[47] H. Mintzberg, “Patterns in Strategy Formation,” Manage. Sci., vol. 24,no. 9, pp. 934–948, May 1978.

[48] M. E. Porter, Competitive strategy: Techniques for analyzing industriesand competition. New York, NY, USA: Free Press, 1980.

[49] C. Markides, “Strategic Innovation,” Sloan Manage. Rev., vol. 38, pp. 9–24, 1997.

[50] C. C. Markides, “A dynamic view of strategy,” Sloan Manage. Rev.,vol. 40, no. 3, pp. 55–63, 1999.

[51] D. L. Deephouse, “To be different, or to be the same? Its a question(and theory) of strategic balance,” Strategic Manage. J., vol. 20, no. 2,pp. 147–166, Feb. 1999.

[52] G. Hooley, A. Broderick, and K. Moller, “Competitive positioning andthe resource-based view of the firm,” J. Strategic Marketing, vol. 6, no. 2,pp. 97–116, Jul. 1998.

[53] C. Blankson and S. P. Kalafatis, “The development of a consumer/customer-derived generic typology of positioning strategies,” J. MarketingTheory Practice, vol. 9, no. 2, pp. 35–53, 2001.

[54] C. Blankson and S. P. Kalafatis, “The development and validation of a scalemeasuring consumer/customer-derived generic typology of positioningstrategies,” J. Marketing Manage., vol. 20, no. 1/2, pp. 5–43, Feb. 2004.

[55] S. P. Kalafatis, M. H. Tsogas, and C. Blankson, “Positioning strategies inbusiness markets,” J. Bus. Ind. Marketing, vol. 15, no. 6, pp. 416–437,Nov. 2000.

[56] G. Hooley and G. Greenley, “The resource underpinnings of competitivepositions,” J. Strategic Marketing, vol. 13, no. 2, pp. 93–116, Jun. 2005.

[57] S. K. Card, J. D. Mackinlay, and B. Shneiderman, Readings in informationvisualization: Using vision to think. San Francisco, CA, USA: MorganKaufmann, 1999.

[58] G. Hoberg and G. Phillips, “Text-based network industries and endogenousproduct differentiation,” J. Political Economy, vol. 124, no. 5, pp. 1423–1465, 2016.

[59] R. Feldman and J. Sanger, The text mining handbook: Advanced ap-proaches in analyzing unstructured Data. Cambridge, U.K.: CambridgeUniv. Press, 2007.

[60] G. Palla, I. Derenyi, I. Farkas, and T. Vicsek, “Uncovering the overlappingcommunity structure of complex networks in nature and society.” Nature,vol. 435, no. 7043, pp. 814–8, Jun. 2005.

[61] D. J. Ketchen Jr, and C. L. Shook, “The application of cluster analysisin strategic management research: An analysis and critique,” StrategicManage. J., vol. 17, no. 6, pp. 441–458, Jun. 1996.

[62] V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre, “Fastunfolding of communities in large networks,” J. Statistical Mech.: TheoryExperiment, vol. 2008, no. 10, 2008, Art. no. P10008.

[63] M. E. J. Newman, “Modularity and community structure in networks,”Proc. Nat. Acad. Sci. USA, vol. 103, no. 23, pp. 8577–8582, Jun. 2006.

[64] M. E. J. Newman, “Fast algorithm for detecting community structure innetworks,” Phys. Rev. E, Statistical, Nonlinear, Soft Matter Phys., vol. 69,no. 6, Jun. 2004, Art. no. 066133.

[65] J. P. Gibbs and W. T. Martin, “Urbanization, technology, and the divisionof labor: International patterns,” Amer. Sociol. Rev., vol. 27, no. 5, pp. 667–677, 1962.

[66] R. C. Basole, J. Huhtamaki, K. Still, and M. G. Russell, “Visual decisionsupport for business ecosystem analysis,” Expert Syst. Appl., vol. 65,pp. 271–282, 2016.

[67] J. Heer, M. Bostock, and V. Ogievetsky, “A tour through the visualizationzoo,” Commun. ACM, vol. 53, no. 6, pp. 59–67, Jun. 2010.

[68] B. R. Iyer and R. C. Basole, “Visualization to understand ecosystems,”Commun. ACM, vol. 59, no. 11, pp. 27–30, 2016.

[69] R. C. Basole, “Visualization of interfirm relations in a converging mobileecosystem,” J. Inf. Technol., vol. 24, no. 2, pp. 144–159, Jun. 2009.

[70] G. D. Battista, P. Eades, R. Tamassia, and I. G. Tollis, Graph drawing:Algorithms for the Visualization of Graphs. Englewood Cliffs, NJ, USA:Prentice-Hall, 1998.

[71] S. Martin, W. M. Brown, R. Klavans, and K. W. Boyack, “OpenOrd: Anopen-source toolbox for large graph layout,” Proc. SPIE, vol. 7868, Jan2011, Art. no. 786806.

[72] M. Bastian, S. Heymann, and M. Jacomy, “Gephi: An open source soft-ware for exploring and manipulating networks,” in Proc. Int. AAAI Conf.Weblogs Soc. Media, 2009.

[73] J. Mackinlay, P. Hanrahan, and C. Stolte, “Show me: Automatic presen-tation for visual analysis.” IEEE Trans. Visualization Comput. Graph.,vol. 13, no. 6, pp. 1137–44, Jan. 2007.

[74] R. C. Basole et al., “Understanding business ecosystem dynamics: A datadriven approach,” ACM Trans. Manage. Inf. Syst., vol. 6, no. 2, pp. 1–32,Jun. 2015.

[75] R. C. Basole, H. Park, and B. C. Barnett, “Coopetition and convergencein the ICT ecosystem,” Telecommun. Policy, vol. 39, no. 7, pp. 537–552,Jun. 2015.

[76] S. Fortunato and M. Barthelemy, “Resolution limit in community detec-tion,” Proc. Nat. Acad. Sci., vol. 104, no. 1, pp. 36–41, 2007.

Authors’ photographs and biographies not available at the time of publication.