The Emergence of Opinion Leaders in a Networked Online … · Lu, Jerath, and Singh: Emergence of...

17
MANAGEMENT SCIENCE Articles in Advance, pp. 1–17 ISSN 0025-1909 (print) ISSN 1526-5501 (online) http://dx.doi.org/10.1287/mnsc.1120.1685 © 2013 INFORMS The Emergence of Opinion Leaders in a Networked Online Community: A Dyadic Model with Time Dynamics and a Heuristic for Fast Estimation Yingda Lu Lally School of Management and Technology, Rensselaer Polytechnic Institute, Troy, New York 12180, [email protected] Kinshuk Jerath, Param Vir Singh David A. Tepper School of Business, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213 {[email protected], [email protected]} W e study the drivers of the emergence of opinion leaders in a networked community where users establish links to others, indicating their “trust” for the link receiver’s opinion. This leads to the formation of a network, with high in-degree individuals being the opinion leaders. We use a dyad-level proportional hazard model with time-varying covariates to model the growth of this network. To estimate our model, we use Weighted Exogenous Sampling with Bayesian Inference, a methodology that we develop for fast estimation of dyadic models on large network data sets. We find that, in the Epinions network, both the widely studied “preferential attachment” effect based on the existing number of inlinks (i.e., a network-based property of a node) and the number and quality of reviews written (i.e., an intrinsic property of a node) are significant drivers of new incoming trust links to a reviewer (i.e., inlinks to a node). Interestingly, we find that time is an important moderator of these effects—intrinsic node characteristics are a stronger short-term driver of additional inlinks, whereas the preferential attachment effect has a smaller impact but it persists for a longer time. Our novel insights have important managerial implications for the design of online review communities. Key words : user-generated content; opinion leaders; social networks; network growth; proportional hazard model; weighted exogenous sampling History : Received March 1, 2010; accepted August 29, 2012, by Sandra Slaughter, information systems. Published online in Articles in Advance. 1. Introduction Opinion leaders—individuals who exert a consid- erable amount of influence on the opinions of others—are an important element in the diffusion of information in a community (Gladwell 2000, Rogers 2003). Motivated by the seminal work by Katz and Lazarsfeld (1955), researchers have contributed to our understanding of opinion leaders by systematically analyzing how individuals emerge as opinion leaders in a community (Watts and Dodds 2007), how they facilitate the diffusion of information by their influ- ence on the opinions of others (Ghose and Ipeirotis 2011, Iyengar et al. 2011, Van den Bulte and Joshi 2007, Stephen et al. 2012), what the characteristics of these individuals are (Chan and Misra 1990, Myers and Robertson 1972), and how to identify them, pri- marily with the aim of marketing products through them (Valente et al. 2003, Vernette 2004). With the advent of Web 2.0, websites where con- sumers voluntarily contribute product reviews, such as Epinions (http://www.epinions.com), have pros- pered in the last few years. By sharing their own opinions on these online forums, consumers influ- ence others’ opinions as well. An advantage of such activity being online is that it may be possible to track the flow of influence among the members of the community. For instance, Epinions employs a novel mechanism in which every member of this commu- nity can formally include members whose reviews she trusts in her “web of trust.” This leads to the for- mation of a network of trust among reviewers with high in-degree individuals being the opinion leaders. Various other websites that provide forums for user- generated content provide mechanisms of the above nature under which users can extend links to other users whose opinions or content they value (among other reasons for forming such links), thus leading to a networked community. Examples of such web- sites include The Motley Fool (http://www.fool.com) and Seeking Alpha (http://www.seekingalpha.com) for sharing opinions on topics related to financial mar- kets, YouTube (http://www.youtube.com) for shar- ing videos, IMDb (http://www.imdb.com) and Rotten 1 Published online ahead of print March 18, 2013

Transcript of The Emergence of Opinion Leaders in a Networked Online … · Lu, Jerath, and Singh: Emergence of...

Page 1: The Emergence of Opinion Leaders in a Networked Online … · Lu, Jerath, and Singh: Emergence of Opinion Leaders in a Networked Online Community Management Science, Articles in Advance,

MANAGEMENT SCIENCEArticles in Advance, pp. 1–17ISSN 0025-1909 (print) � ISSN 1526-5501 (online) http://dx.doi.org/10.1287/mnsc.1120.1685

© 2013 INFORMS

The Emergence of Opinion Leaders in a NetworkedOnline Community: A Dyadic Model with Time

Dynamics and a Heuristic for Fast Estimation

Yingda LuLally School of Management and Technology, Rensselaer Polytechnic Institute, Troy, New York 12180,

[email protected]

Kinshuk Jerath, Param Vir SinghDavid A. Tepper School of Business, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213

{[email protected], [email protected]}

We study the drivers of the emergence of opinion leaders in a networked community where users establishlinks to others, indicating their “trust” for the link receiver’s opinion. This leads to the formation of a

network, with high in-degree individuals being the opinion leaders. We use a dyad-level proportional hazardmodel with time-varying covariates to model the growth of this network. To estimate our model, we useWeighted Exogenous Sampling with Bayesian Inference, a methodology that we develop for fast estimation of dyadicmodels on large network data sets. We find that, in the Epinions network, both the widely studied “preferentialattachment” effect based on the existing number of inlinks (i.e., a network-based property of a node) and thenumber and quality of reviews written (i.e., an intrinsic property of a node) are significant drivers of newincoming trust links to a reviewer (i.e., inlinks to a node). Interestingly, we find that time is an importantmoderator of these effects—intrinsic node characteristics are a stronger short-term driver of additional inlinks,whereas the preferential attachment effect has a smaller impact but it persists for a longer time. Our novelinsights have important managerial implications for the design of online review communities.

Key words : user-generated content; opinion leaders; social networks; network growth; proportional hazardmodel; weighted exogenous sampling

History : Received March 1, 2010; accepted August 29, 2012, by Sandra Slaughter, information systems.Published online in Articles in Advance.

1. IntroductionOpinion leaders—individuals who exert a consid-erable amount of influence on the opinions ofothers—are an important element in the diffusion ofinformation in a community (Gladwell 2000, Rogers2003). Motivated by the seminal work by Katz andLazarsfeld (1955), researchers have contributed to ourunderstanding of opinion leaders by systematicallyanalyzing how individuals emerge as opinion leadersin a community (Watts and Dodds 2007), how theyfacilitate the diffusion of information by their influ-ence on the opinions of others (Ghose and Ipeirotis2011, Iyengar et al. 2011, Van den Bulte and Joshi2007, Stephen et al. 2012), what the characteristics ofthese individuals are (Chan and Misra 1990, Myersand Robertson 1972), and how to identify them, pri-marily with the aim of marketing products throughthem (Valente et al. 2003, Vernette 2004).

With the advent of Web 2.0, websites where con-sumers voluntarily contribute product reviews, suchas Epinions (http://www.epinions.com), have pros-pered in the last few years. By sharing their own

opinions on these online forums, consumers influ-ence others’ opinions as well. An advantage of suchactivity being online is that it may be possible totrack the flow of influence among the members of thecommunity. For instance, Epinions employs a novelmechanism in which every member of this commu-nity can formally include members whose reviewsshe trusts in her “web of trust.” This leads to the for-mation of a network of trust among reviewers withhigh in-degree individuals being the opinion leaders.Various other websites that provide forums for user-generated content provide mechanisms of the abovenature under which users can extend links to otherusers whose opinions or content they value (amongother reasons for forming such links), thus leadingto a networked community. Examples of such web-sites include The Motley Fool (http://www.fool.com)and Seeking Alpha (http://www.seekingalpha.com)for sharing opinions on topics related to financial mar-kets, YouTube (http://www.youtube.com) for shar-ing videos, IMDb (http://www.imdb.com) and Rotten

1

Copyright:

INFORMS

holdsco

pyrig

htto

this

Articlesin

Adv

ance

version,

which

ismad

eav

ailableto

subs

cribers.

The

filemay

notbe

posted

onan

yothe

rweb

site,includ

ing

the

author’s

site.Pleas

ese

ndan

yqu

estio

nsrega

rding

this

policyto

perm

ission

s@inform

s.org.

Published online ahead of print March 18, 2013

Page 2: The Emergence of Opinion Leaders in a Networked Online … · Lu, Jerath, and Singh: Emergence of Opinion Leaders in a Networked Online Community Management Science, Articles in Advance,

Lu, Jerath, and Singh: Emergence of Opinion Leaders in a Networked Online Community2 Management Science, Articles in Advance, pp. 1–17, © 2013 INFORMS

Tomatoes (http://www.rottentomotoes.com) for shar-ing opinions on movies, yelp (http://www.yelp.com)for sharing information on local food and entertain-ment, and, last but not the least, social networks suchas Facebook (http://www.facebook.com).

Among thousands of heterogeneous online review-ers in such communities, which ones emerge asopinion leaders? How do their intrinsic characteris-tics versus their network-level characteristics influ-ence their statuses as opinion leaders? What are themajor factors that influence individuals’ considera-tion of other reviewers’ opinions over time? In thecontext of a networked community with links inthe network denoting opinion seeking,1 these essen-tially become questions regarding the factors influ-encing the evolution of the network. Therefore, weembed influence through opinion sharing in a net-work growth paradigm, and, using a unique dataset from Epinions, we investigate the emergence anddynamics of opinion leadership in a community.2

Several researchers have illustrated that networkstructure-based factors such as a node’s degree, reci-procity, and transitivity have a significant impacton the formation of ties (Barabasi and Albert 1999,Holland and Leinhardt 1972, Jones and Handcock2003, Merton 1968, Narayan and Yang 2007). A promi-nent theory is the “preferential attachment” theory,which suggests that nodes with more existing incom-ing links, as compared with nodes with fewer existingincoming links, have a higher probability of receivingadditional incoming links. However, the effect onnetwork formation of the intrinsic characteristics ofthe nodes themselves is understudied (with a fewnotable exceptions, e.g., Kossinets and Watts 2006,Stephen and Toubia 2009). In our context, character-istics of reviews written serve as natural node char-acteristics. (For instance, is a review written recently,and is it written comprehensively and objectively?)A main objective of our paper is to understand howthese intrinsic node characteristics influence networkevolution.

1 This method of employing the number of incoming links as aproxy for measuring opinion leadership is called the sociometricmethod and has been used widely before in sociology and market-ing (Burt 1999, Iyengar et al. 2011, King and Summers 1970). Thismethod fits our context well, because a larger number of incominglinks can lead to overall higher influence. Reviewers with a largernumber of incoming trust links are easier to find due to their net-work position. In addition, they are trusted by more members inthe community, and this also inspires confidence in the new read-ers, which makes it more likely that they will influence people whofind them. In totality, we can conclude that reviewers who havelarger number of incoming links are the ones with higher opinionleadership.2 A large literature exists on diffusion of information over an exist-ing network or in a community. Note, however, that our work dif-fers from the above because our focus is on the formation of theunderlying network itself.

One of the key features of online review com-munities is that the network structure and individ-ual behavior are dynamically changing over time.For example, over time, reviewers may receive newincoming trust links and also contribute new reviews,both of which increase their attractiveness to othermembers of the community. Compared with offlinesocial networks, the cost of changing structural andbehavioral characteristics is smaller in online settings,and therefore the dynamic properties may becomevery salient. As a result, how the time-changing char-acteristics of individuals influence the formation ofties is a question of great importance in understandinghow online review communities develop, especiallygiven the recent explosion in user-generated content.

To answer these research questions, we developa dyad-level proportional hazard model of networkgrowth and estimate it on the network of moviereviewers (in the “Movies” category) at Epinions.We find that whereas network structure-based factorssuch as preferential attachment and reciprocity aresignificant drivers of network growth, intrinsic nodecharacteristics such as the number of reviews writtenand textual characteristics such as objectivity, read-ability, and comprehensiveness of reviews are alsosignificant drivers of network growth. Interestingly,the recent number of reviews written by a reviewerhas a strong impact on the rate of increase of opinionleadership status for the individual, whereas the pastnumber of reviews written has no statistically signif-icant impact. In contrast, if we also divide the trust-based inlinks for a reviewer into recently obtainedinlinks and past inlinks, we find that both have a sta-tistically significant impact on the rate of increase ofopinion leadership status.

Taken together, our results show that time isan important moderator of the impact of node-based and network structure-based characteristics onthe tie formation process—node-based characteris-tics are significant short-term drivers of additionalinlinks, whereas the network structure-based pref-erential attachment effect is a longer-term but lesseffective driver of additional inlinks. This novel find-ing provides a deeper understanding of how opinionleaders emerge in online communities and contributesto the theory of generative models of large networks.This also has important managerial implications forthe design of opinion-sharing websites, which we dis-cuss later.

To add to the above substantive findings, we alsocontribute to the methodology of handling large-scalesocial network data sets. Review and reviewer char-acteristics change over the time period of our study,and time-varying covariates need to be taken intoaccount when modeling the growth of the social net-work. To deal with the overwhelming computational

Copyright:

INFORMS

holdsco

pyrig

htto

this

Articlesin

Adv

ance

version,

which

ismad

eav

ailableto

subs

cribers.

The

filemay

notbe

posted

onan

yothe

rweb

site,includ

ing

the

author’s

site.Pleas

ese

ndan

yqu

estio

nsrega

rding

this

policyto

perm

ission

s@inform

s.org.

Page 3: The Emergence of Opinion Leaders in a Networked Online … · Lu, Jerath, and Singh: Emergence of Opinion Leaders in a Networked Online Community Management Science, Articles in Advance,

Lu, Jerath, and Singh: Emergence of Opinion Leaders in a Networked Online CommunityManagement Science, Articles in Advance, pp. 1–17, © 2013 INFORMS 3

requirements of a dyad-level proportional hazardmodel with time-varying covariates, we develop anovel Markov chain Monte Carlo (MCMC) adapta-tion of the weighted exogenous sampling method-ology (Manski and Lerman 1977). Our WeightedExogenous Sampling with Bayesian Inference (WESBI)methodology reduces the time of estimation by anorder of magnitude, while still providing accurateestimates. Thus, our methodological contribution isthe development of a fast hierarchical Bayes infer-ence technique for estimating dyad-level networkgrowth models with time-varying covariates. We alsoextend the weighted exogenous sampling methodol-ogy from binary models to duration models. In theonline technical appendix to this paper (available athttp://ssrn.com/abstract=2206016), we report resultsof a comprehensive simulation study covering a largevariety of possible network structures characterizedby different parameter values. For each networkstructure, we show that by sampling a small propor-tion of the total observations, we can recover the truenetwork generating parameters with high accuracyusing WESBI.

The rest of this paper is organized as follows. In §2,we develop the theoretical foundations motivatingour empirical work. In §3, we develop a proportionalhazard model with time-varying covariates to esti-mate the effect of network and reviewer characteris-tics on social network evolution. In §4, we describethe Epinions data set we constructed and our variabledefinitions. In §5, we develop and explain our novelestimation methodology and present the estimationresults of the model on data from the “Movies” cate-gory at Epinions. In §6, we provide several extensionsand robustness checks for our basic model. In §7, weconclude by discussing the implications of our studyand potential future research.

2. Theoretical FoundationsIn this section, we provide theoretical justificationsfor the various concepts and constructs that weincorporate in our network-based model of opinionleadership.

In the past decade, sociologists, physicists, and com-puter scientists have empirically studied networksin such diverse areas as social networks, citationnetworks of academic publications, the World WideWeb network, email networks, router networks,etc. A property frequently identified in networksacross these domains is the “scale-free” property(Dorogovtsev and Mendes 2003). A network is saidto be “scale-free” if its degree distribution followsa power law at least asymptotically (Barabasi andAlbert 1999). Interestingly, we find that the web oftrust network at Epinions is also a scale-free net-work. The most widely accepted network growth

phenomenon that produces a scale-free network is thepreferential attachment (or “rich get richer”) process(Barabasi and Albert 1999). In the context of Epinions,the preferential attachment argument would implythat individuals who already have a high numberof inlinks would be proportionately more likely toreceive new inlinks. An explanation for why the pref-erential attachment effect is observed is that indi-viduals who possess social capital can leverage itto receive more social capital (Allison et al. 1982,Merton 1968). In a community of reviewers, high-status reviewers (ones with a high in-degree) wouldbe considered more attractive for those seeking opin-ions (Bonacich 1987, Gould 2002). This implies thatpeople would like to select high-status individuals,and this process will be self-reinforcing. Furthermore,by design, Epinions prominently displays the reviewsof reviewers with highest in-degrees (i.e., reviews ofthe reviewers with the most number of followers).This provides higher visibility to such reviewers andhence a greater chance of getting new links (Tuckerand Zhang 2010). Motivated by the above arguments,we incorporate the preferential attachment process inour model by assuming that the probability that indi-vidual A forms a link with individual B increases withB’s in-degree.

In social psychology, another network phenomenoncalled dyad-level reciprocity has been considered asone of the key drivers of link formation in networks(Fehr and Gachter 2000, Iacobucci and Hopkins 1992).Reciprocity refers to responding to a positive actionof another individual by a positive action toward thatindividual (Katz and Powell 1955). In the context ofEpinions, we incorporate reciprocity in our model byassuming that individual A is more likely to put indi-vidual B in her web of trust if B has already put A inher web of trust.

Although preferential attachment and reciprocityare network-based effects (node-level and dyad-leveleffects) and have been considered to be importantdrivers of network evolution, they fail to explainmany network dynamics that one observes. For in-stance, an underlying problem with the preferentialattachment framework is that it does not explain whya person could be replaced by another as an opinionleader over time. If the preferential attachment werethe only mechanism, we would expect that a personwith a large number of incoming links will receive aproportionally larger fraction of new incoming links.In other words, an opinion leader will continue asan opinion leader forever without exerting substan-tial effort (even though new opinion leaders mayemerge). A simple examination of the Epinions dataillustrates that this is not the case—specifically, aftera popular reviewer becomes inactive for a while, the

Copyright:

INFORMS

holdsco

pyrig

htto

this

Articlesin

Adv

ance

version,

which

ismad

eav

ailableto

subs

cribers.

The

filemay

notbe

posted

onan

yothe

rweb

site,includ

ing

the

author’s

site.Pleas

ese

ndan

yqu

estio

nsrega

rding

this

policyto

perm

ission

s@inform

s.org.

Page 4: The Emergence of Opinion Leaders in a Networked Online … · Lu, Jerath, and Singh: Emergence of Opinion Leaders in a Networked Online Community Management Science, Articles in Advance,

Lu, Jerath, and Singh: Emergence of Opinion Leaders in a Networked Online Community4 Management Science, Articles in Advance, pp. 1–17, © 2013 INFORMS

number of additional incoming links that she obtainsin every period decreases dramatically.

We argue that a node’s “content” (i.e., nonnet-work characteristics) can help us explain such dynam-ics. For instance, if an opinion leader becomes inac-tive and stops writing reviews, others will prefer toseek the opinion of a reviewer who is active andprovides fresh information. In other words, time islikely to be an important moderator of the impacton opinion leadership of node characteristics such asthe number of reviews contributed by an individual.Although the total number of reviews should have animpact because more reviews provide more informa-tion, recently written reviews are likely to have higherimpact because they are more likely to provide newinformation. For instance, new reviews are likely tobe about new items for which few reviews exist, ormay provide newer insights on old items. (Stephenet al. (2012) suggest similar reasoning in an onlinediffusion context.) To understand this, we dividethe reviews written by every reviewer into “recentreviews” (written in the last time period, which is onemonth) and “past reviews” (older than one month)and assess their impact separately. To simultaneouslyunderstand whether time also moderates the impactof preferential attachment, we divide the trust linksobtained by a reviewer into those obtained recently(within the last one month) and those obtained in thepast (older than one month). We can expect recentreviews to significantly influence the current rate ofincoming links and past reviews to not. We can alsoexpect preferential attachment to have a significantinfluence. However, this is still an empirical question(especially the magnitudes of these effects), which weanswer using our formal model.

A related stream of literature has established thatthe attributes of a review such as its readability andcomprehensiveness may affect a reader’s response tothe product and the perception of the reviewer (Ghoseand Ipeirotis 2011, Kim and Hovy 2006, Liu et al.2007, Otterbacher 2009, Zhang and Varadarajan 2006).Reviewers may also express their subjective opinionsor objective facts, and a mix of both may be mostpreferred. In other words, textual characteristics ofreviews can influence opinion leadership, and we testthis formally as well.

Finally, relationships between individuals offline areoften characterized by homophily, which refers to atendency for people who belong to the same demo-graphic or social category, such as age or gender, to beconnected to each other (McPherson et al. 2001). Thereis some uncertainty about the extent to which sharinga demographic or social category produces homophilyin an online context (Van Alstyne and Brynjolfsson2005); it appears that although similarity in demo-graphic categories does not lead to tie formation in an

online context, similarity in certain latent constructs(as measured by expressed characteristics in reviews)leads to tie formation. In the context of Epinions, theexpressed characteristics to measure homophily couldbe the review writing styles. We expect that those pairsof individuals who have similar review writing styleswould be more likely to form ties with each other, andwe incorporate this into our model.

3. Model DevelopmentWe develop a stochastic network growth model con-ceptualized at the dyad level with directional ties.3

Because networks evolve over time, network tie for-mation data is typically right censored. Hence, insteadof modeling tie formation as a discrete-choice process,we model it as a timing process by using a propor-tional hazard model (Greene 2003), that is, there is abaseline hazard rate for tie formation, moderated bydyad- and direction-specific quantities. We describethis below.

Consider the formation of a directed tie from indi-vidual i to individual j . We use the time period forwhich both individuals i and j have been present inthe community as the starting point of the timing pro-cess for this potential tie, and denote the time fromthe start to the current time as t. The hazard rate fortie formation from i to j is denoted as

�ij4t5= �04t5exp8Vijt90

In the above, �04t5 is the baseline hazard rate attime t, which describes the inherent propensity of twoindividuals to form a link without considering otherfactors. We assume that �04t5 follows a Weibull distri-bution to allow for a flexible baseline hazard rate:

�04t5= �0�1t�1−11 where �01�1 > 00

The quantity exp8Vijt9 increases or decreases the base-line hazard rate for the formation of a directedtie from i to j at time t, based on the values oftime-varying dyad- and direction-specific covariates.We interpret Vijt as the “adjustment factor” for thelatent propensity of a node i to extend a link tonode j at time t, conditional on this not having hap-pened yet. This conditional probability of i linking toj increases with Vijt , and it incorporates the variouscovariates that are expected to influence link forma-tion (based on the theory discussed in the previoussection). We let zijt be the set of sender-, receiver-and dyad-specific covariates for the dyad ij at time t.

3 Some other papers that develop stochastic models for networkphenomena include those by Ansari et al. (2011), Braun and Bon-frer (2011), Handcock et al. (2007), Hoff et al. (2002), Robins et al.(2007), and Snijders et al. (2006).

Copyright:

INFORMS

holdsco

pyrig

htto

this

Articlesin

Adv

ance

version,

which

ismad

eav

ailableto

subs

cribers.

The

filemay

notbe

posted

onan

yothe

rweb

site,includ

ing

the

author’s

site.Pleas

ese

ndan

yqu

estio

nsrega

rding

this

policyto

perm

ission

s@inform

s.org.

Page 5: The Emergence of Opinion Leaders in a Networked Online … · Lu, Jerath, and Singh: Emergence of Opinion Leaders in a Networked Online Community Management Science, Articles in Advance,

Lu, Jerath, and Singh: Emergence of Opinion Leaders in a Networked Online CommunityManagement Science, Articles in Advance, pp. 1–17, © 2013 INFORMS 5

Then, the above can be written as Vijt = zijtÂ, where is the vector of coefficients for zijt . We discuss indetail the different covariates included in zijt in §4. Asan example at this point, note that we can incorporatethe preferential attachment process by including thecovariate Degreejt , which is the in-degree of node j attime t. (In other words, if the coefficient for Degreejtis larger, then the probability of i extending a tie to jat time t is higher.)

Whereas the above incorporates observed charac-teristics, we also need to control for unobserved char-acteristics in a dyad. For example, the sender nodescould be inherently more active (or passive), and thereceiver nodes could be inherently more attractive(or unattractive). To account for this, we incorpo-rate node-specific unobserved effects as Vijt = zijt +

ai + bj , where ai is the sender-specific unobservedrandom effect (that accounts for the “activity rate” ofnode i5, and bj is the receiver-specific unobserved ran-dom effect (that accounts for the “attractiveness” ofnode j5. The sender- and receiver-specific effects ofthe same individual are allowed to be correlated witheach other as

(

aibi

)

∼ MVN(

01[

�2a �ab

�ab �2b

])

0

Furthermore, the extant sociology literature consid-ers homophily as a key driver of link formation in asocial network (McPherson et al. 2001), which impliesthat links are formed between similar individuals.We explicitly incorporate both observed and unob-served homophily in our model. The observed sim-ilarity in behavior is captured using dyad-specificvariables in zijt , and the unobserved dyad-specifichomophily is captured by using a dyad-specific unob-served random effect, dij , as Vijt = zijt + ai + bj +

dij , where dij ∼ MVN401�2d 5.

4 Furthermore, we assumethat the dyad-specific unobserved effects are symmet-ric, that is, dij = dji.

We can present Vijt above in a simplified mannerby aggregating the random effects together with thecorresponding covariates as

Vijt = 4xitÂi+ ai5+ 4xjtÂ

j+ bj5+ 4xijtÂ

ij+ dij51

where zijt = 6xit xjt xijt7, and Âi contains coefficientsfor sender-specific covariates, Âj contains coefficientsfor receiver-specific covariates and Âij contains coeffi-cients for dyad-specific covariates. Therefore, xitÂ

i +aiis the sender effect, xjtÂ

j +bj is the receiver effect, andxijtÂ

ij + dij is the dyad effect.

4 A richer approach for capturing unobserved homophily is tocluster individuals in multidimensional space representing latentcharacteristics. See Braun and Bonfrer (2011) for an excellentapplication.

Figure 1 Illustration of Link Formation Time and Censoring TimeUsed in the Model

i puts j in herweb of trust

Reviewer i enterscommunity

Reviewer j enterscommunity

End of observationwindow

Cij = Cji

Tij

Note. In this figure, ij = 1 and kij = floor(Tij ), and j i = 0 and kj i = Cj i .

We now derive the conditional likelihood func-tion for the above model. We fix the unit of timein our model as one month. Our data is right cen-sored because we do not observe whether ties areformed or not after the end of our observation timewindow. Let Cij be the number of time periods forwhich dyad ij has been observed, and let Tij be thelength of time from the starting point to the timeperiod when i extends a tie to j . (Note that Cij andCji are always equal, but Tij is, in general, differentfrom Tji.) We define ij = 1 if Tij ≤ Cij (i.e., if a tieformed within the observation time) and 0 otherwise,and kij = floor4min8Tij1Cij95. We present this graphi-cally in Figure 1.

Using the notation above, the log-conditional-likelihood function (i.e., conditional on knowing aspecific directed dyad’s latent parameters) can bewritten as

logL

=∑

i1j 6=i

{

ij ·log61−exp8−exp6�4kij5+zij1kijÂ+ai+bj+dij797

kij−1∑

t=0

exp6�4t5+zijtÂ+ai+bj +dij 7

}

1 (1)

where �4t5= log8∫ t+1t

�04u5du9. (See Appendix A forthe detailed procedure of deriving this expression.)

Before we proceed further, we make a few notes.First, the above model does not account for unob-served heterogeneity either in the baseline hazard rateor in the coefficients for the covariates. We use thissimple (yet still quite rich) model for our basic analy-sis and then extend it in Online Technical Appendix Bto include both kinds of heterogeneity above. Second,a key distinction of our model from most otherstochastic models of network growth (including theBarabasi and Albert (1999) framework) is that thosemodels typically do not predict which and whentwo nodes will form a link, whereas we model thisexplicitly. Finally, although our model above sharessome commonalities with those of Hoff (2005) andNarayan and Yang (2007), we extend their models in

Copyright:

INFORMS

holdsco

pyrig

htto

this

Articlesin

Adv

ance

version,

which

ismad

eav

ailableto

subs

cribers.

The

filemay

notbe

posted

onan

yothe

rweb

site,includ

ing

the

author’s

site.Pleas

ese

ndan

yqu

estio

nsrega

rding

this

policyto

perm

ission

s@inform

s.org.

Page 6: The Emergence of Opinion Leaders in a Networked Online … · Lu, Jerath, and Singh: Emergence of Opinion Leaders in a Networked Online Community Management Science, Articles in Advance,

Lu, Jerath, and Singh: Emergence of Opinion Leaders in a Networked Online Community6 Management Science, Articles in Advance, pp. 1–17, © 2013 INFORMS

many ways—most importantly, we incorporate time-varying covariates, which they do not.

4. Data4.1. Data DescriptionEpinions allows reviewers to post reviews, and allowsthem to put other reviewers whom they trust in theirweb of trust. Reviews are organized by product cate-gories, such as movies, cars, books, music, electronics,home and garden, etc. Reviews in different productcategories may have different properties, and commu-nities focusing on different products may have dif-ferent preferences. For example, reviews that focusprimarily on objective details of products may be pre-ferred for electronics but not to the same extent formovies. To avoid mixing the different preferences ofpeople reading and writing reviews in different prod-uct categories, we focus on the “Movies” review com-munity. We further restrict our focus on registeredmembers who have written at least one review on anymovie to ensure that the individuals in our data setindeed have an expressed interest in movies. We relaxthese constraints on data collection later in §6.

To crawl our data on the network of movie review-ers, we first constructed a comprehensive list offeature films released between 1888 and 2008 aslisted on http://www.imdb.com/year, and took theintersection of this list with movies reviewed onEpinions. This process gave us 19,851 movie titles.Next, we searched for all reviews written for any ofthese movies on Epinions and constructed the list ofreviewers who wrote these reviews. From this list,we selected reviewers who registered at Epinionsbetween January 2002 and December 2008.5 For eachof these reviewers, we collected data on which othersthey added in their web of trust and at what time,and constructed the full network of trust among thesereviewers. In addition, for each reviewer, we crawledthe full text of each review she wrote and the datewhen it was written.

The resulting data set contained 6,705 reviewerswith 2,315 ties among them (out of 44,950,320 dyads)and a total of 27,634 reviews written. We furtherdivided this data set into a calibration sample anda holdout sample. The calibration sample containedreviewers who entered the movie community betweenJanuary 2002 and December 2005 (5,180 reviewers whoformed 1,906 ties with each other and wrote 21,049reviews). The holdout sample, employed to evalu-ate the model’s predictive performance, consisted of

5 We consider individuals who started their activity only after 2002because the information about the dates when web of trust tiesbetween individuals were formed is not available for ties beforeJanuary 2002, which leads to a left-censoring problem.

reviewers who entered the movie community betweenJanuary 2006 and December 2008 (1,525 individualswho formed 160 ties with each other and wrote 6,585reviews).

4.2. Variable DescriptionAs stated before, the variables that we employ can bedivided into three categories: receiver-specific covari-ates, sender-specific covariates, and dyad-specificcovariates.

4.2.1. Receiver-Specific Covariates. This categoryconsists of variables that provide information regard-ing the intended receiver of a potential tie, andincludes the aggregate number of reviews writtenuntil time t−1, the additional number of reviewswritten at time t, the total number of incominglinks until time t−1, the additional incoming links attime t, and the average comprehensiveness, readabil-ity, and objectivity scores across all reviews writtenuntil time t by the receiver. Among these variables,the total number of incoming links until time t−1and the additional incoming links at time t are mea-sures of the opinion leadership status of this receiverat time t. If the preferential attachment effect is promi-nent in our data, then the coefficients for these vari-ables will be positive and significant. The aggregatenumber of reviews written until time t−1 is used tomeasure how active a reviewer has been until timet−1. The “recency” variable, constructed as the addi-tional number of reviews written in time period t(i.e., in the last one month), measures how active areviewer was in the most recent period.

We use the text mining tool LingPipe (Alias-I 2008)to process the texts of the reviews and obtain textproperties such as comprehensiveness, readability,and objectivity for each review. We use the numberof sentences in the text of a review as an indicator ofthe Comprehensiveness of the review—generally longertexts contain more information, and thus are expectedto be more comprehensive (Otterbacher 2009).

We measure the Readability of a review by measur-ing the complexity of its writing style by calculatingthe Gunning fog index (GFI) of the text of the review.This is a widely used measure in linguistics (DuBay2004), and is calculated using the following formula:

Readability = GFI=004∗(average sentence length+number of hard words for each 100 words)1

where a “hard” word is defined as a word with morethan two syllables. Note that a larger value of Read-ability for a review implies that the review is harderto read.

To calculate the Objectivity of each review, we fol-low Pang and Lee (2004) and classify each sentence

Copyright:

INFORMS

holdsco

pyrig

htto

this

Articlesin

Adv

ance

version,

which

ismad

eav

ailableto

subs

cribers.

The

filemay

notbe

posted

onan

yothe

rweb

site,includ

ing

the

author’s

site.Pleas

ese

ndan

yqu

estio

nsrega

rding

this

policyto

perm

ission

s@inform

s.org.

Page 7: The Emergence of Opinion Leaders in a Networked Online … · Lu, Jerath, and Singh: Emergence of Opinion Leaders in a Networked Online Community Management Science, Articles in Advance,

Lu, Jerath, and Singh: Emergence of Opinion Leaders in a Networked Online CommunityManagement Science, Articles in Advance, pp. 1–17, © 2013 INFORMS 7

in the review as an objective or a subjective sen-tence (automated using a high-accuracy support vec-tor machine classifier pretrained for movies on a largemovie data set, developed by Pang and Lee 2004).In this case we follow the standard definition in themachine learning community—an objective sentenceis one that talks about the plotline of the movie, andall other sentences are classified as subjective. Subse-quently, the Objectivity of a review is defined as thetotal number of objective sentences divided by thetotal number of sentences in a review.

Epinions designates certain reviewers as “TopReviewers” and displays this label next to their pro-file. It is reasonable to expect that reviewers witha rank label will obtain more trust links. We there-fore include a covariate, Is Top Reviewer, which indi-cates the rank of a reviewer. Note that viewers donot know the exact rank of each reviewer and onlyobserve whether the reviewer is a “top 10,” “top 100,”or “top 1,000” reviewer, or not a top reviewer at all.Therefore, we code the values of this covariate as 3, 2,and 1 if the reviewer is in the top 10, top 100, or top1,000, respectively, and 0 if the reviewer is not on the“Top Reviewer” list (i.e., we code the rank variableon a log scale based on the range in which the truerank falls).

4.2.2. Sender-Specific Covariates. This categoryconsists of variables that provide information on thesender of a potential tie, and includes the aggre-gate number of reviews written until time t and thetotal number of outgoing links from this sender untiltime t. These variables are employed to control forhow active a sender is. We would expect that senderswho were more active in the past have a higher prob-ability of extending links to other reviewers at a givenpoint in time.

4.2.3. Dyad-Specific Covariates. This categoryconsists of variables that provide information regard-ing the dyad in question and includes measuresfor reciprocity, homophily, and commonly trustedreviewers between the two individuals in the dyad.In our research, we measure reciprocity as a binaryvariable. If tie from j to i already exists at time t,the reciprocity variable equals 1, and 0 otherwise.We include the absolute differences in averagereadability, average objectivity, and average com-prehensiveness of the reviews written by i and j asobservable measures of homophily.

Finally, if the sender and receiver are connectedto the same nodes then, as past research has shown,there is a higher chance of a link being formed (Hillet al. 2006). Therefore, we include as a covariate thenumber of commonly trusted reviewers between thesender and the receiver. Note that whereas our corehazard process treats dyads as independent, introduc-ing this covariate relaxes that assumption.

In Table 1, we provide the variable definitions anddescriptive statistics for these variables for our datafor the “Movies” community.

5. Estimation and Results5.1. Estimation Methodology: WESBIWe have 5,180 individuals in our calibration data set,which generates 26,827,220 dyads. Because we needto calculate the hazard rate for each of 48 time periodsfor each dyad (January 2002 to December 2005), thetotal amount of computation is very time expensive.This is a challenge that is often encountered in largescale dyad-level studies of networks (e.g., Braun andBonfrer 2011).

We develop a new methodology to meet the gapbetween the huge amount of data that needs to beprocessed and the limited computing power at ourdisposal. One of the key characteristics of our dataset is that the proportion of the dyads that actuallyform a tie is very small—only 1,906 ties are formedout of the nearly 27 million ties possible. To strikea balance between accurate estimation and compu-tation time, we adapt the weighted exogenous sam-pling maximum likelihood estimator first developedin the choice-based sampling literature by Manski andLerman (1977) for discrete-choice data. We extend theweighted exogenous sampling concept to timing dataand also develop a Bayesian inference procedure forestimation and name our technique Weighted Exoge-nous Sampling with Bayesian Inference.

To employ this method, we collect all of the dyadsthat actually form ties within the observation timewindow and randomly sample from the dyads thatdo not form a tie within the observation time win-dow. By aggregating these two sets of dyads, we con-struct a much smaller data set (we call this smallerdata set as the sampled data set). And then, instead ofmaximizing the expression in Equation (1), we use thefollowing weighted log-conditional-likelihood functionfor Bayesian inference over our new data set:

logL

=w1

(

4ij=15

{

log[

1−exp8−exp6Á4kij5

+zij1kijÂ+ai+bj +dij 79

]

kij−1∑

t=0

exp[

Á4t5+zijtÂ+ai+bj +dij]

})

+w0

(

4ij=05

{

kij−1∑

t=0

exp6Á4t5+zijtÂ+ai+bj +dij 7

})

1 (2)

where w1 and w0 are the weights of the log-condi-tional-likelihood functions for the ties that were

Copyright:

INFORMS

holdsco

pyrig

htto

this

Articlesin

Adv

ance

version,

which

ismad

eav

ailableto

subs

cribers.

The

filemay

notbe

posted

onan

yothe

rweb

site,includ

ing

the

author’s

site.Pleas

ese

ndan

yqu

estio

nsrega

rding

this

policyto

perm

ission

s@inform

s.org.

Page 8: The Emergence of Opinion Leaders in a Networked Online … · Lu, Jerath, and Singh: Emergence of Opinion Leaders in a Networked Online Community Management Science, Articles in Advance,

Lu, Jerath, and Singh: Emergence of Opinion Leaders in a Networked Online Community8 Management Science, Articles in Advance, pp. 1–17, © 2013 INFORMS

Table 1 Variable Definitions and Descriptive Statistics

DescriptiveVariables Definition statisticsa

Receiver characteristicsReceiver’s

PrevAggReviewThe aggregate number

of reviews writtenuntil time t−1

1034 (5.15)

Receiver’s CurReview The additional numberof reviews written attime t

0007 (0.61)

Receiver’sPrevAggOpnLeadership

The aggregate numberof incoming linksuntil time t−1

0068 (15.61)

Receiver’sCurOpnLeadership

The additional numberof incoming links attime t

0002 (0.40)

Comprehensiveness The averagecomprehensivenessof reviews until time t

14041 (17.94)

Objectivity The average objectivityof reviews until time t

0021 (0.21)

Readability The average readabilityof reviews until time t

14006 (11.90)

Top Reviewer Label The rank of the receiveras reviewer on “TopReviewer” list attime t

0.0101 (0.1002)

Sender characteristicsSender’s AggReview The aggregate number

of reviews writtenuntil time t

1041 (5.32)

Sender’sAggOutgoingLink

The aggregate numberof incoming linksuntil time t

0071 (15.63)

Dyad characteristicsDissimilarity in

ObjectivityThe absolute difference

between averageobjectivity of reviewsby sender andreceiver until time t

0002 (0.08)

Dissimilarity inComprehensiveness

The absolute differencebetween averagecomprehensivenessof reviews by senderand receiver untiltime t

1084 (7.95)

Dissimilarity inReadability

The absolute differencebetween averagereadability of reviewsby sender andreceiver until time t

1079 (9.43)

Reciprocity Whether the link fromreceiver to senderexists at time t

0.0003 (0.0160)

Commonly TrustedReviewers

The number ofreviewers trusted byboth sender andreceiver at time t

0.0022 (0.0697)

aNumbers outside parentheses are the means for the “Movies” data set,and those in parentheses are the corresponding standard deviations.

formed and the ties that were not formed, respec-tively. Here, w0 = 41−Q15/41−H15 and w1 =Q1/H1,where Q1 is the fraction of the ties formed in thewhole population, and H1 is the fraction of the tiesformed in the sampled data set.

We estimate the parameters of our model by usingan MCMC hierarchical Bayes estimation procedure,using a Gibbs sampler and the Metropolis–Hastingsalgorithm. The full estimation procedure is providedin Appendix B. In Online Technical Appendix A,we show, using a comprehensive simulation study,that the WESBI method can accurately recover modelparameters in a wide range of settings. Specifically,we find that sampling 10% of the empty dyads (andusing all the dyads that actually formed ties) workswell. Therefore, for the Epinions data set, we sampled10% of the dyads that did not form a link during ourobservation window. This final sampled data set has1,906 established ties and 2,682,531 pairs that did notform a tie. Whereas the estimation for the full data setrequires us to compute the likelihood of tie formationfor 26,827,220 pairs given parameter values in eachMCMC iteration, now we only need to evaluate thelikelihood of tie formation for 2,684,437 pairs in thesampled data set. Commensurate with this reductionin data, we reduce the estimation time by one orderof magnitude while still obtaining accurate parameterestimates.

We highlight WESBI as a powerful estimationmethodology that can be used for speedy and accu-rate estimation in other dyad-level network studiesas well. Nevertheless, it is advisable for future usersof WESBI to test its accuracy in settings that differwidely from those presented in our simulation results.

5.2. Estimation ResultsWe estimated our model in Matlab using the proce-dure in Appendix B. To reduce the autocorrelationbetween draws of the Metropolis–Hastings algorithmand to improve the mixing of the Markov chains, weused an adaptive Metropolis adjusted Langevin algo-rithm (Atchade 2006). We used the first 100,000 drawsfor burn-in and the last 25,000 to calculate the pos-terior distributions. To assess the convergence of theMarkov chains, we ran multiple chains using a setof overdispersed starting values and calculated thewithin-chain variance as well as between-chain vari-ance for the chains for each parameter. The result-ing scale reduction factor (Gelman et al. 2003) foreach parameter is very close to 1. In the first col-umn in Table 2, we present the posterior means of thecoefficients for the data for the “Movies” community.For the estimation, we standardized the values of allcovariates. We discuss these results below.

Copyright:

INFORMS

holdsco

pyrig

htto

this

Articlesin

Adv

ance

version,

which

ismad

eav

ailableto

subs

cribers.

The

filemay

notbe

posted

onan

yothe

rweb

site,includ

ing

the

author’s

site.Pleas

ese

ndan

yqu

estio

nsrega

rding

this

policyto

perm

ission

s@inform

s.org.

Page 9: The Emergence of Opinion Leaders in a Networked Online … · Lu, Jerath, and Singh: Emergence of Opinion Leaders in a Networked Online Community Management Science, Articles in Advance,

Lu, Jerath, and Singh: Emergence of Opinion Leaders in a Networked Online CommunityManagement Science, Articles in Advance, pp. 1–17, © 2013 INFORMS 9

Table 2 Parameter Estimates for Networks of Different Communities

Variables Movies Expanded network Cars Home and garden

Receiver characteristicsReceiver’s PrevAggReview 001278 001094 001578 001889Receiver’s CurReview 008981∗∗∗ 005361∗∗∗ 005997∗∗∗ 005046∗∗∗

Receiver’s PrevAggOpnLeadership 004283∗∗∗ 003596∗∗∗ 004996∗∗∗ 003370∗∗∗

Receiver’s CurOpnLeadership 003048∗∗ 002167∗∗∗ 003710∗∗∗ 002961∗∗∗

Comprehensiveness 003681∗ — 001668∗ 000920Objectivity −001706 — — —Readability −001319 — 000855 −001537(Comprehensiveness)2 −004609∗∗∗ — −003571∗∗∗ −003302∗∗∗

(Objecivity )2 −001147 — — —(Readability )2 −005193∗∗∗ — −003886∗∗ −002408∗∗∗

Top Reviewer Label 001478∗∗∗ 001845∗∗∗ 001939∗∗∗ 001648∗∗∗

Sender characteristicsSender’s AggReview 003178∗ 000899 001636 003315∗

Sender’s AggOutgoingLink 001873 002604∗ 002876∗ 001311

Dyad characteristicsDissimilarity in Comprehensiveness −001695∗ — −002447∗ −002875∗∗

Dissimilarity in Objectivity −002079∗ — — —Dissimilarity in Readability −000583 — −001866 −001683∗∗

Reciprocity 003007∗∗∗ 001379∗∗∗ 003679∗∗ 003488∗∗∗

Commonly Trusted Reviewers 002059∗∗∗ 001705∗ 002884∗∗∗ 002224∗∗∗

Hazard rate parametersLog4a05 −1307542∗∗∗ −1704816∗∗∗ −1304542∗∗∗ −1205319∗∗∗

Log4a15 −500568 −408296 −404363 −308514� 2d 001232∗∗∗ 001941∗∗∗ 002006∗∗∗ 003232∗∗∗

� 2a 003650∗∗∗ 003586∗∗∗ 003883∗∗∗ 002846∗∗∗

� 2b 002615∗∗∗ 002205∗∗∗ 004325∗∗∗ 001823∗∗∗

�ab 001068∗∗∗ 001593∗∗∗ 001849∗∗∗ 001072∗∗∗

∗, ∗∗, ∗∗∗ The 90%, 95%, and 99% credible intervals, respectively, do not include zero.

5.2.1. Receiver-Specific Effects. We find that thecoefficients for opinion leadership (both PrevAggOpn-Leadership and CurOpnLeadership) are positive and sig-nificant. This offers evidence for the traditional pref-erential attachment argument where individuals withmore incoming links have a higher probability ofreceiving additional incoming links in the currentperiod, given everything else equal. The coefficientsfor the impact of reviews written by a receiver tellan interesting story. The coefficient of the number ofreviews written in the current period (CurReview) ispositive and significant, whereas the coefficient forthe total number of reviews written until the pre-vious period is insignificant (PrevAggReview). Intu-itively, this indicates that only recent reviews boosta reviewer’s reputation and attract other individualsin the community to put her in their respective websof trust. On the other hand, the reviews written ear-lier do not influence others’ decisions of extendingtrust links to her, and do not contribute to the emer-gence or the maintenance of opinion leadership. Note,however, that the coefficient for CurReview is largerthan the coefficients for both PrevAggOpnLeadershipand CurOpnLeadership.

Taken together, these results tell an interestingstory—recent review activity is a stronger driver of

opinion leadership status than preferential attach-ment, but preferential attachment is a permanenteffect, whereas past review writing activity does nothave a significant effect. This is likely because trustlinks are not explicitly dated and therefore get val-ued as endorsements even if a long time has passed,whereas reviews become less valuable as the noveltyof information they provide reduces as time passes.Therefore, existing opinion leaders (those who havea large number of inlinks) are at an advantage interms of maintaining their position in the network.Contributing new content can also boost an individ-ual’s opinion leadership status; however, this effect isshort lived. If new content leads to new trust linksquickly, then these added inlinks will contribute tofuture opinion leadership increase through the pref-erential attachment effect.

A review’s textual characteristics also have a signif-icant impact on the emergence of opinion leadership.The coefficient for Comprehensiveness is significantand positive, and that of the squared term ofComprehensiveness is significant and negative. Thisindicates that members of the movie review com-munity have an inverse-U-shaped preference wherereviews that are somewhat longer than average lengthare most preferred, whereas reviews that are either too

Copyright:

INFORMS

holdsco

pyrig

htto

this

Articlesin

Adv

ance

version,

which

ismad

eav

ailableto

subs

cribers.

The

filemay

notbe

posted

onan

yothe

rweb

site,includ

ing

the

author’s

site.Pleas

ese

ndan

yqu

estio

nsrega

rding

this

policyto

perm

ission

s@inform

s.org.

Page 10: The Emergence of Opinion Leaders in a Networked Online … · Lu, Jerath, and Singh: Emergence of Opinion Leaders in a Networked Online Community Management Science, Articles in Advance,

Lu, Jerath, and Singh: Emergence of Opinion Leaders in a Networked Online Community10 Management Science, Articles in Advance, pp. 1–17, © 2013 INFORMS

long or too short are less preferred. The coefficient ofthe linear term of Readability is insignificant, whereasthe coefficient of the squared term of Readability is neg-ative and significant. This indicates that reviews withan average value of Readability are most preferred,whereas very simple or naïve reviews and very hardto read reviews are less preferred. The Objectivity ofa review does not have an impact, possibly becausereaders may have varied preferences for objective ver-sus subjective reviews, leading to an overall null effect.We also find that a top reviewer label has a significantand positive impact on link formation in a dyad.

5.2.2. Sender-Specific Effects. We find that theaggregate number of reviews written by a sender(AggReview) has positive and significant impact onthe probability that the sender extends ties to otherindividuals, which may indicate that there are somereviewers who are more involved in the community—they write reviews as well as develop their webof trust.

5.2.3. Dyad-Specific Effects. We find that reci-procity has a positive and significant impact on theformation of network ties, which is in agreement withmany other studies. Our results for the dissimilarityof textual characteristics between two reviewers alsosupport the traditional homophily argument. This isclear from the negative coefficients for dissimilarity ofcomprehensiveness and objectivity. We also find thatthe number of commonly trusted reviewers has a sig-nificant and positive impact on the formation of a linkin a dyad.

5.2.4. Baseline Hazard Rate. From the hazardrate parameters in Table 2, we can see that, asexpected, the general tendency of forming linksis relatively small in this online community (�0 =

1006×10−65. Furthermore, we find that the reviewers’baseline hazard rate of forming links decreases overtime 4�1 =0000645, which is similar to the effect ofdecreasing activity over time typically observed forindividual-level activity in the customer-base analysisliterature (e.g., Fader et al. 2005).

5.2.5. Unobserved Random Effects. The fact that�a, �b, and �d are significant indicates that randomeffects at the sender, receiver, and dyad levels existin the community, above and beyond the covariatesthat we use in our model. Moreover, �ab is significantand positive, which suggests that reviewers who areintrinsically more attractive are also more active inextending links to others.

5.3. Model PerformanceTo test the performance of our model, we usetwo alternative models as benchmarks: (1) a time-invariant hazard model with all covariates (as inNarayan and Yang 2007) and (2) a time-varying

hazard model with only network characteristics(and no node-level characteristics) as covariates (i.e.,Receiver’s PrevAggOpnLeadership, Receiver’s CurOpn-Leadership, Sender’s AggOutgoingLink, Reciprocity, andCommonly Trusted Reviewers). Traditional model per-formance statistics that provide accuracy measuresaveraged over all dyads cannot serve as good mea-sures because the ties formed in the network areextremely sparse.6 Hunter et al. (2008) proposed pro-cedures to evaluate how well a model fits real data ina social network context based on key structural prop-erties of the network. Hunter et al. (2008) proposeddegree distribution, dyadwise shared partner distri-bution, and the distribution of geodesic distances astest statistics to assess the goodness of fit of social net-work data. However, Hunter et al. (2008) proposedthese statistics for an undirected network. Because wedeal with a directed network, we use in-degree distri-bution, dyadwise commonly trusted reviewer distri-bution, and the distribution of geodesic distances asour model fit statistics. All the test statistics we reportin this section are with respect to the holdout sample.

We first calculate the values of the test statistics forthe holdout period of the actual network. We thensimulate tie formation in the holdout period using ourfull model and the two benchmark models. We calcu-late the test statistics for each model by running thesimulation 200 times. We compare the distributionsobtained from our full model and the two benchmarkmodels with the true distributions in Figure 2. In eachfigure, the x-axis depicts the test statistic, whereasy-axis depicts the percentage of individuals or dyadscorresponding to the test statistic in the holdout sam-ple (on a log scale). The solid black dots represent thetest statistic from the actual data set, and the boxesand whiskers represent the corresponding statisticsacross the simulated data sets. The whisker representsthe upper and lower limits of the 200 correspondingsimulated network statistics. The box represents the25th and 75th percentiles. If a box is missing for a spe-cific value of a network characteristic, it indicates thatthere is not even a single corresponding observationacross 200 networks. (For example, in the first panelin Figure 2(a), the box for in-degree ≥4 is missing.This indicates that among the 200 simulated networksfor the time-invariant hazard model, no network hasa node with in-degree that is ≥4.)

From Figure 2(a), we can see that the in-degree dis-tribution from the full model is very close to thatfor the actual network. In contrast, the time-invarianthazard model shows significant deviations from theobserved distribution for in-degree ≥3, and for the

6 Even a naïve model that predicts that no pairs form ties has anaccuracy of 99.99%, as only 160 ties are formed among 2,324,100possible pairs in the holdout sample.

Copyright:

INFORMS

holdsco

pyrig

htto

this

Articlesin

Adv

ance

version,

which

ismad

eav

ailableto

subs

cribers.

The

filemay

notbe

posted

onan

yothe

rweb

site,includ

ing

the

author’s

site.Pleas

ese

ndan

yqu

estio

nsrega

rding

this

policyto

perm

ission

s@inform

s.org.

Page 11: The Emergence of Opinion Leaders in a Networked Online … · Lu, Jerath, and Singh: Emergence of Opinion Leaders in a Networked Online Community Management Science, Articles in Advance,

Lu, Jerath, and Singh: Emergence of Opinion Leaders in a Networked Online CommunityManagement Science, Articles in Advance, pp. 1–17, © 2013 INFORMS 11

Figure 2 Performance Tests

Time-invarianthazard model

Only networkcharacteristics

Time-varyinghazard model

(a) In-degree distribution

100

In-degree

10–1

10–2

10–3

0 1 2 3 4+ 0 1 2 3 4+ 0 1 2 3 4+

Number of commonly trusted reviewers

0 1 2 3 4+ 0 1 2 3 4+ 0 21 3 4+

Time-invarianthazard model

Only networkcharacteristics

Time-varyinghazard model

(b) Dyadwise commonly trusted reviewer distribution

100

10–1

10–2

10–3

10–4

10–5

10–6

10–7

100

10–1

10–2

10–3

10–4

10–5

10–6

10–7

100

10–1

10–2

10–3

100

10–1

10–2

10–3

10–4

10–5

10–6

10–7

100

10–1

10–2

10–3

10–4

10–5

10–6

10–7

InfLength of geodesic path

(c) Geodesic distance distributionTime-invarianthazard model

Only networkcharacteristics

Time-varyinghazard model

6+54321Inf 6+54321Inf 6+54321

model with only network characteristics included,the predicted in-degree distribution differs signifi-cantly when the in-degree is ≥2. In other words,our full model performs significantly better than thetwo benchmark models in predicting the in-degreedistribution. From Figure 2(b), we can see that theactual data statistics for commonly trusted review-ers lie within the boxes corresponding to the fullmodel, indicating an excellent fit. In comparison, fortime invariant and only network characteristics mod-els, the actual data often lies outside the box or eventhe whiskers. From Figure 2(c), we can see that ourfull model outperforms the two benchmark modelson accurately predicting the geodesic distance distri-bution also.

From Figure 2, we can conclude that our full model(time-varying hazard model with all covariates) notonly performs well in predicting key network statis-tics in the holdout sample, but is also superior tothe two alternative benchmark models. To illustratethe importance of node-level characteristics, we cansee that the performance of the model with only net-work characteristics is always lower than that of ourmodel as well as that of the time-invariant hazardmodel. This emphasizes that node characteristics area major driver of link formation in the network evo-lution process. The time-invariant hazard model per-forms better than the model with only network char-acteristics; however, its performance is significantlyinferior to our full model. The above performancetests strongly indicate that our proposed model per-forms significantly better than the benchmark models,which shows the importance of incorporating bothnode characteristics and dynamics into the model.

6. Extensions and Robustness ChecksIn this section, we extend our basic analysis in threedifferent ways. First, we stratify our data set basedon opinion leadership status and find that the strate-gies of forming links employed by individuals withhigh opinion leadership statuses (HOLSs) are verydifferent from those employed by individuals withlow opinion leadership statuses (LOLSs). Second, weconsider an expanded network by crawling data inde-pendent of categories and also including followers ofreviewers who may not have written any reviews.Third, we conduct our analyses in two other productcategories.7

7 In addition to these extensions, we also estimate a random coef-ficients model to capture the potential unobserved individual het-erogeneity. We find that the impacts of preferential attachmentand recency are qualitatively the same as in the model withhomogenous individuals. Details are available in Online TechnicalAppendix B.

Copyright:

INFORMS

holdsco

pyrig

htto

this

Articlesin

Adv

ance

version,

which

ismad

eav

ailableto

subs

cribers.

The

filemay

notbe

posted

onan

yothe

rweb

site,includ

ing

the

author’s

site.Pleas

ese

ndan

yqu

estio

nsrega

rding

this

policyto

perm

ission

s@inform

s.org.

Page 12: The Emergence of Opinion Leaders in a Networked Online … · Lu, Jerath, and Singh: Emergence of Opinion Leaders in a Networked Online Community Management Science, Articles in Advance,

Lu, Jerath, and Singh: Emergence of Opinion Leaders in a Networked Online Community12 Management Science, Articles in Advance, pp. 1–17, © 2013 INFORMS

Table 3 Parameter Estimates for the “Movies” Category for Individuals with Different Opinion Leadership

All links that are formed in data set are included Only links that are formed first are included

Variables Low opinion leadership High opinion leadership Low opinion leadership High opinion leadership

Receiver characteristicsReceiver’s PrevAggReview 000347 001961∗ 000358 001972∗

Receiver’s CurReview 006368∗∗∗ 004134∗∗ 006276∗∗∗ 004017∗∗

Receiver’s PrevAggOpnLeadership 003828∗∗ 000583 003811∗∗ 000541Receiver’s CurOpnLeadership 003533∗∗ 000420 003391∗∗ 000392Comprehensiveness 004105∗ 001951∗∗∗ 004224∗ 002126∗∗∗

Objectivity 001452 −001374∗ 001315 −001417∗

Readability 000525 −001271∗ 000414 −001378∗

(Comprehensiveness)2 −005152∗∗∗ −001022∗∗∗ −005241∗∗∗ −001216∗∗∗

(Objectivity )2 000436 −000896 000487 −000802(Readability )2 −004960∗∗∗ −002610∗∗∗ −005128∗∗∗ −002602∗∗∗

Is Top Reviewer 001785∗∗∗ −001432∗∗ 001763∗∗∗ −001491∗∗

Sender characteristicsSender’s AggReview −001019∗∗∗ −002410∗∗∗ −000988∗∗∗ −002429∗∗∗

Sender’s AggOutgoingLink 000059∗∗∗ 000900 000062∗∗∗ 000816

Dyad characteristicsDissimilarity in Comprehensiveness −002319∗ −000633 −002332∗ −000602Dissimilarity in Objectivity −001251∗∗∗ −002205∗∗∗ −001121∗∗∗ −002251∗∗∗

Dissimilarity in Readability −000468 −000006 −000438 −000008Reciprocity 002447∗∗∗ 004094∗∗∗ — —Commonly Trusted Reviewers 001643∗∗∗ 001909∗∗ 001629∗∗∗ 001948∗∗

Hazard rate parametersLog4�05 −1407342∗∗∗ −1104360∗∗∗ −1505124∗∗∗ −1204193∗∗∗

Log4��15 −407773 −503882 −402149 −503251� 2d 001656∗∗∗ 001198∗∗∗ 001643∗∗∗ 001219∗∗∗

� 2a 004253∗∗∗ 003760∗∗∗ 004227∗∗∗ 003817∗∗∗

� 2b 003728∗∗∗ 004617∗∗∗ 003721∗∗∗ 004642∗∗∗

�ab 001651∗∗∗ 001867∗∗∗ 001643∗∗∗ 001899∗∗∗

∗, ∗∗, ∗∗∗ The 90%, 95%, and 99% credible intervals, respectively, do not include zero.

6.1. Analysis with Stratification Based onOpinion Leadership

We use the data set described in §4 and classify allindividuals in our sample into two groups based ontheir opinion leadership statuses. Individuals with<10 incoming links at the end of our calibrationperiod (December 2005) are classified as having lowopinion leadership status (LOLS), and the remain-ing individuals are classified as having high opinionleadership status (HOLS). Based on this, 5,100 and80 individuals are classified in the LOLS and HOLScategories, respectively. We then stratify all dyads intotwo groups based on the type of sender. The firstgroup corresponds to all pairs where the tie senderhas low opinion leadership status, and the secondgroup corresponds to all pairs where the tie senderhas high opinion leadership status. To illustrate howthe behaviors of these two groups of senders differfrom each other, we estimate our model for the twosamples separately. We report the results in the firsttwo columns of Table 3.

We uncover an interesting insight into the con-trasting strategies for extending trust links employedby individuals with low and high opinion leader-ship statuses. Whereas low opinion leadership status

individuals extend links to others who have high pre-vious and current opinion leadership status and aretop reviewers, high opinion leadership status indi-viduals extend links to low-status individuals. Onepotential explanation for this finding is provided byMayzlin and Yoganarasimhan (2012): those with aweak network position (LOLS reviewers) want tosignal their ability by finding and linking to HOLSreviewers, whereas those with a strong network posi-tion (HOLS reviewers) do not want to promote otherstrong individuals (HOLS reviewers) as competitors.In addition, LOLS individuals, who can in fact be con-sidered opinion seekers, are seeking access to high-quality reviews for themselves, which individualsidentified by others as top reviewers or opinion lead-ers can provide. In comparison, the HOLS individualswant to retain their followers and gain even higherleadership status by attracting others. Hence, a highopinion leadership status individual would not preferto extend a link to another high opinion leadershipstatus individual, because she may risk losing her fol-lowers to the other opinion leader.

We now conduct a robustness check to alleviate theconcern that reciprocity drives the results presentedabove. We estimate our model with the same stratifi-cation of the data as above, but for pairs of nodes that

Copyright:

INFORMS

holdsco

pyrig

htto

this

Articlesin

Adv

ance

version,

which

ismad

eav

ailableto

subs

cribers.

The

filemay

notbe

posted

onan

yothe

rweb

site,includ

ing

the

author’s

site.Pleas

ese

ndan

yqu

estio

nsrega

rding

this

policyto

perm

ission

s@inform

s.org.

Page 13: The Emergence of Opinion Leaders in a Networked Online … · Lu, Jerath, and Singh: Emergence of Opinion Leaders in a Networked Online Community Management Science, Articles in Advance,

Lu, Jerath, and Singh: Emergence of Opinion Leaders in a Networked Online CommunityManagement Science, Articles in Advance, pp. 1–17, © 2013 INFORMS 13

have reciprocated links, we include only those linksthat are formed first. In other words, if A and B aretwo nodes with the edges A→B and B→A both exist-ing, and, say, A→B is formed before B→A is formed,then we remove the edge B→A from the data. Byartificially removing all the links that could possiblybe reciprocated, we completely remove reciprocity asa possible factor in link formation.8 We provide theresults of the model estimated on these data in thelast two columns of Table 3. Comparing these esti-mates with the estimates in the first two columns ofTable 3, we find that there is no qualitative differencebetween the two sets of results.

6.2. Analysis for an Expanded,Category-Independent Network with“Followers” Included

In §5, we considered only the “Movies” community.In this section, we test our findings on a much larger,category-independent data set in which we alsoinclude individuals who only passively follow otherreviewers without themselves writing any reviews.To collect this data set, in the first step, we use allindividuals in the “Movies” community (as describedin §4.1) as the seeds for network crawling. To coverthe possibility that some parts of the network areunreachable from the “Movies” community, we fur-ther randomly sample 100 individuals from everyother product review community, such as “Cars,”“Computers and Software,” “Home and Garden,”etc., and include them as part of the seed group aswell in this step. In the second step, we start from thisseed group and collect data on all individuals who arein the webs of trust of the members in the seed group,as well as all individuals who put members in theseed group in their web of trust. These new membersare then included in the seed group. We repeat thesecond step until this crawled network stops expand-ing. Considering individuals who registered on thewebsite between January 2002 and December 2008,we obtain a network with almost twice the numberof nodes as in the calibration data described in §4,and that includes 10,669 individuals with 3,396 ties.Based on this much larger network, we estimate ourmodel (without considering the textual characteris-tics of reviews). We present the results in the secondcolumn of Table 2. These results show that, in thismuch larger network as well, the effect of intrinsicnode characteristics on the dynamics of network evo-lution differs from the effect of network-based nodecharacteristics—whereas the impact of previous opin-ion leadership carries over into future periods, previ-ous reviews written have no significant impact on therate of forming ties.

8 We thank an anonymous reviewer for suggesting this analysis.

6.3. Analysis for Other Product CategoriesTo check the robustness of our estimation results, wereplicated our analysis on the “Cars” and the “Homeand Garden” categories. We construct the data setsfor these two categories by restricting ourselves toreviewers who entered between January 2002 andDecember 2008 and wrote at least one review onthe topic of the associated community.9 The resulting“Cars” reviewer community includes 1,059 individu-als with 225 ties formed within the community, andthe “Home and Garden” community comprises 1,120individuals with 457 ties formed within the commu-nity. We present the results for the “Cars” and the“Home and Garden” communities in the third andfourth columns of Table 2, respectively.

As we can see in Table 2, most of the results thatwe found for the “Movies” community—most not-ably the result that only recent reviews, and notpast reviews, have an impact on opinion leader-ship status, whereas both past and recent trustlinks have an impact—also hold for the “Cars” and“Home and Garden” categories. Note, however, thatin both the “Cars” and “Home and Garden” cate-gories, the recency effect is weaker than that in the“Movies” category. One possibility is that readers inthe “Movies” community care more about movies thatwere released recently rather than about old movies,leading to a stronger recency effect. Interestingly, thefact that this effect is salient in both the “Cars” and“Home and Garden” communities, in which morerecent products are expected to be less important forconsumers than in the “Movies” category, indicatesthat the recency effect argument is applicable in awide range of scenarios.

7. Conclusions and ManagerialImplications

We model opinion leadership in a community usinga social network paradigm. We show that whereasphenomena highlighted in the extant literature, suchas preferential attachment and reciprocity, are impor-tant drivers of network growth, intrinsic properties ofnodes such as recent activity and the style of writ-ing reviews (objectivity, readability and comprehen-siveness) are also very significant drivers of networkgrowth and, in our context, drivers of opinion lead-ership status. Our study is one of the first to investi-gate opinion leadership in a longitudinal setting with

9 We used snowball sampling to collect data for this network, whichimplies that we only detect individuals whom at least one otherperson has included in her web of trust. For the “Movies” category,we could start with a list of movies for which reviews were writtenand detect individuals who wrote reviews but were not connectedto others. Such an exhaustive list of products for “Cars” and “Homeand Garden” is extremely difficult to construct, so we work withthis limited data set for this extension.

Copyright:

INFORMS

holdsco

pyrig

htto

this

Articlesin

Adv

ance

version,

which

ismad

eav

ailableto

subs

cribers.

The

filemay

notbe

posted

onan

yothe

rweb

site,includ

ing

the

author’s

site.Pleas

ese

ndan

yqu

estio

nsrega

rding

this

policyto

perm

ission

s@inform

s.org.

Page 14: The Emergence of Opinion Leaders in a Networked Online … · Lu, Jerath, and Singh: Emergence of Opinion Leaders in a Networked Online Community Management Science, Articles in Advance,

Lu, Jerath, and Singh: Emergence of Opinion Leaders in a Networked Online Community14 Management Science, Articles in Advance, pp. 1–17, © 2013 INFORMS

specific details about the opinion shared also avail-able (such as the time of opinion sharing and thecontent), and we significantly extend the emerging lit-erature on reputation building in online environments(Forman et al. 2008, Ghose et al. 2009). By incorpo-rating the time dimension into our study, we find thenovel and important result that intrinsic node charac-teristics are a stronger short-term driver of additionalinlinks, whereas the preferential attachment effect hasa smaller impact but it persists for a longer time. Ourresults are robust and hold consistently for the severaldifferent communities and network definitions thatwe consider.

Our findings have several important managerialand design implications for opinion-sharing websites.(Although we discuss the managerial implicationsin the context of Epinions, we believe they will bevalid for the numerous other networked online opin-ion sharing communities as well, such as MotleyFool, Seeking Alpha, IMDB, Yelp, etc.) Because ofthe manner in which Epinions and most other onlinereview communities are currently designed, the pres-ence of dominant reviewers whom a large numberof individuals already trust might hamper the emer-gence of new high-quality reviewers. This is becausepreferential attachment has a persistent impact oninlinks received, whereas review generation does not(unless it leads to new inlinks fairly quickly). There-fore, though it is not impossible for new review-ers who write up-to-date and high-quality reviews tobecome opinion leaders, it is nevertheless quite diffi-cult. A very simple and practical managerial solutionto this issue could be to attach a “lifetime” to the trustlinks, so that these votes of trust can be allowed to“expire” after a certain period of time. This wouldensure that reviewers cannot rest on the opinion lead-ership status that they have earned in the past. Theywill have to constantly share high-quality opinions, orelse have to secede opinion leadership status to newindividuals offering high-quality opinions, which willlead to an overall increase in the quality of informa-tion available in the community.

Furthermore, in any large online social networksuch as Epinions, it is a difficult task for users tofind relevant individuals among thousands of candi-dates for relationship formation. Epinions can lever-age our results in many ways to help reduce the costof such search. For example, it could display the list ofrecently-most-active reviewers along with the review-ers with the highest recent increase in opinion lead-ership. It could also develop and include a recencyscore for each reviewer as additional information inits search results ranking algorithm. Epinions can alsoask readers to rate reviews on different characteristicssuch as comprehensiveness, readability, and objectiv-ity (or automate this process using text mining). It can

then use these results to provide an average scorefor a reviewer on these characteristics. This wouldhelp the reader in deciding whether or not to read areview and whether or not to extend a trust link toa reviewer. Epinions could also provide a search toolthat could allow users to search reviews for a productbased on these desirable characteristics.

Our study contributes not only to furthering ourunderstanding of how opinion leaders emerge in net-worked communities, but also underscores the impor-tance of incorporating node-level characteristics innetwork growth models, a factor that has receivedlimited attention in the extant literature. Our resultsoffer an explanation for why the power law coeffi-cient for the in-degree distribution for the particu-lar online network from Epinions that we work with(having a value of 1.74) is smaller than the power lawcoefficients for in-degree distributions typically pre-dicted by the theoretical preferential attachment lit-erature (between 2 and 4; Barabasi and Albert 1999).(Note that this is true for various other popular onlinecommunities as well. For example, Mislove et al. 2007finds that the power law coefficient for the in-degreedistribution is 1.63 for YouTube and 1.74 for Flickr.)Intuitively, if individuals also take inherent node char-acteristics beyond in-degree (in our case, reviewerand review characteristics) into account when theyform ties, and individuals do not extend links tonodes with inferior node characteristics, then supe-rior node characteristics could help individuals attractadditional incoming links compared with networkswith pure preferential attachment. In this case, thepower law coefficient of the in-degree distributionwill be smaller, as we find it to be. In fact, dif-ferences in the relative importance of node charac-teristics for tie formation across different networksstudied in the extant literature may explain the dif-ferences in their power law coefficients. Followingthe arguments above, communities in which nodecharacteristics are important will have smaller powerlaw coefficients. This suggests that when researchersinvestigate the evolution of a network, they shouldnot focus solely on network characteristics such asdegree, betweenness measures, etc.; they should alsotake into account how characteristics of individualscan influence the evolution dynamics in a social net-work. (Note that theories of diffusion over existingnetworks and formation of networks on a small scaleconsider characteristics of individuals. However, theliterature, cited earlier, on generative models of large-scale networks has largely overlooked the importanceof node characteristics.) Therefore, these findings alsocontribute to the vast literature on scale-free net-works, why their macro-level characteristics may varyacross different settings, and why their degree distri-butions may not always be as skewed as theoreticalmodels based on preferential attachment predict.

Copyright:

INFORMS

holdsco

pyrig

htto

this

Articlesin

Adv

ance

version,

which

ismad

eav

ailableto

subs

cribers.

The

filemay

notbe

posted

onan

yothe

rweb

site,includ

ing

the

author’s

site.Pleas

ese

ndan

yqu

estio

nsrega

rding

this

policyto

perm

ission

s@inform

s.org.

Page 15: The Emergence of Opinion Leaders in a Networked Online … · Lu, Jerath, and Singh: Emergence of Opinion Leaders in a Networked Online Community Management Science, Articles in Advance,

Lu, Jerath, and Singh: Emergence of Opinion Leaders in a Networked Online CommunityManagement Science, Articles in Advance, pp. 1–17, © 2013 INFORMS 15

From the methodological perspective, we con-tribute to the literature on networks by developinga proportional hazard model of network evolutionthat is able to capture how time-varying covariatescan influence the probability of forming a directedtie between two nodes in a network. We extend theweighted exogenous sampling maximum likelihoodestimator developed by Manski and Lerman (1977)for binary choice data to duration data. Furthermore,we introduce a hierarchical Bayesian adaptation ofthe weighted exogenous sampling maximum likeli-hood estimator as a fast and effective way of deal-ing with the huge amounts of data that researchersand firms are typically faced with in the estimation ofdyadic models on network data. Often, the solutionemployed is to either simplify the model to be esti-mated or randomly sample a small part of the totalpopulation to reduce computational requirements.Our method, which involves selective sampling fol-lowed by appropriate reweighting of the sampleddyads, will help to reduce the degree to which suchcompromises need to be made. The results from oursimulation show that our proposed method can serveas a very effective heuristic when dealing with large-scale network data in a wide range of settings. How-ever, because we do not provide theoretical proofs,we suggest that researchers should check the accuracyof the WESBI method as appropriate for their settingbefore using it.

Our study can motivate future research in sev-eral directions. First, in this study we assume thatchanges over time in review writing styles (which are,in fact, minimal in our data) and in the frequencyof review writing are exogenous. It is possible that areviewer may learn over time and adjust these factorsbased on the readers’ response to her past reviews.A study that investigates reviewer learning would beinfluential in understanding the important but under-studied review-generation phenomenon. Second, ourstratification analysis in §6 indicates that reviewersare strategic in extending trust links to other review-ers based on opinion leadership status. It may beinteresting to investigate this in future studies. Third,we have only captured link formation and havenot looked at link dissolution, because the data thatwould be required are not available to us. Futurestudies can try to collect such data and study thefactors that affect link dissolution. Finally, it may beinteresting to consider the impact of product releasefrequency in a category on review generation and webof trust formation.

AcknowledgmentsAll authors contributed equally and have been listed in arandom order.

Appendix A. Derivation of the Log-Conditional-Likelihood FunctionFor the basic proportional hazard model,

�ij4t5=�04t5exp8zijtÂ91

the probability that the tie from i to j is not formed at timet+1, conditional on the fact that it is not formed yet attime t, is

P4Tij ≥ t+1 �Tt ≥ t5 = exp(

∫ t+1

t�ij4u5du

)

= exp(

−exp4zijtÂ5∫ t+1

t�04u5du

)

0

Here, we require the value of zijt to be invariant betweent and t+1. The conditional probability above can berewritten as

P4Tij ≥ t+1 �Tij ≥ t5=exp(

−exp4zijtÂ+�4t55)

1

where �4t5= log8∫ t+1t �04u5du9.

Let Cij be the length of time for which dyad ij has beenobserved, and let Tij be the length of time from the startingpoint to the time period when i extends a tie to j . Thus,the log-conditional-likelihood function for a data set withN individuals in this basic model is

logL =∑

i1j 6=i

{

ij ·log[

1−exp8−exp6�4kij5+zij1kijÂ79]

kij−1∑

t=0

exp[

�4t5+zijtÂ]

}

1

where ij =1 if Tij ≤Cij (i.e., if a tie formed within the obser-vation time) and 0 otherwise.

Appendix B. MCMC Inference for theTime-Varying Hazard ModelThe steps below provide the details of the estimation pro-cess for the time-varying hazard model with homogeneousconsumer preferences. The procedure of estimating themodel with heterogeneous consumer preferences is verysimilar to the one illustrated below; we discuss the het-erogeneous case in Online Technical Appendix B. For theprocedures below, letters with superscript u represent thevalues of the updated corresponding parameters.

Step 1. Estimate Ã:

Ãu�Â1ai1bi1�01�11dij1data

f 4Ãu�Â1ai1bi1�01�11dij1data5

∝�èÂ0�−1/2 exp

[

− 12 4Ã

u−Ã̄05

′è−1Ã0 4Ã

u−Ã̄05

]

L4Y51

where Ã̄0 and èÃ0 are diffused priors. Because there is noclosed form for this, we use the Metropolis–Hastings algo-rithm to draw from this conditional distribution of Ãu. Theprobability of accepting Ãu is

Pr4acceptance5

=min{exp6−41/254Ãu−�̄05

′è−1Ã0 4Ã

u−�̄057L4Y �Ãu5

exp6−41/254Ã−�̄05′è−1

Ã0 4Ã−�̄057L4Y �Â511}

0

Copyright:

INFORMS

holdsco

pyrig

htto

this

Articlesin

Adv

ance

version,

which

ismad

eav

ailableto

subs

cribers.

The

filemay

notbe

posted

onan

yothe

rweb

site,includ

ing

the

author’s

site.Pleas

ese

ndan

yqu

estio

nsrega

rding

this

policyto

perm

ission

s@inform

s.org.

Page 16: The Emergence of Opinion Leaders in a Networked Online … · Lu, Jerath, and Singh: Emergence of Opinion Leaders in a Networked Online Community Management Science, Articles in Advance,

Lu, Jerath, and Singh: Emergence of Opinion Leaders in a Networked Online Community16 Management Science, Articles in Advance, pp. 1–17, © 2013 INFORMS

We define diffuse priors by setting Ã̄0 to be a vector of zerosand èÃ0 =30I .

Step 2. Generate aui , bui :

f 4aui 1bui �Âu1�u

01�u11dij1data5

∝N44aui 1bui �Âu1�u

01�u11dij51èab5L4Y5

∝�èab�−1/2 exp

[

−12 4a

ui 1b

ui 5è

−1ab 4a

ui 1b

ui 5

′]

L4Y50

Because this distribution does not have a closed form, weuse the Metropolis–Hastings algorithm to draw from theconditional distribution of ai, bi: ai, bi is the draw of therandom effect from the previous iteration, and we draw aui ,bui by

[auibui

]

=[

aibi

]

+ã[

ab

]

, where ã[

ab

]

is a draw from N401ã2å5,and ã and å are chosen adaptively to reduce autocor-relation among MCMC draws following Atchade (2006).The probability of accepting this

[auibui

]

, the updated value

for[

aibi

]

is

Pr4acceptance5

=min{

6exp4− 12 4a

ui 1b

ui 5è

−1ab 4a

ui 1b

ui 5

57L4Y �aui 1bui 5

6exp4− 12 4ai1bi5è

−1ab 4ai1bi5

′57L4Y �ai1bi511}

0

Step 3. Generate èuab : è

uab �a

ui 1b

ui

4èuab �a

ui 1b

ui 5∼ IW2

(

7+N1G−10 +

N∑

i=1

4aui 1bui 54a

ui 1b

ui 5

)

1

where IW2 denotes the inverse-Wishart distribution.Step 4. Generate du

ij1duji: d

uij1d

uji ��

u01Â

u1ai1bi1�u11�

2d 1data

f 4duij1d

uji ��

u01Â

u1ai1bi1�u11�

2d 1data5

∝N44duij1d

uji ��

u01Â

u1ai1bi1�u1 51�

2d 5L4Y5

∝�−1d exp

[

− 12 4d

uij +du

ji52�−2

d

]

L4Y50

We use the Metropolis–Hastings algorithm to draw fromthis conditional distribution of du

ij and duji2 dij and dji are

the draws of the unobservable similarity effects from theprevious iteration, and we draw du

ij and duji by

[duijduji

]

=[dijdji

]

+ãd, where ãd is a draw from N401ã2å5, and ã andå are chosen adaptively to reduce autocorrelation amongMCMC draws following Atchade (2006). The probability ofaccepting

[duijduji

]

is

Pr4acceptance5

=min{

6exp4− 12 4d

uij +du

ji5�−2d 57L4Y �du

ij1duji5

6exp4− 12 4dij +dji5�

−2d 57L4Y �dij1dji5

11}

0

Step 5. Generate �ud :

4�ud �du

ij1duji5∼ IW 1

(

1+N4N −1511+

N∑

i=1

N∑

j=11j 6=i

4duij +du

ji52)

1

where IW1 denotes the inverse-Wishart distribution.Step 6. If convergence is not reached, go to Step 1.

ReferencesAlias-I (2008) LingPipe home page. Accessed January 23, 2013,

http://alias-i.com/lingpipe/index.html.

Allison P, Long S, Krauze T (1982) Cumulative advantage andinequality in science. Amer. Sociol. Rev. 47(5):615–625.

Ansari A, Koenigsberg O, Stahl F (2011) Modeling multiple rela-tionships in social networks. J. Marketing Res. 48(4):713–728.

Atchade Y (2006) An adaptive version for the metropolis adjustedLangevin algorithm with a truncated drift. Methodology Com-put. Appl. Probab. 8(2):235–254.

Barabasi A, Albert R (1999) Emergence of scaling in random net-works. Science 286(5439):509–512.

Bonacich P (1987) Power and centrality: A family of measures.Amer. J. Sociol. 92(5):1170–1182.

Braun M, Bonfrer A (2011) Scalable inference of customer similari-ties from interactions data using Dirichlet processes. MarketingSci. 30(3):513–531.

Burt R (1999) The social capital of opinion leaders. Ann. Amer. Acad.Political Soc. Sci. 566(1):37–54.

Chan K, Misra S (1990) Characteristics of the opinion leader: A newdimension. J. Advertising 19(3):53–60.

Dorogovtsev S, Mendes J (2003) Evolution of Networks: From Bio-logical Nets to the Internet and WWW (Oxford University Press,New York).

DuBay W (2004) The principles of readability. Accessed January 23,2013, http://www.impact-information.com/.

Fader P, Hardie B, Lee K (2005) RFM and CLV: Using iso-valuecurves for customer base analysis. J. Marketing Res. 42(4):415–430.

Fehr E, Gachter S (2000) Fairness and retaliation the economics ofreciprocity. J. Econom. Perspect. 14(3):159–181.

Forman C, Ghose A, Wiesenfel B (2008) Examining the relationshipbetween reviews and sales: The role of reviewer identity dis-closure in electronic markets. Inform. System Res. 19(3):291–313.

Gelman A, Carlin J, Stern H, Rubin D (2003) Bayesian Data Analysis(Chapman and Hall/CRC, Boca Raton, FL).

Ghose A, Ipeirotis P (2011) Estimating helpfulness and economicimpact of product reviews: Mining text and reviewer charac-teristics. IEEE Trans. Knowledge Data Engrg. 23(10):1498–1512.

Ghose A, Ipeirotis P, Sundarajan A (2009) The dimensions of rep-utation in electronic markets. Working paper, New York Uni-versity, New York.

Gladwell M (2000) The Tipping Point (Little, Brown and Company,New York).

Gould S (2002) The Structure of Evolutionary Theory (Harvard Uni-versity Press, Boston).

Greene W (2003) Econometric Analysis, 5th ed. (Prentice Hall, UpperSaddle River, NJ).

Handcock M, Raftery A, Tantrum J (2007) Model-based clusteringfor social networks. J. Royal Statist. Soc. A 170(2):301–354.

Hill S, Provost F, Volinsky C (2006) Network-based marketing:Identifying likely adopters via consumer networks. Statist. Sci.22(2):256–276.

Hoff P (2005) Bilinear mixed-effects models for dyadic data. J. Amer.Statist. Assoc. 100(469):286–295.

Hoff P, Raftery A, Handcock M (2002) Latent space approachesto social network analysis. J. Amer. Statist. Assoc. 97(460):1090–1098.

Holland P, Leinhardt S (1972) Reply: Some evidence on the tran-sitivity of positive interpersonal sentiment. Amer. J. Sociol.77(6):1205–1209.

Hunter D, Goodreau S, Handcock M (2008) Goodness of fit of socialnetwork models. J. Amer. Statist. Assoc. 103(481):248–258.

Iacobucci D, Hopkins N (1992) Modeling dyadic interactions andnetworks in marketing. J. Marketing Res. 29(1):5–17.

Iyengar R, Van den Bulte C, Valente T (2011) Opinion leadershipand social contagion in new product diffusion. Marketing Sci.30(2):195–212.

Copyright:

INFORMS

holdsco

pyrig

htto

this

Articlesin

Adv

ance

version,

which

ismad

eav

ailableto

subs

cribers.

The

filemay

notbe

posted

onan

yothe

rweb

site,includ

ing

the

author’s

site.Pleas

ese

ndan

yqu

estio

nsrega

rding

this

policyto

perm

ission

s@inform

s.org.

Page 17: The Emergence of Opinion Leaders in a Networked Online … · Lu, Jerath, and Singh: Emergence of Opinion Leaders in a Networked Online Community Management Science, Articles in Advance,

Lu, Jerath, and Singh: Emergence of Opinion Leaders in a Networked Online CommunityManagement Science, Articles in Advance, pp. 1–17, © 2013 INFORMS 17

Jones J, Handcock M (2003) An assessment of preferential attach-ment as a mechanism for human sexual network formation.Proc. Royal Soc. 270(1520):1123–1128.

Katz E, Lazarsfeld P (1955) Personal Influence: The Part Played by Peo-ple in the Flow of Mass Communications (Free Press, Glencoe, IL).

Katz L, Powell J (1955) Measurement of the tendency towards recip-rocation of choice. Sociometry 18(4):403–409.

Kim S, Hovy E (2006) Automatic identification of pro and con rea-sons in online reviews. Annual Meeting of the ACL, Proc. COL-ING/ACL on Main Conf. Poster Sessions, Sydney, Australia.

King C, Summers J (1970) Overlap of opinion leadership acrossconsumer product categories. J. Marketing Res. 7(1):43–50.

Kossinets G, Watts D (2006) Empirical analysis of an evolving socialnetwork. Science 311(5757):88–90.

Liu J, Cao Y, Lin C, Huang Y, Zhou M (2007) Low-quality productreview detection in opinion summarization. Joint Conf. Empiri-cal Methods in NLP and Comput. NLP (Association for Compu-tational Linguistics, Stroudsburg, PA), 334–342.

Manski C, Lerman S (1977) The estimation of choice probabilitiesform choice based samples. Econometrica 45(8):1977–1988.

Mayzlin D, Yoganarasimhan H (2012) Link to success: Howblogs build an audience by promoting rivals. Management Sci.58(9):1651–1668.

McPherson M, Smith-Lovin L, Cook J (2001) Birds of a feather:Homophily in social networks. Annual Rev. Sociol. 27(August):415–444.

Merton R (1968) The matthew effect in science. Science 159(3810):56–63.

Mislove A, Marcon M, Gummadi K, Druschel P, Bhattacharjee B(2007) Measurement and analysis of online social networks.Proc. 7th ACM SIGCOMM Conf. on Internet Measurement (ACM,New York), 29–42.

Myers J, Robertson T (1972) Dimensions of opinion leadership.J. Marketing Res. 9(1):41–46.

Narayan V, Yang S (2007) Modeling the formation of dyadic rela-tionships between consumers in online communities. Workingpaper, Cornell University, Ithaca, NY.

Otterbacher J (2009) “Helpfulness” in online communities: A mea-sure of message quality. Proc. SIGCHI Conf. Human Factor Com-put. Systems (ACM, New York), 955–964.

Pang B, Lee L (2004) A sentimental education. Annual Meeting ACL.Proc. 42nd Annual Meeting Assoc. Comput. Linguistics, Barcelona,Spain.

Robins G, Snijders T, Wang P, Handcock M, Pattison P (2007) Recentdevelopments in exponential random graph 4p∗5 models forsocial networks. Soc. Networks 29(2007):192–215.

Rogers E (2003) Diffusion of Innovation (Free Press, New York).Snijders T, Pattison P, Robins G, Handcock M (2006) New specifica-

tions for exponential random graph models. Sociol. Methodology36(1):99–153.

Stephen A, Toubia O (2009) Explaining the power-law degree in asocial commerce network. Soc. Networks 31(4):262–270.

Stephen A, Dover Y, Muchnik L, Goldenberg J (2012) The effects oftransmitter activity and connectivity on information dissemi-nation over online social networks. Working paper, Universityof Pittsburgh, Pittsburgh.

Tucker C, Zhang J (2010) Growing two-sided networks by adver-tising the user base: A field experiment. Marketing Sci. 29(5):805–814.

Valente T, Hoffman B, Ritt-Olson A, Lichtman K, Johnson A(2003) Effects of a social-network method for group assignmentstrategies on peer-led tobacco prevention programs in schools.Amer. J. Public Health 93(11):1837–1843.

Van Alstyne M, Brynjolfsson E (2005) Global village or cyber-balkans? Modeling and measuring the integration of electroniccommunities. Management Sci. 51(6):851–868.

Van den Bulte C, Joshi Y (2007) New product diffusion with influ-entials and imitators. Marketing Sci. 26(3):400–421.

Vernette E (2004) Targeting women’s clothing fashion opinion lead-ers in media planning: An application for magazine. J. Adver-tising Res. 44(1):90–107.

Watts D, Dodds P (2007) Influentials, networks, and public opinionformation. J. Consumer Res. 34(4):441–458.

Zhang Z, Varadarajan B (2006) Utility scoring of product reviews.Proc. 15th ACM Internat. Conf. Inform. Knowledge Management,Arlington, VA.

Copyright:

INFORMS

holdsco

pyrig

htto

this

Articlesin

Adv

ance

version,

which

ismad

eav

ailableto

subs

cribers.

The

filemay

notbe

posted

onan

yothe

rweb

site,includ

ing

the

author’s

site.Pleas

ese

ndan

yqu

estio

nsrega

rding

this

policyto

perm

ission

s@inform

s.org.