SSRN-id2358679 (2)

8
 An Empirical Study of the Financial Community Network on Twitter Steve Y. Yang, Sheung Yin Kevin Mo and Xiaodi Zhu Financial Engineering Program Stevens Institute of Technology Hoboken, New Jersey, USA [email protected], [email protected], [email protected]  Abstract    Twitter, one of the several major social media platforms, has been identified as an influential factor to financial markets by multiple academic and professional publications in recent years. The motivation of this study hinges on the growing popularity of the use of social media and the increasing prevalence of its influence among the financial investment community. This paper presents an empirical evidence of a financial community  in Twitter in which users interests align with the financial market. From a large-scale data gathering effort using Twitter API, we establish a methodology in extracting relevant Twitter users to form the financial community, and we present empirical findings of its network characteristics. We find that this financial community behaves similarly to a small-world network, and we further identify groups of critical nodes and analyze their influence within the financial community based on several network centrality measures. Moreover, we document that the sentiment extracted from tweet messages of these critical nodes is significantly correlated with the Dow Jones Industrial Index price and volatility series. By forming a financial community within the Twitter universe, we argue that the critical Twitter users within the financial community provide a better proxy between social sentiment and financial market movement. Hence, sentiment extracted from these critical nodes provides a more robust predictor of financial markets than the general social sentiment. I. INTRODUCTION The motivation of this paper rests on the belief that social mood or sentiment in social media predicates upward or downward movement of the financial markets. Twitter, one of the main social media platforms, has been identified as an influential factor to the financial market with multiple sources of empirical evidence among academic and professional  publications. The emergence of top influencers and the occurrence of the disruptive events such as the 2013 Associated Press Hoax have raised alarm on the potential threat of social media influence to the stability of the financial markets. As a potential solution to understand how Twitter influences financial market sentiment, this study defines a  financial community with Twitter users whose interests align with the financial market. This subset of the Twitter universe, an active investment community, contains more refined and  precise information related to the financial market, therefore its influence to market sentiment can be better captured. Our hypothesis is that financial market movement is a manifestation of the market participants’ beliefs and behaviors toward future outcomes, and the aggregate of the societal mood should present itself as a reliable predictor. Moreover, the investors who are harvesting this information for their daily trading decisions forms the robust linkage between the social mood and financial market asset price movement. In this paper, we seek to identify the corresponding investment community and pinpoint its major influencers in the social networks context. The primary research question is whether the beliefs and behaviors of major key players in such community reveals better signals to financial market movement. From a large-scale data crawling effort, we define a financial community as a group of relevant Twitter users with interests aligned with the financial market. We first identify 50 well- recognized investment experts’ accounts in Twitter and use their common keywords to create the interests of the financial investment community. By constructing the two layers of the experts’ followers, we apply a multitude of rigorous filtering criteria to establish a financial community  based on their persistent interests in the topic of financial investment. In theory, this community can be defined compared with the active financial investment community we desire in which signals of sentiment can be clearly interpreted. After settling on a definition, we examine how messages from key influencers in the community interact with social sentiment/mood that tend to signal an impending upward or downward swing in the market price movement. We use key network metrics such as out-degree centrality and  betweenness centrality to identify the financial community influencers and we conjecture that these key influencers along with their dynamic weight in their opinions will provide better  predictors of financial market movement measures. II. BACKGROUND  A.  Literature of Twitter and Social Media Social media has become a major channel of information worldwide. According to Kaplan and Haenlein, social media is defined as “a group of Internet-based applications that build on the ideological and technological foundations of Web 2.0, and that allow the creation and exchange of user-generated content” [1]. These user-generated contents serve valuable source of information to business executives and decision- makers due to their profitable applications [1]. For example, Dell listed its continual sales and marketing effort in engaging customers through external social media channels in its

Transcript of SSRN-id2358679 (2)

Page 1: SSRN-id2358679 (2)

8/11/2019 SSRN-id2358679 (2)

http://slidepdf.com/reader/full/ssrn-id2358679-2 1/8

An Empirical Study of the Financial Community Network on Twitter

Steve Y. Yang, Sheung Yin Kevin Mo and Xiaodi ZhuFinancial Engineering ProgramStevens Institute of Technology

Hoboken, New Jersey, [email protected], [email protected], [email protected]

Abstract — Twitter, one of the several major social mediaplatforms, has been identified as an influential factor tofinancial markets by multiple academic and professionalpublications in recent years. The motivation of this studyhinges on the growing popularity of the use of social mediaand the increasing prevalence of its influence among thefinancial investment community. This paper presents anempirical evidence of a fi nancial community in Twitter inwhich users ’ interests align with the financial market.From a large-scale data gathering effort using TwitterAPI, we establish a methodology in extracting relevantTwitter users to form the financial community, and wepresent empirical findings of its network characteristics.We find that this financial community behaves similarly toa small-world network, and we further identify groups ofcritical nodes and analyze their influence within thefinancial community based on several network centralitymeasures. Moreover, we document that the sentimentextracted from tweet messages of these critical nodes issignificantly correlated with the Dow Jones IndustrialIndex price and volatility series. By forming a financialcommunity within the Twitter universe, we argue that thecritical Twitter users within the financial communityprovide a better proxy between social sentiment andfinancial market movement. Hence, sentiment extracted

from these critical nodes provides a more robust predictorof financial markets than the general social sentiment.

I. INTRODUCTION

The motivation of this paper rests on the belief that socialmood or sentiment in social media predicates upward ordownward movement of the financial markets. Twitter, one ofthe main social media platforms, has been identified as aninfluential factor to the financial market with multiple sourcesof empirical evidence among academic and professional

publications. The emergence of top influencers and theoccurrence of the disruptive events such as the 2013Associated Press Hoax have raised alarm on the potentialthreat of social media influence to the stability of the financialmarkets. As a potential solution to understand how Twitterinfluences financial market sentiment, this study defines a

financial community with Twitter users whose interests alignwith the financial market. This subset of the Twitter universe,an active investment community , contains more refined and

precise information related to the financial market, thereforeits influence to market sentiment can be better captured.

Our hypothesis is that financial market movement is amanifestation of the market participants’ beliefs and behaviorstoward future outcomes, and the aggregate of the societalmood should present itself as a reliable predictor. Moreover,the investors who are harvesting this information for theirdaily trading decisions forms the robust linkage between thesocial mood and financial market asset price movement.

In this paper, we seek to identify the correspondinginvestment community and pinpoint its major influencers inthe social networks context. The primary research question iswhether the beliefs and behaviors of major key players in suchcommunity reveals better signals to financial marketmovement. From a large-scale data crawling effort, we definea financial community as a group of relevant Twitter userswith interests aligned with the financial market. We firstidentify 50 well- recognized investment experts’ accounts inTwitter and use their common keywords to create the interestsof the financial investment community. By constructing thetwo layers of th e experts’ followers, we apply a multitude ofrigorous filtering criteria to establish a financial community

based on their persistent interests in the topic of financialinvestment. In theory, this community can be definedcompared with the active financial investment community wedesire in which signals of sentiment can be clearly interpreted.

After settling on a definition, we examine how messagesfrom key influencers in the community interact with socialsentiment/mood that tend to signal an impending upward ordownward swing in the market price movement. We use keynetwork metrics such as out-degree centrality and

betweenness centrality to identify the financial communityinfluencers and we conjecture that these key influencers alongwith their dynamic weight in their opinions will provide better

predictors of financial market movement measures.

II. BACKGROUND

A. Literature of Twitter and Social Media

Social media has become a major channel of information

worldwide. According to Kaplan and Haenlein, social media isdefined as “a group of Internet -based applications that buildon the ideological and technological foundations of Web 2.0,and that allow the creation and exchange of user-generatedcontent” [1]. These user-generated contents serve valuablesource of information to business executives and decision-makers due to their profitable applications [1]. For example,Dell listed its continual sales and marketing effort in engagingcustomers through “external social media channels ” in its

Page 2: SSRN-id2358679 (2)

8/11/2019 SSRN-id2358679 (2)

http://slidepdf.com/reader/full/ssrn-id2358679-2 2/8

2012 annual report [2] and subsequently reported $1 millionadditional revenue due to sales alerts from Twitter [3].

Twitter is a microblogging social media platform thatallows its users to send and read short messages of up to 140characters, commonly kno wn as “tweets”. It has evolved with517 million registered users as of 2012, generating over 340million tweets daily [4] [5]. The statistics is astonishing inrecent web development, and Twitter has been consistentlyranked as one of the top 15 websites with the highest volume[6]. Twitter also demonstrates strong growing presence in thesocial media platforms among the top 15 websites [6].

B. Literature of Twitter Influence to Financial Market

With Twitter becoming one of the largest social media platforms in the world, its influence to the financial markethas been documented. A survey conducted by ThomsonReuters indicates that majority of financial market

professionals and customers use social media for professionalreasons [7]. This is a revealing trend that social media carriesa substantial weight in influencing markets. Empiricalevidences also demonstrate strong correlation between the two

domains. Chen et al. finds that the articles and commentariesavailable on social media platforms contributes to the pricediscovery process and that investor opinions revealed onthese platforms “strongly predict future stock returns andearnings surprises” [8]. In a similar study, Ruiz et al. showedthe correlation between micro-blogging activity and stock-market events, and how it could be applied for developing

profitable strategies [9]. Bollen et al. further suggested thatincluding public mood dimensions enhance the accuracy ofmarket prediction [10]. A similar study conducted by Zhang etal. investigated a set of emotional text Twitter data of six

public opinion time series (dollar, $, gold, oil, job, andeconomy) and found their correlated effects with the financialmarket movement [11].

C. Literature of Social Network and Community Detection

A social network is “a set of nodes representing peo plethat are connected by links showing relations or flows

between them” [12]. A social network can be categorized intodifferent network types, and the identification process dependson network metrics such as average path length and averageclustering coefficient [13]. Large-scale networks acrossdifferent domains were observed to follow similar patterns,such as scale-free distributions, the small-world effect andstrong community structures [14]. The small-world

phenomenon was first proposed by Travers and Milgram that“social networks are in some sense tightly woven, full of

unexpected strands linking individuals seemingly far removedfrom one another in physical or social space” [15]. The small-world networks are often composed of “short chains ofacquaintances” among social participants [16], highlightingkey characteristics of small diameters and small average pathlengths [17]. High clustering coefficient is also noted relativeto “a purely random graph with the same number of links” [18]. In summary, small-world networks are graphs withclustering coefficients much larger than random networks andwith diameters that increase logarithmically with the number

of nodes [19]. Small-world networks are “ globally and locallyefficient” in exchanging information among the network [20]and examples were found in the neural networks,communication, and transport networks [20].

Community is defined as” the set of nodes such that theyinteract with each other more frequently than with those nodesoutside the group” [14]. Once a community is formed, itsaggregate behavior can be insightful in representing specificinterest groups. There are four common approaches to form acommunity: node-centric, group-centric, network-centric andhierarchy-centric [14]. Among the most common, the node-centric method requires each node in a group to satisfy a set ofspecified properties [14]. The selection of the optimalapproach rests on the tradeoff between stability and utility inhow well the community reflects the common interest.Moreover, it facilitates the discovery of clustered group thatachieves similarity among members and dissimilarity amongdifferent communities.

III. DATA COLLECTION

Collecting Twitter data is an important milestone to

gather empirical supporting evidence to construct the financialcommunity and to identify behavioral patterns among its participants. Using Twitter API, we implemented a large-scaledata collection system to collect three main types of data.First, the system gathers the friend-following relationshipinformation between two Twitter users. Given the uniqueidentification number of a specific user, the system cangenerate the identification number for all followers. Thisinformation can be useful in forming the structure of the socialnetwork. Second, the system collects user profile informationincluding its location, language preference, account creationdate, time zone. It also specifies the number of followers, thenumber of friends, and the number of total messages sent. The

profile information has allowed our study understand thedemographics of the financial community and their message

propagation pattern. Lastly, the system tracks tweet andretweet messages by a majority of the members in thefinancial community. These messages can be translated intosentiment and therefore be used for in-depth study related tofinancial phenomenon. By tracking the structural relationamong participants of the community, their profileinformation and messages, the dataset has led to this empiricalstudy related to the proposed financial community examiningits network characteristics and significance in correlating themessages’ sentiment with market movements.

IV. METHODOLOGY

We use a degree-centric community detection method forforming the financial community, which requires eachcommunity member to satisfy a set of specific requirements.We seek to understand the structure and networkcharacteristics of the financial community in the Twitteruniverse through empirical evidence. The objective is tounderstand the static and dynamic properties of the networkand how their message sentiment can be translated intomeasurable financial phenomenon.

Page 3: SSRN-id2358679 (2)

8/11/2019 SSRN-id2358679 (2)

http://slidepdf.com/reader/full/ssrn-id2358679-2 3/8

A. Definition of a Financial CommunityThe financial community is defined as a subset of the

Twitter users with similar interests in the financial market.The objective of establishing such community allows us toextract the most relevant users from the Twitter universerelated to the financial market.

B. Construction of the Financial Community

Identification of 50 well-recognized investment expertsThe proposed financial community begins with a

collection of 50 user accounts which represents well-recognized investment experts and financial news providers(see Appendix A for the detailed account lists). The first listincludes the top 25 traders as the most influential users onTwitter related to the financial market [21]. The selection ofthese traders was based on the frequency of posting, thenumber of followers and the usefulness of their tweets [21].The second list derives from the top 7 financial news

providers who feature financial news in a timely and objectivemanner: Bloomberg, Forbes, Reuters, BusinessWeek,

Financial Times, CNNMoney and CNBC [22]. Their

associated Twitter accounts form the remaining 25 seeds. Theassumption is that the 50 seed accounts specialize indelivering financial-related messages and their followers sharesimilar interest, reflecting a strong likelihood that theirfollowers belong to the active investment community.

Forming the Financial Community NetworkIt is the hypothesis of this study that the community can

be built based on a set of influential seed accounts with theirassociated followers. The community network is formed byunique nodes and linkages. We obtain the pairwise data of thefriend-follower relationships through Twitter API byextracting the followers of the 50 seed accounts. Once these

linkages are identified, an algorithm sorts through the linkagetable by extracting the unique nodes and their secondaryfollowers are captured. The two-layer of the followers of the50 seed accounts forms the basis of the financial community.

Figure 1: Financial Community from Twitter Universe

Filtering Mechanism1. By Language Preference, Location and MessagesWith the goal of extracting the most relevant nodes in the

community, we conduct a filtering procedure to eliminatenodes that might have relations with existing actors but havelimited significance to the network. First, we restrict thedomain to English language although we acknowledge that

this methodology has the capability to apply across differentlanguages. Second, we restrict the location accounts to all U.S.and U.K. domain. Any missing values from the profiles aredisregarded. The rationale behind these two criteria is toensure the quality of each node’s contribution to the network.There exists a tradeoff in encompassing a large network withas many related nodes as possible and in retaining the corequality of nodes with the best resemblance of a financial

community. We aim to find an optimal balance betweenyielding a sufficiently large sample size and revealingsignificant information about the financial market.

2. By Financial-Related TopicsOne of the key objectives in filtering a relevant financial

community is to ensure the messages posted by its memberscontain relevant financial-related keywords. Our approachincludes extracting the top common keywords appeared in thetweet messages collected from the top 50 seed influencersfrom August 3 to September 3, 2013. We group thesekeywords into financial-related topics and apply them toscreen out users whose most recent 20 messages do not belongto the relevant financial topics.

Figure 2: Financial Keyword Mining: Top 50 Words

V.

EMPIRICAL FINDINGS A. Financial Community Summary

We identified a total number of 154,203 Twitter users inthe financial community. We believe that our financialcommunity consists of a robust set of Twitter users whoseinterests are related to the financial market. Whileincorporating additional layers of the relationships, we did notfind significant expansion of the financial community.

B. Financial Community User ClassificationIt is essential to decompose the community into distinct

user groups by their friend-following relationship. We adoptthe Twitter user classification defined by Krishnamurthy et al.

with the three distinctive user types: Broadcaster,Acquaintance and Odd users [23].

1. Broadcaster : The Twitter users who have a higher proportion of followers than people they follow.

2. Acquaintances : The Twitter users who demonstratereciprocity in their relationships.

3. Odd User/Small Cluster : The Twitter users who havea higher proportion of people they follow than theirfollowers.

Page 4: SSRN-id2358679 (2)

8/11/2019 SSRN-id2358679 (2)

http://slidepdf.com/reader/full/ssrn-id2358679-2 4/8

The key metrics for classification is the ratio of thenumber of followers over the number of users who they follow(i.e. friend-following ratio). Broadcasters are defined as theusers who have a friend-following ratio of over 2.00 and aminimum of 300 followers. Users with friend-following ratio(ff_ratio) between 0.50 and 2.00 are labeled as acquaintancesand the rest are labeled as odd users. Our results indicate thatthere are 10,760 broadcasters, 131,105 acquaintances and

12,338 odd users. Majority of the community consists ofacquaintances and the remaining splits about evenly between

broadcasters and odd users.

C. Financial Community DemographicsThe empirical scatterplot of the friend-following

relationship is among the three distinctive user types in thefinancial community. It displays the number of followers andfriends of each member in the community (see Figure 3). First,we construct two lines that represent the thresholds of eachuser type definition, i.e. slope of the follower-friend ratio. Thefirst line defines the boundary between broadcasters andacquaintances (ff_ratio = 2). The nodes above this line are

broadcasters. On the other hand, the second line defines boundary between acquaintances and odd users that all nodes below that line are odds (ff_ratio = 0.5). Thus, the majoritynodes which stay between these two lines are acquaintances.

Figure 3: Scatterplot of Community by friend-follower relationship

Figure 4: Financial Community Growth

Community Composition Empirical FindingsBroadcasters 10,760 (7%)Acquaintances 131,105 (85%)Odd Users 12,338 (8%)

Table 1: Community Composition

The second observation is that most acquaintances arelocated at the diagonal line of the scatterplot (ff_ratio = 1).This demonstrates that the reciprocal relationship is mostcommon among the community population, which isconsistent with empirical finding in the Twitter universe [23].The linear reciprocal relationship can be rationalized in theTwitter universe that a user tends to have the same number offollowers and friends. In addition, it is noted that in the

scatterplot there is a cut-off line at 2,000 friends. It can beexplained by a restriction imposed by Twitter which states thata user cannot follow more than 2,000 accounts if he does nothave a sufficiently larger number of followers.

D. Financial Community Population GrowthWe also examined the evolution of the financial

community over time. We were able to extract the creationdate o f all the community members’ Twitter accounts andconsequently describe the community population growthcurve. Since the initiation of Twitter in June 2006, the networkhas grown stagnantly for the first 2.5 years and rapidly afterSeptember 2008 reaching from 20,000 users to the current154,327 users (see Figure 4).

E. Network Analysis of the Financial CommunityThere are two elements in forming a network: Nodes and

Edges. In this study, the 154,327 user accounts constitute thenodes for the financial community network and their pairwisefriend-following relationships are used as forming the edgesamong the nodes. Due to the friend-follower relationship ofthe community structure, all nodes are connected with at leastone edge and therefore the financial community is a largeconnected network without any isolated nodes.

Table 2: Summary Statistics of Financial Community Network

We compare different types of network structure, and wefind that the financial community best resembles a small-world network. Our community exhibits some of the key

characteristics of a small-world network such as its smalldiameter, and higher clustering coefficient relative to arandom network. The descriptive characteristics are consistentwith our empirical summary statistics of the financialcommunity network. The low diameter at 6.00 illustrates thatthe longest shortest path is only 6 layers from a node to anyother nodes in the network, which is a defining small-worldnetwork property. The diameter value is consistent with theresult of comparable small-world network study at 4.67 and

Page 5: SSRN-id2358679 (2)

8/11/2019 SSRN-id2358679 (2)

http://slidepdf.com/reader/full/ssrn-id2358679-2 5/8

3.43 by Sysomos Inc [24] and Bakhshandeh et al [25]respectively. In addition, the network displays a strongdecaying power law of its degree distribution. This result isconsistent with the findings of literature characterizing socialmedia network as a small-world network.

Figure 5: Degree Distribution

F. Critical Node Analysis in the Financial CommunityIn the financial community, there exist users who play a

central role in the connectedness of the network. These users,known as critical nodes, situate at the most critical locations ofthe community network and therefore bear a large weight inthe network dynamic properties such as connectedness andmessage propagation pattern. Analyzing these nodes isessential for understanding the financial community becausethey represent the most influential users in the community interms of facilitating the message propagation process andstabilizing the network structure.

Through social network analysis, we identify thesecritical nodes by applying centrality measures: out-degreecentrality, betweenness centrality and closeness centrality. Ourdata captures the direction of the friend-follower relationshipwhich contributes to the formation of a directed network. Thethree centrality measures incorporate direction as information

propagates from the sender to their followers. This studytracks the profile information and tweet messages of the top100 users ranked by each of the three centrality measures. Thedescription of each centrality measure is defined as:

1. Out-degree centrality measures the number offollowers a node have in the network.

2. Betweenness centrality captures the number ofshortest paths from all vertices to other nodes in thenetwork that passes through the specific node.

3. Closeness centrality measures how central a node isin a network by the inverse of farness of the node,which can be calculated by the sum of the node’sdistance to all other nodes.

Each group of these critical nodes shares certain commonattributes. The profile data consists of the nodes’ location,username, description, the number of messages they have

posted, and the number of followers and friends. In studyingof these critical nodes, we first examine the three importantattributes: the number of tweeted messages, the number of

followers and the number of friends. Samples of these criticalnodes are provided in Table 3.

First, we look at critical nodes with the highest out-degreecentrality, and we find that these critical nodes are followed

by a large portion of the users in the community. A tweetmessage initiated by this group can be spread to a largedomain in the network. Furthermore, nodes with the highestout-degree centrality can be much more influential than othernodes as their large number of followers can gain significantinterest from the financial community and therefore attractmore users who follow them. From the profile dataset, thesetop nodes with the highest out-degree centrality are

broadcasters who normally have tweeted more than 10,000tweets over the lifetime of the account, a much highertweeting rate than normal broadcasters and the community.They also have a longer account history compared to the othertwo groups of critical nodes since 2007.

The second group of critical nodes, users with the highest betweenness centrality, is equally split between broadcastersand acquaintances with few odd users. This illustrates that thisgroup of critical nodes is not composed of mainly

broadcasters, but nodes that are situated between the mostfollowed broadcasters and least followed odd users. Theirconnectedness is important as they can transmit informationmost effectively within the network. An interestingobservation is that these critical nodes are more specialized inthe financial industry with majority being writers or

publishing editors who are highly interested in financialmarket. Most accounts were created from 2007 to 2009.

The last group of critical nodes, users with the lowestcloseness centrality measure, consists of community memberswho are within short distance to the most central locationwithin the network. These nodes have many followerscompared to those critical nodes measured in out-degreecentrality and betweenness centrality. In the financialcommunity, their messages can take a shorter distance tospread among the network and therefore the underlyingmessage propagation is more direct. Many nodes are observedto have less number of users they follow and less number oftweeted messages, compared to the previous two groups. Mostof these accounts were created from 2008 to 2011.

Critical Nodes Sample UsersOut-DegreeCentrality

@TheEconomist, @BreakingNews, @FinancialTimes,@FortuneMagazine, @CMEGroup

BetweennessCentrality

@themotleyfool, @Vanguard_Group,@ReformedBroker, @TheStreet, @NYSEEuronext

Closeness

Centrality

@YESBANK, @currency4trades, @QNBGroup,

@Bizzun, @ FFinancialGroup, @sobertraderTable 3: Critical Node Sample Users

Based on the location data, we extract the top 3000 userswith the highest betweenness centrality from the financialcommunity in the U.S. and map their population density (seeFigure 6). New York and California are identified as the toptwo states containing the largest number of critical nodes, afact reflective of their wealth distribution and financial andmedia centers. The next level consists of the state of

0

10000

20000

30000

40000

50000

60000

70000

F r e q u e n c y

Degree (Number of Followers)

Degree Distribution

Page 6: SSRN-id2358679 (2)

8/11/2019 SSRN-id2358679 (2)

http://slidepdf.com/reader/full/ssrn-id2358679-2 6/8

Massachusetts, Texas, Illinois and Florida. These are amongthe U.S. states with the largest and wealthiest population.Some of their cities such as Boston and Chicago are majorfinancial hubs. Lastly, the last level comprises mostly stateson the east coast like Pennsylvania, Virginia, Georgia, NewJersey, Maryland and District of Columbia. They tend to besituated nearby New York City and have significant businessties in terms of the number of financial corporation

headquarters. The population map of the critical nodes has twosignificant interpretations: First, it reflects to a certain degreewhere the most influential nodes in the financial communityare most likely located. Second, their tweeting activities mighthave direct implication of the location of the events. Knowingthe location of the source gains a competitive advantage intracking the scope of the events.

Figure 6: Critical Nodes Location in Financial Community

VI. MARKET REGRESSION OF SENTIMENT

A. Objective and Data SourcesThe objective of the regression analysis is to show that

the critical nodes in the financial community have predictive power to the financial market movement. From previoussection, we identify the top 50 seeds and the critical nodes asthe key players. We also extract a list of selected communityusers for performance comparison purposes. Through theirtweet messages, we adopt a methodology in extracting theirsentiment and regress it against the historical daily return andvolatility of the Dow Jones Industrial Index (DJIA). We applythe analysis for both top 50 seeds and three critical nodesgroups during the period from 10/15/2013 to 11/15/2013.

B. Sentiment AnalysisSentiment analysis is applied to determine the sentimental

level of tweet messages. We develop a sentiment analysis

algorithm using a list of positive and negative words from adictionary of sentimental keywords proposed by Hill and Liu[26]. From the tweet messages, the algorithm matches the twoword lists and counts the frequency of positive and negativewords appeared in the message content. A score is assigned toeach tweet message by the number of matches. The algorithmalso adjusts the weight based on the relative importance of thenodes. For instance, the same positive message tweeted by a

broadcaster will have a larger sentiment weight than anacquaintance.

C. Market Regression with the DJIA Price IndexA multiple linear regression model is applied to examine

the relationship between daily market return and Twittermessage sentiment. First, we generate the market return bytaking the log return of market price ( ), so that the

return series pass the unit root test for time series analysis.Second, we ensure the return series with no autocorrelation by

testing the residual of AR model. The dependent variable isthe market return that indicates the trend of the stock market.Furthermore, if a significant correlation is discovered, it maysuggest that price change lags behind the sentimentmovement. In other words, sentiment has predictive powerover returns. The two series, negative and positive sentimentvalues, are used as independent variables. The resultant modelcan be formulated as follows:

1. Regression with Sentiment of Top 50 SeedsRegression Tweet Message (Top50 Seeds)

Market Return Coefficient Prob.Intercept 0.0022 0.1724Return -0.3326 0.0241**

Negative -1.61e-05 0.0014***Positive 9.76e-06 0.0096***

Table 4: Top 50 seeds Regression by Market Return

Figure 7: Market Price vs. Tweet Sentiment (Top 50 Seeds)

The regression result shows that positive and negativesentiment of tweet messages by the top 50 seeds hassignificant impact on market return. The more negative themessage sentiment expresses, the lower the price return is, andvice versa. Negative message sentiment has a stronger marketreturn impact than positive sentiment. In addition, regression

from the critical nodes (see Table 5) shows that negativesentiment has more negative effect on market return than positive sentiment, suggesting the influence of negativesentiment for lower market return. We construct a controlsample user group where the users are selected randomly froma large pool of users we collected. It serves as a control groupfor performance comparison with other groups of criticalnodes. The regression results shows that the three groups ofcritical nodes outperform the selected control group shown bythe higher significance in the regression.

Page 7: SSRN-id2358679 (2)

8/11/2019 SSRN-id2358679 (2)

http://slidepdf.com/reader/full/ssrn-id2358679-2 7/8

2. Regression with Sentiment of Critical Nodes

MarketReturn

Tweet Message(Betweenness Centrality)

Tweet Message(Out-Degree Centrality)

Coeff. Prob. Coeff. Prob.Intercept 0.0055 0.05* 0.0055 0.05*Return -0.3876 0.03** -0.3873 0.03**

Negative -1.22E-05 0.12 -1.23E-05 0.11Positive 3.03E-06 0.40 3.07E-06 0.40

MarketReturn

Tweet Message(Closeness Centrality)

Tweet Messages(Selected Community)

Coeff. Prob. Coeff. Prob.Intercept 0.0034 0.02** 0.0018 0.03**Return -0.3452 0.04** -0.4481 0.02**

Negative -3.58E-05 0.03** -7.52E-07 0.90Positive 1.21E-05 0.32 -5.06E-08 0.99

Table 5: Critical Nodes Regression by Market Return

D. Market Regression with DJIA VolatilityWe also examine the influence of sentiment on market

volatility through using GARCH(1,1) model. The meanequation contains constant element and lag-1 return for the

price return series. The positive and negative sentiment of the

messages are treated as exogenous factors for the varianceequation. The formula is shown below.

{

1. Regression with Sentiment of Top 50 SeedsMarket Volatility Tweet Message (Top 50 Seeds)

Variance Eq. Coefficient Prob.Intercept -1.55E-06 0.8947Resid^2 0.1450 0.8558Garch 0.600 0.7311

Negative 1.15E-08 0.7168Positive -4.66E-09 0.8617Table 6: Top 50 Seeds Regression by Market Volatility

The result shows that message sentiment does not havesignificant influence on market volatility and suggests thatsentiment of tweet messages by the top 50 seed has lesssignificance for volatility movement than market return.

2. Regression with Sentiment of Critical Nodes

MarketVolatility

Tweet Message(Betweenness Centrality)

Tweet Message(Out-Degree Centrality)

Coeff. Prob. Coeff. Prob.Intercept -3.52E-06 0.71 -3.54E-06 0.71Resid^2 0.15 0.82 0.15 0.84Garch 0.60 0.79 0.60 0.79

Negative 7.55E-09 0.005*** 1.4E-08 0.00***

Positive -5.08E-10 0.95 -2.9E-09 0.95Market

Volatility

Tweet Message(Closeness Centrality)

Tweet Messages(Selected Community)

Coeff. Prob. Coeff. Prob.Intercept -1.94E-06 0.78 6.71E-08 0.99Resid^2 0.14 0.82 0.15 0.85Garch 0.59 0.66 0.60 0.79

Negative 1.6E-08 0.85 -1.31E-09 0.96Positive 5.04E-09 0.96 1.75E-09 0.94

Table 7: Critical Nodes Regression by Market Volatility

In conducting the market volatility regression, negativesentiment has significant effect on the change of marketreturns from the betweenness and out-degree centrality groups(see Table 7). The negative sentiment of these two groups ofcritical nodes yields positive volatility effect, suggesting that

bad news on Twitter increase the volatility of price return inthe stock market. On the other hand, positive sentiment doesnot have significant effects on volatility of price return across

all groups. The result indicates that positive messages do notaffect market volatility as much as negative messages.Compared among the four groups of critical nodes, thenegative sentiment of the top out-degree centrality group hasthe largest and most significant impact on the volatility of

price return and provides more superior results. Thisobservation indicates that grouping users in financialcommunity by centrality analysis transforms the regressionresult to be more precise and accurate in explaining marketmovements.

VII. DISCUSSION

The market sentiment regression highlights the significantinfluence of critical nodes towards the financial marketmovement. The regression results show that all three groups ofcritical nodes have better performance than the selectedfinancial community for its predictive power of both the returnand volatility series. Through performance comparison, thesentiment expressed by the critical nodes is more significantand impactful than those by the selected community members.Critical nodes ranked by betweenness, degree and closenesscentrality yield higher negative sentiment coefficients againstmarket return at -1.22E-05, -1.23E-05 and -3.58E-05respectively, all with better significance level compared withthe selected financial community level (see Table 5). Thevolatility regression exhibits similar observation with bettersignificance level by the critical nodes ranked by betweennessand out-degree centrality at 0.005*** and 0.00*** (see Table7). This is consistent with the definition of critical nodes

because of their central contribution to network connectedness.This study shows that the behavior of critical nodes can beused to yield a more re liable indicator for the financial asset’s

price movement. It is worth noting that the current criticalnode analysis does not factor in the effect of tweets being read

by unregistered users who may be active investors. Thischannel of information propagation is achieved by searchingthe Twitter website for specific keywords and the associatedtweets would appear regardless whether the users arefollowers or not. This would facilitate the propagationmechanism of tweets to a wider community, including those

tweets broadcasted by the critical nodes. Measuring theimpact of this unobservable user group to the financial marketwill be a challenging problem, and we plan to address thisissue in the future study.

This empirical study of the financial community has threemajor contributions to the current literature of financial marketand Twitter sentiment. First, it addresses the hypothesis thatfinancial market movement can be a manifestation of themarket partic ipants’ beliefs and behaviors toward future

Page 8: SSRN-id2358679 (2)

8/11/2019 SSRN-id2358679 (2)

http://slidepdf.com/reader/full/ssrn-id2358679-2 8/8

outcomes, and the aggregate of the societal mood can presentitself as a reliable predictor. Second, the concepts ofleveraging critical nodes in the financial community generatea robust linkage between the social mood and financial marketasset price movement. Moreover, the findings of critical nodesserve as an important guidance for regulatory authorities in

paying attention to avoid manipulative or malicious actionssuch as the 2013 Associated Press hacking incident. Lastly,

the empirical study provides insightful observations about thedemographics and network structure of the financialcommunity. By decomposing the community into unique usertypes, beliefs and behaviors of market participants can be

better understood. With the continual growth of the Twitteruniverse, the dynamic characteristics of the financialcommunity can be better understood in terms of its networkstructure and the message sentiment influence towards thefinancial market movement.

VIII. CONCLUSION AND FUTURE RESEARCH WORK

This paper illustrates that a robust correlation can beformed between the social mood and financial market asset

price movement by establishing a financial community. A keyfinding is that the sentiment generated by critical nodes in thefinancial community yields a more reliable indicator for thefinancial asset’s price movement. We find that negativesentiment of the critical nodes has consistent predictive powerover market returns, and it consistently predicates marketvolatility as well. In the sentiment study, we show thatdifferent groups of critical nodes exert different degree ofimpacts to financial asset price and volatility movement.

For future work, this research can be extended into twoareas related to the use of the financial community: First, the

behaviors and beliefs of the critical nodes can be modeled assurrogates for information diffusion in financial markets.Sentiment or mood of the financial community can bemodeled as the aggregate of the individual sentiment withmemories. Swing of community sentiment can then bemodeled as a complex and interactive system. Second, anagent-based simulation can be applied to the network growthand message propagation of the financial community. Bymodeling the dynamic properties of the community, the agent-

based simulation can replicate important events in the socialmedia platform and capture their key factors on the financialmarket movement.

IX. APPENDIX

Top 25 Traders’ Twitter Accounts: @howardlindzon,@alphatrends, @pkedrosky, @grassosteve, @chessNwine,

@allstarcharts, @FOCUS_ON_RISK, @reformedbroker,@harmongreg, @ritholtz, @irenealdridge, @AnneMarieTrades,@optionalpha, @jimcramer, @PeterLBrandt, @1nvestor,@vader7x, @downtowntrader, @smbcapital, @tlmontana,@KeeneOnMarket, @ppearlman, @bespokeinvest, @fibline.

Top 25 Financial News Providers: @BloombergMrkts, @ BloombergNews, @Forbes, @Reuters, @BW, @ftfinancenews,@CNNMoney, @CNBCFastMoney, @WSJ, @YahooFinance,@CNBCWorld, @MarketWatch, @IBDinvestors, @MSN_Money,@FoxBusiness, @RealMarketNews, @NYSE, @TabbFORUM,

@EconBizFin, @CNBCClosingBell, @dowbands,@FXstreetNews, @Street_Insider, @HedgeWorld, @cr_harper.

R EFERENCES [1] A. Kapla and M. Haenlein, "Users of the world, unite! The challenges

and opportunities of Social Media," Business Horizons, vol. 53, no. 1, pp. 59-68, 2010.

[2] Dell Inc., "Form 10-K Annual Report pursuant to section 13 and 15(d),"United States Securities and Exchange Commission, Round Rock,Texas, 2012.

[3] D. Miller, "What Keeps Twitter Chirping Along," 10 December 2008.[Online]. Available: http://www.internetnews.com/webcontent/.

[4] I. Lunden, "Analyst: Twitter Passed 500M Users In June 2012, 140MOf Them In US," 30 July 2012. [Online]. Available:http://http://techcrunch.com/2012/07/30/analyst-twitter-passed-500m-users-in-june-2012-140m-of-them-in-us-jakarta-biggest-tweeting-city/.

[5] Twitter, "Twitter turns six," 21 March 2012. [Online]. Available://blog.twitter.com/2012/twitter-turns-six.

[6] Statistic Brain, "Top Websites By Traffic," 8 May 2012. [Online].Available: http://www.statisticbrain.com/top-us-websites-by-traffic/.

[7] Thomson Reuters, "Social Networking for Market Professionals: Howis it Changing the Face of Financial Markets?," 2010

[8] H. Chen, P. De, Y. Hu and B. Hwang, "Customers as Advisors: TheRole of Social Media in Financial Markets," 2 August 2013. [Online].Available: http://papers.ssrn.com/papers.cfm?abstract_id=1807265.

[9]E. J. Ruiz, V. Hristidis, C. Castillo, A. Gionis and A. Jaimes,"Correlating financial time series with micro-blogging activity," in

Proceedings of the fifth ACM international conference on Web searchand data mining , New York, 2012.

[10] J. Bollen, H. Mao and X. Zeng, "Twitter mood predicts the stockmarket.," Journal of Computational Science, pp. 1-8, 2011.

[11] X. Zhang, H. Fuehres and P. A. Gloor, "Predicting Stock MarketIndicators Through Twitter “I hope it is not as bad as I fear”," Procedia- Social and Behavioral Sciences, vol. 26, pp. 55-62, 2010.

[12] E. Aranguena, C. Manuel and M. Pozo, "Centrality in Directed Social Networks: A Game Theoretic Approach," in AIP Conference , 2010.

[13] Z. Zou, B. Mao, H. Hao, J. Gao and J. Yang, "Regular Small-World Network," Chinese Physics Letter, vol. 26, no. 11, 2009.

[14] L. Tang and H. Liu, Community Detection and Mining in Social Media,Lexington, KY: Morgan & Claypool, 2010.

[15] J. Travers and S. Milgram, "An Experimental Study of the Small WorldProblem," Sociometry, vol. 32, no. 4, pp. 425-443, 1969.

[16] J. Kleinberg, "The small-world phenomenon: an algorithm perspective,"in the thirty-second annual ACM symposium on Theory of computing(pp. 163-170) , 2000.

[17] D. Watts and H. Strogatz, "Collective dynamics of 'small-world'networks," Nature, vol. 393, no. 6684, pp. 440-442, 1998.

[18] M. Jackson, Social and Economic Networks, Princeton, NJ: PrincetonUniversity Press, 2008.

[19] D. Watts, Small Worlds: The Dynamics of Networks between Orderand Randomness, Princeton, NJ: Princeton University Press, 2003.

[20] V. Latora and M. Marchiori, "Efficient Behavior of Small-World Networks," Physical Review Letters, vol. 87, no. 19, 2001.

[21] Options Trading IQ, "Top 25 Traders on Twitter," 13 April 2012.[Online]. Available: http://www.optionstradingiq.com/.

[22] Top Forex News, "Top 7 News Sources for Financial Trader," [Online].Available: http://www.topforexnews.com/.

[23] B. Krishnamurthy, P. Gill and M. Arlitt, "A Few Chirps About Twitter,"in WOSN’08 , Seattle, Washington, 2008.[24] Sysomos Inc., "Six Degrees of Separation, Twitter Style," 1 April 2010.

[Online]. Available: http://www.sysomos.com/insidetwitter/sixdegrees/. [25] R. Bakhshandeh, M. Samadi, Z. Azimifar and J. Schaeffer, "Degrees of

Separation in Social Networks," in International Symposium onCombinatorial Search , 2011.

[26] M. Hill and B. Liu, “ Mining and Summarizing Customer Reviews," inthe ACM SIGKDD International Conference on Knowledge Discoveryand Data Mining , Seattle, Washington, 2004.