A Dream of Predicting Elections and Trading Stocks using Twitter
@yelenamm Yelena Mejova
Yet Another ConferenceMoscow Nov 30 2014
Money and Power
Movie box office salesConsumer confidence
Dow Jones Industrial AverageIndividual stocks
Political leaningPolarization
User classificationPredicting elections!
Financial Indexes Political Opinion
More…
CIKM 2013 TutorialTWITTER AND THE REAL WORLD
with Ingmar Weber
https://sites.google.com/site/twitterandtherealworld/home
Finance, Politics, Public Health, Event Detection
Can I get rich on the stock market?
• Efficient Market Hypothesis:– Financial markets are information efficient: prices
fully reflect all available information– Cannot be predicted
JUST AS WELL
Answer: NO
• Behavioral Economics: overconfidence, overreaction, information bias…
• Insider trading, governmental manipulation…
• Speculative bubbles: information be damned!
• Bitcoin: where is the value? – pure bubble
A non-random walk down Wall Street (1999) Lo & MacKinlay
Answer: NO MAYBE?
http://www.caymanatlantic.com/Self-reported Gains
http://dataminr.com/http://nymag.com/daily/intelligencer/2013/04/bloombergs-vip-terminal-tweeters.html
http://gnip.com/
1. content providers
2. specialized providers 3. data analytics
4. traders
Movies
Hollywood Stock Exchange
Predicting the Future with Social Media @sitaramasur Asur, Huberman @ WI-IAT 2010
• 2.89 million tweets• 24 moviesCorrel (tweet rate & box office gross) = 0.90using previous week’s tweetsto predict weekend box office gross:
Adj R2 = 0.973…and sentiment (positive/negative) score to predict second weekend box office gross:
Adj R2 = 0.94
least squares linear regression using previous week’s HSX scoresto predict weekend box office gross:
Adj R2 = 0.967
Consumer Confidence
• Index of Consumer Sentiment (ICS) (Reuters/UMich)• Economic Confidence Index (ECI) (Gallup)
• Subjectivity Lexicon: Opinion Finder
[some figures from authors’ original slides]
From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series @brendan642 O’Connor, Balasubramanyan, Routledge, Smith @ ICWSM (2011)
• High day-to-day volatility.• Average last k days.• Keyword “jobs”
k = 1, 7, 30• @ k=15 correlates with ECI
(Gallup) at r = 0.731
• Predicting 1 month in the future using previous 15 days
• Correlation with Gallup poll:– Twitter model: 77.5%– Poll model: 80.4%
• As Twitter grows, so is its accuracy
Consumer ConfidenceFrom Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series
@brendan642 O’Connor, Balasubramanyan, Routledge, Smith @ ICWSM (2011)
Twitter mood predicts the stock market@jlbollen Bollen, Mao, Zeng @ Journal of Computational Science (2011)
• Opinion Finder: positive / negative• GPOMS: calm, alert, sure, vital, kind and happy
Twitter 2008 (~10M tweets)
[some figures from authors’ original slides]
DJIA888 citations!
Slight correlation only with Calm GPOMS mood (0.065 at 6 day lag)
• Tracking stocks $STOCK
Stocks Tweets and trades: The information content of stock microblogs@timmsprenger Sprenger, Tumasjan, Sandner, Welpe
@ European Financial Management (2013)
Stocks
• Tweets: Jan 1 – Jun 30, 2010• S&P100 companies using $STOCK (price change & volume)• Naïve Bayes classifier trained on 2,500 tweets (buy/sell/hold): 81.2%
accuracy
Tweets and trades: The information content of stock microblogs@timmsprenger Sprenger, Tumasjan, Sandner, Welpe
@ European Financial Management (2013)
DOMINATED BY FEW “EXPERTS”1.5% posted 53.7% of all messages– Their quality is not much better!
BULLISH STOCK RETURNS-0.022 p<0.05
0.091 p<0.001
VOLUME TRADING VOLUME0.073 p<0.001
0.312 p<0.001
Stocks
• Twitter: Jan 1 – Jun 30, 2010• 150 (randomly selected) companies in S&P 500– Daily relative price change– Traded volume normalized by mean traded volume
for that company for entire time period
Correlating financial time series with micro-blogging activityRuiz, Hristidis, Castillo, Gionis, Jaimes @ WSDM (2012)
=
[some figures from authors’ original slides]
represent tweets as a GRAPH
constrain graph to a company and a time window
+ similarity nodes connecting very similar tweets (RTs) using Jaccard distance
Trading Simulation
[some figures from authors’ original slides]
• the only one that obtains a profit during which the Dow Jones fell -5.8%
Correlating financial time series with micro-blogging activityRuiz, Hristidis, Castillo, Gionis, Jaimes @ WSDM (2012)
• Best performance for vector auto-regression with the number of connected components
proposed
Don’t fire your stock broker yet
http://www.nytimes.com/interactive/2012/10/15/business/Declining-US-High-Frequency-Trading.html?ref=business
High-Speed Trading No Longer Hurtling Forward
Computer Flaws Get Wry Smile From Humans Displaced
http://dealbook.nytimes.com/2013/09/19/computer-flaws-get-wry-smile-from-humans-
displaced/?ref=highfrequencyalgorithmictrading
How a Trading Algorithm Went Awry
http://online.wsj.com/article/SB10001424052748704029304575526390131916792.html
Can we track & predict political
sentiment?
Elections“the crowning of the Internet as the king of all political media”“the beginning of the Internet presidency”
- on Obama's 2008 victory Mitch Wagner, InformationWeek
Transparency“Instantaneous tweeting of shady government practices -- and the resulting uproar -- means that public bodies are more responsive than ever”.
- Wesley Donehue, CNN
Mobilization“This exercise of power has produced a template for political action on a massive scale fueled by social media.”
- on PIPA and SOPA Vivek Wadhwa, Washington Post
blog
geru
nive
rsity
.wor
dpre
ss.c
om
US politics
• Most research will be presented• Clear left/right distinction• Popular political figures• High(ish) Twitter engagement REPUBLICAN
(right)DEMOCRAT
(left)
• Sampling Twitter for political speech– general keywords: #current– event keywords: #debate08, #tweetdebate– people: obama, romney, merkel– parties: democrat, republican, pirate– accounts: wefollow, twellow– news stories, known URL retweets
• Caveats– requires expert knowledge– known best after the event– selection bias (who do you want to ignore?)
lets talk politics
1. Text (text classification)2. Network (label propagation)
political leaning classification
political leaning classification
• Bootstrapped hashtag-based sample of political discussion• Gardenhose Sep 14 - Nov 4, 2010• Classes: right, left, ambiguous
TEXT-BASED• remove stopwords, hashtags, mentions, urls, all words occurring once in
the corpus• TFIDF weighting:
HASHTAG-BASED• remove hashtags used by only one user
Predicting the political alignment of twitter users @vagabondjack Conover, Gonçalves, Ratkiewicz, Flammini, Menczer @ SocialCom (2011)
NETWORK-BASED
• Label propagation– Initialize cluster membership
arbitrarily– Iteratively update each node’s label
according to the majority of its neighbors
– Ties are broken randomly• Cluster assignment by majority
cluster label (using manually labeled data)
political leaning classification
retweet network
Predicting the political alignment of twitter users @vagabondjack Conover, Gonçalves, Ratkiewicz, Flammini, Menczer @ SocialCom (2011)
• Classifier: Support Vector Machine
political leaning classification
network-based method
Predicting the political alignment of twitter users @vagabondjack Conover, Gonçalves, Ratkiewicz, Flammini, Menczer @ SocialCom (2011)
SEED-BASED (highly precise)1. Start with few seed users of known leaning2. The leaning of their followers is determined by which side they
retweet more3. Propagate users’ leaning to their tweets/hashtags/etc
hashtag accuracy: 98.6%, 93%, 90% (by source)
political leaning classificationPolitical hashtag hijacking in the US
Hadgu, Garimella, Weber @ WWW (2013)
Correlates with ADA (Americans for Democratic Action score):
– Spearman rank order correlation: .44
– Pearson product-moment correlation coefficient: .51
Visualizing media bias through Twitter@JisunAn An, Cha, Gummadi, Crowcroft, Quercia @ AAAI (2012)
Jaccard similarity of their audience (co-subscribers)distance between
two media
• Position news sources in leaning by considering the overlap in common audience (followers on Twitter)
political leaning classification
• Nov 21, 2013 – Feb 26, 2014• Classifier labeled to identify pro- and
anti- protest sentiment• Twitter, blogs, news, forums, Facebook
political leaning classificationRussia, Ukraine, and the West: Social Media Sentiment in
the Euromaidan Protests@bretling Etling @ Berkman Center Research (2014)
Ukr
aine
Russ
iaU
S &
UK
Does it reflect the overall sentiment of the people?
look who’s talking
• 2010 US Senate special election in Massachusetts
• Silent majority & vocal minority tweet differently (different agendas?)
• Spamming, fake grassroots movements
Vocal Minority versus Silent Majority: Discovering the Opinions of the Long Tail @enimust Mustafaraj, Finn, Whitlock, Metaxas @ SocialCom (2011)
number of tweets per user
• Truthiness is a quality characterizing a "truth" that a person making an argument or assertion claims to know intuitively "from the gut" or because it "feels right" without regard to evidence, logic, intellectual examination, or facts.
look who’s talkingDetecting and Tracking Political Abuse in Social Media
Ratkiewicz, Conover, Meiss, Goncalves, Flammini, Menczer @ ICWSM (2011)
Classifying memes for astroturf
Truthy project by Indiana University
look who’s talking
#ampat @PeaceKaren_25 &@HopeMarie_25
gopleader.gov Chris Coons
#Truthy @senjohnmccain on.cnn.com/aVMu5y “Obama said…”
TRU
THY
LEG
ITIM
ATE
Detecting and Tracking Political Abuse in Social Media Ratkiewicz, Conover, Meiss, Goncalves, Flammini, Menczer @ ICWSM (2011)
• 2009 German federal elections
electionsPredicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment
Tumasjan, Sprenger, Sandner, Welpe @ AAAI (2010)
sentiment profiles of leading candidates in tweets mentioning them (using LIWC2007) “The mere number of tweets reflects
voter preferences and comes close to traditional election polls”
CONTROVERSY!
638 citations!
electionsWhy the Pirate Party won the German election of 2009 or the trouble with predictions: A
response to Tumasjan, Sprenger, Sander, & Welpe, "Predicting elections with twitter: What 140 characters reveal about political sentiment"
@ajungherr Jungherr, Jürgens, Schoen @ SSCR V30/N2 (2012)
“show that the results of TSSW are contingent on arbitrary choices of the authors”
If results of polls played a role in deciding upon the inclusion of particular parties, the TSSW method is dependent
on public opinion surveys
Choice of Parties Choice of Dates
prediction analysis […] between [13.9] and [27.9], the day of the election,
produces a MAE of of 2.13, significantly higher than the MAE for TSSW
• Non-US elections:
– Irish: On using twitter to monitor political sentiment and predict election results, Bermingham, Smeaton (2011)• "Our approach however has demonstrated an error which is not competitive with the traditional
polling methods.”
– Dutch: Predicting the 2011 Dutch senate election results with twitter, Sang, Bos (2012)• Uses polls for demographic imbalances, yet performance still below traditional polls
– Singapore: Tweets and votes: A study of the 2011 singapore general election, Skoric, Poor, Achananuparp, Lim, Jiang (2012)• Not as accurate as traditional polls, performance at local government levels
– New Zealand: Can Social Media Predict Election Results? Evidence from New Zealand, Michael P. Cameron (2013)• “the size of the effect is small and it appears that social media presence will therefore only make a
difference in closely contested elections”
– many more coming out each day!
elections
Check out Gayo-Avello’s
literature surveys!
Metaxas et al. @ SocialCom (2011)
• A method of prediction should be an algorithm finalized before the election– specify data collection, cleaning, analysis, interpretation…
• Data from social media are fundamentally different than data from natural phenomena– people change their behavior next time around– spammers & activists will try to take advantage
• From a testable theory on why and when it predicts (avoid self-deception!)
• (maybe) Learn from professional pollsters– tweet ≠ user– user ≠ eligible voter– eligible voter ≠ voter
How (Not) To Predict Elections @takis_metaxas Metaxas et al. @ SocialCom (2011)
elections
[from authors’ original slides]
What now?
Now-casting Fore-castingShow improvement over baseline
or that you could make money / a difference
Publish a paper: let us know!(or go to Wall Street / Political Thinktank )
1. Bullishness is affected more strongly by returns than vice versa2. Message volume predicts trading volume3. … but high trading volume and volatility predict message volume more4. Agreement among traders leads to lower trading volumes
day of the week market index
Fixed-effects panel regressions at 1 and 2 day lags
Top Related