Post on 19-Feb-2017
The Potential and Perils of Election Prediction Using
Social Media Sources
Federico Nanni and Josh CowlsUniversity of Mannheim/Comparative
Media Studies, MIT
Reasons to be cheerful+ Social media data is (often) cheap+ Phone response rates are in decline+ More granularity available?
CostUtility
Traditional inferential model Social media model
Reasons to be doubtful- Myriad reliability issues...– Difficult to establish the meaning of
latent messages– Platform specific behaviours (e.g.
hashtags, likes) are not always understood
– Political discourse often laced with e.g. sarcasm
- The ethics of collecting and using social media data
Results to date have been mixed...• A meta-analysis found little evidence that
using Twitter to predict elections is better than chance in the aggregate (Gayo-Avello, 2013)
• Nonetheless, social media can provide an ‘early warning system’ for a candidate’s momentum (Jensen and Anstead, 2013)
• Big problem: what’s in a name?
Our approach: intention over attention
• Most models count references to candidates’ or parties’ names – measuring attention
• Other models use sentiment analysis, seeking to ascertain emotion responses to candidates
• We built an intention model, collecting instances of vote declarations for specific candidates
Case study• Context: Labour and the Lib Dems
required new leaders in 2015 (after a polling fail!)
• Leadership elections conducted in summer 2015– Lib Dems: two candidates (Tim Farron,
Norman Lamb)– Labour: four candidates (Jeremy Corbyn,
Andy Burnham, Yvette Cooper, Liz Kendall)
Advantages of our case• Primary candidates’ names easier to
isolate than ambiguous party names (“Labour”, “Liberal”)
• Party elections are a minority sport – better signal to noise ratio?
• Start and end dates clear; postal vote system ensured greater period of decision-making
Method Wrote Python scripts to collect tweets which:
Mentioned the name of a candidate Included a specific declaration to vote (“I’ll vote
for...”, “I’m voting for” etc) Cleaned data
Removed non-declarations (“I’m not voting for...”) Ascertained preferred candidate in ambiguous cases
Final dataset: 1361 valid declarations for Lib Dem race and 17617 for Labour
Analysis (1)
Analysis (2)
Key successes• ‘Intention’ model beat out ‘Attention’
model in 5 out of 6 races, and in both races overall
• Lib Dem prediction accuracy close to traditional margin of error (MOE = 3.5)
• Caught Corbyn’s success to a high degree of accuracy (MOE = 2)
Reflections and future work• Tough to generalise successes – specific
cases, particular platform. (How) would this work for:– Multi-state process (e.g. US primaries)?– General elections?
• Despite ongoing challenges, social media will surely play a key role in the future of accurate election prediction