Sent elect march6-2014

29
SentElect TM : Forecasting Elections based on Sentiments in Social Media V.S. Subrahmanian SentiMetrix, Inc. & University of Maryland @vssubrah [email protected] March 6 2014 © Sentimetrix, All rights reserved, Sentiment Analysis Symposium March 6 2014 1 This work was performed for Sentimetrix, Inc.

description

Using SentElect product to forecast elections results in India

Transcript of Sent elect march6-2014

Page 1: Sent elect march6-2014

SentElectTM: Forecasting Elections based on Sentiments in Social Media

V.S. Subrahmanian SentiMetrix, Inc. & University of Maryland

@vssubrah [email protected]

March 6 2014

© Sentimetrix, All rights reserved, Sentiment Analysis Symposium March 6

2014

1

This work was performed for Sentimetrix, Inc.

Page 2: Sent elect march6-2014

SentElectTM Election Application

© Sentimetrix, All rights reserved, Sentiment Analysis Symposium March 6

2014

2

On May 8 2013, Sentimetrix predicted the outcome of the upcoming Pakistan election in front of 100+ people in V.S. Subrahmanian’s keynote at the Sentiment Analysis Symposium in New York City

On May 9, the BBC said the election was too close to call “Pakistan Elections: Five Reasons why the vote is unpredictable”

Sentimetrix was correct!

Page 3: Sent elect march6-2014

SentElectTM

• Currently tracks Twitter feeds on virtually any topic – Politicians – Political parties – Issues (in progress, expected completion April 2014)

• Identifies intensity of sentiment on each topic in each tweet.

• Forecasts trends in terms of expected number of supporters/opponents on Twitter

• Identifies individuals who are most influential in shaping an opinion/trend

• Provides a single dashboard to cover all of this.

© Sentimetrix, All rights reserved, Sentiment Analysis Symposium March 6

2014

3

Page 4: Sent elect march6-2014

SentElectTM

SentElectTM Functionalities Business Use

Identify sentiment and changes in sentiment on any given topic

Track sentiment on both your political campaign as well as your competitor’s

Learns a model on “big data” showing how support/opposition to a topic spreads

Understand how your campaign (and your opponent’s) are doing with voters and why

Forecast the expected number of people who will support/oppose a topic

Forecast how many people support/oppose your campaign and/or your opponent’s

Identify the most important individuals responsible for shaping/spreading opinion on a topic

Identify those shaping positive/negative opinion about you and see if you can get them to work on your behalf. Engage with influential Twitter users

© Sentimetrix, All rights reserved, Sentiment Analysis Symposium March 6

2014

4

Page 5: Sent elect march6-2014

SentElectTM Case Study

© Sentimetrix, All rights reserved, Sentiment Analysis Symposium March 6

2014

5

• Upcoming Indian election • Identified 31 entities to track. • Learned diffusion models from

July 15 – Jan 25 2014. • Tested models on Jan 25-Feb 20

data (~26 days) • Forecast trends on all 31 entities

from Feb 20 2014 to May 15 2014.

• Tested diffusion forecasts on January 25-Feb 20 2014 data with Pearson correlation coefficients consistently over 0.8, usually over 0.9.

SUMMARY STATISTICS • Study reported here uses data from

July 2013 to Feb 20 2014 • Forecasts made till May 15 2014. • 19.5M tweets studied in all • 16M distinct Twitter accounts • 40M edge network Twitter collection done using Twitter ontology and semantic database developed by Rensselaer Polytechnic Institute. [@jahendler]

Page 6: Sent elect march6-2014

BJP Forecast

© Sentimetrix, All rights reserved, Sentiment Analysis Symposium March 6

2014

6 July 15 2013

Jan 24 2014

Feb 20 2014

May 15 2014

OUTLOOK

• BJP supporters exceed opponents.

• Positives increasing faster than negatives

• Large number of supporters

• Outlook is very good

Page 7: Sent elect march6-2014

Narendra Modi Forecast

© Sentimetrix, All rights reserved, Sentiment Analysis Symposium March 6

2014

7 July 15 2013

Jan 24 2014

Feb 20 2014

May 15 2014

OUTLOOK

• Modi supporters exceed opponents.

• Positives increasing faster than negatives

• Outlook is very good

Page 8: Sent elect march6-2014

UPA Forecast

© Sentimetrix, All rights reserved, Sentiment Analysis Symposium March 6

2014

8 July 15 2013

Feb 20 2014

May 15 2014

OUTLOOK

• UPA opponents outnumber supporters.

• But catching up.

• Raw numbers much smaller than for BJP.

• Outlook not good.

Jan 24 2014

Page 9: Sent elect march6-2014

Congress Party Forecast

© Sentimetrix, All rights reserved, Sentiment Analysis Symposium March 6

2014

9 July 15 2013

Jan 24 2014

Feb 20 2014

May 15 2014

OUTLOOK

• Interesting, sentiment on Congress is more positive.

• But very muted in terms of numbers.

• Outlook is not good.

Page 10: Sent elect march6-2014

Rahul Gandhi Forecast

© Sentimetrix, All rights reserved, Sentiment Analysis Symposium March 6

2014

10 July 15 2013

Feb 20 2014

OUTLOOK

• Overall, sentiment on Rahul is positive

• Positives outweigh negatives and are growing.

• But negatives are much higher than Modi’s

May 15 2014

Jan 24 2014

Page 11: Sent elect march6-2014

Arvind Kejriwal Forecast

© Sentimetrix, All rights reserved, Sentiment Analysis Symposium March 6

2014

11 July 15 2013

Feb 20 2014

OUTLOOK

• Positives and negatives about even as of Feb 20

• But trend shows increasing doubts about Mr. Kejriwal as election time draws near.

Jan 24 2014

May 15 2014

Page 12: Sent elect march6-2014

SentElect Summary Statistics

© Sentimetrix, All rights reserved, Sentiment Analysis Symposium March 6

2014

12

BJP Narendra Modi

UPA Congress Party

Rahul Gandhi

Arvind Kejriwal

#Supporters Feb 20 2014

193031 68320 42482 7082 66399 31626

#Opponent Feb 20 2014

135077 26868 47893 4177 39641 19964

#Supporters May 15 2014

273119 95006 52736 9592 74773 96931

#Opponent May 15 2014

191171 40466 54189 5060 40389 213784

Accuracy (PCC*) Pos.

0.985 0.83 0.986 0.900 0.936 0.983

Accuracy (PCC) Neg.

0.984 0.957 0.984 0.931 0.911 0.966

* Pearson Correlation Coefficient

Page 13: Sent elect march6-2014

Head to Head: BJP vs. UPA/Congress

• Feb 20 2014: – BJP shows almost 4 times as many

supporters as Congress/UPA supporters.

– BJP opponents are less than 3 times as many as Congress/UPA opponents.

– So BJP is doing well.

• Forecast for May 15 2014: – BJP will maintain about 1.5x supporters

as compared to opponents. – Congress/UPA has slightly more

opponents than supporters.

• BJP’s outlook in terms of positives and negatives shows a combined growth.

• But UPA/Congress combined negatives exceed positives.

• And support for UPA/Congress is tepid raising the question of Congress/UPA supporters showing up to vote.

• In general, till May 15 2014, BJP seems to garner more support than Congress/UPA.

© Sentimetrix, All rights reserved, Sentiment Analysis Symposium March 6

2014

13

0 400000

BJP -5/15

UPA/Congress -5/15

BJP -2/20

UPA/Congress -2/20

Support

Opposition

Page 14: Sent elect march6-2014

Head to Head: Narendra Modi vs. Rahul Gandhi

• Feb 20 2014: – Mr. Gandhi and Mr. Modi are

about equal in “likes” as of Feb 20 2014 with Mr. Modi having a small [insignificant] lead.

– But Mr. Gandhi has 1.5x as many opponents in comparison to Mr. Modi.

• May 15 2014: – In terms of supporters, Mr.

Modi is pulling ahead of Mr. Gandhi with 1.3x supporters compared with Mr. Gandhi.

– On opponents, we expect them to be even.

• Mr. Modi is likely to pull away ahead of Mr. Gandhi by May 15 2014.

© Sentimetrix, All rights reserved, Sentiment Analysis Symposium March 6

2014

14

0 50000 100000150000

Modi -5/15

Gandhi - 5/15

Modi -2/20

Gandhi - 2/20

Support

Opposition

Page 15: Sent elect march6-2014

Head to Head: Rahul Gandhi vs. Arvind Kejriwal

• Feb 20 2014: – Mr Gandhi has 2x

supporters w.r.t. Mr. Kejriwal

– But he also has 2x opponents w.r.t. Mr. Kejriwal

• May 15 2014: – Mr. Kejriwal will have 1.3x

supports w.r.t. Mr. Gandhi [an about turn!]

– Mr. Kejriwal will have 5x opponents w.r.t. Mr. Gandhi.

• In short, though supporters for Mr. Kejriwal will grow, opponents will increase in number faster.

• Congress/UPA should outperform AAP/Mr. Kejriwal.

© Sentimetrix, All rights reserved, Sentiment Analysis Symposium March 6

2014

15

0 200000 400000

Kejriwal -5/15

Gandhi - 5/15

Kejriwal -2/20

Gandhi - 2/20

Support

Opposition

Page 16: Sent elect march6-2014

Head to Head: Narendra Modi vs. Arvind Kejriwal

• Feb 20 2014: – Mr Modi has 2x supporters

as Mr. Kejriwal – But also has about 1.4x

opponents as Mr. Kejriwal

• May 15 2014: – Mr. Modi and Mr. Kejriwal

will have about the same number of supporters

– Mr. Kejriwal will have about 5x the number of opponents as Mr. Modi

• Though support for Mr. Kejriwal is growing, opposition is growing at a much faster rate.

• We expect BJP to handily outperform AAP/Mr. Kejriwal.

© Sentimetrix, All rights reserved, Sentiment Analysis Symposium March 6

2014

16

0 200000 400000

Kejriwal -5/15

Modi - 5/15

Kejriwal -2/20

Modi - 2/20

Support

Opposition

Page 17: Sent elect march6-2014

SentElectTM : Identifying Key Influencers

© Sentimetrix, All rights reserved, Sentiment Analysis Symposium March 6

2014

17

Selected topic(s)

Page 18: Sent elect march6-2014

SentElectTM : Identifying Key Influencers

© Sentimetrix, All rights reserved, Sentiment Analysis Symposium March 6

2014

18

Constraints on identifying influential users

Page 19: Sent elect march6-2014

SentElectTM : Identifying Key Influencers

© Sentimetrix, All rights reserved, Sentiment Analysis Symposium March 6

2014

19

List of most influential users

on the select topic – note that number of

followers is not adequate

Page 20: Sent elect march6-2014

SentElectTM : User Profile

© Sentimetrix, All rights reserved, Sentiment Analysis Symposium March 6

2014

20

Distribution of topics discussed

Page 21: Sent elect march6-2014

SentElectTM : User Profile

© Sentimetrix, All rights reserved, Sentiment Analysis Symposium March 6

2014

21

List of tweets on selected

topics

Tabs allow user to see other tweets

Page 22: Sent elect march6-2014

SentElectTM : Sentiment Profile

© Sentimetrix, All rights reserved, Sentiment Analysis Symposium March 6

2014

22

Average sentiment score

on selected topics range from -1

(max negative) to +1 (max positive)

Page 23: Sent elect march6-2014

SentElectTM : Sentiment Profile

© Sentimetrix, All rights reserved, Sentiment Analysis Symposium March 6

2014

23

Volume of tweets on selected topic

Page 24: Sent elect march6-2014

Forecast Summary

© Sentimetrix, All rights reserved, Sentiment Analysis Symposium March 6

2014

24

Forecast #1

• Narendra Modi will be India’s next Prime Minister.

Forecast #2

• BJP (by itself) will fall short of a majority in Parliament, securing less than 272 seats.

Forecast #3

• Next Indian government will be a BJP-led coalition

Page 25: Sent elect march6-2014

Forecast Risks

• Our forecast can go wrong. – Risk #1 Forecasting based on unsupervised learning is difficult at best.

No training data connecting votes on the ground in India to number of supporters/opponents on Twitter. Selection bias.

– Risk #2 Forecast is based on publicly available Twitter data, not on entire Twitter fire-hose.

– Risk #3 Twitter-based and technology based risks: geo-location issues, bots/sybils/fake accounts.

– Risk #4 Changing situation on the ground with new allegations (e.g. corruption) emerging frequently.

– Risk #5 External events we can’t control for (e.g. terrorist attacks) can dramatically change the electoral landscape.

• Sentimetrix will update its forecasts approximately once every 2-3 weeks on www.sentimetrix.com. Next scheduled update – March 27 2014.

© Sentimetrix, All rights reserved, Sentiment Analysis Symposium March 6

2014

25

Page 26: Sent elect march6-2014

One Sybil’s strategy: @IsabellaObregom

1. Take tweet from a reputable account: – @AapKaJawab, an Aam Aadmi Party enthusiast, retweets:

“Arvind Kejriwal breaks into Manna Dey song on brotherhood at swearing-in – http://t.co/bVCHPte60k”

2. Follow link, rewrap in new shortened URL – @AapKaJawab’s link leads to an Indian news article

– @IsabellaObregom shrinks URL with Adf.ly, tweets: “Arvind Kejriwal breaks into Manna Dey song on brotherhood at swearing-in http://t.co/81cq9eyrNh”

3. @IsabellaObregom now paid per click through Adf.ly!

(In early 2014, Adf.ly and Twitter suspended account – original owner tweeted only in Spanish)

© Sentimetrix, All rights reserved, Sentiment Analysis Symposium March 6

2014

26

Page 27: Sent elect march6-2014

A larger Sybil network in our dataset

• We found many Sybil/bot accounts

• @Marie____Taylor and @Amy____Jones tweet identically, except different shortened links.

– Overlapping network of followers

– 100K+ tweets

– Many “smaller” inactive followers, each following 30-40 random people, with 30-40 bot followers.

– Related: @Lea___Smith, @Megan__Martinez, etc…

© Sentimetrix, All rights reserved, Sentiment Analysis Symposium March 6

2014

27

Page 28: Sent elect march6-2014

© Sentimetrix, All rights reserved, Sentiment Analysis Symposium March 6

2014

28

Page 29: Sent elect march6-2014

SentiMetrix Contact Information

• Address

6017 Southport Drive 20814 Bethesda MD USA

• E-mail [email protected]

• www.sentimetrix.com

• Telephone +1 240 479 9286

• V.S. Subrahmanian

• Twitter: @vssubrah

• Email: [email protected]

• www.cs.umd.edu/~vs/

• Telephone: +1 301 405 6724

© Sentimetrix, All rights reserved, Sentiment Analysis Symposium March 6

2014

29