Insights into socio politics using data analytics

48
Insights Into Socio Politics Using Data Analytics A presentation by

Transcript of Insights into socio politics using data analytics

Page 1: Insights into socio politics using data analytics

Insights Into Socio Politics Using Data Analytics

A presentation by

Page 2: Insights into socio politics using data analytics

2

About Politweet

• Researching the socio-economic and political interest of Malaysians

• Developing analytical tools for Twitter research

• Creating interactive, data-driven sites about socio-economic and political topics

#bdw2013 #bigdataMY

Page 3: Insights into socio politics using data analytics

3

Today’s Talk

• Overview of our data pipeline• Building timelines of historical events• Measuring user opinion• Measuring political partisanship• Visualising voter migration

#bdw2013 #bigdataMY

Page 4: Insights into socio politics using data analytics

4#bdw2013 #bigdataMY

Page 5: Insights into socio politics using data analytics

5

Technical Details

• Runs on PostgreSQL, MySQL and PHP running on Fedora Linux

• Events– 6.3 million tweets from 1.6 million users

• Politicians’ mentions– 5.5 million tweets from 385 thousand users

• Tweets related to American elections– 12 million tweets from 2 million users

#bdw2013 #bigdataMY

Page 6: Insights into socio politics using data analytics

6

BUILDING TIMELINES

#bdw2013 #bigdataMY

Page 7: Insights into socio politics using data analytics

7

Building Timelines

• Tweets as historical record• Bersih2 rally for electoral reforms – July 9th 2011– Goal: to reach Stadium Merdeka– 85372 tweets from 19190 users– 17452 mentions of locations collected for

investigative purposes

#bdw2013 #bigdataMY

Page 8: Insights into socio politics using data analytics

8

Methodology

1. Identify most re-tweeted tweet for each hour2. Identify peak time periods for event3. Identify peak time periods for locations4. View tweeted images for each hour5. Watch videos that are supported by tweet

evidence6. Combine all this information to establish a

timeline, cross-reference by reading tweets in sequence to help separate rumour from fact

#bdw2013 #bigdataMY

Page 9: Insights into socio politics using data analytics

9

#bersih2 Twitter Activity

#bdw2013 #bigdataMY

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 240

2000

4000

6000

8000

10000

12000

14000

July 9 UsersJuly 9 Tweets

Hour

Twee

ts /

Use

rs

Page 10: Insights into socio politics using data analytics

10

#bersih2 Area Activity

#bdw2013 #bigdataMY

Page 11: Insights into socio politics using data analytics

11

#bersih2 Timeline

• 8 AM – People making journey to city; reports of roadblocks

• 9 AM – Arrests being made; police checking IC at KTM and LRT

• 10 AM – More arrests being made at KL Sentral, Masjid Jamek, Sogo; Large crowd reported at Masjid Negara; False report of tear gas fire at KLCC

#bdw2013 #bigdataMY

Page 12: Insights into socio politics using data analytics

12

#bersih2 Timeline

• 11 AM – 236 people arrested so far; police targeting people in bersih tees;

#bdw2013 #bigdataMY

Page 13: Insights into socio politics using data analytics

13

#bersih2 Timeline

• 12 PM – More arrests; Crowds gathered/moving at old railway station; Central Market; Petaling Street

#bdw2013 #bigdataMY

Page 14: Insights into socio politics using data analytics

14#bdw2013 #bigdataMY

Page 15: Insights into socio politics using data analytics

15

#bersih2 Timeline

• 1 PM – Tear-gas being fired near central market; Water cannon being used; Massive crowd gathered at Jalan Sultan, Puduraya; LRT stations closed

#bdw2013 #bigdataMY

Page 16: Insights into socio politics using data analytics

16#bdw2013 #bigdataMY

Page 17: Insights into socio politics using data analytics

17

#bersih2 Timeline

• 2 PM – Police action continues. The crowd at Puduraya has broken up, 1 section proceeds to Tung Shin hospital while the remainder heads to Stadium Merdeka and KLCC.

• The earlier crowd that remained at Jalan Sultan and Jalan Petaling were spared from similar police action.

• Bersih and Pakatan leaders were tear-gassed at KL Sentral, following an attempt to break through the police blockade

#bdw2013 #bigdataMY

Page 18: Insights into socio politics using data analytics

18

#bersih2 Timeline

• 2.30 PM – Police action continues. Tear gas is fired into Tung Shin hospital grounds. Crowd at Stadium Merdeka remains calm.

• 3 PM – More arrests being made of crowd members at Tung Shin hospital. Crowd is scattered.

• 4 PM – Crowd begins to disperse in some areas. Large crowd reported at KLCC.

#bdw2013 #bigdataMY

Page 19: Insights into socio politics using data analytics

19

#bersih2 Area Activity (revisited)

#bdw2013 #bigdataMY

Page 20: Insights into socio politics using data analytics

20

Crowd Estimation

• Timeline establishes peak period• Photos determine extents• Google Maps used to measure area• Crowd density estimated as average persons

per sq. ft.• Final estimate was 45 – 50 thousand people

attended the rally

#bdw2013 #bigdataMY

Page 21: Insights into socio politics using data analytics

21

Puduraya

Crowd Estimation Sample

Area covered: 127,536 sq.ft.Estimated crowd: 31,884 people

#bdw2013 #bigdataMY

Page 22: Insights into socio politics using data analytics

22

Himpunan Kebangkitan Rakyat

• People’s Uprising Rally• January 12th 2013• Applied the same techniques to build a

timeline

#bdw2013 #bigdataMY

Page 23: Insights into socio politics using data analytics

23

Crowd estimation

#bdw2013 #bigdataMY

Page 24: Insights into socio politics using data analytics

24

Crowd estimation

#bdw2013 #bigdataMY

Page 25: Insights into socio politics using data analytics

25

MEASURING USER OPINION

#bdw2013 #bigdataMY

Page 26: Insights into socio politics using data analytics

26

Measuring User Opinion

• Sentiment analysis on tweets• Standard approaches– Classify sentiment based on words or phrases– Use Support Vector Machine (SVM) technique to

build topic-specific classifiers• Demonstration: Tweets on #MansuhPTPTN

(Abolish PTPTN)

#bdw2013 #bigdataMY

Page 27: Insights into socio politics using data analytics

27

Word-based Classifier

#bdw2013 #bigdataMY

neutral

positive

neutral

neutral

neutral

positive

neutral

neutral

neutral

neutral

neutral

negative

neutral

neutral

Identify keywords to determine sentiment

Result:2 positive11 neutral1 negative

Page 28: Insights into socio politics using data analytics

28

Word-based Classifier

#bdw2013 #bigdataMY

neutral

negative

positive

negative

neutral

neutral

neutral

positive

neutral

neutral

neutral

neutral

neutral

negative

negative

negativeneutral

neutral

negative

Lets add ‘ditahan’ and ‘blacklist’ to list of negative words

Result:2 positive5 neutral6 negative

Page 29: Insights into socio politics using data analytics

29

Word-based Classifier

• Word and phrase-based classifiers are good at measuring ‘mood’ of a tweet

• Often result in large % of neutral sentiment• Now we try Support Vector Machine (SVM)

#bdw2013 #bigdataMY

Page 30: Insights into socio politics using data analytics

30

SVM Approach

#bdw2013 #bigdataMY

neutral

positive

neutral

positive

neutral

neutral

neutral

neutral

negative

neutral

neutral

negative

Certain phrases are used by supporters of the proposal

Keywords influence results positive

positive

positive

Result:4 positive9 neutral1 negative

Page 31: Insights into socio politics using data analytics

31

SVM Approach

• SVM improves results but requires training sets of data

• Not practical for infrequent topics, such as the PTPTN issue

• For regular issues, constant training required to keep up to date

• Does not reliably tell us the final opinion of the user

#bdw2013 #bigdataMY

Page 32: Insights into socio politics using data analytics

32

Deducing Final Opinion

#bdw2013 #bigdataMY

neutral

positive

neutral

positive

neutral

neutral

neutral

neutral

negative

neutral

neutral

negative

If the last tweet was positive, does that imply positive opinion?

positive

positive

positive

Page 33: Insights into socio politics using data analytics

33

Our Methodology

1. Collect all tweets from users on a given topic for a fixed length of time

2. A human examines tweets in sequence, on a per-user basis

3. Based on the examination, determine the final opinion of the user

4. Common reasons for support / opposing an issue are noted

#bdw2013 #bigdataMY

Page 34: Insights into socio politics using data analytics

34

Testing Our Method

#bdw2013 #bigdataMY

positive

positive

neutral

neutral

neutral

neutral

negative

Researcher determines this user supports the proposal to abolish PTPTN

The opposition to the methods of student activists is noted.

This user is not opposed to a reduction in interest rate, instead of abolishing outright

positive

positive

positive

positive

positive

positive

positive

positive

Page 35: Insights into socio politics using data analytics

35#bdw2013 #bigdataMY

Page 36: Insights into socio politics using data analytics

36

Opinion-based Sentiment Analysis

• Pro – More accurate measurement of sentiment than

standard approaches– Offers details on why users oppose or support an

issue – Not influenced by large volume of tweets

• Con– Time-consuming to prepare– Requires researchers familiar with the language and

the issue#bdw2013 #bigdataMY

Page 37: Insights into socio politics using data analytics

37

Geo-located Sentiment Analysis

• Same methodology, but only on geo-located tweets

• Results in sentiment based on location, and how many in the area tweeted about the topic

• Demonstration: Himpunan Kebangkitan Rakyat (People’s Uprising Rally) on January 12th

#bdw2013 #bigdataMY

Page 38: Insights into socio politics using data analytics

38#bdw2013 #bigdataMY

Page 39: Insights into socio politics using data analytics

39#bdw2013 #bigdataMY

Page 40: Insights into socio politics using data analytics

40

Plans for the Future

• Build a Malay-language SVM to determine sentiment on tweets

• Use sampling to estimate the opinion of the Twitter user population

#bdw2013 #bigdataMY

Page 41: Insights into socio politics using data analytics

41

POLITICAL PARTISANSHIP

#bdw2013 #bigdataMY

Page 42: Insights into socio politics using data analytics

42

Measuring Political Partisanship

• Who we follow• Who we mention

#bdw2013 #bigdataMY

Page 43: Insights into socio politics using data analytics

43#bdw2013 #bigdataMY

Page 44: Insights into socio politics using data analytics

44

Who we follow

#bdw2013 #bigdataMY

Page 45: Insights into socio politics using data analytics

45

Who we mention

#bdw2013 #bigdataMY

Page 46: Insights into socio politics using data analytics

46

Facebook

#bdw2013 #bigdataMY

Page 47: Insights into socio politics using data analytics

47

Voter migration

#bdw2013 #bigdataMY

Page 48: Insights into socio politics using data analytics

48

Contact details

• Facebook : Fb.com/politweet• Twitter : @politweetorg• Email : [email protected]

#bdw2013 #bigdataMY