Mining Social Web Data Like a Pro: Four Steps to Success

61
Mining Social Web Data Like a Pro: Four Steps to Success Presented by Matthew A. Russell "Data Journalism and Interactivity" - GDA Seminar Quito, Ecuador - 20 September 2013 1

description

GDA Presentation - Quito Ecuador - 20 Sept 2013

Transcript of Mining Social Web Data Like a Pro: Four Steps to Success

Page 1: Mining Social Web Data Like a Pro: Four Steps to Success

Mining Social Web Data Like a Pro: Four Steps to Success

Presented by Matthew A. Russell

"Data Journalism and Interactivity" - GDA Seminar

Quito, Ecuador - 20 September 2013

1

Page 2: Mining Social Web Data Like a Pro: Four Steps to Success

Hola

2

Trained as a Computer Scientist

CTO @ Digital Reasoning Systems

Data Mining, Machine Learning

Principal @ Zaffra

Boutique Consulting

Author @ O'Reilly Media

5 published books on technology

Page 3: Mining Social Web Data Like a Pro: Four Steps to Success

3

Page 4: Mining Social Web Data Like a Pro: Four Steps to Success

Transform Curiosity Into Insight

4

An open source project

http://bit.ly/MiningTheSocialWeb2E

Inherently accessible

Virtual machine & IPython Notebook UX

Turn-key code templates for bootstrapping data science experiments

Think of the book as "premium" support for the OSS project

Page 5: Mining Social Web Data Like a Pro: Four Steps to Success

¿Por qué no Español?

5

Page 6: Mining Social Web Data Like a Pro: Four Steps to Success

Investigative Journalist

6

"A person whose profession it is to

discover the truth and to identify lapses from

it in whatever media may be available."

Page 7: Mining Social Web Data Like a Pro: Four Steps to Success

Data Science

7

Data => Actionable Information

Highly interdisciplinary

Nascent

Necessary

http://wikipedia.org/wiki/Data_science

Page 8: Mining Social Web Data Like a Pro: Four Steps to Success

Digital Signal Explosion

A model for the world: signal and sinks

Growth in data exhaust is accelerating

Digital fingerprints

Software is eating the world

Data mining opportunities galore...

8

Page 9: Mining Social Web Data Like a Pro: Four Steps to Success

Digital Data Stats100 terabytes of data uploaded daily to Facebook.

Brands and organizations on Facebook receive 34,722 Likes every minute of the day.

According to Twitter’s own research in early 2012, it sees roughly 175 million tweets every day

30 Billion pieces of content shared on Facebook every month.

Data production will be 44 times greater in 2020 than it was in 2009

According to estimates, the volume of business data worldwide, across all companies, doubles every 1.2 years.

9

See http://wikibon.org/blog/big-data-statistics

Page 10: Mining Social Web Data Like a Pro: Four Steps to Success

Social Media Is All the Rage

World population: ~7B people

Facebook: 1.15B users

Twitter: 500M users

Google+ 343M users

LinkedIn: 238M users

~200M+ blogs (conservative estimate)

10

Page 11: Mining Social Web Data Like a Pro: Four Steps to Success

But Why Is It All the Rage?

It satisfies fundamental human desires

We want to be heard

We want to satisfy our curiosity

We want it easy

We want it now

11

Page 12: Mining Social Web Data Like a Pro: Four Steps to Success

12

Roberto Mercedes

Jorge

Ana

Nina

Social Network Mechanics

Page 13: Mining Social Web Data Like a Pro: Four Steps to Success

Interest Graph Mechanics

13

Roberto Mercedes

Jorge

Ana

Nina

U2

Juan Luis

Guerra

Juan Luís

Guerra

Page 14: Mining Social Web Data Like a Pro: Four Steps to Success

A (Social) Interest Graph

14

Roberto Mercedes

Jorge

Ana

Nina

U2

Juan Luis

Guerra

Juan Luís

Guerra

Page 15: Mining Social Web Data Like a Pro: Four Steps to Success

A (Political) Interest Graph

15

Roberto Mercedes

Jorge

Ana

Nina

Johnny Araya

Rodolfo Hernández

Page 16: Mining Social Web Data Like a Pro: Four Steps to Success

Social Media Dimensions

16

Facebook

Accounts Types: People & Pages

Mutual Connections

"Likes"

"Shares"

"Comments"

Extensive Privacy Controls

Twitter

Accounts Types: "Anything"

"Following" Relationships

Favorites

Retweets

Replies

(Almost) No Privacy Controls

Page 17: Mining Social Web Data Like a Pro: Four Steps to Success

Why Does This Matter?

"If you can measure it, you can improve it"

Modeling Behavior

Predictive Analysis

Recommending Content

Swaying political situations might just be the ultimate value proposition for social media

17

Page 18: Mining Social Web Data Like a Pro: Four Steps to Success

Social Media Analysis Framework

Four Steps To Success

Aspire

Acquire

Analyze

Summarize

Let's step through a trivial example...

18

Page 19: Mining Social Web Data Like a Pro: Four Steps to Success

(1) Aspire

Let's frame a trivial hypothesis to illustrate the four steps...

Frame a hypothesis about some real world phenomenon

For example: "Johnny Araya is a more popular candidate than Rodolfo Hernández"

Let's use social media as a basis of investigation

19

Page 20: Mining Social Web Data Like a Pro: Four Steps to Success

(2) Acquire

Collect the data that you need to test the hypothesis

How?

Use Facebook and Twitter APIs to harvest data about each candidate

Go after low hanging fruit before something more complex

You don't even need to write code to do this (yet)

20

Page 21: Mining Social Web Data Like a Pro: Four Steps to Success

They're both on Facebook

21

http://facebook.com/ElDoctor2014

http://facebook.com/JohnnyArayaMonge

Page 22: Mining Social Web Data Like a Pro: Four Steps to Success

They're both on Twitter

22

@Johnny_Araya@ElDoctor2014

Page 23: Mining Social Web Data Like a Pro: Four Steps to Success

(3) AnalyzeCount, Filter, and Rank the Data

Johnny Araya:

~50k Facebook likes

~14k Twitter followers

Rodolfo Hernández:

~37k Facebook likes;

745 Twitter followers

Johnny Araya is indeed more popular in social media

23

Page 24: Mining Social Web Data Like a Pro: Four Steps to Success

(4) Summarize

Present the data in a concise and easily understood manner

Charts

Tables

Simple visualizations

Some examples...

24

Page 25: Mining Social Web Data Like a Pro: Four Steps to Success

25

Araya%

Hernandez%

Araya%

Hernandez%

Twitter Popularity

Social Media Popularity: Araya vs Hernández

Facebook Popularity

Page 26: Mining Social Web Data Like a Pro: Four Steps to Success

26

0"

10000"

20000"

30000"

40000"

50000"

60000"

Araya" Hernandez"

Twi5er"followers"

Facebook"fans"

Social Media Popularity: Araya vs Hernández

Page 27: Mining Social Web Data Like a Pro: Four Steps to Success

27

1"

10"

100"

1000"

10000"

100000"

Araya" Hernandez"

Twi0er"followers"

Facebook"fans"

Social Media Popularity: Araya vs Hernández

Page 28: Mining Social Web Data Like a Pro: Four Steps to Success

Twitter Popularity

28

Page 29: Mining Social Web Data Like a Pro: Four Steps to Success

Facebook Popularity

29

JohnnyArayaMonge,35%,

o0oguevaraguth,17%,

luisguillermosolisr,3%,

villaltaJM,19%,

ElDoctor2014,26%,

Facebook(Likes(for(Costa(Rican(Presiden4al(Candidates(

Page 30: Mining Social Web Data Like a Pro: Four Steps to Success

Recall the previous hypothesis:

"Johnny Araya is a more popular candidate than Rodolfo Hernández"

What do we know now that we didn't before?

The current state of each candidate's Twitter and Facebook popularity

Let's explore a slightly more complex hypothesis...

30

Reflect and Refine...

Page 31: Mining Social Web Data Like a Pro: Four Steps to Success

(1) Aspire

Redefine the hypothesis:

For example: "Johnny Araya has a more effective social media strategy than Rodolfo Hernández"

Presumably because of his superior social media status at the moment

31

Page 32: Mining Social Web Data Like a Pro: Four Steps to Success

(2) Acquire

Collect the data that you need to test the hypothesis

How? Use APIs to harvest data about each candidate

Let's consider any Facebook posts for 2013

32

Page 33: Mining Social Web Data Like a Pro: Four Steps to Success

33

for candidate in ['JohnnyArayaMonge', 'ElDoctor2014']:

# Get the data

url = 'https://graph.facebook.com/{0}?' + \ fields= posts.limit(500)&access_token=XXX'.format(candidate) content = requests.get(url).json()

# Save the data

f = open(candidate + ".json", "w") f.write(json.dumps(content)) f.close()

Python Source Code

Page 34: Mining Social Web Data Like a Pro: Four Steps to Success

(3) Analyze

34

Count, Filter, and Rank the Data

Some more Python source code to crunch the numbers

Extract Facebook likes and shares this year

Page 35: Mining Social Web Data Like a Pro: Four Steps to Success

Facebook Vitals

35

ElDoctor2014Total Likes 37495Num Posts since Jan 1, 2013 (of 500 possible) 436Total Post Likes 155473Total Post Shares 9684Oldest Post in Batch 2013-03-15T00:40:21+0000Num posts prior to Jan 1, 2013 0Avg likes/post 356.589449541 (0.951032003044%)Avg shares/post 22.2110091743 (0.059237256099%)Post Types [(u'photo', 286), (u'link', 77), (u'status', 40), (u'video', 32), (u'swf', 1)]

JohnnyArayaMongeTotal Likes 50301Num Posts since Jan 1, 2013 (of 500 possible) 205Total Post Likes 176161Total Post Shares 7542Oldest Post in Batch 2013-01-01T07:18:43+0000Num posts prior to Jan 1, 2013 190Avg likes/post 859.32195122 (1.70835957778%)Avg shares/post 36.7902439024 (0.0731401838978%)Post Types [(u'photo', 149), (u'status', 38), (u'link', 13), (u'video', 5)]

Page 36: Mining Social Web Data Like a Pro: Four Steps to Success

(4) Summarize

Present the data in a concise and easily understood manner

Like a table...

36

Page 37: Mining Social Web Data Like a Pro: Four Steps to Success

37

Metric Araya Hernández

Total Likes

Posts since 1 Jan 13

Num Prior Posts

Earliest Post

Post Likes since 1 Jan 13

Post Shares since 1 Jan 13

Avg Likes per Post

Avg Shares per Post

50,301 37,495

205 436

190+ 0

1 Jan 2013 15 March 2013

176,161 155,473

7,542 9,684

859 356

36 22

Page 38: Mining Social Web Data Like a Pro: Four Steps to Success

38

Metric Araya Hernández

Total Likes

Posts since 1 Jan 13

Num Prior Posts

Earliest Post

Post Likes since 1 Jan 13

Post Shares since 1 Jan 13

Avg Likes per Post

Avg Shares per Post

50,301 37,495

205 436

190+ 0

1 Jan 2013 15 March 2013

176,161 155,473

7,542 9,684

859 356

36 22

Page 39: Mining Social Web Data Like a Pro: Four Steps to Success

Recall the hypothesis:

"Johnny Araya has a more effective social media strategy than Rodolfo Hernández because he has more Facebook and Twitter popularity"

What do we know now?

Hernández has Facebook vitals that are quite competitive with Araya

However, Hernández only joined Facebook ~6 months ago!

It would appear that Hernández has the more effective strategy

What is he doing to rise in popularity so quickly?

39

Reflect and Refine...

Page 40: Mining Social Web Data Like a Pro: Four Steps to Success

40

Comparison of Facebok Content

Page 41: Mining Social Web Data Like a Pro: Four Steps to Success

Other Candidates

41

Page 42: Mining Social Web Data Like a Pro: Four Steps to Success

Johnny Araya FB Posts

42

Page 43: Mining Social Web Data Like a Pro: Four Steps to Success

Rodolfo Hernández FB Posts

43

Page 44: Mining Social Web Data Like a Pro: Four Steps to Success

44

Page 45: Mining Social Web Data Like a Pro: Four Steps to Success

Past ~2 Months on Facebook

45

Aug 2013 FB Likes Sept 2013 FB Likes % Change

Johnny Araya

Otto Guevara Guth

José María Villalta Florez-Estrada

Dr. Rodolfo Hernández

Luis Guillermo Solís Rivera

50,301 53,809 6.97%24,146 27,675 14.62%

27,262 35,169 29.00%

37,495 38,298 2.14%

5,334 6,763 26.79%

Page 46: Mining Social Web Data Like a Pro: Four Steps to Success

Past ~3 Months on Twitter

46

Aug 2013 Sept 2013 % Change

Johnny Araya

Otto Guevara Guth

José María Villalta Florez-Estrada

Dr. Rodolfo Hernández

Luis Guillermo Solís Rivera

14,573 15,506 6.40%114 159 39.47%

8,160 8,990 10.17%

745 858 15.17%

1,192 1,487 24.75%

Page 47: Mining Social Web Data Like a Pro: Four Steps to Success

Facebook and Twitter Compared

47

% FB Change % Twitter Change

Johnny Araya

Otto Guevara Guth

José María Villalta Florez-Estrada

Dr. Rodolfo Hernández

Luis Guillermo Solís Rivera

6.97% 6.40%14.62% 39.47%

29.00% 10.17%

2.14% 15.17%

26.79% 24.75%

Page 48: Mining Social Web Data Like a Pro: Four Steps to Success

Your Imagination Is the Only Limit

Analyze the comments that people are leaving on Facebook pages

Try to ascertain common common Facebook fans or Twitter followers amongst candidates

Deduce demographics from social media by synthesizing public data

Theorize about potential "reach" or "influence" using social media

Analyze data in realtime

48

Page 49: Mining Social Web Data Like a Pro: Four Steps to Success

Thinking about Reach

49

Think about "liking" and "following" as opt-ins to feeds

Remember: Interest Graphs

Arriving at effective metrics is tricker than it initially seems

Page 50: Mining Social Web Data Like a Pro: Four Steps to Success

Potential Twitter Influence

50

Araya Hernández

Followers

TheoreticalReach

Reach (10)

Reach (100)

Reach (1000)

Reach (10,000)

"Suspect" Followers

~14k ~750

~40M ~550k

490 673

289 702

2782 X

2832 X

3,246 94

See also http://wp.me/p3QiJd-2a

Page 51: Mining Social Web Data Like a Pro: Four Steps to Success

Potential Influence

51

Page 52: Mining Social Web Data Like a Pro: Four Steps to Success

Who are Candidates Following?

52

Page 53: Mining Social Web Data Like a Pro: Four Steps to Success

What are Candidates Tweeting?

53

Page 54: Mining Social Web Data Like a Pro: Four Steps to Success

Realtime Analysis

54

Monitor Twitter's firehose for realtime data using filters such as #Syria

Keep in mind the sheer volume of data can be considerable

Analysis at MiningTheSocialWeb.com

Page 55: Mining Social Web Data Like a Pro: Four Steps to Success

Mapping #Syria Tweets

55

See http://wp.me/p3QiJd-1t Text

Page 56: Mining Social Web Data Like a Pro: Four Steps to Success

Temporal Analysis on #Syria

56

Page 57: Mining Social Web Data Like a Pro: Four Steps to Success

Analyzing #Syria Tweet Entities

57

Page 58: Mining Social Web Data Like a Pro: Four Steps to Success

Closing Remarks

Software is the gift that keeps on giving

Code it up once, run it ad infinitum...

Code designed for one account will work for other accounts

Analysis is all about knowing what to count

Coding it up is just the dirty work

Start somewhere and then iteratively explore...then exploit

58

Page 59: Mining Social Web Data Like a Pro: Four Steps to Success

Aspire to Do Great Things

Predicting demographic data such as age or gender is possible for some languages

Time and space are fundamentals for grounding online discussions in reality.

Twitter is about as good as it gets for realtime topical analysis

Think of the world as signal producers and signal collectors

Monitoring breaking news events like #Syria

59

Page 60: Mining Social Web Data Like a Pro: Four Steps to Success

The Tip of the Iceberg

60

Page 61: Mining Social Web Data Like a Pro: Four Steps to Success

Stay in Touch

Website: http://MiningTheSocialWeb.com

Twitter: @ptwobrussell

FB: http://facebook.com/MiningTheSocialWeb

LinkedIn: http://linkedin.com/in/ptwobrussell

Email: [email protected]

61