Why Twitter Is All the Rage: A Data Miner's Perspective

45
Why Twitter Is All The Rage: A Data Miner's Perspective Matthew A. Russell O'Reilly Webcast 15 Oct 2013 1

description

A presentation on data mining with Twitter that was originally presented as an O'Reilly webinar. See http://oreillynet.com/pub/e/2928 for the archived webinar video.

Transcript of Why Twitter Is All the Rage: A Data Miner's Perspective

Page 1: Why Twitter Is All the Rage: A Data Miner's Perspective

Why Twitter Is All The Rage:A Data Miner's PerspectiveMatthew A. Russell

O'Reilly Webcast

15 Oct 2013

1

Page 2: Why Twitter Is All the Rage: A Data Miner's Perspective

Hello, My Name Is ... Matthew

2

Educated as a Computer Scientist

CTO @ Digital Reasoning Systems

Data mining; machine learning

Author @ O'Reilly Media

5 published books on technology

Principal @ Zaffra

Selective boutique consulting

Page 3: Why Twitter Is All the Rage: A Data Miner's Perspective

Transforming Curiosity Into Insight

3

An open source software (OSS) project

http://bit.ly/MiningTheSocialWeb2E

A book

http://bit.ly/135dHfs

Accessible to (virtually) everyone

Virtual machine with turn-key coding templates for data science experiments

Think of the book as "premium" support for the OSS project

Page 4: Why Twitter Is All the Rage: A Data Miner's Perspective

Overview

Background

Twitter as a data science platform

Politics, influence, world events

Data science tools for mining Twitter

Q&A

4

Page 5: Why Twitter Is All the Rage: A Data Miner's Perspective

Background

5

Page 6: Why Twitter Is All the Rage: A Data Miner's Perspective

Data Science

6

Data => Actionable information

Highly interdisciplinary

Nascent

Necessary

http://wikipedia.org/wiki/Data_science

Page 7: Why Twitter Is All the Rage: A Data Miner's Perspective

Digital Signal Explosion

A model for the world: signal and sinks

Growth in data exhaust is accelerating

Digital fingerprints

"Software is eating the world"

Data mining opportunities galore...

7

Page 8: Why Twitter Is All the Rage: A Data Miner's Perspective

Digital Data Stats100 terabytes of data uploaded daily to Facebook.

Brands and organizations on Facebook receive 34,722 Likes every minute of the day.

According to Twitter’s own research in early 2012, it sees roughly 175 million tweets every day

30 Billion pieces of content shared on Facebook every month.

Data production will be 44 times greater in 2020 than it was in 2009

According to estimates, the volume of business data worldwide, across all companies, doubles every 1.2 years.

8

See http://wikibon.org/blog/big-data-statistics

Page 9: Why Twitter Is All the Rage: A Data Miner's Perspective

Social Media Is All the Rage

World population: ~7B people

Facebook: 1.15B users

Twitter: 500M users

Google+ 343M users

LinkedIn: 238M users

~200M+ blogs (conservative estimate)

9

Page 10: Why Twitter Is All the Rage: A Data Miner's Perspective

Why Does Social Media Matter?

It's the frontier for predictive analytics

Understanding world events

Swaying political elections

Modeling human behavior

Analyzing sentiment

Making intelligent recommendations

10

Page 11: Why Twitter Is All the Rage: A Data Miner's Perspective

Twitter Is All the Rage

It satisfies fundamental human desires

We want to be heard

We want to satisfy our curiosity

We want it easy

We want it now

Accessible, rich, and (mostly) "open" data

RESTful APIs and JSON responses

Great proving ground for predictive analytics

11

Page 12: Why Twitter Is All the Rage: A Data Miner's Perspective

Twitter's Network Dynamics

500M curious users

100M curious users actively engaging

Real-time communication

Short, sweet, ... and fast

Asymmetric Following Model

An interest graph

12

Page 13: Why Twitter Is All the Rage: A Data Miner's Perspective

Twitter as a data science platform

13

Page 14: Why Twitter Is All the Rage: A Data Miner's Perspective

What's in a Tweet?

14

140 Characters ...

... Plus ~5KB of metadata!

Authorship

Time & location

Tweet "entities"

Replying, retweeting, favoriting, etc.

Page 15: Why Twitter Is All the Rage: A Data Miner's Perspective

Twitter and Facebook Compared

15

Twitter

Accounts Types: "Anything"

"Following" Relationships

Favorites

Retweets

Replies

(Almost) No Privacy Controls

Facebook

Accounts Types: People & Pages

Mutual Connections

"Likes"

"Shares"

"Comments"

Extensive Privacy Controls

Page 16: Why Twitter Is All the Rage: A Data Miner's Perspective

16

Roberto Mercedes

Jorge

Ana

Nina

Social Network Mechanics

Page 17: Why Twitter Is All the Rage: A Data Miner's Perspective

Interest Graph Mechanics

17

Roberto Mercedes

Jorge

Ana

Nina

U2

Juan Luis

Guerra

Juan Luís

Guerra

Page 18: Why Twitter Is All the Rage: A Data Miner's Perspective

A (Social) Interest Graph

18

Roberto Mercedes

Jorge

Ana

Nina

U2

Juan Luis

Guerra

Juan Luís

Guerra

Page 19: Why Twitter Is All the Rage: A Data Miner's Perspective

A (Political) Interest Graph

19

Roberto Mercedes

Jorge

Ana

Nina

Johnny Araya

Rodolfo Hernández

Page 20: Why Twitter Is All the Rage: A Data Miner's Perspective

Costa Rican Presidential Candidates

20

@Johnny_Araya@ElDoctor2014

Page 21: Why Twitter Is All the Rage: A Data Miner's Perspective

~3 Months on Twitter

21

Aug 2013 Sept 2013 % ChangeJohnny ArayaOtto Guevara GuthJosé María Villalta Florez-Estrada

Dr. Rodolfo HernándezLuis Guillermo Solís Rivera

14,573 15,506 6.40%114 159 39.47%

8,160 8,990 10.17%

745 858 15.17%

1,192 1,487 24.75%

Page 22: Why Twitter Is All the Rage: A Data Miner's Perspective

Who are Candidates Following?

22

Page 23: Why Twitter Is All the Rage: A Data Miner's Perspective

What are Candidates Tweeting?

23

Page 24: Why Twitter Is All the Rage: A Data Miner's Perspective

Potential Influence

24

Page 25: Why Twitter Is All the Rage: A Data Miner's Perspective

Potential Twitter Influence

25

Araya Hernández

Followers

TheoreticalReach

Reach (10)

Reach (100)

Reach (1000)

Reach (10,000)

"Suspect" Followers

~14k ~750

~40M ~550k

490 673

289 702

2782 X

2832 X

3,246 94

See also http://wp.me/p3QiJd-2a

Page 26: Why Twitter Is All the Rage: A Data Miner's Perspective

Considerations for Measuring Influence

26

Spam bot accounts that effectively are zombies and can’t be harnessed for any utility at all

Inactive or abandoned accounts that can’t influence or be influenced since they are not in use

Accounts that follow so many other accounts that the likelihood of getting noticed (and thus influencing) is practically zero

The network effects of retweets by accounts that are active and can be influenced to spread a message

See also http://wp.me/p3QiJd-2a

Page 27: Why Twitter Is All the Rage: A Data Miner's Perspective

27

Araya%

Hernandez%

Araya%

Hernandez%

Twitter Popularity

Social Media Popularity: Araya vs Hernández

Facebook Popularity

Page 28: Why Twitter Is All the Rage: A Data Miner's Perspective

Realtime Analysis: #Syria

28

Monitor Twitter's firehose for realtime data using filters such as #Syria

Keep in mind the sheer volume of data can be considerable

Analysis at MiningTheSocialWeb.com

Page 29: Why Twitter Is All the Rage: A Data Miner's Perspective

#Syria: Who?

29

See http://wp.me/p3QiJd-1I

Page 30: Why Twitter Is All the Rage: A Data Miner's Perspective

#Syria: Who?

30

See http://wp.me/p3QiJd-1I

Page 31: Why Twitter Is All the Rage: A Data Miner's Perspective

#Syria: Who?

31

See http://wp.me/p3QiJd-1I

Page 32: Why Twitter Is All the Rage: A Data Miner's Perspective

#Syria: What?

32

See http://wp.me/p3QiJd-1I

Page 33: Why Twitter Is All the Rage: A Data Miner's Perspective

#Syria: What?

33

See http://wp.me/p3QiJd-1I

Page 34: Why Twitter Is All the Rage: A Data Miner's Perspective

#Syria: Where?

34

See http://wp.me/p3QiJd-1I

Page 35: Why Twitter Is All the Rage: A Data Miner's Perspective

#Syria: When?

35

See http://wp.me/p3QiJd-1I

Page 36: Why Twitter Is All the Rage: A Data Miner's Perspective

#Syria: Why?

36

That's for you (as the data scientist) to decide

Quantitative automation can amplify human intelligence

Qualitative analysis is still requires human intelligence

Page 37: Why Twitter Is All the Rage: A Data Miner's Perspective

Data science tools for mining Twitter

37

Page 38: Why Twitter Is All the Rage: A Data Miner's Perspective

MTSW Virtual Machine Experience

Goal: Make it easy to transform curiosity into insight

Vagrant-based virtual machine

Virtualbox or AWS

IPython Notebook User Experience

Point-and-click GUI

100+ turn-key examples and templates

Social web mining for the masses

38

Page 39: Why Twitter Is All the Rage: A Data Miner's Perspective

Social Media Analysis Framework

A memorable four step process to guide data science experiments:

Aspire

Acquire

Analyze

Summarize

39

Page 40: Why Twitter Is All the Rage: A Data Miner's Perspective

40

Page 41: Why Twitter Is All the Rage: A Data Miner's Perspective

41

Page 42: Why Twitter Is All the Rage: A Data Miner's Perspective

42

Page 43: Why Twitter Is All the Rage: A Data Miner's Perspective

43

Page 44: Why Twitter Is All the Rage: A Data Miner's Perspective

Free ResourcesMining the Social Web 2E Chapter 1 (Chimera)

http://bit.ly/13XgNWR

Source Code (GitHub)

http://bit.ly/MiningTheSocialWeb2E

http://bit.ly/1fVf5ej (numbered examples)

Screencasts (Vimeo)

http://bit.ly/mtsw2e-screencasts

http://MiningTheSocialWeb.com

44

Page 45: Why Twitter Is All the Rage: A Data Miner's Perspective

Q&A

45