Why Twitter Is All the Rage: A Data Miner's Perspective
-
Upload
matthew-russell -
Category
Technology
-
view
984 -
download
0
description
Transcript of Why Twitter Is All the Rage: A Data Miner's Perspective
Why Twitter Is All The Rage:A Data Miner's PerspectiveMatthew A. Russell
O'Reilly Webcast
15 Oct 2013
1
Hello, My Name Is ... Matthew
2
Educated as a Computer Scientist
CTO @ Digital Reasoning Systems
Data mining; machine learning
Author @ O'Reilly Media
5 published books on technology
Principal @ Zaffra
Selective boutique consulting
Transforming Curiosity Into Insight
3
An open source software (OSS) project
http://bit.ly/MiningTheSocialWeb2E
A book
http://bit.ly/135dHfs
Accessible to (virtually) everyone
Virtual machine with turn-key coding templates for data science experiments
Think of the book as "premium" support for the OSS project
Overview
Background
Twitter as a data science platform
Politics, influence, world events
Data science tools for mining Twitter
Q&A
4
Background
5
Data Science
6
Data => Actionable information
Highly interdisciplinary
Nascent
Necessary
http://wikipedia.org/wiki/Data_science
Digital Signal Explosion
A model for the world: signal and sinks
Growth in data exhaust is accelerating
Digital fingerprints
"Software is eating the world"
Data mining opportunities galore...
7
Digital Data Stats100 terabytes of data uploaded daily to Facebook.
Brands and organizations on Facebook receive 34,722 Likes every minute of the day.
According to Twitter’s own research in early 2012, it sees roughly 175 million tweets every day
30 Billion pieces of content shared on Facebook every month.
Data production will be 44 times greater in 2020 than it was in 2009
According to estimates, the volume of business data worldwide, across all companies, doubles every 1.2 years.
8
See http://wikibon.org/blog/big-data-statistics
Social Media Is All the Rage
World population: ~7B people
Facebook: 1.15B users
Twitter: 500M users
Google+ 343M users
LinkedIn: 238M users
~200M+ blogs (conservative estimate)
9
Why Does Social Media Matter?
It's the frontier for predictive analytics
Understanding world events
Swaying political elections
Modeling human behavior
Analyzing sentiment
Making intelligent recommendations
10
Twitter Is All the Rage
It satisfies fundamental human desires
We want to be heard
We want to satisfy our curiosity
We want it easy
We want it now
Accessible, rich, and (mostly) "open" data
RESTful APIs and JSON responses
Great proving ground for predictive analytics
11
Twitter's Network Dynamics
500M curious users
100M curious users actively engaging
Real-time communication
Short, sweet, ... and fast
Asymmetric Following Model
An interest graph
12
Twitter as a data science platform
13
What's in a Tweet?
14
140 Characters ...
... Plus ~5KB of metadata!
Authorship
Time & location
Tweet "entities"
Replying, retweeting, favoriting, etc.
Twitter and Facebook Compared
15
Accounts Types: "Anything"
"Following" Relationships
Favorites
Retweets
Replies
(Almost) No Privacy Controls
Accounts Types: People & Pages
Mutual Connections
"Likes"
"Shares"
"Comments"
Extensive Privacy Controls
16
Roberto Mercedes
Jorge
Ana
Nina
Social Network Mechanics
Interest Graph Mechanics
17
Roberto Mercedes
Jorge
Ana
Nina
U2
Juan Luis
Guerra
Juan Luís
Guerra
A (Social) Interest Graph
18
Roberto Mercedes
Jorge
Ana
Nina
U2
Juan Luis
Guerra
Juan Luís
Guerra
A (Political) Interest Graph
19
Roberto Mercedes
Jorge
Ana
Nina
Johnny Araya
Rodolfo Hernández
Costa Rican Presidential Candidates
20
@Johnny_Araya@ElDoctor2014
~3 Months on Twitter
21
Aug 2013 Sept 2013 % ChangeJohnny ArayaOtto Guevara GuthJosé María Villalta Florez-Estrada
Dr. Rodolfo HernándezLuis Guillermo Solís Rivera
14,573 15,506 6.40%114 159 39.47%
8,160 8,990 10.17%
745 858 15.17%
1,192 1,487 24.75%
Who are Candidates Following?
22
What are Candidates Tweeting?
23
Potential Influence
24
Potential Twitter Influence
25
Araya Hernández
Followers
TheoreticalReach
Reach (10)
Reach (100)
Reach (1000)
Reach (10,000)
"Suspect" Followers
~14k ~750
~40M ~550k
490 673
289 702
2782 X
2832 X
3,246 94
See also http://wp.me/p3QiJd-2a
Considerations for Measuring Influence
26
Spam bot accounts that effectively are zombies and can’t be harnessed for any utility at all
Inactive or abandoned accounts that can’t influence or be influenced since they are not in use
Accounts that follow so many other accounts that the likelihood of getting noticed (and thus influencing) is practically zero
The network effects of retweets by accounts that are active and can be influenced to spread a message
See also http://wp.me/p3QiJd-2a
27
Araya%
Hernandez%
Araya%
Hernandez%
Twitter Popularity
Social Media Popularity: Araya vs Hernández
Facebook Popularity
Realtime Analysis: #Syria
28
Monitor Twitter's firehose for realtime data using filters such as #Syria
Keep in mind the sheer volume of data can be considerable
Analysis at MiningTheSocialWeb.com
#Syria: Who?
29
See http://wp.me/p3QiJd-1I
#Syria: Who?
30
See http://wp.me/p3QiJd-1I
#Syria: Who?
31
See http://wp.me/p3QiJd-1I
#Syria: What?
32
See http://wp.me/p3QiJd-1I
#Syria: What?
33
See http://wp.me/p3QiJd-1I
#Syria: Where?
34
See http://wp.me/p3QiJd-1I
#Syria: When?
35
See http://wp.me/p3QiJd-1I
#Syria: Why?
36
That's for you (as the data scientist) to decide
Quantitative automation can amplify human intelligence
Qualitative analysis is still requires human intelligence
Data science tools for mining Twitter
37
MTSW Virtual Machine Experience
Goal: Make it easy to transform curiosity into insight
Vagrant-based virtual machine
Virtualbox or AWS
IPython Notebook User Experience
Point-and-click GUI
100+ turn-key examples and templates
Social web mining for the masses
38
Social Media Analysis Framework
A memorable four step process to guide data science experiments:
Aspire
Acquire
Analyze
Summarize
39
40
41
42
43
Free ResourcesMining the Social Web 2E Chapter 1 (Chimera)
http://bit.ly/13XgNWR
Source Code (GitHub)
http://bit.ly/MiningTheSocialWeb2E
http://bit.ly/1fVf5ej (numbered examples)
Screencasts (Vimeo)
http://bit.ly/mtsw2e-screencasts
http://MiningTheSocialWeb.com
44
Q&A
45