Dynamic Multi-Relational Chinese Restaurant Process for Analyzing Influences on Users in Social...

52
Dynamic Multi-Relational Chinese Restaurant Process for Analyzing Influences on Users in Social Media* Indrajit Bhattacharya Research Scientist IBM Research, Bangalore laboration w/ Himabindu Lakkaraju & Chiranjib Bhattachar Workshop on Social Computing IIT Kharagpur, Oct 5-6 2012

Transcript of Dynamic Multi-Relational Chinese Restaurant Process for Analyzing Influences on Users in Social...

Page 1: Dynamic Multi-Relational Chinese Restaurant Process for Analyzing Influences on Users in Social Media* Indrajit Bhattacharya Research Scientist IBM Research,

Dynamic Multi-Relational Chinese Restaurant Process for Analyzing Influences on

Users in Social Media*Indrajit Bhattacharya

Research ScientistIBM Research, Bangalore

*Collaboration w/ Himabindu Lakkaraju & Chiranjib Bhattacharyya

Workshop on Social ComputingIIT Kharagpur, Oct 5-6 2012

Page 2: Dynamic Multi-Relational Chinese Restaurant Process for Analyzing Influences on Users in Social Media* Indrajit Bhattacharya Research Scientist IBM Research,

Social Media Analysis: Motivation

Microblogs: Twitter, Facebook, MySpace

Understanding and analyzing topics & trends

Influences on users

Variety of stakeholders

Business

Government

Social scientists

2

Page 3: Dynamic Multi-Relational Chinese Restaurant Process for Analyzing Influences on Users in Social Media* Indrajit Bhattacharya Research Scientist IBM Research,

Social Media Analysis: Challenges

Network and Influences on Users

User personality: Personal preferences, global and geographic trends, social circle in the network [Yang WSDM 11]

Dynamic nature

Topics & user personalities evolve over time

Volume of data

Existing approaches fall short 3

Page 4: Dynamic Multi-Relational Chinese Restaurant Process for Analyzing Influences on Users in Social Media* Indrajit Bhattacharya Research Scientist IBM Research,

Soc Med Analysis: State of the Art

Content Analysis

Ramage ICWSM 2010, Hong SOMA 2010

Variants of LDA

Inferring User Interests

Ahmed KDD 2011, Wen KDD 2010

Individual features such as user activity or network

Patterns in Temporal Evolution

Yang et al WSDM 20114

Page 5: Dynamic Multi-Relational Chinese Restaurant Process for Analyzing Influences on Users in Social Media* Indrajit Bhattacharya Research Scientist IBM Research,

Bayesian Non-parametric Models

Choosing no of components in a mixture model

Particularly severe problem for large data volumes such as for social media data

Bayesian solution

Infinite dimensional prior

Allows no of mixture components to grow with data size

Cannot capture richness of social media data

Algorithms often not scalable 5

Page 6: Dynamic Multi-Relational Chinese Restaurant Process for Analyzing Influences on Users in Social Media* Indrajit Bhattacharya Research Scientist IBM Research,

Talk Outline

Background: Chinese Restaurant Processes

CRP with multiple relationships: (RelCRP, MRelCRP)

Dynamic MRelCRP

Multi-threaded Online Inference Algorithm

Experimental Results 8

Page 7: Dynamic Multi-Relational Chinese Restaurant Process for Analyzing Influences on Users in Social Media* Indrajit Bhattacharya Research Scientist IBM Research,

Talk Outline

Background: Chinese Restaurant Processes

CRP with multiple relationships: (RelCRP, MRelCRP)

Dynamic MRelCRP

Multi-threaded Online Inference Algorithm

Experimental Results 9

Page 8: Dynamic Multi-Relational Chinese Restaurant Process for Analyzing Influences on Users in Social Media* Indrajit Bhattacharya Research Scientist IBM Research,

Dirichlet Process (Informal)

10

Page 9: Dynamic Multi-Relational Chinese Restaurant Process for Analyzing Influences on Users in Social Media* Indrajit Bhattacharya Research Scientist IBM Research,

Dirichlet Process: Properties

12

Page 10: Dynamic Multi-Relational Chinese Restaurant Process for Analyzing Influences on Users in Social Media* Indrajit Bhattacharya Research Scientist IBM Research,

Chinese Restaurant Process (CRP)

14

Page 11: Dynamic Multi-Relational Chinese Restaurant Process for Analyzing Influences on Users in Social Media* Indrajit Bhattacharya Research Scientist IBM Research,

Talk Outline

Background: Chinese Restaurant Processes

CRP with multiple relationships: (RelCRP, MRelCRP)

Dynamic MRelCRP

Parallelized Online Inference Algorithm

Experimental Results 15

Page 12: Dynamic Multi-Relational Chinese Restaurant Process for Analyzing Influences on Users in Social Media* Indrajit Bhattacharya Research Scientist IBM Research,

Relational Ch. Rest. Pr. (RelCRP)

R16

Page 13: Dynamic Multi-Relational Chinese Restaurant Process for Analyzing Influences on Users in Social Media* Indrajit Bhattacharya Research Scientist IBM Research,

Relational Ch. Rest. Pr. (RelCRP)

17

Page 14: Dynamic Multi-Relational Chinese Restaurant Process for Analyzing Influences on Users in Social Media* Indrajit Bhattacharya Research Scientist IBM Research,

Influence of World-wide Factors

18

Page 15: Dynamic Multi-Relational Chinese Restaurant Process for Analyzing Influences on Users in Social Media* Indrajit Bhattacharya Research Scientist IBM Research,

Influence of World-wide Factors

19

Page 16: Dynamic Multi-Relational Chinese Restaurant Process for Analyzing Influences on Users in Social Media* Indrajit Bhattacharya Research Scientist IBM Research,

Influence of Personal Preferences

20

Page 17: Dynamic Multi-Relational Chinese Restaurant Process for Analyzing Influences on Users in Social Media* Indrajit Bhattacharya Research Scientist IBM Research,

Influence of Personal Preferences

21

Page 18: Dynamic Multi-Relational Chinese Restaurant Process for Analyzing Influences on Users in Social Media* Indrajit Bhattacharya Research Scientist IBM Research,

Influence of Friend Network

22

Page 19: Dynamic Multi-Relational Chinese Restaurant Process for Analyzing Influences on Users in Social Media* Indrajit Bhattacharya Research Scientist IBM Research,

Influence of Friend Network

23

Page 20: Dynamic Multi-Relational Chinese Restaurant Process for Analyzing Influences on Users in Social Media* Indrajit Bhattacharya Research Scientist IBM Research,

Influence of Geography

India China

UK

24

Page 21: Dynamic Multi-Relational Chinese Restaurant Process for Analyzing Influences on Users in Social Media* Indrajit Bhattacharya Research Scientist IBM Research,

Influence of Geography

25

Page 22: Dynamic Multi-Relational Chinese Restaurant Process for Analyzing Influences on Users in Social Media* Indrajit Bhattacharya Research Scientist IBM Research,

Aggregating Influences

RelCRP is exchangeable like the CRP

Useful as a prior for infinite mixture model

RelCRP captures influence of one relation on posts

Influences act simultaneously on any user

Aggregated influence pattern is user specific

Different users affected differently by same combination of world-wide and geographic factors

Page 23: Dynamic Multi-Relational Chinese Restaurant Process for Analyzing Influences on Users in Social Media* Indrajit Bhattacharya Research Scientist IBM Research,

Multi Relational CRP

28

Page 24: Dynamic Multi-Relational Chinese Restaurant Process for Analyzing Influences on Users in Social Media* Indrajit Bhattacharya Research Scientist IBM Research,

Talk Outline

Background: Chinese Restaurant Processes

CRP with multiple relationships: (RelCRP, MRelCRP)

Dynamic MRelCRP

Multi-threaded Online Inference Algorithm

Experimental Results 30

Page 25: Dynamic Multi-Relational Chinese Restaurant Process for Analyzing Influences on Users in Social Media* Indrajit Bhattacharya Research Scientist IBM Research,

Evolving Patterns in Social Media

Number of Topics

Topics die and new ones are born

User Personalities

Susceptibility to influence by world-wide, geographic and friends’ preferences

Existing Topic Distributions

Words go out of fashion, new ones enter vocabulary

Topic Characters:

Popularity of topic changes world-wide, in users preference, sub-networks and geographies 31

Page 26: Dynamic Multi-Relational Chinese Restaurant Process for Analyzing Influences on Users in Social Media* Indrajit Bhattacharya Research Scientist IBM Research,

Dynamic MultiRelCRP

32

Page 27: Dynamic Multi-Relational Chinese Restaurant Process for Analyzing Influences on Users in Social Media* Indrajit Bhattacharya Research Scientist IBM Research,

User Personality Trends

33

Page 28: Dynamic Multi-Relational Chinese Restaurant Process for Analyzing Influences on Users in Social Media* Indrajit Bhattacharya Research Scientist IBM Research,

Evolving Topic Distributions

34

Page 29: Dynamic Multi-Relational Chinese Restaurant Process for Analyzing Influences on Users in Social Media* Indrajit Bhattacharya Research Scientist IBM Research,

Topic Character Trends

35

Page 30: Dynamic Multi-Relational Chinese Restaurant Process for Analyzing Influences on Users in Social Media* Indrajit Bhattacharya Research Scientist IBM Research,

Talk Outline

Background: Chinese Restaurant Processes

CRP with multiple relationships: (RelCRP, MRelCRP)

Dynamic MRelCRP

Multi-threaded Online Inference Algorithm

Experimental Results 36

Page 31: Dynamic Multi-Relational Chinese Restaurant Process for Analyzing Influences on Users in Social Media* Indrajit Bhattacharya Research Scientist IBM Research,

Inference and Estimation Tasks

37

Page 32: Dynamic Multi-Relational Chinese Restaurant Process for Analyzing Influences on Users in Social Media* Indrajit Bhattacharya Research Scientist IBM Research,

Online Algorithm

Traditional iterative framework does not scale for social media data

Sequential Monte Carlo methods [Canini AIStats ‘09] that rejuvenate some old labels also infeasible

Online sampling [Banerjee SDM ‘07] does not revisit old labels at all; initial batch phase

Adapt for non-parametric setting

38

Page 33: Dynamic Multi-Relational Chinese Restaurant Process for Analyzing Influences on Users in Social Media* Indrajit Bhattacharya Research Scientist IBM Research,

Multi-threaded Implementation

Sequential online implementation does not scale

Iterative Gibbs sampling algorithms parallelized for hierarchical Bayesian models [Asuncion NIPS 08, Smola VLDB 10]

Our algorithm is parallel, online and non-parametric

Explicit consolidation by master thread at the end of each iteration

Only new topics consolidated 39

Page 34: Dynamic Multi-Relational Chinese Restaurant Process for Analyzing Influences on Users in Social Media* Indrajit Bhattacharya Research Scientist IBM Research,

Talk Outline

Background: Chinese Restaurant Processes

CRP with multiple relationships: (RelCRP, MRelCRP)

Dynamic MRelCRP

Multi-threaded Online Inference Algorithm

Experimental Results 40

Page 35: Dynamic Multi-Relational Chinese Restaurant Process for Analyzing Influences on Users in Social Media* Indrajit Bhattacharya Research Scientist IBM Research,

Datasets and Baselines

Twitter: 360 million tweets (Jun-Dec 2009)

Facebook: 300,000 posts (public profiles, 3 mths)

Latent Dirichlet Allocation (LDA)

[Hong SOMA 2010]

Labeled LDA (L-LDA)

Hashtags as topics [Ramage ICWSM 2010]

Timeline

Dynamic non-parametric topic model [Ahmed UAI 2010] 41

Page 36: Dynamic Multi-Relational Chinese Restaurant Process for Analyzing Influences on Users in Social Media* Indrajit Bhattacharya Research Scientist IBM Research,

1 Model Goodness

Perplexity: Ability to generalize to unseen data

Both network and dynamics are important for modeling social media data

Model Twitter FacebookDMRelCRP 1188.29 1562.34Timeline 1582.86 1802.9L-LDA 1982.76 -LDA 2932.06 3602

Perplexity

42

Page 37: Dynamic Multi-Relational Chinese Restaurant Process for Analyzing Influences on Users in Social Media* Indrajit Bhattacharya Research Scientist IBM Research,

2 Quality of Discovered Topics

Label assigned to each post indicating category

Distribution over words indicating semantics

A. Clustering posts using topic labels

B. Prediction using topic labels

Predicting post authorship & user commenting activity

C. Major event detection

43

Page 38: Dynamic Multi-Relational Chinese Restaurant Process for Analyzing Influences on Users in Social Media* Indrajit Bhattacharya Research Scientist IBM Research,

2A Post Clustering using Topics

Use hashtags as gold standard (for Twitter)

16K posts #NIPS2009, #ICML2009, #bollywood etc

DMRelCRP close to L-LDA without using hashtags

DMelCRP produces ‘finer-grained’ clusters

Model nMI R-Index F1DMRelCRP 0.93 0.88 0.86Timeline 0.81 0.72 0.73L-LDA 1 1 1LDA 0.55 0.52 0.48

Clustering accuracy (Tw)

44

Page 39: Dynamic Multi-Relational Chinese Restaurant Process for Analyzing Influences on Users in Social Media* Indrajit Bhattacharya Research Scientist IBM Research,

2B Prediction Using Topics

Authorship: Given post and user, predict if author

Commenting activity: Given post and (non-author) user, predict if user comments on that post

DMRelCRP topics lead to more accurate prediction

Model Twitter Facebook Twitter FacebookDMRelCRP 0.793 0.734 0.683 0.648Timeline 0.718 0.669 0.582 0.579L-LDA 0.521 0.432 0.429 0.482LDA 0.647 - 0.542 -

Authorship Commenting

45

Page 40: Dynamic Multi-Relational Chinese Restaurant Process for Analyzing Influences on Users in Social Media* Indrajit Bhattacharya Research Scientist IBM Research,

2C Major Event Detection

47

Page 41: Dynamic Multi-Relational Chinese Restaurant Process for Analyzing Influences on Users in Social Media* Indrajit Bhattacharya Research Scientist IBM Research,

2C Major Event Detection

48

Page 42: Dynamic Multi-Relational Chinese Restaurant Process for Analyzing Influences on Users in Social Media* Indrajit Bhattacharya Research Scientist IBM Research,

3 Analysis of Influences

49

Page 43: Dynamic Multi-Relational Chinese Restaurant Process for Analyzing Influences on Users in Social Media* Indrajit Bhattacharya Research Scientist IBM Research,

3A Global Personality Trends

50

Page 44: Dynamic Multi-Relational Chinese Restaurant Process for Analyzing Influences on Users in Social Media* Indrajit Bhattacharya Research Scientist IBM Research,

3A Global Personality Trends

51

Michael Jackson’s death

FIFA WC

Google Wave

Page 45: Dynamic Multi-Relational Chinese Restaurant Process for Analyzing Influences on Users in Social Media* Indrajit Bhattacharya Research Scientist IBM Research,

3A Global Personality Trends

52

Page 46: Dynamic Multi-Relational Chinese Restaurant Process for Analyzing Influences on Users in Social Media* Indrajit Bhattacharya Research Scientist IBM Research,

3B Geo-specific Personality Trends

Personality trends very similar in UK and US

Geographic influences high at different epochs 53

Page 47: Dynamic Multi-Relational Chinese Restaurant Process for Analyzing Influences on Users in Social Media* Indrajit Bhattacharya Research Scientist IBM Research,

3B Geo-specific Personality Trends

India: W-wide and geographic influences weaker

China: W-wide weak, geo strong; stable pattern 54

Page 48: Dynamic Multi-Relational Chinese Restaurant Process for Analyzing Influences on Users in Social Media* Indrajit Bhattacharya Research Scientist IBM Research,

3C Topic Character Trends

55

Page 49: Dynamic Multi-Relational Chinese Restaurant Process for Analyzing Influences on Users in Social Media* Indrajit Bhattacharya Research Scientist IBM Research,

3C Topic Character Trends

56

Page 50: Dynamic Multi-Relational Chinese Restaurant Process for Analyzing Influences on Users in Social Media* Indrajit Bhattacharya Research Scientist IBM Research,

3C Topic Character Trends

57

Page 51: Dynamic Multi-Relational Chinese Restaurant Process for Analyzing Influences on Users in Social Media* Indrajit Bhattacharya Research Scientist IBM Research,

Scaling with Data Size

Java-based multi-threaded framework; 7 threads

8-core 32 GB RAM

Scales largely because of multi-threading 58

Page 52: Dynamic Multi-Relational Chinese Restaurant Process for Analyzing Influences on Users in Social Media* Indrajit Bhattacharya Research Scientist IBM Research,

Summary

First attempt at studying user influences in social media data

New non-parametric model that captures multiple relationships and temporal evolution

Multi-threaded online Gibbs sampling algorithm

Extensive evaluation on large real dataset

Topics lead to better clustering and prediction

Insights on user influence patterns

59