Online Bayesian Models for Personal Analytics in Social Media Svitlana Volkova and Benjamin Van...

18
Online Bayesian Models for Personal Analytics in Social Media Svitlana Volkova and Benjamin Van Durme [email protected] http://www.cs.jhu.edu/~svitlana / Center for Language and Speech Processing, Johns Hopkins University, Human Language Technology Center of Excellence

Transcript of Online Bayesian Models for Personal Analytics in Social Media Svitlana Volkova and Benjamin Van...

Page 1: Online Bayesian Models for Personal Analytics in Social Media Svitlana Volkova and Benjamin Van Durme svitlana@jhu.edu svitlana

Online Bayesian Models for Personal Analytics in Social Media

Svitlana Volkova and Benjamin Van Durme

[email protected] http://www.cs.jhu.edu/~svitlana/

Center for Language and Speech Processing, Johns Hopkins University,

Human Language Technology Center of Excellence

Page 2: Online Bayesian Models for Personal Analytics in Social Media Svitlana Volkova and Benjamin Van Durme svitlana@jhu.edu svitlana

Social Media Predictive Analytics

• Personalized, diverse and timely data • Can reveal user interests, preferences and

opinions

Social Network Prediction App - https://apps.facebook.com/snpredictionapp/

DemographicsPro – http://www.demographicspro.com/WolphralAlpha Analytics – http://www.wolframalpha.com/facebook/

Page 3: Online Bayesian Models for Personal Analytics in Social Media Svitlana Volkova and Benjamin Van Durme svitlana@jhu.edu svitlana

User Attribute Prediction Task

Political PreferenceRao et al., 2010; Conover et al., 2011, Pennacchiotti and Popescu, 2011; Zamal et al.,

2012; Cohen and Ruths, 2013; Volkova et. al, 2014

.

.

.

Communications

GenderGarera and Yarowsky, 2009;

Rao et al., 2010; Burger et al., 2011; Van Durme, 2012;

Zamal et al., 2012; Bergsma and Van Durme, 2013

AgeRao et al., 2010; Zamal et al., 2012; Cohen and Ruth, 2013;

Nguyen et al., 2011, 2013; Sap et al., 2014

AAAI 2015 Demo (joint work with Microsoft Research) Income, Education Level, Ethnicity, Life Satisfaction, Optimism, Personality, Showing Off, Self-Promoting

Page 4: Online Bayesian Models for Personal Analytics in Social Media Svitlana Volkova and Benjamin Van Durme svitlana@jhu.edu svitlana

OutlineI. Our Approach

II. Dynamic (Streaming) Models

III.Experimental Results

IV. Practical Recommendations

Page 5: Online Bayesian Models for Personal Analytics in Social Media Svitlana Volkova and Benjamin Van Durme svitlana@jhu.edu svitlana

Existing Approaches ~1K Tweets*

….…….…….…….…….…….…….…….…

How long does it take for an average Twitter user to produce thousands of tweets?

*Rao et al., 2010; Conover et al., 2011; Pennacchiotti and Popescu, 2011a; Burger et al., 2011; Zamal et al., 2012; Nguyen et al., 2013

Tweets as a

document

What if we want to make reliable predictions immediately after 10 tweets?

Page 6: Online Bayesian Models for Personal Analytics in Social Media Svitlana Volkova and Benjamin Van Durme svitlana@jhu.edu svitlana

Attributed Social Networks

*Conover et al., 2011; Pennacchiotti and Popescu, 2011a; Zamal et al., 2012; Volkova et al., 2014.

Page 7: Online Bayesian Models for Personal Analytics in Social Media Svitlana Volkova and Benjamin Van Durme svitlana@jhu.edu svitlana

Our Approach

Static (Batch)

Predictions

Streaming (Online)

Inference

Dynamic (Iterative) Learning and

Prediction• Offline

training• Offline

predictions• No or limited

network information

• Offline training• Online

predictions in time (ACL’14)

• Exploring 6 types of neighborhoods

① Streaming nature of SM: dynamic training and prediction

② Network structure: joint user-neighbour streams③ Trade-off between prediction time vs. model

quality

• Online predictions• Relying on

neighbors + Iterative re-training+ Active learning+ Interactive

rationale annotation

Page 8: Online Bayesian Models for Personal Analytics in Social Media Svitlana Volkova and Benjamin Van Durme svitlana@jhu.edu svitlana

Online Predictions:Iterative Bayesian Updates

Time

?

?

Page 9: Online Bayesian Models for Personal Analytics in Social Media Svitlana Volkova and Benjamin Van Durme svitlana@jhu.edu svitlana

Iterative Batch Learning

Time

R

D

?

?

t1

t1

Labeled

Unlabeled

t1

t1

Iterative Batch Retraining (IB)

Iterative Batch with Rationale Filtering (IBR)

?

tm…

tmt2 …

t2 …

tmt2 …

Page 10: Online Bayesian Models for Personal Analytics in Social Media Svitlana Volkova and Benjamin Van Durme svitlana@jhu.edu svitlana

Rationales

Rationales are explicitly highlighted ngrams in tweets that best justified why the annotators made their labeling

decisions

Page 11: Online Bayesian Models for Personal Analytics in Social Media Svitlana Volkova and Benjamin Van Durme svitlana@jhu.edu svitlana

Active LearningL

ab

ele

dU

nla

bele

d

1-Jan-2011

1-Feb-2011

1-Nov-2011

1-Dec-2011

Time

Active Without Oracle (AWOO)

Active With Rationale Filtering (AWR)

Active With Oracle (AWO)

Page 12: Online Bayesian Models for Personal Analytics in Social Media Svitlana Volkova and Benjamin Van Durme svitlana@jhu.edu svitlana

Performance Metrics

• Accuracy over time:

• Find optimal models:– Data steam type (user, friend, user + friend)– Time (more correctly classified users faster)– Prediction quality (better accuracy over time)

Page 13: Online Bayesian Models for Personal Analytics in Social Media Svitlana Volkova and Benjamin Van Durme svitlana@jhu.edu svitlana

Results: Iterative Batch Learning

Mar Jun Sep50

100

150

200

250

300

0.0

0.2

0.4

0.6

0.8

1.0

user

Co

rrectl

y c

lassifi

ed

Accu

racy

Mar Jun Sep50

100

150

200

250

300

0.0

0.2

0.4

0.6

0.8

1.0

user

Co

rrectl

y c

lassifi

ed

Accu

racy

IB: higher recall IBR: higher precision

Time: # correctly classified users increases over time

IB faster, IBR slower

Data stream selection:User + friend stream > user stream

Page 14: Online Bayesian Models for Personal Analytics in Social Media Svitlana Volkova and Benjamin Van Durme svitlana@jhu.edu svitlana

Results: Active Learning AWOO: higher recall AWR: higher precision

Time:Unlike IB/IBR models, AWOO/AWR

models classify more users correctly faster (in Mar) but then plateaus

Mar Jun Sep50

100

150

200

250

300

0.0

0.2

0.4

0.6

0.8

1.0

user

Co

rrectl

y c

lassifi

ed

Accu

racy

Mar Jun Sep50

100

150

200

250

300

0.0

0.2

0.4

0.6

0.8

1.0

user

Co

rrectl

y c

lassifi

ed

Accu

racy

Page 15: Online Bayesian Models for Personal Analytics in Social Media Svitlana Volkova and Benjamin Van Durme svitlana@jhu.edu svitlana

Mar Jun Sep0.5

0.6

0.7

0.8

0.9

1.0

IB: userIBR: user

Accu

racy

Mar Jun Sep0.5

0.6

0.7

0.8

0.9

1.0

AWOO: userAWR: user

Accu

racy

_x0003_Mar

_x0003_Jun

_x0003_Sep

0.5

0.6

0.7

0.8

0.9

1.0

IB: user + friend

Acc

ura

cy

_x0003_Mar

_x0003_Jun

_x0003_Sep

0.5

0.6

0.7

0.8

0.9

1.0

AWOO: user + friend

Acc

ura

cy

batch < activeu

ser

+ f

rien

d >

use

rResults: Model Quality

Page 16: Online Bayesian Models for Personal Analytics in Social Media Svitlana Volkova and Benjamin Van Durme svitlana@jhu.edu svitlana

Summary

• Active learning > iterative batch

• N, UN > U: “neighbors give you away”

• Higher confidence => higher precision, lower confidence => higher recall (as expected)

• Rationales significantly improve results

Page 17: Online Bayesian Models for Personal Analytics in Social Media Svitlana Volkova and Benjamin Van Durme svitlana@jhu.edu svitlana

Practical Recommendations• If you want to deliver ads fast but to be less

confident in user attribute predictions:– use models with higher recall (AWOO, IB)– apply lower decision threshold e.g., 0.55

• If you want to deliver ads to a true target crowd but latter in time: – use models with higher precision (AWR, IBR)– apply higher decision threshold e.g., 0.95 – models with rational filtering (IBR, AWR) require less

computation (lower-dimensional feature vectors), are more accurate but annotations cost money (Mechanical Turk)

• For highly assortative attributes e.g., political preference use a joint user-neighbor stream

Page 18: Online Bayesian Models for Personal Analytics in Social Media Svitlana Volkova and Benjamin Van Durme svitlana@jhu.edu svitlana

Thank you!Labeled Twitter network data for gender, age, political preference

prediction: http://www.cs.jhu.edu/~svitlana/

Interested in using our models for your research or collaboration: code and pre-trained models for inferring demographic attributes,

personality and 6 Ekman’s emotions available on request: [email protected]

AAAI Technical DemoInferring Latent User Properties from Texts Published in

Social MediaWednesday, January 28 6:30 – 8:00 Zilker Ballroom

I am on a job market. Hire me!

Email: [email protected]