Enhance micro-blogging recommendations of posts with an ... · 1 State of the art 2 Data Analysis...

52
Enhance micro-blogging recommendations of posts with an homophily-based graph Quentin Grossetti 1, 2 Supervised by C´ edric du Mouza 2 , Camelia Constantin 1 and Nicolas Travers 2 1 LIP6 - Universit´ e Pierre Marie Curie - Paris, France 2 CEDRIC Laboratory - CNAM - Paris, France BDA - Novembre 2017 Enhance micro-blogging recommendations of posts with an homophily-based graph BDA - Novembre 2017 1 / 31

Transcript of Enhance micro-blogging recommendations of posts with an ... · 1 State of the art 2 Data Analysis...

Page 1: Enhance micro-blogging recommendations of posts with an ... · 1 State of the art 2 Data Analysis Topology Retweets Homophily 3 Approach Similarity graph Propagation Model 4 Experiments

Enhance micro-blogging recommendations of posts withan homophily-based graph

Quentin Grossetti1,2

Supervised by Cedric du Mouza2,Camelia Constantin1 and Nicolas Travers2

1LIP6 - Universite Pierre Marie Curie - Paris, France

2CEDRIC Laboratory - CNAM - Paris, France

BDA - Novembre 2017

Enhance micro-blogging recommendations of posts with an homophily-based graph BDA - Novembre 2017 1 / 31

Page 2: Enhance micro-blogging recommendations of posts with an ... · 1 State of the art 2 Data Analysis Topology Retweets Homophily 3 Approach Similarity graph Propagation Model 4 Experiments

IntroductionContext

Growth of microblogging plateforms since 2000

700 millions of messages/day in 2017

300 millions of messages/day in 2017

70 millions of publications/day in 2017

70 millions of pictures/day in 2017

Enhance micro-blogging recommendations of posts with an homophily-based graph BDA - Novembre 2017 2 / 31

Page 3: Enhance micro-blogging recommendations of posts with an ... · 1 State of the art 2 Data Analysis Topology Retweets Homophily 3 Approach Similarity graph Propagation Model 4 Experiments

IntroductionReal life examples

Finding Users of Interest in Micro-blogging Systems (EDBT 2016)

Enhance micro-blogging recommendations of posts with an homophily-based graph BDA - Novembre 2017 3 / 31

Page 4: Enhance micro-blogging recommendations of posts with an ... · 1 State of the art 2 Data Analysis Topology Retweets Homophily 3 Approach Similarity graph Propagation Model 4 Experiments

IntroductionReal life examples

Finding Users of Interest in Micro-blogging Systems (EDBT 2016)

Enhance micro-blogging recommendations of posts with an homophily-based graph BDA - Novembre 2017 3 / 31

Page 5: Enhance micro-blogging recommendations of posts with an ... · 1 State of the art 2 Data Analysis Topology Retweets Homophily 3 Approach Similarity graph Propagation Model 4 Experiments

Problem

How to connect users to relevant messages ?

Recommendation of messages

700M new messages every day

300M of users

Real time

Enhance micro-blogging recommendations of posts with an homophily-based graph BDA - Novembre 2017 4 / 31

Page 6: Enhance micro-blogging recommendations of posts with an ... · 1 State of the art 2 Data Analysis Topology Retweets Homophily 3 Approach Similarity graph Propagation Model 4 Experiments

Table of contents

1 State of the art

2 Data AnalysisTopologyRetweetsHomophily

3 ApproachSimilarity graphPropagation Model

4 ExperimentsProtocolResultsUpdating strategies

5 Conclusion

6 Annexes

Enhance micro-blogging recommendations of posts with an homophily-based graph BDA - Novembre 2017 5 / 31

Page 7: Enhance micro-blogging recommendations of posts with an ... · 1 State of the art 2 Data Analysis Topology Retweets Homophily 3 Approach Similarity graph Propagation Model 4 Experiments

State of the art

State of the art

Content-based [Lops (2011)]

Method Pros ConsContent-based No need of interactions tweets are hard to describe

Collaborative filtering simple model and good results too large matrixMatrix Factorization efficient to fight sparsity matrix growing too fast

Hybrid systems increase user engagement hard to describe relationshipRandom walks models very cheap low memory

Enhance micro-blogging recommendations of posts with an homophily-based graph BDA - Novembre 2017 6 / 31

Page 8: Enhance micro-blogging recommendations of posts with an ... · 1 State of the art 2 Data Analysis Topology Retweets Homophily 3 Approach Similarity graph Propagation Model 4 Experiments

State of the art

State of the art

Collaborative filtering [Schafer (2007)]

Method Pros ConsContent-based No need of interactions tweets are hard to describe

Collaborative filtering simple model and good results too large matrix

Matrix Factorization efficient to fight sparsity matrix growing too fastHybrid systems increase user engagement hard to describe relationship

Random walks models very cheap low memory

Enhance micro-blogging recommendations of posts with an homophily-based graph BDA - Novembre 2017 6 / 31

Page 9: Enhance micro-blogging recommendations of posts with an ... · 1 State of the art 2 Data Analysis Topology Retweets Homophily 3 Approach Similarity graph Propagation Model 4 Experiments

State of the art

State of the art

Matrix Factorization [Koren (2009)]

Method Pros ConsContent-based No need of interactions tweets are hard to describe

Collaborative filtering simple model and good results too large matrixMatrix Factorization efficient to fight sparsity matrix growing too fast

Hybrid systems increase user engagement hard to describe relationshipRandom walks models very cheap low memory

Enhance micro-blogging recommendations of posts with an homophily-based graph BDA - Novembre 2017 6 / 31

Page 10: Enhance micro-blogging recommendations of posts with an ... · 1 State of the art 2 Data Analysis Topology Retweets Homophily 3 Approach Similarity graph Propagation Model 4 Experiments

State of the art

State of the art

Hybrid systems [Bostandjiev (2010)]

Method Pros ConsContent-based No need of interactions tweets are hard to describe

Collaborative filtering simple model and good results too large matrixMatrix Factorization efficient to fight sparsity matrix growing too fast

Hybrid systems increase user engagement hard to describe relationship

Random walks models very cheap low memory

Enhance micro-blogging recommendations of posts with an homophily-based graph BDA - Novembre 2017 6 / 31

Page 11: Enhance micro-blogging recommendations of posts with an ... · 1 State of the art 2 Data Analysis Topology Retweets Homophily 3 Approach Similarity graph Propagation Model 4 Experiments

State of the art

State of the art

Random walks models [Sharma (2016)]

Method Pros ConsContent-based No need of interactions tweets are hard to describe

Collaborative filtering simple model and good results too large matrixMatrix Factorization efficient to fight sparsity matrix growing too fast

Hybrid systems increase user engagement hard to describe relationshipRandom walks models very cheap low memory

Enhance micro-blogging recommendations of posts with an homophily-based graph BDA - Novembre 2017 6 / 31

Page 12: Enhance micro-blogging recommendations of posts with an ... · 1 State of the art 2 Data Analysis Topology Retweets Homophily 3 Approach Similarity graph Propagation Model 4 Experiments

State of the art

State of the artNot only recommendations

User recommendation(topology,content-based,demographic etc...)

Hashtag (Bayesian model,euclidien...)

Timeline Filtering (DeepLearning)

Few papers on tweetsrecommendation except Twitterin 2016

Enhance micro-blogging recommendations of posts with an homophily-based graph BDA - Novembre 2017 7 / 31

Page 13: Enhance micro-blogging recommendations of posts with an ... · 1 State of the art 2 Data Analysis Topology Retweets Homophily 3 Approach Similarity graph Propagation Model 4 Experiments

Data Analysis

Data AnalysisDataset

Updated connected component from the graph found in [Kwak (2009)].

No of nodes 2,182,867No of edges 325,451,980No of tweets 2,571,173,369

Avg. out-degree 57.8Avg. in-degree 69.4max out-degree 348,595max in-degree 185,401

Diameter 15Average shortest path 3.7

Table – Twitter dataset characteristics

Enhance micro-blogging recommendations of posts with an homophily-based graph BDA - Novembre 2017 8 / 31

Page 14: Enhance micro-blogging recommendations of posts with an ... · 1 State of the art 2 Data Analysis Topology Retweets Homophily 3 Approach Similarity graph Propagation Model 4 Experiments

Data Analysis Topology

Data AnalysisTopology

1 2 3 4 5 10 15

100

102

104

106

108

1010

Smallest path

Nu

mb

erof

pat

hs

Figure – Twitter smallest paths distribution

Small world with averagedistance of 3.7

Enhance micro-blogging recommendations of posts with an homophily-based graph BDA - Novembre 2017 9 / 31

Page 15: Enhance micro-blogging recommendations of posts with an ... · 1 State of the art 2 Data Analysis Topology Retweets Homophily 3 Approach Similarity graph Propagation Model 4 Experiments

Data Analysis Retweets

Data AnalysisRetweets

0 1 2-5 6-50 51-200201-500 500+

103

104

105

106

107

108

109

1010

Number of retweets

Nu

mb

erof

twee

ts

Figure – Distribution of the number ofretweets per tweet

1 retweet - 7%

2-5 retweets - 1%

6+ - 0,2%

Enhance micro-blogging recommendations of posts with an homophily-based graph BDA - Novembre 2017 10 / 31

Page 16: Enhance micro-blogging recommendations of posts with an ... · 1 State of the art 2 Data Analysis Topology Retweets Homophily 3 Approach Similarity graph Propagation Model 4 Experiments

Data Analysis Retweets

Data AnalysisLifespan

10 100 500 1,000102

103

104

105

106

107

Lifespan (in hours)

Nb

ofm

essa

ges

Figure – Lifespan of a message

< 1hour : 40%

< 3days : 90%

Enhance micro-blogging recommendations of posts with an homophily-based graph BDA - Novembre 2017 11 / 31

Page 17: Enhance micro-blogging recommendations of posts with an ... · 1 State of the art 2 Data Analysis Topology Retweets Homophily 3 Approach Similarity graph Propagation Model 4 Experiments

Data Analysis Homophily

Data AnalysisHomophily

Distance No of users % Mean similarity

1 3 229 02,65 0,00852 32 668 26,86 0,00143 81 645 67,13 0,00094 3 820 03,14 0,00105 43 00,03 0,00146 1 0 0,0008

Impossible 216 0,18 0,0017

Table – Evolution of the similarity score through distance in the network

sim(u, v) =

∑i∈Lu∩Lv

1log(1+pop(i))

|Lu ∪ Lv |(1)

Enhance micro-blogging recommendations of posts with an homophily-based graph BDA - Novembre 2017 12 / 31

Page 18: Enhance micro-blogging recommendations of posts with an ... · 1 State of the art 2 Data Analysis Topology Retweets Homophily 3 Approach Similarity graph Propagation Model 4 Experiments

Data Analysis Homophily

Data AnalysisHomophily

0 5 10 15 20 250

0.5

·10−2

Position in the ranking

Ave

rage

scor

e

Distances distribution (%)

Rank Average Distance 1 2 3 4

1 1,55 57,03 31,53 10,64 0,82 1,68 49,60 33,13 16,87 0,43 1,8 42,45 36,02 20,72 0,84 1,86 38,71 38,71 20,56 2,025 1,98 31,44 40,16 27,59 0,81

Table – Link beetween distance in the network and position in the Top-Nranking Top-NEnhance micro-blogging recommendations of posts with an homophily-based graph BDA - Novembre 2017 13 / 31

Page 19: Enhance micro-blogging recommendations of posts with an ... · 1 State of the art 2 Data Analysis Topology Retweets Homophily 3 Approach Similarity graph Propagation Model 4 Experiments

Data Analysis Homophily

Data AnalysisConclusions

Many conclusions from this analysis :

Freshness is crucial (Messages dies very fast)⇒ real-time recommendation

Few users have high similarity⇒ use transitivity

Distance 2 successfully gather important users⇒ rely on this homophily

Enhance micro-blogging recommendations of posts with an homophily-based graph BDA - Novembre 2017 14 / 31

Page 20: Enhance micro-blogging recommendations of posts with an ... · 1 State of the art 2 Data Analysis Topology Retweets Homophily 3 Approach Similarity graph Propagation Model 4 Experiments

Approach Similarity graph

Similarity GraphBuilding process

U W

Z

V Y

X

Z1

Z2

Z3

Z4

Figure – Twitter Graph

Enhance micro-blogging recommendations of posts with an homophily-based graph BDA - Novembre 2017 15 / 31

Page 21: Enhance micro-blogging recommendations of posts with an ... · 1 State of the art 2 Data Analysis Topology Retweets Homophily 3 Approach Similarity graph Propagation Model 4 Experiments

Approach Similarity graph

Graphe de similariteExemple de construction

U W

Z

V Y

X

Z1

Z2

Z3

Z4

Figure – Twitter Graph

Enhance micro-blogging recommendations of posts with an homophily-based graph BDA - Novembre 2017 15 / 31

Page 22: Enhance micro-blogging recommendations of posts with an ... · 1 State of the art 2 Data Analysis Topology Retweets Homophily 3 Approach Similarity graph Propagation Model 4 Experiments

Approach Similarity graph

Similarity GraphBuilding process

U W

Z

V Y

X

Z1

Z2

Z3

Z4

Figure – Twitter Graph

Enhance micro-blogging recommendations of posts with an homophily-based graph BDA - Novembre 2017 15 / 31

Page 23: Enhance micro-blogging recommendations of posts with an ... · 1 State of the art 2 Data Analysis Topology Retweets Homophily 3 Approach Similarity graph Propagation Model 4 Experiments

Approach Similarity graph

Graphe de similariteExemple de construction

U W

Z

V Y

X

Z1

Z2

Z3

Z4

Figure – Twitter Graph

Enhance micro-blogging recommendations of posts with an homophily-based graph BDA - Novembre 2017 15 / 31

Page 24: Enhance micro-blogging recommendations of posts with an ... · 1 State of the art 2 Data Analysis Topology Retweets Homophily 3 Approach Similarity graph Propagation Model 4 Experiments

Approach Similarity graph

Similarity GraphBuilding process

U Y

V

Z1

sim(u, v)

sim(u, y)

sim(u, z1)

Figure – Similarity Graph

Enhance micro-blogging recommendations of posts with an homophily-based graph BDA - Novembre 2017 15 / 31

Page 25: Enhance micro-blogging recommendations of posts with an ... · 1 State of the art 2 Data Analysis Topology Retweets Homophily 3 Approach Similarity graph Propagation Model 4 Experiments

Approach Similarity graph

Similarity GraphCharacteristics

Twitter Network Similarity Graph

No of nodes 2 182 867 1 149 374No of edges 325,451,980 4 950 417

Avg. similarity score 0.008Mean out-degree 57.8 5.9

Table – Similarity Graph Characteristics

Enhance micro-blogging recommendations of posts with an homophily-based graph BDA - Novembre 2017 16 / 31

Page 26: Enhance micro-blogging recommendations of posts with an ... · 1 State of the art 2 Data Analysis Topology Retweets Homophily 3 Approach Similarity graph Propagation Model 4 Experiments

Approach Similarity graph

Propagation ModelIn a nutshell

p(u, t) =

∑v∈Fu

p(u ← v , t)

|Fu|(2)

With Fu the set of users influential to u and p(u ← v , t) a probabilityestimation that u likes t determined by the behavior of the user v .

p(u ← v , t) = p(v , t)× sim(u, v) (3)

Enhance micro-blogging recommendations of posts with an homophily-based graph BDA - Novembre 2017 17 / 31

Page 27: Enhance micro-blogging recommendations of posts with an ... · 1 State of the art 2 Data Analysis Topology Retweets Homophily 3 Approach Similarity graph Propagation Model 4 Experiments

Approach Similarity graph

Propagation ModelExample

U W

V Y

X

0.3

0.5

0.1

0.5

0.4 0.8

Figure – Propagation example

Enhance micro-blogging recommendations of posts with an homophily-based graph BDA - Novembre 2017 18 / 31

Page 28: Enhance micro-blogging recommendations of posts with an ... · 1 State of the art 2 Data Analysis Topology Retweets Homophily 3 Approach Similarity graph Propagation Model 4 Experiments

Approach Propagation Model

Propagation ModelExample

U W

V Y

X t1

0.3

0.5

0.1

0.5

0.4 0.8

Figure – Propagation example - a tweet t1 is published

Enhance micro-blogging recommendations of posts with an homophily-based graph BDA - Novembre 2017 18 / 31

Page 29: Enhance micro-blogging recommendations of posts with an ... · 1 State of the art 2 Data Analysis Topology Retweets Homophily 3 Approach Similarity graph Propagation Model 4 Experiments

Approach Propagation Model

Propagation ModelExample

U W

V Y

X t1

0.3

0.5

0.1

0.5

0.4 0.8

Figure – Propagation example - X shares/likes t1

p(x , t1) = 1

Enhance micro-blogging recommendations of posts with an homophily-based graph BDA - Novembre 2017 18 / 31

Page 30: Enhance micro-blogging recommendations of posts with an ... · 1 State of the art 2 Data Analysis Topology Retweets Homophily 3 Approach Similarity graph Propagation Model 4 Experiments

Approach Propagation Model

Propagation ModelExample

U W

V Y

X t1

0.3

0.5

0.1

0.5

0.4 0.8

Figure – Propagation example - Propagation

p(w , t1) =

∑v∈Fw

p(w←v ,t)

|Fw | = 0+1×0.52 = 0.25

Enhance micro-blogging recommendations of posts with an homophily-based graph BDA - Novembre 2017 18 / 31

Page 31: Enhance micro-blogging recommendations of posts with an ... · 1 State of the art 2 Data Analysis Topology Retweets Homophily 3 Approach Similarity graph Propagation Model 4 Experiments

Approach Propagation Model

Propagation ModelExample

U W

V Y

X t1

0.3

0.5

0.1

0.5

0.4 0.8

Figure – Propagation example - Propagation

p(u, t1) = 0.25×0.52 = 0.0625

Enhance micro-blogging recommendations of posts with an homophily-based graph BDA - Novembre 2017 18 / 31

Page 32: Enhance micro-blogging recommendations of posts with an ... · 1 State of the art 2 Data Analysis Topology Retweets Homophily 3 Approach Similarity graph Propagation Model 4 Experiments

Approach Propagation Model

Propagation ModelConvergence

Let n be users (u1, u2, ..., un) :a11pu1 + a12pu2 + ....+ a1npun = b1

a21pu1 + a22pu2 + ....+ a2npun = b2

... = ...an1pu1 + an2pu2 + ....+ annpun = bn

Could also be written as Ap = b with

A =

u1 u2 · · · un

u1 a11 a12 . . . a1n

u2 a21 a22 . . . a2n...

......

. . ....

un an1 an2 . . . ann

p =

p(u1)p(u2)

...p(un)

b =

b1

b2...bn

Because ∀u, v sim(u, v) ≤ 1, |ajj | ≥

∑j 6=i

|aij | for every i , the matrix A is

diagonally dominant.Enhance micro-blogging recommendations of posts with an homophily-based graph BDA - Novembre 2017 19 / 31

Page 33: Enhance micro-blogging recommendations of posts with an ... · 1 State of the art 2 Data Analysis Topology Retweets Homophily 3 Approach Similarity graph Propagation Model 4 Experiments

Approach Propagation Model

Propagation ModelOptimizations

Speed up the convergence

Let ∆(u, t1) = p(u, t)k+1 − p(u, t)k

If ∆(u, t1) < β we stop the propagation

Limitation of popular messages

If p(u, t) < f (t) no need to propagate.f (t) = 1− kp

kp+pop(t)p

Enhance micro-blogging recommendations of posts with an homophily-based graph BDA - Novembre 2017 20 / 31

Page 34: Enhance micro-blogging recommendations of posts with an ... · 1 State of the art 2 Data Analysis Topology Retweets Homophily 3 Approach Similarity graph Propagation Model 4 Experiments

Approach Propagation Model

Propagation ModelOptimizations

Speed up the convergence

Let ∆(u, t1) = p(u, t)k+1 − p(u, t)k

If ∆(u, t1) < β we stop the propagation

Limitation of popular messages

If p(u, t) < f (t) no need to propagate.f (t) = 1− kp

kp+pop(t)p

Enhance micro-blogging recommendations of posts with an homophily-based graph BDA - Novembre 2017 20 / 31

Page 35: Enhance micro-blogging recommendations of posts with an ... · 1 State of the art 2 Data Analysis Topology Retweets Homophily 3 Approach Similarity graph Propagation Model 4 Experiments

Experiments Protocol

ExperimentsProtocol

130 Millions of messages shared at least twice

Split the ranked set 90% - 10%

Compute recommendation during this 10% for 1500 random users(500 small, 500 medium, 500 big)

Comparison with

CF : naive collaborative filteringBayes : probabilistic modelGraphJet : Twitter used solution

Enhance micro-blogging recommendations of posts with an homophily-based graph BDA - Novembre 2017 21 / 31

Page 36: Enhance micro-blogging recommendations of posts with an ... · 1 State of the art 2 Data Analysis Topology Retweets Homophily 3 Approach Similarity graph Propagation Model 4 Experiments

Experiments Results

ExperimentsHits

20 40 60 80 100 120 140 160 180 2000

0.5

1

1.5

2

2.5·104

Number of daily recommendations per user

Num

ber

ofhi

ts(×

104)

BayesCF

GraphJetSimGraph

Figure – Hits pour 1500 utilisateurs

Linear growth of CF

Fast growth forSimGraph

GraphJet stuck around5000 hits

Enhance micro-blogging recommendations of posts with an homophily-based graph BDA - Novembre 2017 22 / 31

Page 37: Enhance micro-blogging recommendations of posts with an ... · 1 State of the art 2 Data Analysis Topology Retweets Homophily 3 Approach Similarity graph Propagation Model 4 Experiments

Experiments Results

ExperimentsHits according to user profiles

20 40 60 80 100 120 140 160 180 2000

200

400

600

800

Number of daily recommendations per user

Num

ber

ofhi

ts

BayesCF

GraphJetSimGraph

Figure – 500 small

20 40 60 80 100 120 140 160 180 2000

1,000

2,000

3,000

4,000

5,000

6,000

Number of daily recommendations per user

BayesCF

GraphJetSimGraph

Figure – 500 medium

20 40 60 80 100 120 140 160 180 2000

0.5

1

1.5

·104

Number of daily recommendations per user

BayesCF

GraphJetSimGraph

Figure – 500 big users

small < 50 ; medium < 1000 ; big > 1000Tendencies are very stables no matter the profile of users

Enhance micro-blogging recommendations of posts with an homophily-based graph BDA - Novembre 2017 23 / 31

Page 38: Enhance micro-blogging recommendations of posts with an ... · 1 State of the art 2 Data Analysis Topology Retweets Homophily 3 Approach Similarity graph Propagation Model 4 Experiments

Experiments Results

ExperimentsHits accuracy

20 40 60 80 100 120 140 160 180 200

101

102

Number of daily recommendations per user

Avg

.nu

mb

erof

shar

es

BayesCF

GraphJetSimGraph

Figure – Hits popularity

Bayes targets closemessages

GraphJet targets popularmessages

CF and SimGraph aremixing both popular andclose messages

Enhance micro-blogging recommendations of posts with an homophily-based graph BDA - Novembre 2017 24 / 31

Page 39: Enhance micro-blogging recommendations of posts with an ... · 1 State of the art 2 Data Analysis Topology Retweets Homophily 3 Approach Similarity graph Propagation Model 4 Experiments

Experiments Results

ExperimentsF1 scores

20 40 60 80 100 120 140 160 180 2000

0.2

0.4

0.6

0.8

1·10−2

Number of daily recommendations per user

F1

Sco

re(×

10−

2)

BayesCF

GraphJetSimGraph

Figure – F1 Scores

Small values

Peak around 20recommendations

Enhance micro-blogging recommendations of posts with an homophily-based graph BDA - Novembre 2017 25 / 31

Page 40: Enhance micro-blogging recommendations of posts with an ... · 1 State of the art 2 Data Analysis Topology Retweets Homophily 3 Approach Similarity graph Propagation Model 4 Experiments

Experiments Results

ExperimentsRunning time

init. (per user) init total time time (per message) total time (70 cores //) total time1,149,374 users 13,238,941 Tweets (Trial period) init + recos

Bayes 10ms 0.04h 975ms 51.22h 51.26hCF 8,583ms 39.40h 0.5ms 0.02h 41.01h

SimGraph 311ms 1.41h 38ms 2.00h 3.41h

init. (per user) init total time time (per user) total time (70 cores //) total time1,149,374 users 1,149,374 users * 66 days (Trial period) init + recos

GraphJet 0ms 0h 14ms 4.2h 4.2h

Table – Initialization and recommendation time (in ms)

Enhance micro-blogging recommendations of posts with an homophily-based graph BDA - Novembre 2017 26 / 31

Page 41: Enhance micro-blogging recommendations of posts with an ... · 1 State of the art 2 Data Analysis Topology Retweets Homophily 3 Approach Similarity graph Propagation Model 4 Experiments

Experiments Updating strategies

ExperimentsUpdating strategies

How to update SimGraph ?

Split the last 10% in 2

Evaluate hits prediction impact for the remaining 5% :

do nothingrecompute everythingupdate only weightscrossfold

Enhance micro-blogging recommendations of posts with an homophily-based graph BDA - Novembre 2017 27 / 31

Page 42: Enhance micro-blogging recommendations of posts with an ... · 1 State of the art 2 Data Analysis Topology Retweets Homophily 3 Approach Similarity graph Propagation Model 4 Experiments

Experiments Updating strategies

ExperimentsUpdating strategies

20 40 60 80 100 120 140 160 180 2000

1,000

2,000

3,000

4,000

5,000

6,000

Number of daily recommendations per user

Num

ber

ofhi

ts

recompute everythingdo nothingcrossfold

update weights

Figure – Hits / updating strategies

doing nothing is thesame as updatingweights

crossfold (very cheap)works very well

Enhance micro-blogging recommendations of posts with an homophily-based graph BDA - Novembre 2017 28 / 31

Page 43: Enhance micro-blogging recommendations of posts with an ... · 1 State of the art 2 Data Analysis Topology Retweets Homophily 3 Approach Similarity graph Propagation Model 4 Experiments

Experiments Updating strategies

Experiments

Convergence property of the SimGraph

Iteration Number of edges

1 4 950 4172 7 519 0313 10 836 1294 11 496 4455 11 678 747

Table – Number of edges evolution through iterations

Enhance micro-blogging recommendations of posts with an homophily-based graph BDA - Novembre 2017 29 / 31

Page 44: Enhance micro-blogging recommendations of posts with an ... · 1 State of the art 2 Data Analysis Topology Retweets Homophily 3 Approach Similarity graph Propagation Model 4 Experiments

Conclusion

ConclusionContribution

Construction and analysis of a large Twitter dataset

Method relying on homophily to find nearest neighbors at low cost

Construction and optimization of a convergent propagation model

Comparison of the recommendations made by our model with state ofthe art solutions

Possibility for the model to be updated at low cost

Enhance micro-blogging recommendations of posts with an homophily-based graph BDA - Novembre 2017 30 / 31

Page 45: Enhance micro-blogging recommendations of posts with an ... · 1 State of the art 2 Data Analysis Topology Retweets Homophily 3 Approach Similarity graph Propagation Model 4 Experiments

Conclusion

ConclusionFuture works

Densify points of comparison between users

Burst recommendation bubbles

Work on the crossfold convergence of the model

Add a popularity prediction optimization

Enhance micro-blogging recommendations of posts with an homophily-based graph BDA - Novembre 2017 31 / 31

Page 46: Enhance micro-blogging recommendations of posts with an ... · 1 State of the art 2 Data Analysis Topology Retweets Homophily 3 Approach Similarity graph Propagation Model 4 Experiments

Conclusion

Thanks for you attention !

Enhance micro-blogging recommendations of posts with an homophily-based graph BDA - Novembre 2017 31 / 31

Page 47: Enhance micro-blogging recommendations of posts with an ... · 1 State of the art 2 Data Analysis Topology Retweets Homophily 3 Approach Similarity graph Propagation Model 4 Experiments

Annexes

ANNEXES

Enhance micro-blogging recommendations of posts with an homophily-based graph BDA - Novembre 2017 31 / 31

Page 48: Enhance micro-blogging recommendations of posts with an ... · 1 State of the art 2 Data Analysis Topology Retweets Homophily 3 Approach Similarity graph Propagation Model 4 Experiments

Annexes

AnnexesLifespan and popularity

100 101 102 103 104100

101

102

103

104

Avg. lifepan (hours)

Avg

.n

um

ber

ofre

twee

ts

Figure – Correlation between lifespan andpopularity

Strong correlation up to103 hours

After a month, thecorrelation fades

Enhance micro-blogging recommendations of posts with an homophily-based graph BDA - Novembre 2017 31 / 31

Page 49: Enhance micro-blogging recommendations of posts with an ... · 1 State of the art 2 Data Analysis Topology Retweets Homophily 3 Approach Similarity graph Propagation Model 4 Experiments

Annexes

AnnexesTopology

0 10 20100

101

102

103

104

105

106

107

108

Shortest distance

Nu

mb

erof

pat

hs

Figure – Smallest path distribution for thesimilarity graph

Diameter of 21 for anaverage path of 7.5

Enhance micro-blogging recommendations of posts with an homophily-based graph BDA - Novembre 2017 31 / 31

Page 50: Enhance micro-blogging recommendations of posts with an ... · 1 State of the art 2 Data Analysis Topology Retweets Homophily 3 Approach Similarity graph Propagation Model 4 Experiments

Annexes

AnnexesSimilarities

0 5 10 15 20 250

0.5

·10−2

Position dans le classement

Sco

rem

oyen

Figure – Score similarity evolution

Really weak scores

Breaks after the fifthmost similar user

Enhance micro-blogging recommendations of posts with an homophily-based graph BDA - Novembre 2017 31 / 31

Page 51: Enhance micro-blogging recommendations of posts with an ... · 1 State of the art 2 Data Analysis Topology Retweets Homophily 3 Approach Similarity graph Propagation Model 4 Experiments

Annexes

AnnexesIntersections

20 40 60 80 100 120 140 160 180 2000

0.2

0.4

0.6

0.8

1

Number of daily recommendations per user

Rat

ioof

hits

inco

mm

onw

ith

Sim

Gra

ph

BayesCF

GraphJetSimGraph

Figure – Parts of hits included in SimGraphEnhance micro-blogging recommendations of posts with an homophily-based graph BDA - Novembre 2017 31 / 31

Page 52: Enhance micro-blogging recommendations of posts with an ... · 1 State of the art 2 Data Analysis Topology Retweets Homophily 3 Approach Similarity graph Propagation Model 4 Experiments

Annexes

AnnexesNumber of recommendations

20 40 60 80 100 120 140 160 180 2000

20

40

60

80

100

120

140

Number of daily recommendations per user

Num

ber

ofac

tual

reco

mm

enda

tion

s

BayesCF

GraphJetSimGraph

Figure – Recall capacity

CF is less limited

Other methods arebunched together

Threshold effect forSimGraph and Bayes

Enhance micro-blogging recommendations of posts with an homophily-based graph BDA - Novembre 2017 31 / 31