Using a Model of Social Dynamics to Predict Popularity of News Kristina Lerman Tad Hogg USC...

22
Using a Model of Using a Model of Social Dynamics to Social Dynamics to Predict Popularity of Predict Popularity of News News Kristina Lerman Kristina Lerman Tad Tad Hogg Hogg USC Information Sciences Institute USC Information Sciences Institute HP Labs HP Labs WWW 2010 WWW 2010

Transcript of Using a Model of Social Dynamics to Predict Popularity of News Kristina Lerman Tad Hogg USC...

Using a Model of Social Using a Model of Social Dynamics to Predict Dynamics to Predict Popularity of NewsPopularity of News

Kristina Lerman Kristina Lerman Tad Hogg Tad Hogg

USC Information Sciences InstituteUSC Information Sciences Institute HP Labs HP Labs

WWW 2010WWW 2010

OutlineOutline

IntroductionIntroduction Social News Portal DiggSocial News Portal Digg Social Dynamics of DiggSocial Dynamics of Digg Model-Based PredictionModel-Based Prediction ConclusionsConclusions

OutlineOutline

IntroductionIntroduction Social News Portal DiggSocial News Portal Digg Social Dynamics of DiggSocial Dynamics of Digg Model-Based PredictionModel-Based Prediction ConclusionsConclusions

IntroductionIntroduction

Popularity of content in social media is unequally Popularity of content in social media is unequally distributed.distributed.– 16,000 new stories submitted to Digg everyday, while o16,000 new stories submitted to Digg everyday, while o

nly a handful subset becomes popularnly a handful subset becomes popular

Importance of popularity predictionImportance of popularity prediction– Provide users with tools to indentify interesting itemsProvide users with tools to indentify interesting items– Enable social media companies to maximize revenue Enable social media companies to maximize revenue

Studies of past researchesStudies of past researches– Content quality weakly correlates with eventual populaContent quality weakly correlates with eventual popula

rityrity– Social influence is responsible for the unpredictability oSocial influence is responsible for the unpredictability o

f popularityf popularity

OutlineOutline

IntroductionIntroduction Social News Portal DiggSocial News Portal Digg Social Dynamics of DiggSocial Dynamics of Digg Model-Based PredictionModel-Based Prediction ConclusionsConclusions

User Interface of DiggUser Interface of Digg

Popular listPopular list– The front pageThe front page– Promoted newsPromoted news

Upcoming listUpcoming list

Friends’ ActivityFriends’ Activity

Inequality of Inequality of PopularityPopularity

Figure: Dynamics of social voting. (a) Evolution of the number of votes received by two front page stories in June 2006. (b) Distribution of popularity of 201 front page stories submitted in June 2006.

OutlineOutline

IntroductionIntroduction Social News Portal DiggSocial News Portal Digg Social Dynamics of DiggSocial Dynamics of Digg Model-Based PredictionModel-Based Prediction ConclusionsConclusions

Story Data SetsStory Data Sets

MayMay– Submitted to Digg between May 25-27, 2006Submitted to Digg between May 25-27, 2006– 2152 stories, 1212 distinct users2152 stories, 1212 distinct users– 510 stories by 239 users are promoted to the front page510 stories by 239 users are promoted to the front page

JuneJune– Promoted (popular) subsetPromoted (popular) subset

201 stories promoted between June 27-30, 2006201 stories promoted between June 27-30, 2006 User name and time stamp of the first 216 votes for each storyUser name and time stamp of the first 216 votes for each story

– Upcoming subsetUpcoming subset Submitted between June 30, 2006 and July 1, 2006Submitted between June 30, 2006 and July 1, 2006 159 stories received at least 10 votes159 stories received at least 10 votes

Snapshot of Social NetworSnapshot of Social Network in Diggk in Digg JuneJune

– 1020 top-ranked users with their friends and fans1020 top-ranked users with their friends and fans– Augment the network in February, 2008Augment the network in February, 2008– Eliminate users who joined Digg after June 30, 2006Eliminate users who joined Digg after June 30, 2006

MayMay– Retain only the top 1020 users and their fansRetain only the top 1020 users and their fans– Assume other users had zero fansAssume other users had zero fans

Stochastic Model of Social Stochastic Model of Social Dynamics in DiggDynamics in Digg Hogg and Lerman (ICWSM’09)Hogg and Lerman (ICWSM’09)

– The stochastic processes framework relates users’ indThe stochastic processes framework relates users’ individual choices to their aggregate behavior.ividual choices to their aggregate behavior.

– Represent user behavior in Digg as transitions between Represent user behavior in Digg as transitions between a small number of statesa small number of states

Explanatory powerExplanatory power– Why some stories accumulate many more votes than otWhy some stories accumulate many more votes than ot

hers?hers?

Predictive powerPredictive power

Dynamical Model of Dynamical Model of Social VotingSocial Voting Rate equation for the number of users who vote for a Rate equation for the number of users who vote for a

story:story: (vote_rate = interest * visibility)(vote_rate = interest * visibility)

ss(0) = (0) = SS (the number of fans of the story’s submitter) (the number of fans of the story’s submitter)NNvotevote(0)=1(0)=1

Model ParametersModel Parameters

Some parameters are measured directly from the May data set.Some parameters are measured directly from the May data set.

Story specific parametersStory specific parameters– rr: estimated as the value that minimizes the root-mean-square (RMS) : estimated as the value that minimizes the root-mean-square (RMS)

difference between the difference between the observedobserved votes and the model predictions. votes and the model predictions.

– SS = the number of fans of the story’s submitter = the number of fans of the story’s submitter

Observations on the Observations on the ModelModel

The correlation between The correlation between SS and and rr = -0.13 = -0.13 General observations reproduced by the modelGeneral observations reproduced by the model

– Slow initial growth in votes while the story is on the upcoming listSlow initial growth in votes while the story is on the upcoming list– More interesting stories are promoted faster and receive more votesMore interesting stories are promoted faster and receive more votes– A story submitted by a poorly connected user tends to need high interA story submitted by a poorly connected user tends to need high inter

est to be promoted (Lerman, 2007)est to be promoted (Lerman, 2007)

OutlineOutline

IntroductionIntroduction Social News Portal DiggSocial News Portal Digg Social Dynamics of DiggSocial Dynamics of Digg Model-Based PredictionModel-Based Prediction ConclusionsConclusions

Applications of the Applications of the ModelModel Estimating inherent story quality from the Estimating inherent story quality from the

evolution of its observed popularityevolution of its observed popularity

Predicting a story’s eventual popularity Predicting a story’s eventual popularity based on the early reaction of users to the based on the early reaction of users to the storystory

Story Quality Story Quality EstimationEstimation

A wide range of interestingness to usersA wide range of interestingness to users

Well fit lognormal distributionWell fit lognormal distribution

ExamplesExamples

Predicting Final Predicting Final Popularity of StoriesPopularity of Stories

Correlations are 0.87 and 0.49, respectively.Correlations are 0.87 and 0.49, respectively.

Strong prediction in popularity ratingStrong prediction in popularity rating

Comparison with Comparison with Social Influence only Social Influence only PredictionPrediction

Decision tree classifier based on social influenceDecision tree classifier based on social influence– Two Features: 1. number of fan votes received within the first 10 Two Features: 1. number of fan votes received within the first 10

votes; 2. number of submitter’s fansvotes; 2. number of submitter’s fans

Model-based prediction outperforms the decision tree Model-based prediction outperforms the decision tree classifierclassifier

OutlineOutline

IntroductionIntroduction Social News Portal DiggSocial News Portal Digg Social Dynamics of DiggSocial Dynamics of Digg Model-based PredictionModel-based Prediction ConclusionsConclusions

ConclusionsConclusions

Research has shown that popularity is weakly related to inResearch has shown that popularity is weakly related to inherent content quality, and that social influence leads to aherent content quality, and that social influence leads to an uneven distribution of popularity, and makes it difficult tn uneven distribution of popularity, and makes it difficult to predict.o predict.

We claim that the model of social dynamics, which is develWe claim that the model of social dynamics, which is developed in an earlier work, can quantitatively characterize evooped in an earlier work, can quantitatively characterize evolution of popularity of items in Digg.lution of popularity of items in Digg.

How interesting a story is and how connected the submitteHow interesting a story is and how connected the submitter is fully determines the evolution of the number of receiver is fully determines the evolution of the number of received votes.d votes.