Using a Model of Social Dynamics to Predict Popularity of News Kristina Lerman Tad Hogg USC...
-
Upload
joseph-watson -
Category
Documents
-
view
221 -
download
0
Transcript of Using a Model of Social Dynamics to Predict Popularity of News Kristina Lerman Tad Hogg USC...
Using a Model of Social Using a Model of Social Dynamics to Predict Dynamics to Predict Popularity of NewsPopularity of News
Kristina Lerman Kristina Lerman Tad Hogg Tad Hogg
USC Information Sciences InstituteUSC Information Sciences Institute HP Labs HP Labs
WWW 2010WWW 2010
OutlineOutline
IntroductionIntroduction Social News Portal DiggSocial News Portal Digg Social Dynamics of DiggSocial Dynamics of Digg Model-Based PredictionModel-Based Prediction ConclusionsConclusions
OutlineOutline
IntroductionIntroduction Social News Portal DiggSocial News Portal Digg Social Dynamics of DiggSocial Dynamics of Digg Model-Based PredictionModel-Based Prediction ConclusionsConclusions
IntroductionIntroduction
Popularity of content in social media is unequally Popularity of content in social media is unequally distributed.distributed.– 16,000 new stories submitted to Digg everyday, while o16,000 new stories submitted to Digg everyday, while o
nly a handful subset becomes popularnly a handful subset becomes popular
Importance of popularity predictionImportance of popularity prediction– Provide users with tools to indentify interesting itemsProvide users with tools to indentify interesting items– Enable social media companies to maximize revenue Enable social media companies to maximize revenue
Studies of past researchesStudies of past researches– Content quality weakly correlates with eventual populaContent quality weakly correlates with eventual popula
rityrity– Social influence is responsible for the unpredictability oSocial influence is responsible for the unpredictability o
f popularityf popularity
OutlineOutline
IntroductionIntroduction Social News Portal DiggSocial News Portal Digg Social Dynamics of DiggSocial Dynamics of Digg Model-Based PredictionModel-Based Prediction ConclusionsConclusions
User Interface of DiggUser Interface of Digg
Popular listPopular list– The front pageThe front page– Promoted newsPromoted news
Upcoming listUpcoming list
Friends’ ActivityFriends’ Activity
Inequality of Inequality of PopularityPopularity
Figure: Dynamics of social voting. (a) Evolution of the number of votes received by two front page stories in June 2006. (b) Distribution of popularity of 201 front page stories submitted in June 2006.
OutlineOutline
IntroductionIntroduction Social News Portal DiggSocial News Portal Digg Social Dynamics of DiggSocial Dynamics of Digg Model-Based PredictionModel-Based Prediction ConclusionsConclusions
Story Data SetsStory Data Sets
MayMay– Submitted to Digg between May 25-27, 2006Submitted to Digg between May 25-27, 2006– 2152 stories, 1212 distinct users2152 stories, 1212 distinct users– 510 stories by 239 users are promoted to the front page510 stories by 239 users are promoted to the front page
JuneJune– Promoted (popular) subsetPromoted (popular) subset
201 stories promoted between June 27-30, 2006201 stories promoted between June 27-30, 2006 User name and time stamp of the first 216 votes for each storyUser name and time stamp of the first 216 votes for each story
– Upcoming subsetUpcoming subset Submitted between June 30, 2006 and July 1, 2006Submitted between June 30, 2006 and July 1, 2006 159 stories received at least 10 votes159 stories received at least 10 votes
Snapshot of Social NetworSnapshot of Social Network in Diggk in Digg JuneJune
– 1020 top-ranked users with their friends and fans1020 top-ranked users with their friends and fans– Augment the network in February, 2008Augment the network in February, 2008– Eliminate users who joined Digg after June 30, 2006Eliminate users who joined Digg after June 30, 2006
MayMay– Retain only the top 1020 users and their fansRetain only the top 1020 users and their fans– Assume other users had zero fansAssume other users had zero fans
Stochastic Model of Social Stochastic Model of Social Dynamics in DiggDynamics in Digg Hogg and Lerman (ICWSM’09)Hogg and Lerman (ICWSM’09)
– The stochastic processes framework relates users’ indThe stochastic processes framework relates users’ individual choices to their aggregate behavior.ividual choices to their aggregate behavior.
– Represent user behavior in Digg as transitions between Represent user behavior in Digg as transitions between a small number of statesa small number of states
Explanatory powerExplanatory power– Why some stories accumulate many more votes than otWhy some stories accumulate many more votes than ot
hers?hers?
Predictive powerPredictive power
Dynamical Model of Dynamical Model of Social VotingSocial Voting Rate equation for the number of users who vote for a Rate equation for the number of users who vote for a
story:story: (vote_rate = interest * visibility)(vote_rate = interest * visibility)
ss(0) = (0) = SS (the number of fans of the story’s submitter) (the number of fans of the story’s submitter)NNvotevote(0)=1(0)=1
Model ParametersModel Parameters
Some parameters are measured directly from the May data set.Some parameters are measured directly from the May data set.
Story specific parametersStory specific parameters– rr: estimated as the value that minimizes the root-mean-square (RMS) : estimated as the value that minimizes the root-mean-square (RMS)
difference between the difference between the observedobserved votes and the model predictions. votes and the model predictions.
– SS = the number of fans of the story’s submitter = the number of fans of the story’s submitter
Observations on the Observations on the ModelModel
The correlation between The correlation between SS and and rr = -0.13 = -0.13 General observations reproduced by the modelGeneral observations reproduced by the model
– Slow initial growth in votes while the story is on the upcoming listSlow initial growth in votes while the story is on the upcoming list– More interesting stories are promoted faster and receive more votesMore interesting stories are promoted faster and receive more votes– A story submitted by a poorly connected user tends to need high interA story submitted by a poorly connected user tends to need high inter
est to be promoted (Lerman, 2007)est to be promoted (Lerman, 2007)
OutlineOutline
IntroductionIntroduction Social News Portal DiggSocial News Portal Digg Social Dynamics of DiggSocial Dynamics of Digg Model-Based PredictionModel-Based Prediction ConclusionsConclusions
Applications of the Applications of the ModelModel Estimating inherent story quality from the Estimating inherent story quality from the
evolution of its observed popularityevolution of its observed popularity
Predicting a story’s eventual popularity Predicting a story’s eventual popularity based on the early reaction of users to the based on the early reaction of users to the storystory
Story Quality Story Quality EstimationEstimation
A wide range of interestingness to usersA wide range of interestingness to users
Well fit lognormal distributionWell fit lognormal distribution
Predicting Final Predicting Final Popularity of StoriesPopularity of Stories
Correlations are 0.87 and 0.49, respectively.Correlations are 0.87 and 0.49, respectively.
Strong prediction in popularity ratingStrong prediction in popularity rating
Comparison with Comparison with Social Influence only Social Influence only PredictionPrediction
Decision tree classifier based on social influenceDecision tree classifier based on social influence– Two Features: 1. number of fan votes received within the first 10 Two Features: 1. number of fan votes received within the first 10
votes; 2. number of submitter’s fansvotes; 2. number of submitter’s fans
Model-based prediction outperforms the decision tree Model-based prediction outperforms the decision tree classifierclassifier
OutlineOutline
IntroductionIntroduction Social News Portal DiggSocial News Portal Digg Social Dynamics of DiggSocial Dynamics of Digg Model-based PredictionModel-based Prediction ConclusionsConclusions
ConclusionsConclusions
Research has shown that popularity is weakly related to inResearch has shown that popularity is weakly related to inherent content quality, and that social influence leads to aherent content quality, and that social influence leads to an uneven distribution of popularity, and makes it difficult tn uneven distribution of popularity, and makes it difficult to predict.o predict.
We claim that the model of social dynamics, which is develWe claim that the model of social dynamics, which is developed in an earlier work, can quantitatively characterize evooped in an earlier work, can quantitatively characterize evolution of popularity of items in Digg.lution of popularity of items in Digg.
How interesting a story is and how connected the submitteHow interesting a story is and how connected the submitter is fully determines the evolution of the number of receiver is fully determines the evolution of the number of received votes.d votes.