Influence in Social Media

45
CS 599: Social Media Analysis University of Southern California 1 Influence in Social Media Kristina Lerman University of Southern California

description

Influence in Social Media. Kristina Lerman University of Southern California. Identifying influential users in social networks. Hard problem for large online social networks. Topics covered. How do you quantitatively measure influence in social networks? What metrics should you use? - PowerPoint PPT Presentation

Transcript of Influence in Social Media

Page 1: Influence in Social Media

CS 599: Social Media Analysis

University of Southern California 1

Influence in Social Media

Kristina LermanUniversity of Southern California

Page 2: Influence in Social Media

Identifying influential users in social networks

Page 3: Influence in Social Media

Hard problem for large online social networks

Page 4: Influence in Social Media

Topics covered

How do you quantitatively measure influence in social networks? What metrics should you use?

• Cha et al. 2010 “Measuring User Influence on Twitter: The Million Follower Fallacy”, in ICWSM.

• Bakshy, et al. 2011 “Everyone’s an Influencer: Quantifying Influence on Twitter”, In WSDM.

• Ghosh & Lerman. 2010 “Predicting Influential Users in Online Social Networks” In SNA-KDD workshop.

Page 5: Influence in Social Media

Measuring User Influence in Twitter: The Million Follower Fallacy

Meeyoung Cha, Hamed Haddadi, Fabrıcio Benevenuto, Krishna P. Gummadi

Niki ParmarUniversity of Southern

California

Page 6: Influence in Social Media

Problem• Notion of Influence is important and long studied

- Business, fashion, voting trends, marketing, etc

• Measuring influence is difficult - Involves human choices, societies, complex

• Modern view of key factors to measure influence are- Interpersonal relationship among users- Readiness to adapt to change

• Leads to information diffusion, set trends, friendship, news gossip, contraversial issues..

• Can initiate large scale chain reactions

Page 7: Influence in Social Media

Influence in Social Media• Online communication is the new way to receive information

• Influence :“the power or capacity of causing an effect in indirect or intangible ways.”

• Twitter usage- 271 million monthly active users.- 500 million Tweets are sent per day.

• Understand roles of different users - Through followers, retweets, identity value

• Effective advertisement on Twitter using influential users- Target only a few influential people to target majority of the users.

Page 8: Influence in Social Media

Measuring Influence on Twitter

• Comparison of three influence measures- Indegree influence: Number of followers of the user indicating size of

the audience for that user- Retweet influence: Number of retweets containing one’s name

indicating ability to generate content with pass-along value.- Mention influence: Number of replies to the user indicating ability of

user to engage others in a conversation

• Measure dynamics of user influence over topics and time.

• How to gain and maintain influence? How can ordinary users be influential?

Page 9: Influence in Social Media

Data Collection

• Collected 54,981,152 in-use user accounts, which were connected to each other by 1,963,263,821 social links.

• Gathered the tweets generated by all the user since early days which amounts to 1,755,925,520 tweets.

• Focus on the largest component of the network which contains 99% of the links and tweets.

• After filtering, measure influence for 6,189,636 users on the entire set of users.

Page 10: Influence in Social Media

Comparing User Influence

Page 11: Influence in Social Media

Comparing User InfluenceTop Users for each measure•Followers count – Public figures and news sources Get a lot of attention from their audience

•Retweets – Content Aggregation Service like news channels & news sources, businessman They are about the content and contain URL’s

•Mentions – Celebrities, Known Figures Ordinary users show great passion for celebrities

Page 12: Influence in Social Media

Comparing User Influence

• Correlation for all users is high since it contains the least influential users with no retweets or mentions and are not well connected.

• Indegree has a low correlation with retweets and mentions. Large number of followers does not correlate to influence. Million Followers Fallacy

• Retweets are about the content and mentions are identity driven – High Correlation , Greater Impact to influence.

Page 13: Influence in Social Media

Influence Across Topics

• Measure across 3 diverse topics : the Iranian presidential election, the outbreak of the H1N1 influenza and the death of Michael Jackson.

• Distribution of user ranks for retweets and mentions follows power law pattern!

Page 14: Influence in Social Media

Influence Across Topics

• Measure variation of a user’s influence across the three topics using Spearman’s rank correlation coefficient

• Observed strong correlation between topics• Most influential users hold influence across a range of topics. Top

users similar for all the three topics. High for top 1%.

Page 15: Influence in Social Media

Influence over time – Rise or Fall?• Engagement of top influentials

- Top 100 users based on three measures- Calculate probability Ρ, random tweet posted on Twitter during a 15 day period is a retweet (or a mention) of that user. - Calculate this over 8 month period and normalize by the total tweets

Page 16: Influence in Social Media

Influence of Ordinary Users• Gather top 20 users for each topic who tweet only about one topic• Calculate Probability P again for these 60 users over 8 months

- Increase in retweets and mentions over the time period - Influentials spread information about protests, controversial news etc- Users limited to a single topic show largest increase in influence scores

Page 17: Influence in Social Media

Conclusion• Capture different perspectives of influence – indegree, retweets,

mentions

• User’s popularity is not related to influence. Different groups of influence depending on content and name value

• Most influential users hold influence over variety of topics

• Top twitter users had a disproportionate amount of influence indicating a power law distribution

• Users need to self advertise and have continuous effort and involvement to become influential over time

Page 18: Influence in Social Media

Thank You !#Questions?

Page 19: Influence in Social Media

Everyone’s an influencer [Bakshy et al.]

• Questions– What makes some content spread far but not others?– Can we reliably identify influential users on Twitter?

• Findings– Largest cascades generated by users who have generated

them in the past– Content matters? Positive URLs tend to spread farther– But, cannot reliably predict which user or URL will

generate large cascade

Page 20: Influence in Social Media

Word- of mouth diffusion & Influence• Diffusion: mechanism for information spread on networks

– Diffusion event is a cascade

Influencer is Influencer is someone who can someone who can consistently trigger consistently trigger large cascadeslarge cascades

Page 21: Influence in Social Media

Marketing• Can we maximize cascades by seeding information (or

product) with certain influentialinfluential nodes?– These nodes can influence disproportionately many others

• What characteristics of influentials can help identify them?– Credibility– Expertise– Enthusiasm – Centrality?

• Need a large-scale, unbiased sample of observed cascades on a social network– Not only case studies of the most successful cascades

Page 22: Influence in Social Media

Twitter

• Users tweet short messages– Retweet posts of others– Tweets may contain URLs

• Distinct markers that allow us to track diffusion

• Social networks– Users follow ‘friends’ to see their

tweets• “who listens to whom”

• Compare impact of different users by measuring observed activity on Twitter– User who “seeds” content (URL)– Seed’s influence is measured by

number of users connected to her who subsequently retweet the URL

Page 23: Influence in Social Media

Data set• Tweets

Diffusion events– 87M tweets containing a bit.ly URL that were broadcast

Sep 13, 2009 – Nov 15, 2009 – 1.6M seed users active both months who initiated 74M

cascades (46.3 cascades each on average)

• Follower graphSocial network– Collect followers of every active user, and their followers

and so on– 56M users and 1.7B edges

Page 24: Influence in Social Media

Computing influence on Twitter• Influence on Twitter = causing others to propagate

information (URL) to their followers• User A influenced user B to retweet the URL if

– User A tweets first– User A is a friend of B (B is a follower of A)– But, what if more than one friend tweets before B?

Page 25: Influence in Social Media

Some hard facts about cascades

• Most URLs do not spread at all!

Page 26: Influence in Social Media

Predicting individual influence• User’s influence = log(average size of cascades user seeds)• What attributes of a user consistently predict influence?

– Seed user attributes• # of followers• # friends• # tweets• Date of joining

– Past influence of seed users• Average, minimum, maximum total influence• Average, minimum, maximum local influence

– Local influence is number of retweet by seed’s followers

• Train regression tree model on these attributes to predict future influence

Page 27: Influence in Social Media

Regression tree (part)

Most informative features: past local influence, # followers

Page 28: Influence in Social Media

Influence as a function of most predictive features

All users Top 25

Page 29: Influence in Social Media

Prediction performance of the regression tree

large cascades are driven by previously influential users. But, the extreme rarity of such cascades means that most users with these attributes will not be successful

Most cascades

Page 30: Influence in Social Media

Does content matter?

Do YouTube videos spread farther than niche news?• Manually classify 1000 URLs using Amazon Mechanical Turk

– Type of URL• Spam/not spam/unsure• Media sharing/social networking• Blog/forum• News/mass media• Other

– Category of URL• Lifestyle, tech, offbeat, entertainment, gaming, science, news,

business, sports, other

Page 31: Influence in Social Media

Average cascade size of types of content

Page 32: Influence in Social Media

Impact of content on cascade size• Content judged by humans to be more interesting, or elicit

more positive feeling spreads a little further on average• However, none of the content features had predictive value!

Page 33: Influence in Social Media

Summary• What features help us predict whether a cascade will reach

many people?– Content itself has no predictive power– Seed user’s features are somewhat predictive

• Number of followers• Number of followers who retweet seeder’s posts

• Targeting influencers– What viral marketing strategies reach wide audience at

minimal cost?– May be more cost-effective to target many “ordinary

influencers” than few highly influential users– Moot point, since “social epidemics” are so rare

Page 34: Influence in Social Media

Predicting influential users in online social networks [Ghosh & Lerman]• Questions

– Does network structure predict influence?– Which metric should we use to measure centrality in a

particular social network?

• Findings – The choice of the metric depends on the nature of

interactions– Fundamental relationship between dynamic processes and

measurement of network structure

Page 35: Influence in Social Media

Centrality

• SNA metrics examine topology of the network to identify important, or central nodes

1

2 3

4 5

1

2 3

4 5

Betweenness [Freeman, 1977]Degree

1

2 3

4 5

PageRank[Brin et al, 1998]

1

2 3

4 5

Alpha-centrality[Bonacich, 1987]

Claim: The nature of interactions between nodes affects how we measure network structure

Consequences for network analysis

Page 36: Influence in Social Media

The Gossips, 1948Norman Rockwell (American, 1894-1978)

Page 37: Influence in Social Media

War News from Mexico, 1848Richard Caton Woodville (American, 1825-1855)

Page 38: Influence in Social Media

Types of interactions

•Conservative interactions– One-to-one, e.g., phone

calls– Modeled by random walk

•Non-conservative interactions• One-to-many, e.g., epidemic,

information diffusion• Modeled by contact process

1

2 3

4 5

1

2 3

4 5

Two classes of interactions between network nodes

Page 39: Influence in Social Media

Centrality metrics

• Conservative interactions• Random walk-based metrics,

e.g., PageRankPageRank, …

•Non-conservative interactions• Path-based metrics, e.g., Alpha-Alpha-

centralitycentrality, …

1

2 3

4 5

Node size ~ centrality

1

2 3

4 5

Centrality identifies important nodes in the network, e.g., those that are often visited by a process

Page 40: Influence in Social Media

submitter

Which centrality metric is right?

• Empirical study of influence in social media• Data from Digg and Twitter about how information (URL) spreads on the

follower graph

follower

follower

follower

Page 41: Influence in Social Media

Ground truth

• Re-broadcasting (retweeting) provides ground truth for ground truth for measuring influencemeasuring influence• Empirical measure of influence/importance

1. average number re-broadcasts by followers2. average size of cascades a node triggers

• Rank nodes by the empirical measure ground truth• Compare rankings produced by centrality metrics to the

ground truth

Page 42: Influence in Social Media

Fans of submitter in OSN (K) White balls in the urn (K)

Users in OSN (N) Balls in the urn (N)

No. of users who voted (n) No. of balls picked (n)

No. of fans who voted (k)

Post

submitter

No of white balls picked (k)

URN MODEL

P(X k |K,N,n)

K

k

N kn k

N

n

(Hypergeometric Dist.)

fanfan

fan

Statistical significance of the influence metric

Page 43: Influence in Social Media

Metric is statistically significant

Digg Twitter

Page 44: Influence in Social Media

Which centrality metric is right for social media?

Correlation between the ground truth and rankings predicted by Alpha-Centrality and PageRank

Non-conservative Alpha-Centrality best predicts node centrality (since information flow in social media is non-conservative)

Digg Twitter

Page 45: Influence in Social Media

Summary• Network structure measurements

– How we measure network structure depends on the nature of interactions between nodes

– Affects how we compute centrality, strength of ties, and communities