Managing Social Communities - Uni Koblenz-Landaustaab/Research/Talks/20110706...Managing Social...
Transcript of Managing Social Communities - Uni Koblenz-Landaustaab/Research/Talks/20110706...Managing Social...
Web Science & Technologies University of Koblenz ▪ Landau, Germany
Managing Social Communities
Steffen Staab
Acknowledgements to ROBUST Project team & WEST Team, in particular
K. Dellschaft, J. Kunegis, F. Schwagereit
Steffen Staab [email protected]
Web Science Doctoral Summer School 2
Semantic Web
Web Retrieval
Interactive Web
Multimedia Web
Software Web
Institut WeST – Web Science & Technologies
eGovernment eMedia eScience eOrganizations ePerson
Institute for Computer Science
Institute for Information Systems
Leibniz Institute for Social Sciences (GESIS)
Steffen Staab [email protected]
Web Science Doctoral Summer School 3
Plan for this Talk
1 Web
2 Science
Steffen Staab [email protected]
Web Science Doctoral Summer School 4
Social Communities
…are everywhere
c
Steffen Staab [email protected]
Web Science Doctoral Summer School 5
Content, User & Networks Analysis Understanding, response time
Opportunities Open innovation, improved user support,… increase business value
Data Storage and Processing
Scalability, heterogeneity
Business Value Product support & innovation, CRM, Expertise management, Marketing, Advertising
Online Communities Intranet, Extranet, Internet
Risks Bad content quality, social ill behavior,… jeopardize business value
Steffen Staab [email protected]
Web Science Doctoral Summer School 6
Large-scale Testbeds
SAP (B2B) Community Network
IBM (E2E) Developer Network
Polecat (C2C)
2009 99K accounts
2013 800K accounts
2009 1.5M users 150K access/day
2013 5M users 1200K accesses/day
2009 …
2013 millions posts/day 1TB data/day
Business Partner Network CRM for IT
Online Marketing
Corporate Knowledge Management
2
Steffen Staab [email protected]
Web Science Doctoral Summer School 7
SAP Business Partner Use Case
SAP Developer Network
Posts per day Size of user generated content (posts) Number of users
2007 2009 2013 2007 2009 2013 2007 2009 2013
SAP 5000 6000 7000 1M 4M 10.0M
1M 1.7M 4.8M
Steffen Staab [email protected]
Web Science Doctoral Summer School 8
ROBUST: IBM Employee Use Case Business Data Created per day Number of users
2007 2009 2013 2007 2009 2013
IBM Activities Entry 700 2750 5000 53200 143600 200000
IBM Blogs Entries 120 30 60 34600 77750 100000
IBM Communities 3 23 50 3000 181950 250000
IBM Bookmarks 800 900 1000 8500 22400 50000
IBM Wikis NA 40 100 NA 35450 100000
IBM Files NA 290 1000 NA 45160 100000 IBM Overall 1623 4033 7210 500000* 500000* 500000*
Steffen Staab [email protected]
Web Science Doctoral Summer School 9
Risks in Online Communities
Definition: Risk Probability of an event occurring Impact of the event occurring
Risk management
Process for managing costs, benefits and likelyhoods Detect high impact risks in time even if
they generate expensive false alarms Ignore very low impact risks
even if they can be reliably detected Types of risks
Non-compliance with the community policies/polity Scamming or spamming behavior Lower involvement and productivity Decrease of user satisfaction Loss of community dynamics
SAP: SCN Award Points Scamming • Experts reputation decreases • Business users leave the forum
Web: Public communities • Death of TechCrunch forum due to spam and lack of management Loss of 1% experts loss of high revenue
Loss of 10% lurkers low impact
Cost Benefit
Likelihood
8
Steffen Staab [email protected]
Web Science Doctoral Summer School 10
Communities: dynamics and confidentiality
ROBUST supports decision making for users, hosts and service providers Managing growth & decline
Identify, encourage, safeguard core users Social matching Define/maintain etiquette and policies Manage negative behavior and conflicts Content matching Recognize, categorize decline and growth Redirect users to other communities
Merging communities Cross community topic detection to stimulate inter-community interactions
Splitting communities Identification of clusters/compartments of members that can be separate
Steffen Staab [email protected]
Web Science Doctoral Summer School 11
Agenda
• Risks and Opportunities in Social Communities: the ROBUST project
• Many related Talks in this Summer School
Robust partners Alani: Monitoring and analysis of social networks Karnstedt: User churn
Closely related Greene: Network Analysis Bernstein: Scalable infrastructures
But here comes the biased account from work in our institute
Steffen Staab [email protected]
Web Science Doctoral Summer School 12
Plan for this Talk
1 Web
2 Science
Steffen Staab [email protected]
Web Science Doctoral Summer School 13
Bild eines schwarzen Lochs
Flickr cc, Jan 7 2009 by thebadastronomer
Steffen Staab [email protected]
Web Science Doctoral Summer School 14
Agenda
• Risks and Opportunities in Social Communities: the ROBUST project
• Web Science Methodology: An explanation by analogy with Physics and some initial (!) applications to online communities • Modeling dynamic system at micro level,
Understanding collective effects (macro level) arising from individual behavior (micro level)
• Predicting dynamic system behavior, recognizing behavior deviating from the model
• Modeling dynamic system behavior at the macro level • Controling dynamic system behavior by collective action
Steffen Staab [email protected]
Web Science Doctoral Summer School 15
Better understanding of the tagging process
Cooperative classification of resources Which factors influence the tagging process?
• Background knowledge of the user? • Tag assignments of other users?
Hypothesis: Tagging involves imitation of other users AND selection of tags from background knowledge of users.
Steffen Staab [email protected]
Web Science Doctoral Summer School 16
Methodology
Conceptualization
Own Knowledge
Shared terminology
Something else? User interface
Tagging Behavior
Joint Stochastic Model
Model of Own Knowledge
Model of Sharing
Model of User Interface Influence
Simulated Tagging Behavior
Com
parison of Statistics
Steffen Staab [email protected]
Web Science Doctoral Summer School 17
Components of Analysis
Properties of Tag Streams Stream view of Folksonomies Co-occurrence streams Resource streams
Dynamic model for Tagging Systems Simulating background knowledge Simulating tag imitation
Simulation Results Co-occurrence streams Resource streams
Observations in
the real world
Stochastic models of influence
Which models best fit the
reality?
Steffen Staab [email protected]
Web Science Doctoral Summer School 18
Stream Views of a Folksonomy
Folksonomies: Vertices: Users, tags, resources Edges: Tag assignments Postings:
• Tag assignments of a user to a single resource • Can be ordered according to their time-stamp
Steffen Staab [email protected]
Web Science Doctoral Summer School 19
Co-occurrence Streams
Co-occurrence Streams: All tags co-occurring with a given tag in a posting Ordered by posting time
Co-occurrence stream for 'apple': {mackz, r1, {apple, tree}, 13:25}
{klaasd, r2, {apple, mac, ibook}, 13:26} {mackz, r2, {apple, macintosh, stevejobs}, 13:27}
tree, mac, ibook, macintosh, stevejobs
Tag |Y| |U| |T| |R|ajax 2.949.614 88.526 41.898 71.525blog 6.098.471 158.578 186.043 557.017xml 974.866 44.326 31.998 61.843
Steffen Staab [email protected]
Web Science Doctoral Summer School 20
Properties of Co-occurrence Streams – Tag Growth
linear growth
Steffen Staab [email protected]
Web Science Doctoral Summer School 21
Properties of Co-occurrence Streams – Tag Frequencies
power law
Steffen Staab [email protected]
Web Science Doctoral Summer School 22
Resource Streams
Resource Streams: All tags assigned to a resource Ordered by posting time
Resource stream for 'r2': {mackz, r1, {apple, tree}, 13:25}
{klaasd, r2, {apple, mac, ibook}, 13:26} {mackz, r2, {apple, macintosh, stevejobs}, 13:27}
apple, mac, ibook, apple, macintosh, stevejobs
Steffen Staab [email protected]
Web Science Doctoral Summer School 23
Properties of Resource Streams – Tag Frequencies
Steffen Staab [email protected]
Web Science Doctoral Summer School 24
Properties of Resource Streams – Tag Frequencies
Web Science & Technologies University of Koblenz ▪ Landau, Germany
Simulating the Evolution of Tag Streams
Steffen Staab [email protected]
Web Science Doctoral Summer School 26
Simulating tag streams
Which of my concepts represent this web page? How do I tag
this web page?
Which combination of inspirations develop the same statistics as the one observed for delicious?
Inspiration for conceptualization from:
1. Most popular tags
2. Most recently used tags
3. Tags used for this resource
4. Tags co-occuring with similar text documents
5. Creating completely new tags
6. …
Steffen Staab [email protected]
Web Science Doctoral Summer School 27
The Delicious User Interface
Imitating previous tag assignments:
Recommended tags: Intersection of tags of a user and tags already assigned to the resource.
Your tags: Tags of the user. Popular tags: 7 most popular tags assigned to the resource.
Steffen Staab [email protected]
Web Science Doctoral Summer School 28
Simulating a Tag Stream
p(w|t): Probability of selecting word w for topic t. Modeled by word distributions in a topic centered text corpus.
n: Number of visible previous tags.
h: Maximal number of previous tag assignments used for determining ranking of the n distinct tags.
Start with empty tag stream Each simulation step appends a new tag assignment Simulation of a single tag assignment:
Steffen Staab [email protected]
Web Science Doctoral Summer School 29
Modeling Background Knowledge
PBK: Probability of selecting from background knowledge p(w|t): Probability of selecting word w for topic t. Modeled by word
distributions in a topic centered text corpus. p(w|r): Probability of selecting word w for resource r.
Text Corpora Del.icio.us Text Corpora
Steffen Staab [email protected]
Web Science Doctoral Summer School 30
Modeling Tag Imitation
PI = 1 – PBK: Probability of imitating a previous tag assignment n: Number of visible top-ranked tags h: Maximal number of previous tag assignments used for determining
ranking of the n distinct tags
PBK t t-1 t-2 t-3 t-4 t-5 … … t-h
1-PBK
1 2 3 … n
Steffen Staab [email protected]
Web Science Doctoral Summer School 32
Overall Scheme
Conceptualization
Own Knowledge
Shared terminology
Something else? User interface
Tagging Behavior
Joint Stochastic Model
Model of Own Knowledge Model of Sharing
Model of User Interface Influence
Simulated Tagging Behavior
Com
parison of Statistics
Steffen Staab [email protected]
Web Science Doctoral Summer School 33
Simulating Co-occurrence Streams
Tag growth: Influenced by PBK and p(w|t)
Tag Frequencies: Influenced by PBK, p(w|t), n, h n: Semantic breadth of a topic (blog: 100 tags,
ajax: 50 tags, xml: 50 tags; Cattuto et al. 2007) h: No hint for realistic values. Good guesses may be 500
and 1000.
Steffen Staab [email protected]
Web Science Doctoral Summer School 34
Co-occ. Streams – Simulated Tag Growth
Steffen Staab [email protected]
Web Science Doctoral Summer School 36
Co-occ. Stream – Simulated Tag Frequencies
Steffen Staab [email protected]
Web Science Doctoral Summer School 37
Simulating Resource Streams
PI and PBK: Values comparable to co-occurrence streams p(w|r): Approximated by p(w|t) n: 7 tags are visible (cf. Delicious user interface) h: Smaller value than for co-occurrence streams
Steffen Staab [email protected]
Web Science Doctoral Summer School 38
Res. Streams – Simulated Tag Frequencies
Steffen Staab [email protected]
Web Science Doctoral Summer School 40
Frequency RankCo-occur. Streams Resource Streams Tag Growth
Polya Urn Model o o fixed sizeSimon Model o o linear
YS Model w/ Memory + o linearHalpin et al. Model o o linear
Our Model + + power-law
Lessons learned
Black holes do not only eat mass they also dissolve by emitting radiation
Imitation AND background knowledge are needed for
explaining properties of tag streams Probability of imitating previous tag assignments: ~70-90%
[Dellschaft+Staab, ACM Hypertext 2008]
Epistemic Model
Steffen Staab [email protected]
Web Science Doctoral Summer School 41
Solar System
Flickr, cc Sep 1 2008 by Image Editor
Jupi
ter
Sat
urn
Nep
tun
Ura
nus
Steffen Staab [email protected]
Web Science Doctoral Summer School 42
Agenda
• Risks and Opportunities in Social Communities: the ROBUST project
• Web Science Methodology: An explanation by analogy with Physics and some initial (!) applications to online communities • Modeling dynamic system at micro level,
Understanding collective effects (macro level) arising from individual behavior (micro level)
• Predicting dynamic system behavior, recognizing behavior deviating from the model
• Modeling dynamic system behavior at the macro level • Controling dynamic system behavior by collective action
Steffen Staab [email protected]
Web Science Doctoral Summer School 43
Overall Scheme
Conceptualization
Own Knowledge
Shared terminology
Something else? User interface
Tagging Behavior
Joint Stochastic Model
Model of Own Knowledge Model of Sharing
Model of User Interface Influence
Simulated Tagging Behavior
Com
parison of Statistics
Steffen Staab [email protected]
Web Science Doctoral Summer School 44
What is our Uranus?
What is this?
Steffen Staab [email protected]
Web Science Doctoral Summer School 45
Uranus = Spam
Effect of removing 257 spammers of 12.777 users from the ‘bookmark’ stream
[Dellschaft+Staab,WebSci 2010]
Steffen Staab [email protected]
Web Science Doctoral Summer School 46
Why care? The Bibsonomy Example
Complete snapshot of Bibsonomy system Manually labeled ground truth of spammers in the data set
Users Tags Resources TAS
Spammers 29,248 297,846 1,197,354 13,258,759
Non-Spammers 2,467 61,154 234,143 816,196
Steffen Staab [email protected]
Web Science Doctoral Summer School 47
Why care? The Delicious Example
Crawled during the TAGora Project Amount of spammers not known exactly Estimation based on random sample of 500 users: With 95% probability: Between 1.972 and 12.949 spammers Delicious most likely already applies spam detection Why care about ~ 1.5% spammers in Delicious?
Users Tags Resources TAS
532,938 2,482,850 18,778,566 140,305,446
Steffen Staab [email protected]
Web Science Doctoral Summer School 48
Filtering Results (Users)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 200
2000
4000
6000
8000
10000
12000
14000
16000
Number of Spammers and Non-Spammers
SpammerNon-Spammer
Steffen Staab [email protected]
Web Science Doctoral Summer School 49
Filtering Results (Tag Assignments)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 200
50000
100000
150000
200000
250000
300000
350000
400000
450000
Filtered and unfiltered number of TAS
SpamNon-Spam
Steffen Staab [email protected]
Web Science Doctoral Summer School 50
That’s why
Effect of removing 257 spammers of 12.777 users from the ‘bookmark’ stream
Steffen Staab [email protected]
Web Science Doctoral Summer School 51
How statistically significant is the epistemic model for normal users?
Steffen Staab [email protected]
Web Science Doctoral Summer School 52
Lessons learned
Uranus was discovered because it affected Neptun Pluto was discovered because it affected Uranus! Spammers can be discovered by their behavior,
even if you do not know what kind of spam they are producing!
Steffen Staab [email protected]
Web Science Doctoral Summer School 53
How do constellations in the sky evolve?
http://www.flickr.com/photos/furious-angel/2142647358/sizes/o/in/photostream/
Steffen Staab [email protected]
Web Science Doctoral Summer School 54
Agenda
• Risks and Opportunities in Social Communities: the ROBUST project
• Web Science Methodology: An explanation by analogy with Physics and some initial (!) applications to online communities • Modeling dynamic system at micro level,
Understanding collective effects (macro level) arising from individual behavior (micro level)
• Predicting dynamic system behavior, recognizing behavior deviating from the model
• Modeling dynamic system behavior at the macro level • Controling dynamic system behavior by collective action
Steffen Staab [email protected]
Web Science Doctoral Summer School 55
Example: Network
Person Friendship
Steffen Staab [email protected]
Web Science Doctoral Summer School 56
SUGGESTING WHOM TO LINK TO NEXT
Steffen Staab [email protected]
Web Science Doctoral Summer School 57
Use Networks for Recommendation
Goal: Predict who a person will add as friend Facebook's algorithm: find friends-of-friends → Problem: Rest of the network is ignored!
:-(
me
Steffen Staab [email protected]
Web Science Doctoral Summer School 58
Algebraic Graph Theory
0 1 0 0 0 01 0 1 1 0 00 1 0 1 0 00 1 1 0 1 00 0 0 1 0 10 0 0 0 1 0
Represent a network by an adjacency matrix A: Aij = 1 when i and j are connected Aij = 0 when i and j are not connected A is square and symmetric.
1 2 4 5 6
3
A =
1
2 3
4 5
6
1 2 3 4 5 6
Steffen Staab [email protected]
Web Science Doctoral Summer School 59
Baseline: Friend of a Friend Model
Count the number of ways a person can be found as the friend of a friend. Consider the matrix product AA = A2
1 2 4
3 0 1 0 0 0 01 0 1 1 0 00 1 0 1 0 00 1 1 0 1 00 0 0 1 0 10 0 0 0 1 0
1 0 1 1 0 00 3 1 1 1 01 1 2 1 1 01 1 1 3 0 10 1 1 0 2 00 0 0 1 0 1
=
2
Steffen Staab [email protected]
Web Science Doctoral Summer School 60
Eigenvalue Decomposition
Write the matrix A as a product:
A = UΛUT
where U are the eigenvectors UTU = I Λ are the eigenvalues Λij = 0 when i ≠ j
Steffen Staab [email protected]
Web Science Doctoral Summer School 61
Computing A2
Use the eigenvalue decomposition A = UΛUT
A2 = UΛUT UΛUT = UΛ2UT
Exploit U and Λ: UTU = I because U contains eigenvectors (Λ2)ii = Λii
2 because Λ contains eigenvalues Result: Just square all eigenvalues!
Steffen Staab [email protected]
Web Science Doctoral Summer School 62
Friend of a Friend of a Friend
Compute the number of friends-of-friends-of-friends: A3 = UΛUT UΛUT UΛUT = UΛ3UT
1 2 4 5 6
3
1 2 3 4 5 6 1 2 3 4 5 6
0 1 0 0 0 01 0 1 1 0 00 1 0 1 0 00 1 1 0 1 00 0 0 1 0 10 0 0 0 1 0
0 3 1 1 1 03 2 4 5 1 11 4 2 4 1 11 5 4 2 4 01 1 1 4 0 20 1 1 0 2 0
3
=
Steffen Staab [email protected]
Web Science Doctoral Summer School 63
Matrix Exponential
The matrix exponential can be written as a power sum with decreasing coefficients:
exp(A) = I + A + 1/2 A2 + 1/6 A3 + . . .
Recommendations for user ④: ① > ⑥ > ⑦
1 2 4 5
3
1 2 3 4 5 6 1 2 3 4 5 6 6
0 1 0 0 0 0 01 0 1 1 0 0 00 1 0 1 0 0 00 1 1 0 1 0 00 0 0 1 0 1 00 0 0 0 1 0 10 0 0 0 0 1 0
exp =
1.66 1.72 0.93 0.98 0.28 0.06 0.011.72 3.57 2.70 2.93 1.04 0.29 0.060.93 2.70 2.86 2.71 0.99 0.28 0.060.98 2.93 2.71 3.63 1.95 0.76 0.220.28 1.04 0.99 1.95 2.35 1.59 0.640.06 0.29 0.28 0.76 1.59 2.23 1.380.01 0.06 0.06 0.22 0.64 1.38 1.59
7 6
7
7
0.98 0.76 0.22
Steffen Staab [email protected]
Web Science Doctoral Summer School 64
Why the Matrix Exponential An = Number of paths of length n aA2 + bA3 + cA4 + . . . = Number of paths, weighted by path length → New edges more likely to appear when there are many paths already → When a > b > c > . . . > 0, short paths are weighted more
Steffen Staab [email protected]
Web Science Doctoral Summer School 65
Computing Power Series
Let p(A) be a power series:
p(A) = aA2 + bA3 + cA4 + . . . = aUΛ2UT + bUΛ3UT + cUΛ4UT + . . .
= U(aΛ2 + bΛ3 + cΛ4 + . . .)UT
= Up(Λ)UT
Therefore:
Power series change only the eigenvalues!
Steffen Staab [email protected]
Web Science Doctoral Summer School 66
TRACKING THE EVOLUTION OF THE NETWORK AS A WHOLE
Steffen Staab [email protected]
Web Science Doctoral Summer School 67
Diversity • Many, equally-sized subcommunities • High entropy • ‘Flat’ structure Regularity • Few large subcommunities • Low entropy • Many ‘hubs’
Steffen Staab [email protected]
Web Science Doctoral Summer School 68
Network Evolution
• How did a network look at time t? • Idea: Observe the change of diversity/regularity over time
⇒ ⇒
Steffen Staab [email protected]
Web Science Doctoral Summer School 69
Outline
1. Power-law exponent 2. Weighted spectral distribution 3. Network entropy 4. Network rank
Steffen Staab [email protected]
Web Science Doctoral Summer School 70
1. Power-law Exponent
Number of neighbors is unevenly distributed:
Results in a power-law (Newman 2006) Higher exponent γ denotes less regularity
C(n) ∼ n−γ
Epinions trust network (Massa et al. 2005)
Steffen Staab [email protected]
Web Science Doctoral Summer School 71
1. Power-law Exponent over Time
γ shrinks ⇒ Network becomes more regular
Epinions trust network (Massa et al. 2005)
Steffen Staab [email protected]
Web Science Doctoral Summer School 72
2. Weighted Spectral Distribution
• Consider the n×n matrix N defined by Nij = 1 / sqrt(d(i)d(j)) when (i,j) is an edge Nij = 0 otherwise Then the distribution of the eigenvalues of N is called the
weighted spectral distribution (WSD) (Fay et al. 2010) Eigenvalues nearer to ±1: diversity Eigenvalues nearer to 0: regularity
Steffen Staab [email protected]
Web Science Doctoral Summer School 73
2. Weighted Spectral Distribution over Time
• The WSD shifts to zero ⇒ Regularization The WSD shifts towards zero ⇒ The network becomes regular
CiteULike user–tag network (Emamy et al. 2007)
Steffen Staab [email protected]
Web Science Doctoral Summer School 74
3. Network Entropy
• Write the graph G as a sum of subgraphs Gk
G = G1 ∪ G2 ∪ . . . ∪ Gr
Each Gk has weighted edges, with total weight λk
• When picking an edge from G at random, the probability of it being in community Gk is
λk / (λ1 + λ2 + . . . + λr) = λk / L • The entropy of this distribution is (Kunegis et al. 2011)
H(G) = −Σk (λk / L) log (λk / L) • Entropy: Effective number of subcommunities
Steffen Staab [email protected]
Web Science Doctoral Summer School 75
3. Network Entropy over Time
Entropy is constant ⇒ Constant number of communities
Time (t)
Entropy (H
(G))
0
absolute
zoom
Enron email network (Klimt et al. 2004)
Steffen Staab [email protected]
Web Science Doctoral Summer School 76
4. Network Rank
Decompose network into subcommunities:
G = G1 ∪ G2 ∪ . . . ∪ Gr
The rank r is a measure of diversity:
rank(G) = r Weighted rank:
rank∗(G) = Σk |Gk| / |G1| Robust measure of diversity (Kunegis et al. 2011)
Steffen Staab [email protected]
Web Science Doctoral Summer School 77
4. Network Rank over Time
• Increasing network rank: increasing diversity • Shrinking network rank: shrinking diversity
Time (t)
Netw
ork rank (rank∗ (G
))
Enron email network (Klimt et al. 2004)
Steffen Staab [email protected]
Web Science Doctoral Summer School 78
More Network Rank Plots
hep-th citations Wikipedia elections Epinions trust network
frwikibooks edits MIT conference contacts YouTube social network
(biased towards good examples of convex evolution)
Steffen Staab [email protected]
Web Science Doctoral Summer School 79
Conclusion
• Power-law exponent shrinks – Connection diversity shrinking
• Weighted spectral distribution shifts to zero – Emerging main components
• Entropy is constant – Effective number of communities is constant
• Network rank increases, then shrinks – Two-phase- model of expansion
Steffen Staab [email protected]
Web Science Doctoral Summer School 80
Watch out!
KONECT – Koblenz Network Collection http://uni-koblenz.de/~kunegis/paper/kunegis-
konect.poster.pdf Coming soon! Follow #ictrobust or @kunegis or @ststaab
Steffen Staab [email protected]
Web Science Doctoral Summer School 81
Why has the sky the density it has?
Flickr, cc Oct 14, 2007, Michael Donough
Steffen Staab [email protected]
Web Science Doctoral Summer School 82
Why do tagging systems have so little spam?
User Roles
Content Quality
Community Policy
Content Process
Administrative Process
Steffen Staab [email protected]
Web Science Doctoral Summer School 83
Agenda
• Risks and Opportunities in Social Communities: the ROBUST project
• Web Science Methodology: An explanation by analogy with Physics and some initial (!) applications to online communities • Modeling dynamic system at micro level,
Understanding collective effects (macro level) arising from individual behavior (micro level)
• Predicting dynamic system behavior, recognizing behavior deviating from the model
• Modeling dynamic system behavior at the macro level • Controling dynamic system behavior by collective action
Steffen Staab [email protected]
Web Science Doctoral Summer School 84
Yahoo Answers
• Ensure quality of user generated content
• Use of administrators and community moderators How?
• Policy influences community processes
Steffen Staab [email protected]
Web Science Doctoral Summer School 86
Communities need Governance
Steering and coordinating actions of community members Goal: Successful and flourishing community High quality user-generated content Active community members
[ http://www.flickr.com/photos/61433480@N02/5593890914/, http://www.flickr.com/photos/boojee/3733902852/ ]
[Benz2004]
Steffen Staab [email protected]
Web Science Doctoral Summer School 87
Motivation
Different types of Web communities User-generated content (video, photos, comment, article,
questions, answers, posting, review text)
What are the most successful means of governance for user-generated content?
Analyze successful platforms and compare
their means of governance!
Steffen Staab [email protected]
Web Science Doctoral Summer School 88
Means of Governance
1. Direct intervention of community owner Affecting content or users based on apparent properties
2. Functionality of the community platform
User-generated Content
Community Member
Assessment Ratings Text Reviews Bookmarks Abuse Reports
Selection & Ranking Ratings
Time Views Replies Score
Hide Low Quality
Content Modification
Complex User Roles
Steffen Staab [email protected]
Web Science Doctoral Summer School 89
Method
Selection of 250 most prominent web sites with community functionality according to Alexa Page Rank
Clustering web sites in four groups according to purpose
Top-5 web sites of each group analyzed (*)
Social Media Editorial News
Social Networking Social Reviewing
Steffen Staab [email protected]
Web Science Doctoral Summer School 90
Key Results
(1) Abuse Reports are a successful means of governance. • 16 occurrences • Restricted to filter out unwanted content • Staff needed – expensive but efficient [Schwagereit2010]
(2) Simple ratings are dominant – but battle between “Like” and “Like/Dislike” • “Like”: 9 occurrences • “Like/Dislike”: 7 occurrences • Tradeoff between simplicity and improved ranking ability
Steffen Staab [email protected]
Web Science Doctoral Summer School 91
Key Results
(3) Creation time is most implemented ranking criterion • 18 occurrences • Others: score: 8, ratings: 6 • Important content is renewed - unimportant content will be
forgotten
(4) Content modification and user roles are rarely implemented 2 occurrences Requires complex role system and users
who understand it
Steffen Staab [email protected]
Web Science Doctoral Summer School 92
GOVERNANCE MODEL: DEEP DIVE - SIMULATION
Steffen Staab [email protected]
Web Science Doctoral Summer School 93
Methodology Principle
1. Define a Web Community model (Lycos IQ, Yahoo Answers…)
2. Adapt this model to an existing community 3. Estimate parameters
4. Define quality measure
5. Simulate community behaviour
6. Compare simulation results with real data 7. Analyze quality measures wrt variations of CoSiMo
parameters
Steffen Staab [email protected]
Web Science Doctoral Summer School 94
Dataset Lycos IQ
Time Period: 909 days Users: 34.327 Administrators: 36 Questions: 1.031.982 Answers: 2.996.446 Deleted non-compliant Answers: 21.139
Steffen Staab [email protected]
Web Science Doctoral Summer School 95
Observed parameters (input to simulation)
0-9991000-19992000-29993000-39994000-49995000-59996000-69997000-7999>7000
1
10
100
1000
10000
100000
0.0-
0.09
0.1-
0.19
0.2-
0.29
0.3-
0.39
0.4-
0.49
0.5-
0.59
0.6-
0.69
0.7-
0.79
0.8-
0.89
0.9-
1.0
Answers per year
Number of Users
Rate of Compliant Answers
Steffen Staab [email protected]
Web Science Doctoral Summer School 96
Example Behaviors and Example Policies
Reading Policies for Administrators:
PA: random selection of postings
PB: random selection of postings that no other administrator has examined so far
PC: selection of postings that were most often reported by users for being non-compliant
Promotion Policy: PM-X : ordinary users become
moderators (who can delete postings) when having at least X bonus points
Behaviors of Ordinary Users: • Create new postings
• Read existing postings • Report non-compliant postings OR give bonus points to poster
Moderator Users:
• Create new postings • Read existing postings
• Delete non-compliant posting OR give bonus points to poster
Administrators: •Read existing postings
•Delete non-compliant postings
Steffen Staab [email protected]
Web Science Doctoral Summer School 97
How many administrators are needed?
510204080
160
320
640
1280
2560
0,65
0,75
0,85
0,95
1,05
1418722881152Additional non-compliant
Postings (per day)
Recent Posting Quality
Number of Administrators
0,95-1,050,85-0,950,75-0,850,65-0,75
Steffen Staab [email protected]
Web Science Doctoral Summer School 98
Fighting spam with administrators…
0,990,9920,9940,9960,998
1
1
9 72 576
Applied Policies
Recent Posting Quality
Number of Administrators
0,998-10,996-0,9980,994-0,9960,992-0,9940,99-0,992
Variation of policies and number of administrators • Efficient policies result in high quality content • A minimum of 18 administrators are needed • Many moderators are needed to bring the quality to a high level
Steffen Staab [email protected]
Web Science Doctoral Summer School 99
Fighting spam with user moderators…
51020408016032064012802560
0,60,650,70,750,80,850,90,951
PAPA
+PB
PA+P
B+PC
PA+P
B+PC
+PM
3…PA
+PB+
PC+P
M1…
PA+P
B+PC
+PM
800
PA+P
B+PC
+PM
400
PA+P
B+PC
+PM
200
PA+P
B+PC
+PM
100
PA+P
B+PC
+PM
50PA
+PB+
PC+P
M25
PA+P
B+PC
+PM
12
Additional non-compliant
Postings (per day)
Recent Posting Quality
Applied Policies
0,95-1
0,9-0,950,85-0,9
Variation of policies and posting quality • A limited number of administrators has a limited capacity of filtering a surge of non-compliant postings • Moderators are helping to increase quality
Steffen Staab [email protected]
Web Science Doctoral Summer School 100
Lessons Learned
• Strategy of selecting questionable postings is crucial
• Reporting by normal users is the most effective strategy
• Moderators are not so effective as expected, if they hunt only incidentally for non-compliant content
• Sufficiently strong requirements regarding moderator profiles lead to high quality of moderators
• Policies for promoting users need to be based on a criterion that is time dependent
Steffen Staab [email protected]
Web Science Doctoral Summer School 101
Agenda
• Risks and Opportunities in Social Communities: the ROBUST project
• Web Science Methodology: An explanation by analogy with Physics and some initial (!) applications to online communities • Modeling dynamic system at micro level,
Understanding collective effects (macro level) arising from individual behavior (micro level)
• Predicting dynamic system behavior, recognizing behavior deviating from the model
• Modeling dynamic system behavior at the macro level • Controling dynamic system behavior by collective action
Steffen Staab [email protected]
Web Science Doctoral Summer School 102
Are we satisfied here? No! Not by far!
Understand how and why users tag or tweet? -> What are people‘s limitations that affect the system? -> Psychology and Sociology! What are their legal boundaries? -> How can you shape the systems? -> Law! What are organizations‘ incentives? -> Why and how do organizations participate? -> Nice example: open source -> Economy
Steffen Staab [email protected]
Web Science Doctoral Summer School 104
References
The Slashdot Zoo: Mining a social network with negative edges J. Kunegis, A. Lommatzsch and C. Bauckhage In Proc. World Wide Web Conf., pp. 741–750, 2009.
Learning spectral graph transformations for link prediction J. Kunegis and A. Lommatzsch In Proc. Int. Conf. on Machine Learning, pp. 561–568, 2009.
Spectral analysis of signed graphs for clustering, prediction and visualization J. Kunegis, S. Schmidt, A. Lommatzsch and J. Lerner In Proc. SIAM Int. Conf. on Data Mining, pp. 559–570, 2010.
Network growth and the spectral evolution model J. Kunegis, D. Fay and C. Bauckhage In Proc. Conf. on Information and Knowledge Management, pp. 739–748, 2010.
Steffen Staab [email protected]
Web Science Doctoral Summer School 105
References
B. Viswanath, A. Mislove, M. Cha, K. P. Gummadi, On the evolution of user interaction in Facebook. In Proc. Workshop on Online Social Networks, pp. 37–42, 2009.
Steffen Staab [email protected]
Web Science Doctoral Summer School 106
References
K. Dellschaft, S. Staab. An Epistemic Dynamic Model for Tagging Systems. HYPERTEXT 2008, Proceedings of the 19th ACM Conference on Hypertext and Hypermedia, June 19-21, 2008 - Pittsburgh, Pennsylvania, USA.
K. Dellschaft, S. Staab. On Differences in the Tagging Behavior of Spammers and Regular Users. In: Proc. of WebSci-2010, Raleigh, April, 2010.
F. Schwagereit, S. Sizov, S. Staab. Finding Optimal Policies for Online Communities with CoSiMo. In: Proc. of WebSci-2010, Raleigh, US, April, 2010.