Dynamics of Conversations
ACM SIGKDD ’10
By Ravi Kumar, Mohammad Mahdian, & Mary McGlohon
Presented by Annie T. Chen on March 29, 2011.
Overview
RQ: What is the structure of online conversations?
Method Proposed a simple mathematical model for
the structure of conversations Added to it to account for factors such as
recency and author identity that may affect conversations.
Compared the predictions of these models back to the empirical data for three datasets: Usenet groups, Yahoo! Groups, and Twitter
Properties of Conversations
Size and depth of thread Depth: length of the maximum path from the
root to a leaf in a thread Size is roughly quadratic to depth
Degree distribution p Close to power law: p(k) k- for some >2
Branching Process Model (BP-Model) - 1 The Galton-Watson branching process is a
classic model for generating a random tree. At each ith step in the process, each node
generates a certain number of children according to the distribution p
p(k): fraction of nodes with k children in the data
Zi: number of children at the ith level of the thread
let =E[p], the mean of the distribution p
Branching Process Model (BP-Model) - 2 According to the definition of a branching
process, it can be shown that:E[Z] = (1-)-1
Since < 1 for all datasets, the branching process dies out.
Empirical Simulated
Branching Process Model (BP-Model) - 3 Problems with the BP-Model
Model is not generative (degree distributions are stipulated)
Model does not capture the depth distributions that are observed in reality
Number of children is determined by a single distribution
Timestamps are left out
T-Model
Concept: new messages receive more attention than old ones
Probability of the decision to add a child to v is proportional to some function h(degv, rv) of degree and recency of v
Probability of death is proportional to a constant
h(degv, rv) = degv+rv for constants >=0 and (0,1)
Thus, both degree and recency play a role in generating different types of threads
TI-Model - 1
The TI-Model was developed to model author identity.
Concept: authors tend to respond to responses to their own earlier messages.
Based on the polya urn model Original polya urn problem:
Initially, an urn has x balls of color 1 and y balls of color 2. At each time t, one ball is drawn out and returned to the urn with another ball of the same color.
“Rich get richer” process
TI-Model - 2
New message v arrives with u=parent(v)
“Identity copying” effectEmpirical Simulated
an author on path(parent(u))
random author
Examples
Usenet Yahoo! Groups Twitter
Usenet
Empirical
Simulated
Usenet
Group
It.discussioni.leggende.metropolitane 10
It.politica.polo 10
Rec.games.chess.politics 3
Bln.politik.rassismus 2
Sk.politics 1.5
High : Higher degree of preferential attachment Top ones tended to be politically related
Group
fa.linux.kernel 0.98
uk.politics.electoral 0.98
rec.arts.drwho 0.97
uk.politics.crime 0.97
chile.soc.politica 0.96
High : High recency effect Lower traffic groups had a higher recency effect
Usenet Identity copying rates
High (low copying rate): new authors tend to join in often Low (high copying rate): tendency for authors of posts to
have previously already authored a post
High (low copying rate):
or.politics
alt.fan.cecil-adams
alt.marketplace.online.ebay
pl.misc.kolej
rec.arts.sf.written
Low (high copying rate) linux.debian.bugs.dist
microsoft.public.excel.misc
microsoft.public.excel.programming
nctu.talk
tw.bbs.campus.nctu
Yahoo! Groups
Groups with “bushy” threads and high recency effects
Group
indianmedical =10
IllinoisSpeakers
DetectiveRichardHead
Bodybuildersaverageguys
villageDesign
NorthCarolinaSpeakers =0.99
stbaseliosorthodoxchurch
LostnFoundEvents
PatriceVinci
molecular-biology-notebook
Group
#mustsee =10
#twitterinreallife
#readingrainbow
#whathappenswhen
#vogueevolution
#yankees =0.99
#warriors
#tiff09
#iranelectioni
#followfriday
Groups with “bushy” threads and high recency effects
Conclusion Employed various mathematical models to simulate
patterns in online conversations Strengths:
Incorporated time and author identity in the models Were able to predict patterns that were found in
actual datasets Weaknesses / further directions:
Explanatory power: how well do these models explain differences between conversational environments and/or networks?
Could incorporate other elements of conversation:• Topics• Structural/semantic components of messages• Actor characteristics/roles
How well do these models emulate different types of communication tools, e.g. Twitter?
References
Aldous, D. (2003). Lecture 2: Branching Processes. Accessed March 29, 2011 at http://www.stat.berkeley.edu/~aldous/Networks/lec2.pdf.
Kumar, R., Mahdian, M., & McGlohon, M. (2010). Dynamics of conversations. ACM SIGKDD 2010.
Zhu, T. (2009). Nonlinear Polya Urn Models and Self-Organizing Processes. Accessed March 29, 2011 at http://www.math.upenn.edu/grad/dissertations/tongzhudissertation.pdf.
Top Related