Event summarization using tweets

54
Event Summarization using Tweets Deepayan Chakrabarti and KunalPunera Yahoo!Research

description

Deepayan Chakrabarti and KunalPunera Yahoo!Research

Transcript of Event summarization using tweets

  • 1. Event Summarization using Tweets Deepayan Chakrabarti and KunalPunera Yahoo!Research

2. Abstract Forsome highly structured and recurring events, such as sports, it is better to use more sophisticated techniques to summarize the relevant tweets. A solution based on learning the underlying hidden state representation of the event via Hidden Markov Models. 3. Introduction one-shotevents Have structure or are long-running (a)the most recent tweets could be repeating the same information about the event (b)most users would be interested in a summary of the occurrences in the game so far. 4. Introduction Ourgoalto extract a few tweets that best describe the chain of interesting occurrences in that eventA 1. 2.two-step process Segment the event time-line pick key tweets to describe each segment 5. Introduction challenges Events are typically bursty Separate sub-events may not be temporally far apart Previous instances of similar events are available. Tweets are noisy Strong empirical results. 6. Characteristics of Sports Coverage in Tweets 7. Characteristics of Sports Coverage in Tweets 8. Characteristics of Sports Coverage in Tweets Some 1. 2.issues of this data: sub-events are marked by increased frequency of tweets. Boundaries of sub-events also result in a change in vocabulary of tweets. 9. Algorithms Baseline:SUMMALLTEXT associate with each tweet a vector of the TF-logIDF of its constituent words Cosine distance Select those tweets which are closest to all other tweets from theevent. 10. Algorithms 11. Algorithms Several 1. 2.defects O ( |Z|2) computations heavily biased towards the most popular sub-event 12. Algorithms Baseline: 1. 2.SUMMTIMEINT Split up the duration into equal-sized time intervals Select the key tweets from each interval Two 1. 2.extra parameters: a segmentation TS of the duration of the event into equal-time windows the minimum activity threshold l 13. Algorithms 14. Algorithms Defects:Burstiness of tweet volume: Multiple sub-events in the same burst: Cold Start : 15. Algorithms OurApproach: SUMMHMM BACKGROUND ON HMMS: N states labeled S1 ,, SN , A set of observation symbols v1 ,, vM bi(k) a ij i 16. Algorithms Eachstate: one class of sub-events The symbols: the words used in tweets The variation in symbol probabilities across different states: the different language models used by the Twitter users The transitions between states models the chain of sub-events over time 17. Algorithms OurModications OUTPUTS PER TIME STEP: a multiset of symbols DETECTING BURSTS IN TWEET VOLUME: COMBINING INFORMATION FROM MULTIPLE EVENTS 18. Algorithms threesets of symbol probabilities: (1)( s ) , which is specic to each state but is the same for all events, (2) ( sg ) , which is specic to a particular state for a particular game (3) ( bg ) , which is a background distribution of symbols over all states and games. 19. Algorithms AlgorithmSummary Input: multiple events of the same type Learns the model parameters that bestt the data. EM algorithm the optimal segmentation standard V iterbi algorithm 20. Algorithms standardViterbi algorithm 21. Algorithms 22. Experiments ExperimentalSetup professional American Football Sep 12th, 2010 to Jan 24th, 2011 over 440K tweets over 150 games for an average of around 1760 tweets per game. 23. Experiments MANUALGROUND TRUTH CONSTRUCTION . Each output tweet was matched with the happenings in the game and labeled as Comment-Play , Comment-Game , or Comment-General . 24. Experiments Play-by-PlayPerformance RECALL PRECISIONSummary Construction 25. EVALUATIONAT OPERATING POINT . 26. conclusion Weproposed an approach based on learning an underlying hidden state representation of an event . 27. Towards Twitter Context Summarization with User Inuence Models 28. ABSTRACT Traditionalsummarization techniques only consider text information. We study how user inuence models, which project user interaction information onto a Twitter context tree, can help Twitter context summarization within a supervised learning framework. 29. INTRODUCTION ATwitter context tree is dened as a tree structure of tweets which are connected with reply relationship, and the root of a context tree is its original tweet. two types of user inuence models, called pair-wise user inuence model and global user inuence model. Granger Causality inuence model PageRank algorithm 30. TWITTER CONTEXT TREE ANALYSIS Thetemporal growth of the Twitter context tree 31. TWITTER CONTEXT TREE ANALYSIS Whetherthe tree structure can help the summarization task 32. USER INFLUENCE MODELS GrangerCausality Inuence Model A time series data x is to Granger cause another time series data y ,If and only if regressing for y in terms of both past values of y and x is statistically signicantly more accurate than regressing for y in terms of past values of y only. Let 33. USER INFLUENCE MODELS Lasso-Granger methodLag ( X,T )to denote the lagged version of data X ; FullyConnectedFeatureGraph ( X ) denotes the fully connected graph dened over the features; Lasso ( y, Xlag )denotes the set of temporal variables receiving a non-zero co-ecient by the Lasso algorithm. 34. USER INFLUENCE MODELS PagerankInuence Model For each user u , it has a directed edge to each user v if u has a reply or a retweet to v s tweet and we can have a global user graph G . 35. SUMMARIZATION METHOD Text-based TFIDFSignals 36. SUMMARIZATION METHOD PopularitySignals Number of replies, number of retweets, and number of followers for a given tweets author. 37. SUMMARIZATION METHOD Temporal 1. 2.Signals t the age of tweets in a context tree into an exponential distribution. for each tweet, we compute its temporal signal as the likelihood of sampling its age from the tted exponential distribution. 38. Supervised Learning Framework GradientalgorithmBoosted Decision Tree(GBDT) 39. EDITORIAL DATA SET 10Twitter context trees from March 7th to March 20th,2011 4 are initiated by Lady Gaga 6 are initiated by Justin Bieber 1. read the root tweet 2. Scans through all candidate tweets 3. Selects 5 to 10 tweets 40. EDITORIAL DATA SET 41. EXPERIMENTS EvaluationMetrics 42. Methods for Comparison Centroid: SimToRoot: Linear: Mead: LexRank SVD: ContentOnly ContentAttribute: AllNoGranger: All: 43. Experimental Results OverallComparison 44. CONCLUSION Userinuence information is very helpful to generate a high quality summary for each Twitter context tree. All signals are converted into features, and we cast Twitter context summarization into a supervised learning problem.