A Visual Framework for Clustering Memes in Social Mediaanh/wp-content/uploads/2016/... · detection...
Transcript of A Visual Framework for Clustering Memes in Social Mediaanh/wp-content/uploads/2016/... · detection...
![Page 1: A Visual Framework for Clustering Memes in Social Mediaanh/wp-content/uploads/2016/... · detection on twitter based on temporal and social terms evaluationÓ (2010). In Proceedings](https://reader036.fdocuments.in/reader036/viewer/2022081406/5f0ee1ad7e708231d441659b/html5/thumbnails/1.jpg)
A Visual Framework for Clustering
Memes in Social Media
Authors: Anh Dang1, Abidalrahman Moh’d1, Anatoliy Gruzd2, Evangelos Milios1, and Rosane Minghim3
Presenter: Anh Dang
Advanced in Social Networks Analysis and Mining 2015
Date: August 28, 2015
1 {anh,amohd,eem}@cs.dal.ca, Dalhousie University
2 [email protected], Ryerson University
3 [email protected], University of Sao Paulo-USP
1
![Page 2: A Visual Framework for Clustering Memes in Social Mediaanh/wp-content/uploads/2016/... · detection on twitter based on temporal and social terms evaluationÓ (2010). In Proceedings](https://reader036.fdocuments.in/reader036/viewer/2022081406/5f0ee1ad7e708231d441659b/html5/thumbnails/2.jpg)
Overview
• Problem
• Motivation
• Related work
• Contributions
• The meme detection framework
• Internal Centrality-Based Weighting
• Similarity Score Reweighting with Relevance User Feedback
• Results
• Conclusions
2
![Page 3: A Visual Framework for Clustering Memes in Social Mediaanh/wp-content/uploads/2016/... · detection on twitter based on temporal and social terms evaluationÓ (2010). In Proceedings](https://reader036.fdocuments.in/reader036/viewer/2022081406/5f0ee1ad7e708231d441659b/html5/thumbnails/3.jpg)
Problem
• A meme is a unit of information that can spread from person to person through online social networks (OSNs).
• OSNs can be used for spreading spam, slander, and rumors. • Swine flu pandemic memes in
2009.
• How to detect memes or emerging events in OSNs effectively?
3
![Page 4: A Visual Framework for Clustering Memes in Social Mediaanh/wp-content/uploads/2016/... · detection on twitter based on temporal and social terms evaluationÓ (2010). In Proceedings](https://reader036.fdocuments.in/reader036/viewer/2022081406/5f0ee1ad7e708231d441659b/html5/thumbnails/4.jpg)
Motivation
• Detecting memes in OSNs is a challenging task due to the limitations of textual content (e.g., Tweets 140 character limit).
• Manually labelling memes is infeasible on a large scale.
• Clustering is a simple and efficient approach.
• Lexical content similarity does not work well for memes that are related but not using the same words.
• Polysemy or synonymy.
4
“Meme about Obama”
![Page 5: A Visual Framework for Clustering Memes in Social Mediaanh/wp-content/uploads/2016/... · detection on twitter based on temporal and social terms evaluationÓ (2010). In Proceedings](https://reader036.fdocuments.in/reader036/viewer/2022081406/5f0ee1ad7e708231d441659b/html5/thumbnails/5.jpg)
Related Work
• Leskovec et al. (2009) proposed Meme-tracking system.
• studied the signature path and topic of each meme by grouping similar tweets in Twitter.
• Cataldi et al. (2010) monitored the real-time spread of emerging memes in Twitter.
• constructed a navigable topic graph connecting related memes.
• JafariAsbagh et al. (2014) introduced the problem of meme clustering.
• grouped structurally related tweets in Twitter.
5
![Page 6: A Visual Framework for Clustering Memes in Social Mediaanh/wp-content/uploads/2016/... · detection on twitter based on temporal and social terms evaluationÓ (2010). In Proceedings](https://reader036.fdocuments.in/reader036/viewer/2022081406/5f0ee1ad7e708231d441659b/html5/thumbnails/6.jpg)
Our contributions
• Automatically annotate a meme dataset about five popular topics in Reddit.
• Leverage various types of external content for computing similarity between social network text.
• Propose, implement, and evaluate two similarity combination strategies for meme clustering tasks.
• Internal Centrality-Based Weighting (ICW).
• Similarity Score Reweighting with Relevance User Feedback (SSR).
• Explore the use of semantic similarity score for meme clustering tasks.
6 1. GTM - http://ares.research.cs.dal.ca/gtm/
![Page 7: A Visual Framework for Clustering Memes in Social Mediaanh/wp-content/uploads/2016/... · detection on twitter based on temporal and social terms evaluationÓ (2010). In Proceedings](https://reader036.fdocuments.in/reader036/viewer/2022081406/5f0ee1ad7e708231d441659b/html5/thumbnails/7.jpg)
Meme Detection Framework
7
The meme detection framework is to:
• collect and analyze data in Reddit automatically.
• cluster similar Reddit submissions into the same meme using semantic
similarity score.
![Page 8: A Visual Framework for Clustering Memes in Social Mediaanh/wp-content/uploads/2016/... · detection on twitter based on temporal and social terms evaluationÓ (2010). In Proceedings](https://reader036.fdocuments.in/reader036/viewer/2022081406/5f0ee1ad7e708231d441659b/html5/thumbnails/8.jpg)
Reddit submission
• A Reddit submission contains the following elements:
8
Title
Comments
Url or Image
![Page 9: A Visual Framework for Clustering Memes in Social Mediaanh/wp-content/uploads/2016/... · detection on twitter based on temporal and social terms evaluationÓ (2010). In Proceedings](https://reader036.fdocuments.in/reader036/viewer/2022081406/5f0ee1ad7e708231d441659b/html5/thumbnails/9.jpg)
External Content
URL:
IMAGE:
9
![Page 10: A Visual Framework for Clustering Memes in Social Mediaanh/wp-content/uploads/2016/... · detection on twitter based on temporal and social terms evaluationÓ (2010). In Proceedings](https://reader036.fdocuments.in/reader036/viewer/2022081406/5f0ee1ad7e708231d441659b/html5/thumbnails/10.jpg)
Similarity Measures
Google-trigram method (GTM1) is used to:
• compute the semantic similarity between two submissions.
• produce a score from 0 to 1 between two texts.
• Title similarity: is the GTM score between two titles.
• Comment similarity: is the GTM score between their comments that are concatenated together.
• URL similarity: is the GTM score between their URL texts.
• Image similarity: is the GTM score between their image contents.
10 1. GTM - http://ares.research.cs.dal.ca/gtm/
![Page 11: A Visual Framework for Clustering Memes in Social Mediaanh/wp-content/uploads/2016/... · detection on twitter based on temporal and social terms evaluationÓ (2010). In Proceedings](https://reader036.fdocuments.in/reader036/viewer/2022081406/5f0ee1ad7e708231d441659b/html5/thumbnails/11.jpg)
Combinations
Explore different similarity score combination strategies:
• MAX: selects the highest value among the four pairwise
similarity scores.
• AVG: calculates the average of the four pairwise similarity
scores.
• LINEAR: experiments a linear combination among the
four pairwise similarity scores.
11
![Page 12: A Visual Framework for Clustering Memes in Social Mediaanh/wp-content/uploads/2016/... · detection on twitter based on temporal and social terms evaluationÓ (2010). In Proceedings](https://reader036.fdocuments.in/reader036/viewer/2022081406/5f0ee1ad7e708231d441659b/html5/thumbnails/12.jpg)
Combinations (cont.)
• Internal Centrality-Based Weighting:
• calculates the weight factors for each element of a submission
by considering its surrounding context between two
submissions.
• The algorithm:
• For each weight factor and each submission:
• compute the semantic similarity score between the selected element with the
other 3 elements.
• Multiple the calculated values between two submissions.
• Normalize the scores for all the four weight factors.
12
![Page 13: A Visual Framework for Clustering Memes in Social Mediaanh/wp-content/uploads/2016/... · detection on twitter based on temporal and social terms evaluationÓ (2010). In Proceedings](https://reader036.fdocuments.in/reader036/viewer/2022081406/5f0ee1ad7e708231d441659b/html5/thumbnails/13.jpg)
Combinations (cont.)
• Similarity Score Reweighting with Relevance User
Feedback
13
Two selected submissions
Weights
![Page 14: A Visual Framework for Clustering Memes in Social Mediaanh/wp-content/uploads/2016/... · detection on twitter based on temporal and social terms evaluationÓ (2010). In Proceedings](https://reader036.fdocuments.in/reader036/viewer/2022081406/5f0ee1ad7e708231d441659b/html5/thumbnails/14.jpg)
Evaluation Metrics
• 2439 submissions, 1003 URLs, and 230 images about 5 popular topics in Reddit from Oct to Nov 2014.
• Results returned from each search were labeled with their corresponding keywords.
• Hierarchical clustering is used for the evaluation.
• Purity is used as an evaluation metric.
• The number of correctly assigned submissions.
14
The experiment ground-truth dataset
![Page 15: A Visual Framework for Clustering Memes in Social Mediaanh/wp-content/uploads/2016/... · detection on twitter based on temporal and social terms evaluationÓ (2010). In Proceedings](https://reader036.fdocuments.in/reader036/viewer/2022081406/5f0ee1ad7e708231d441659b/html5/thumbnails/15.jpg)
Semantic vs. Syntactical
15
• TF-IDF and Euclidean distance are used as a baseline similarity distance.
• GTM as a similarity distance outperforms Euclidean-based TF-IDF vectors.
• GTM scales better with an increasing vocabulary size.
GTM vs. Baseline
![Page 16: A Visual Framework for Clustering Memes in Social Mediaanh/wp-content/uploads/2016/... · detection on twitter based on temporal and social terms evaluationÓ (2010). In Proceedings](https://reader036.fdocuments.in/reader036/viewer/2022081406/5f0ee1ad7e708231d441659b/html5/thumbnails/16.jpg)
Experiment – Combination
16
ICW outperforms MAX and AVG.
ICW and SSR.
SSR improves the
clustering results
for ICW, MAX
and AVG.
![Page 17: A Visual Framework for Clustering Memes in Social Mediaanh/wp-content/uploads/2016/... · detection on twitter based on temporal and social terms evaluationÓ (2010). In Proceedings](https://reader036.fdocuments.in/reader036/viewer/2022081406/5f0ee1ad7e708231d441659b/html5/thumbnails/17.jpg)
Conclusions
• Present a framework to tackle the problem of meme
detection in Reddit.
• Define and propose several pairwise similarity scores
between elements of two submissions using
semantic similarity.
• The proposed Internal Centrality-Based Weighting
and Similarity Score Reweighting with Relevance
User Feedback strategy achieves the best result.
17
![Page 18: A Visual Framework for Clustering Memes in Social Mediaanh/wp-content/uploads/2016/... · detection on twitter based on temporal and social terms evaluationÓ (2010). In Proceedings](https://reader036.fdocuments.in/reader036/viewer/2022081406/5f0ee1ad7e708231d441659b/html5/thumbnails/18.jpg)
Future Work
• Extend this proposed framework to other social
network websites, such as Twitter, Facebook, and
Google Plus.
• Compare the spread of rumour-driven memes
between Reddit and other OSNs.
• provide a more holistic view of rumour spread.
• Visualize how a rumour related meme is discussed
and spread in Reddit.
18
![Page 19: A Visual Framework for Clustering Memes in Social Mediaanh/wp-content/uploads/2016/... · detection on twitter based on temporal and social terms evaluationÓ (2010). In Proceedings](https://reader036.fdocuments.in/reader036/viewer/2022081406/5f0ee1ad7e708231d441659b/html5/thumbnails/19.jpg)
References
• J. Leskovec, L. Backstrom, and J. Kleinberg, “Meme-tracking and the dynamics of the news cycle” (2009). In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2009, pp. 497–506.
• M. Cataldi, L. Di Caro, and C. Schifanella, “Emerging topic detection on twitter based on temporal and social terms evaluation” (2010). In Proceedings of the Tenth International Workshop on Multimedia Data Mining. ACM, 2010, p. 4.
• M. JafariAsbagh, E. Ferrara, O. Varol, F. Menczer, and A. Flammini,“Clustering memes in social media streams” (2014). Social Network Analysis and Mining, vol. 4, no. 1, 2014
19
![Page 20: A Visual Framework for Clustering Memes in Social Mediaanh/wp-content/uploads/2016/... · detection on twitter based on temporal and social terms evaluationÓ (2010). In Proceedings](https://reader036.fdocuments.in/reader036/viewer/2022081406/5f0ee1ad7e708231d441659b/html5/thumbnails/20.jpg)
• Questions
20