Social Event Detection using Multimodal Clustering and Integrating Supervisory Signals
-
Upload
symeon-papadopoulos -
Category
Technology
-
view
923 -
download
2
description
Transcript of Social Event Detection using Multimodal Clustering and Integrating Supervisory Signals
![Page 1: Social Event Detection using Multimodal Clustering and Integrating Supervisory Signals](https://reader034.fdocuments.in/reader034/viewer/2022042607/5550b531b4c905fa618b4a5c/html5/thumbnails/1.jpg)
ACM International Conference on Multimedia Retrieval
Hong Kong, Jun 5-8, 2012
Social event detection using multimodal clustering and integrating supervisory signals
Georgios Petkos, Symeon Papadopoulos, Yiannis Kompatsiaris
Centre for Research and Technology Hellas, Information Technologies Institute (CERTH-ITI)
![Page 2: Social Event Detection using Multimodal Clustering and Integrating Supervisory Signals](https://reader034.fdocuments.in/reader034/viewer/2022042607/5550b531b4c905fa618b4a5c/html5/thumbnails/2.jpg)
mklab.iti.gr socialsensor.eu #2
Social Events in Multimedia
Event detection in multimedia:
• Real-world events attendants taking photos captured photos are shared in social networks
• Multimedia collection find groups of images depicting the real-world events
soccer
music
![Page 3: Social Event Detection using Multimodal Clustering and Integrating Supervisory Signals](https://reader034.fdocuments.in/reader034/viewer/2022042607/5550b531b4c905fa618b4a5c/html5/thumbnails/3.jpg)
mklab.iti.gr socialsensor.eu #3
Problem Setting & Formulation
• Collection of images + metadata
– Metadata typically include tags, geotagging information, timestamp, owner
– Metadata can be noisy or missing
– A set of feature vectors can be extracted from each image and its metadata
• Problem:
– Find groups of images such that each group depict a unique social event
Essentially, an image clustering problem.
![Page 4: Social Event Detection using Multimodal Clustering and Integrating Supervisory Signals](https://reader034.fdocuments.in/reader034/viewer/2022042607/5550b531b4c905fa618b4a5c/html5/thumbnails/4.jpg)
mklab.iti.gr socialsensor.eu #4
The Role of Different Features
• Visual similarity: Images look similar
• Spatial-temporal context: Images were captured at approximately the same location and time
• Tags: Users have annotated images using similar tags
• Same owner: Photos captured by the same person
PROBLEM: We don’t know what matters most
![Page 5: Social Event Detection using Multimodal Clustering and Integrating Supervisory Signals](https://reader034.fdocuments.in/reader034/viewer/2022042607/5550b531b4c905fa618b4a5c/html5/thumbnails/5.jpg)
mklab.iti.gr socialsensor.eu #5
Heuristics-based Approaches
• Rely on online sources and text metadata [Ruocco & Ramampiaro, 2011; Liu et al., 2011b]
– structured data about events may not be available in online sources
– for many images, text metadata can be of low quality
• Use heuristics [Liu et al., 2011a; Papadopoulos et al., 2011] (e.g. “all photos taken by the same user at the same day same event”)
– such heuristics are manually constructed in ad hoc ways
![Page 6: Social Event Detection using Multimodal Clustering and Integrating Supervisory Signals](https://reader034.fdocuments.in/reader034/viewer/2022042607/5550b531b4c905fa618b4a5c/html5/thumbnails/6.jpg)
mklab.iti.gr socialsensor.eu #6
Multimodal Clustering Approaches
Existing approaches: • May utilize early/late fusion strategies. The final result will
depend heavily on the fusion weights [Cai et al., 2011] – It may be difficult to determine appropriate weights, either manually
or using a search procedure.
• May attempt to estimate generative models or minimize the disagreement between the clusterings according to different modalities [Bekkerman & Jeon, 2007; Khalidov et al., 2011] – Some modalities are more important than others when desired
clusters correspond to specific concepts.
In order to create clusters that correspond to semantically
different concepts, will require putting more emphasis on the appropriate features.
![Page 7: Social Event Detection using Multimodal Clustering and Integrating Supervisory Signals](https://reader034.fdocuments.in/reader034/viewer/2022042607/5550b531b4c905fa618b4a5c/html5/thumbnails/7.jpg)
mklab.iti.gr socialsensor.eu #7
Baseline Multimodal Clustering
early fusion
![Page 8: Social Event Detection using Multimodal Clustering and Integrating Supervisory Signals](https://reader034.fdocuments.in/reader034/viewer/2022042607/5550b531b4c905fa618b4a5c/html5/thumbnails/8.jpg)
mklab.iti.gr socialsensor.eu #8
Rationale of Proposed Approach
• What if during the clustering procedure we take into account a relevant example clustering?
• This would essentially integrate a supervisory signal in the multimodal clustering procedure.
How to do this?
• Essentially, we want to define what it means for two items expressed in multiple modalities to belong in the same cluster, and then, try to learn this from example clusterings.
![Page 9: Social Event Detection using Multimodal Clustering and Integrating Supervisory Signals](https://reader034.fdocuments.in/reader034/viewer/2022042607/5550b531b4c905fa618b4a5c/html5/thumbnails/9.jpg)
mklab.iti.gr socialsensor.eu #9
Proposed Approach
1. For the items in the input clustering for our task, compute the distances between all pairs of items for all modalities.
2. For each pair of items compile the distances (for all modalities) in a vector. For pairs of items, assign a +ve label (same cluster) and –ve (different cluster)
3. Train a classifier to predict a “same cluster” relationship for pairs of items.
4. For each item in the test set to be clustered compute the “same cluster” relationship using that classifier.
5. Form an “indicator vector” for each item to be clustered summarizes the “same cluster” relationship to the other items to be clustered.
6. Cluster indicator vectors (e.g. using k-means) to determine the final multimodal clustering.
![Page 10: Social Event Detection using Multimodal Clustering and Integrating Supervisory Signals](https://reader034.fdocuments.in/reader034/viewer/2022042607/5550b531b4c905fa618b4a5c/html5/thumbnails/10.jpg)
mklab.iti.gr socialsensor.eu #10
Overview of Proposed Approach
supervised fusion
1 2
3-4
5 6
![Page 11: Social Event Detection using Multimodal Clustering and Integrating Supervisory Signals](https://reader034.fdocuments.in/reader034/viewer/2022042607/5550b531b4c905fa618b4a5c/html5/thumbnails/11.jpg)
mklab.iti.gr socialsensor.eu #11
Indicator Vectors
Indicator vectors of items that correspond to the same cluster should be more similar to each other than to indicator vectors of items that do not correspond to the same cluster.
![Page 12: Social Event Detection using Multimodal Clustering and Integrating Supervisory Signals](https://reader034.fdocuments.in/reader034/viewer/2022042607/5550b531b4c905fa618b4a5c/html5/thumbnails/12.jpg)
mklab.iti.gr socialsensor.eu #12
Evaluation - Dataset/Features
• Benchmark dataset:
MediaEval Social Event Detection 2011
• 36 social events of two types (soccer, music) comprising 2,074 Flickr images
• Features [distance]:
– SIFT BoW [cosine similarity]
– Time uploaded [absolute difference in hours]
– Tags [cosine similarity]
– Geo-location (for ~20% of images) [geodesic distance]
![Page 13: Social Event Detection using Multimodal Clustering and Integrating Supervisory Signals](https://reader034.fdocuments.in/reader034/viewer/2022042607/5550b531b4c905fa618b4a5c/html5/thumbnails/13.jpg)
mklab.iti.gr socialsensor.eu #13
Evaluation - Protocol
• Split set of event in two 50-50 random sets. One set used for training the classifier, other used for testing clustering accuracy.
• Evaluated against a multimodal spectral clustering approach that uses a short of early fusion strategy. Search in the space of fusion parameters executed.
• 10 random runs were executed: in each run, a separate random subset of the events was used for training and the rest was used for testing.
![Page 14: Social Event Detection using Multimodal Clustering and Integrating Supervisory Signals](https://reader034.fdocuments.in/reader034/viewer/2022042607/5550b531b4c905fa618b4a5c/html5/thumbnails/14.jpg)
mklab.iti.gr socialsensor.eu #14
Evaluation - Results (1)
• Best NMI achieved by proposed approach
![Page 15: Social Event Detection using Multimodal Clustering and Integrating Supervisory Signals](https://reader034.fdocuments.in/reader034/viewer/2022042607/5550b531b4c905fa618b4a5c/html5/thumbnails/15.jpg)
mklab.iti.gr socialsensor.eu #15
Evaluation - Results (2)
• Average and std. deviation of NMI achieved by tested methods
![Page 16: Social Event Detection using Multimodal Clustering and Integrating Supervisory Signals](https://reader034.fdocuments.in/reader034/viewer/2022042607/5550b531b4c905fa618b4a5c/html5/thumbnails/16.jpg)
mklab.iti.gr socialsensor.eu #16
Example Results (1)
• Proposed method: Correctly found three photos • Baseline: Apart from the three photos, it also included
irrelevant ones, e.g. (other soccer events, concert)
Event: CE Sabadell - Real Unión de Irún, 31 May 2009
![Page 17: Social Event Detection using Multimodal Clustering and Integrating Supervisory Signals](https://reader034.fdocuments.in/reader034/viewer/2022042607/5550b531b4c905fa618b4a5c/html5/thumbnails/17.jpg)
mklab.iti.gr socialsensor.eu #17
Example Results (2)
• Proposed method: Failed to include all relevant photos to a single cluster (it split them to three), but at least each of the three clusters contained only relevant ones.
• Baseline method: Not only split the photos into three clusters, but also included many irrelevant ones in each cluster.
Event: Barcelona FC triple celebration, 28 May 2009
![Page 18: Social Event Detection using Multimodal Clustering and Integrating Supervisory Signals](https://reader034.fdocuments.in/reader034/viewer/2022042607/5550b531b4c905fa618b4a5c/html5/thumbnails/18.jpg)
mklab.iti.gr socialsensor.eu #18
Conclusions
Proposed approach for multimodal clustering with an application on event detection in multimedia.
Advantages
• Does not rely on ad-hoc fusion strategies.
• Matches implicit semantics of example clusterings.
• Naturally handles missing modalities.
Disadvantages
• Computationally expensive: – computation of N2 “same cluster relationships”
– clustering of N dimensional vectors
![Page 19: Social Event Detection using Multimodal Clustering and Integrating Supervisory Signals](https://reader034.fdocuments.in/reader034/viewer/2022042607/5550b531b4c905fa618b4a5c/html5/thumbnails/19.jpg)
mklab.iti.gr socialsensor.eu
Future Work
• Study how larger-scale training (last.fm, upcoming, eventful) affects performance
• Reduce “same-cluster” feature space (to K << N2)
– Representative image selection
– Dimensionality reduction
• Integrate event selection step in the proposed approach (currently it considers all images as belonging to events).
• Participate in MediaEval SED 2012!
#19
![Page 20: Social Event Detection using Multimodal Clustering and Integrating Supervisory Signals](https://reader034.fdocuments.in/reader034/viewer/2022042607/5550b531b4c905fa618b4a5c/html5/thumbnails/20.jpg)
mklab.iti.gr socialsensor.eu
Acknowledgement
#20
![Page 21: Social Event Detection using Multimodal Clustering and Integrating Supervisory Signals](https://reader034.fdocuments.in/reader034/viewer/2022042607/5550b531b4c905fa618b4a5c/html5/thumbnails/21.jpg)
mklab.iti.gr socialsensor.eu #21
Questions
Further contact: [email protected] / [email protected] Follow: @socialsensor_ip @sympapadopoulos @kompats
![Page 22: Social Event Detection using Multimodal Clustering and Integrating Supervisory Signals](https://reader034.fdocuments.in/reader034/viewer/2022042607/5550b531b4c905fa618b4a5c/html5/thumbnails/22.jpg)
mklab.iti.gr socialsensor.eu #22
Previous Work (1)
• Multimodal spectral clustering X. Cai, F. Nie, H. Huang, F. Kamangar (2011) Heterogeneous image
feature integration via multi-modal spectral clustering. In IEEE conf. on Computer Vision and Pattern Recognition (CVPR), pp. 1977-1984
• Probabilistic Bayesian network approach V. Khalidov, F. Forbes, R.P. Horaud (2011) Conjugate mixture models for
clustering multimodal data. In Neural Computation, 23(2):517–557
• Combinatorial Markov Random Fields R. Bekkerman, J. Jeon (2007) Multi-modal clustering for multimedia
collections. In IEEE conference on Computer Vision and Pattern Recognition (CVPR), pp. 1-8
![Page 23: Social Event Detection using Multimodal Clustering and Integrating Supervisory Signals](https://reader034.fdocuments.in/reader034/viewer/2022042607/5550b531b4c905fa618b4a5c/html5/thumbnails/23.jpg)
mklab.iti.gr socialsensor.eu #23
Previous Work (2)
• MediaEval SED 2011 M. Brenner, E. Izquierdo (2011) Mediaeval benchmark: Social event
detection in collaborative photo collections. In MediaEval SED.
X. Liu, B. Huet, R. Troncy (2011) Eurecom @ MediaEval 2011 social event detection task. In MediaEval SED.
S. Papadopoulos, C. Zigkolis, Y. Kompatsiaris, A. Vakali (2011) CERTH @ MediaEval 2011 social event detection task. In MediaEval SED.
M. Ruocco, H. Ramampiaro (2011) NTNU @ MediaEval 2011 social event detection task. In MediaEval SED.
Y. Wang, L. Xie, H. Sundaram (2011) Social event detection with clustering and filtering. In MediaEval SED.