April 2014 SEWM 20141 Event Detection from Social Media: User-centric Parallel Split-n-merge and...
-
Upload
anabel-gray -
Category
Documents
-
view
213 -
download
0
Transcript of April 2014 SEWM 20141 Event Detection from Social Media: User-centric Parallel Split-n-merge and...
April 2014 SEWM 2014 1
Event Detection from Social Media:
User-centric Parallel Split-n-merge and
Composite Kernel
Truc-Vien T. Nguyen, Lugano University, Swiss
Minh-Son Dao, University of Information Technology, Vietnam
Riccardo Mattivi, Trento University, Italy
Francesco G.B. De Natale, Trento University, Italy
SEWM – ICMR – 2014Glasgow, UK
Outline
Social Event and Web-media User-centric Parallel Split-n-merge for Events
Clustering Composite Kernel for Event Classification Ongoing work Conclusion
April 2014 SEWM 2014 2
April 2014 SEWM 2014 3
- Tsunami- Miyagi, Japan- Mar 11, 2011
- Tsunami- Miyagi, Japan- Mar 11, 2011
Observations Time-Location: Users cannot attend two events at the same
time at different places whose locations are far away each other
Theme: Users in the same community tend to TAG the same
event with similar words Users tend to take series of images in a short interval
time for what they pay attention Images related to an event of a given type share some
common visual features that are characteristic for that event type
Spatio-Temporal-Theme
April 2014 SEWM 2014 4
User-centric Parallel Split-n-merge
April 2014 SEWM 2014 5
Web media collection A crawled from
Social Networks
Convert A to UT-image
Split each row of UT-image into
clusters {bi}
Merge {bi} using {location, time,
theme}
Merge {bi} using {location, time,
theme} and Common-sense
Merge {bi} using visual information
UT-Image
April 2014 SEWM 2014 6
photo_url
usernamedateTaken
titledescription
tagslocations
user
s
timeSort by time for each row. Those pixels (in the samerow) do not have time will be grouped and put atthe beginningof the row
Split by TIME
April 2014 Truc-Vien T. Nguyen 7
If no time information, each pixel is treated as one cluster
If there is time information
Merge by spatio-time-theme
April 2014 Truc-Vien T. Nguyen 8
for selected cluster bk, create-time-taken-boundary Tk
-Location-union Lk
-Document (tag, title, description) Dk
for any pair of clusters (bk, bl), merge if 2/3 following conditions are hold-Tdistance(Tk, Tl) ≤α-Ldistance(Lk,Ll) ≤ β-JaccardIndex(Dk, Dl) ≥ γ
Merge by common-sense
April 2014 Truc-Vien T. Nguyen 9
Process tf-idf on Dk and select the most COMMON key-words to create NDk
With any pair of cluster (bk,bl), merge ifJaccardIndex(NDk, NDl) ≥ γ
Merge with Visual features
April 2014 Truc-Vien T. Nguyen 10
with any pair of cluster (bk, bl), merge ifJaccardIndex(BoWk, BoWl) ≥ θ
Results – Events clustering
April 2014 SEWM 2014 11
MediaEval 2013 dataset and participants
Result - Events Clustering
April 2014 SEWM 2014 12
- The first run (Split, Merge by spatio-location-them) α=24 hours, β=5km, γ=0.2
- The second run (as the first) α=8 hours, β=2km, γ=0.2- The third run (as the first plus common-sense merging)- The last run, as the third plus visual feature θ= 0.3
April 2014 SEWM 2014 13
Classification Problems
Supervised Learning: learn a function : → from examples niii yx 1 ,
Binary Classification: = {-1, +1}
Multi-class Classification: = {1,2,…,k}
Event Classification: Each member of has a set of features
April 2014 SEWM 2014 14
SVM- Multiclass Classification
Support Vector Machines (SVMs) Binary classification Computing a function (Kernel) between
each pair of samples One Vs.
Rest
Multi-class Classification
April 2014 SEWM 2014 15
Event Categories
Class Event Type
0 Conference
1 Fashion
2 Concert
3 Non_event
4 Sports
5 Protest
6 Other
7 Exhibition
8 Theater_dance
April 2014 SEWM 2014 16
Composite Kernel
text features
Coefficient
visual features
212121 ,1),(),( EEKEEKEECK VT
April 2014 SEWM 2014 17
Text Features
NLP basic features: the word, its lower-case, four prefixes, four suffixes, orthographic feature, word form feature.
Ontological features: obtained by matching wi with a knowledge base, for ex. “Washington”->City
Encyclopedic features: obtained by associating wi with Wikipedia, for ex. “Washington”-> http://en.wikipedia.org/wiki/Washington,_D.C.
An excerpt from the ontology
April 2014 SEWM 2014 18
Visual Features
April 2014 SEWM 2014 19
- Dense RGB-SIFT- SVM with histogram intersection kernel- the SVMs have been trained with the images given in the SED training set- codebook for the bag of words with 4096 visual words
Results – Events Classification
April 2014 SEWM 2014 20
Run with test-set
cross-validation on the training set
cross-validation on the training set
Ongoing work
April 2014 SEWM 2014 21
Events clustering
Web media
Events classificationTrainingdata
- Set of instances of events- Have ability of automatically annotating events- Extend to “automatically annotation images”
Topic modeling(apply on set of document Dk)
name clusters
classifiers
events
Improve events clustering qualification
Conclusion
April 2014 SEWM 2014 22
1. Event clustering- Simple and easy to develop- Can develop to run on parallel mode- Need to find the way to automatically adjust parameters
2. Event classification- Composite kernel combined both text and visual features- The combination has proved its robustness with a significant
improvement in performance (from 45.83% to 53.58% with basic features, and from 47.61% to 54.86% with our new features)
- Encyclopedic knowledge such as Wikipedia, could provide a great additional resource
Thanks for your attention
April 2014 SEWM 2014 23
Q & A
April 2014 Truc-Vien T. Nguyen 24
Features
wi is text of the title, description, or the tag in each event
li is the word wi in lower-case
p1i, p2i, p3i, p4i are the four prefixes of wi
s1i, s2i, s3i, s4i are the four suffixes of wi
fi is the part-of-speech of wi
gi is the orthographic feature that test whether a word contains all upper-cased, initial letter upper-cased, all lower-cased.
ki is the word form feature that test whether a token is a word, a number, a symbol, a punctuation mark.
oi is the ontological features. We used an ontology and knowledge base that contains 355 classes, 99 properties, and more than 100,000 entities. Given a full ontology, wi is be matched to the deepest subsumed child class.