April 2014 SEWM 20141 Event Detection from Social Media: User-centric Parallel Split-n-merge and...

April 2014 SEWM 2014 1

Event Detection from Social Media:

User-centric Parallel Split-n-merge and

Composite Kernel

Truc-Vien T. Nguyen, Lugano University, Swiss

Minh-Son Dao, University of Information Technology, Vietnam

Riccardo Mattivi, Trento University, Italy

Francesco G.B. De Natale, Trento University, Italy

SEWM – ICMR – 2014Glasgow, UK

Outline

Social Event and Web-media User-centric Parallel Split-n-merge for Events

Clustering Composite Kernel for Event Classification Ongoing work Conclusion



- Tsunami- Miyagi, Japan- Mar 11, 2011

- Tsunami- Miyagi, Japan- Mar 11, 2011

Observations Time-Location: Users cannot attend two events at the same

time at different places whose locations are far away each other

Theme: Users in the same community tend to TAG the same

event with similar words Users tend to take series of images in a short interval

time for what they pay attention Images related to an event of a given type share some

common visual features that are characteristic for that event type

Spatio-Temporal-Theme


User-centric Parallel Split-n-merge


Web media collection A crawled from

Social Networks

Convert A to UT-image

Split each row of UT-image into

clusters {bi}

Merge {bi} using {location, time,

theme}

Merge {bi} using {location, time,

theme} and Common-sense

Merge {bi} using visual information

UT-Image


photo_url

usernamedateTaken

titledescription

tagslocations

user

s

timeSort by time for each row. Those pixels (in the samerow) do not have time will be grouped and put atthe beginningof the row

Split by TIME

April 2014 Truc-Vien T. Nguyen 7

If no time information, each pixel is treated as one cluster

If there is time information

Merge by spatio-time-theme


for selected cluster bk, create-time-taken-boundary Tk

-Location-union Lk

-Document (tag, title, description) Dk

for any pair of clusters (bk, bl), merge if 2/3 following conditions are hold-Tdistance(Tk, Tl) ≤α-Ldistance(Lk,Ll) ≤ β-JaccardIndex(Dk, Dl) ≥ γ

Merge by common-sense


Process tf-idf on Dk and select the most COMMON key-words to create NDk

With any pair of cluster (bk,bl), merge ifJaccardIndex(NDk, NDl) ≥ γ

Merge with Visual features


with any pair of cluster (bk, bl), merge ifJaccardIndex(BoWk, BoWl) ≥ θ

Results – Events clustering

April 2014 SEWM 2014 11

MediaEval 2013 dataset and participants

Result - Events Clustering

April 2014 SEWM 2014 12

- The first run (Split, Merge by spatio-location-them) α=24 hours, β=5km, γ=0.2

- The second run (as the first) α=8 hours, β=2km, γ=0.2- The third run (as the first plus common-sense merging)- The last run, as the third plus visual feature θ= 0.3

April 2014 SEWM 2014 13

Classification Problems

Supervised Learning: learn a function : → from examples niii yx 1 ,

Binary Classification: = {-1, +1}

Multi-class Classification: = {1,2,…,k}

Event Classification: Each member of has a set of features

April 2014 SEWM 2014 14

SVM- Multiclass Classification

Support Vector Machines (SVMs) Binary classification Computing a function (Kernel) between

each pair of samples One Vs.

Rest

Multi-class Classification

April 2014 SEWM 2014 15

Event Categories

Class Event Type

0 Conference

1 Fashion

2 Concert

3 Non_event

4 Sports

5 Protest

6 Other

7 Exhibition

8 Theater_dance

April 2014 SEWM 2014 16

Composite Kernel

text features

Coefficient

visual features

212121 ,1),(),( EEKEEKEECK VT

April 2014 SEWM 2014 17

Text Features

NLP basic features: the word, its lower-case, four prefixes, four suffixes, orthographic feature, word form feature.

Ontological features: obtained by matching wi with a knowledge base, for ex. “Washington”->City

Encyclopedic features: obtained by associating wi with Wikipedia, for ex. “Washington”-> http://en.wikipedia.org/wiki/Washington,_D.C.

An excerpt from the ontology

April 2014 SEWM 2014 18

Visual Features

April 2014 SEWM 2014 19

- Dense RGB-SIFT- SVM with histogram intersection kernel- the SVMs have been trained with the images given in the SED training set- codebook for the bag of words with 4096 visual words

Results – Events Classification

April 2014 SEWM 2014 20

Run with test-set

cross-validation on the training set

cross-validation on the training set

Ongoing work

April 2014 SEWM 2014 21

Events clustering

Web media

Events classificationTrainingdata

- Set of instances of events- Have ability of automatically annotating events- Extend to “automatically annotation images”

Topic modeling(apply on set of document Dk)

name clusters

classifiers

events

Improve events clustering qualification

Conclusion

April 2014 SEWM 2014 22

1. Event clustering- Simple and easy to develop- Can develop to run on parallel mode- Need to find the way to automatically adjust parameters

2. Event classification- Composite kernel combined both text and visual features- The combination has proved its robustness with a significant

improvement in performance (from 45.83% to 53.58% with basic features, and from 47.61% to 54.86% with our new features)

- Encyclopedic knowledge such as Wikipedia, could provide a great additional resource

Thanks for your attention

April 2014 SEWM 2014 23

Q & A


Features

wi is text of the title, description, or the tag in each event

li is the word wi in lower-case

p1i, p2i, p3i, p4i are the four prefixes of wi

s1i, s2i, s3i, s4i are the four suffixes of wi

fi is the part-of-speech of wi

gi is the orthographic feature that test whether a word contains all upper-cased, initial letter upper-cased, all lower-cased.

ki is the word form feature that test whether a token is a word, a number, a symbol, a punctuation mark.

oi is the ontological features. We used an ontology and knowledge base that contains 355 classes, 99 properties, and more than 100,000 entities. Given a full ontology, wi is be matched to the deepest subsumed child class.

April 2014 SEWM 20141 Event Detection from Social Media: User-centric Parallel Split-n-merge and...

Documents

Transcript of April 2014 SEWM 20141 Event Detection from Social Media: User-centric Parallel Split-n-merge and...