April 2014 SEWM 20141 Event Detection from Social Media: User-centric Parallel Split-n-merge and...

24
April 2014 SEWM 2014 1 Event Detection from Social Media: User-centric Parallel Split-n-merge and Composite Kernel Truc-Vien T. Nguyen, Lugano University, Swiss Minh-Son Dao, University of Information Technology, Vietnam Riccardo Mattivi, Trento University, Italy Francesco G.B. De Natale, Trento University, Italy SEWM – ICMR – 2014 Glasgow, UK

Transcript of April 2014 SEWM 20141 Event Detection from Social Media: User-centric Parallel Split-n-merge and...

Page 1: April 2014 SEWM 20141 Event Detection from Social Media: User-centric Parallel Split-n-merge and Composite Kernel  Truc-Vien T. Nguyen, Lugano University,

April 2014 SEWM 2014 1

Event Detection from Social Media:

User-centric Parallel Split-n-merge and

Composite Kernel

Truc-Vien T. Nguyen, Lugano University, Swiss

Minh-Son Dao, University of Information Technology, Vietnam

Riccardo Mattivi, Trento University, Italy

Francesco G.B. De Natale, Trento University, Italy

SEWM – ICMR – 2014Glasgow, UK

Page 2: April 2014 SEWM 20141 Event Detection from Social Media: User-centric Parallel Split-n-merge and Composite Kernel  Truc-Vien T. Nguyen, Lugano University,

Outline

Social Event and Web-media User-centric Parallel Split-n-merge for Events

Clustering Composite Kernel for Event Classification Ongoing work Conclusion

April 2014 SEWM 2014 2

Page 3: April 2014 SEWM 20141 Event Detection from Social Media: User-centric Parallel Split-n-merge and Composite Kernel  Truc-Vien T. Nguyen, Lugano University,

April 2014 SEWM 2014 3

- Tsunami- Miyagi, Japan- Mar 11, 2011

- Tsunami- Miyagi, Japan- Mar 11, 2011

Page 4: April 2014 SEWM 20141 Event Detection from Social Media: User-centric Parallel Split-n-merge and Composite Kernel  Truc-Vien T. Nguyen, Lugano University,

Observations Time-Location: Users cannot attend two events at the same

time at different places whose locations are far away each other

Theme: Users in the same community tend to TAG the same

event with similar words Users tend to take series of images in a short interval

time for what they pay attention Images related to an event of a given type share some

common visual features that are characteristic for that event type

Spatio-Temporal-Theme

April 2014 SEWM 2014 4

Page 5: April 2014 SEWM 20141 Event Detection from Social Media: User-centric Parallel Split-n-merge and Composite Kernel  Truc-Vien T. Nguyen, Lugano University,

User-centric Parallel Split-n-merge

April 2014 SEWM 2014 5

Web media collection A crawled from

Social Networks

Convert A to UT-image

Split each row of UT-image into

clusters {bi}

Merge {bi} using {location, time,

theme}

Merge {bi} using {location, time,

theme} and Common-sense

Merge {bi} using visual information

Page 6: April 2014 SEWM 20141 Event Detection from Social Media: User-centric Parallel Split-n-merge and Composite Kernel  Truc-Vien T. Nguyen, Lugano University,

UT-Image

April 2014 SEWM 2014 6

photo_url

usernamedateTaken

titledescription

tagslocations

user

s

timeSort by time for each row. Those pixels (in the samerow) do not have time will be grouped and put atthe beginningof the row

Page 7: April 2014 SEWM 20141 Event Detection from Social Media: User-centric Parallel Split-n-merge and Composite Kernel  Truc-Vien T. Nguyen, Lugano University,

Split by TIME

April 2014 Truc-Vien T. Nguyen 7

If no time information, each pixel is treated as one cluster

If there is time information

Page 8: April 2014 SEWM 20141 Event Detection from Social Media: User-centric Parallel Split-n-merge and Composite Kernel  Truc-Vien T. Nguyen, Lugano University,

Merge by spatio-time-theme

April 2014 Truc-Vien T. Nguyen 8

for selected cluster bk, create-time-taken-boundary Tk

-Location-union Lk

-Document (tag, title, description) Dk

for any pair of clusters (bk, bl), merge if 2/3 following conditions are hold-Tdistance(Tk, Tl) ≤α-Ldistance(Lk,Ll) ≤ β-JaccardIndex(Dk, Dl) ≥ γ

Page 9: April 2014 SEWM 20141 Event Detection from Social Media: User-centric Parallel Split-n-merge and Composite Kernel  Truc-Vien T. Nguyen, Lugano University,

Merge by common-sense

April 2014 Truc-Vien T. Nguyen 9

Process tf-idf on Dk and select the most COMMON key-words to create NDk

With any pair of cluster (bk,bl), merge ifJaccardIndex(NDk, NDl) ≥ γ

Page 10: April 2014 SEWM 20141 Event Detection from Social Media: User-centric Parallel Split-n-merge and Composite Kernel  Truc-Vien T. Nguyen, Lugano University,

Merge with Visual features

April 2014 Truc-Vien T. Nguyen 10

with any pair of cluster (bk, bl), merge ifJaccardIndex(BoWk, BoWl) ≥ θ

Page 11: April 2014 SEWM 20141 Event Detection from Social Media: User-centric Parallel Split-n-merge and Composite Kernel  Truc-Vien T. Nguyen, Lugano University,

Results – Events clustering

April 2014 SEWM 2014 11

MediaEval 2013 dataset and participants

Page 12: April 2014 SEWM 20141 Event Detection from Social Media: User-centric Parallel Split-n-merge and Composite Kernel  Truc-Vien T. Nguyen, Lugano University,

Result - Events Clustering

April 2014 SEWM 2014 12

- The first run (Split, Merge by spatio-location-them) α=24 hours, β=5km, γ=0.2

- The second run (as the first) α=8 hours, β=2km, γ=0.2- The third run (as the first plus common-sense merging)- The last run, as the third plus visual feature θ= 0.3

Page 13: April 2014 SEWM 20141 Event Detection from Social Media: User-centric Parallel Split-n-merge and Composite Kernel  Truc-Vien T. Nguyen, Lugano University,

April 2014 SEWM 2014 13

Classification Problems

Supervised Learning: learn a function : → from examples niii yx 1 ,

Binary Classification: = {-1, +1}

Multi-class Classification: = {1,2,…,k}

Event Classification: Each member of has a set of features

Page 14: April 2014 SEWM 20141 Event Detection from Social Media: User-centric Parallel Split-n-merge and Composite Kernel  Truc-Vien T. Nguyen, Lugano University,

April 2014 SEWM 2014 14

SVM- Multiclass Classification

Support Vector Machines (SVMs) Binary classification Computing a function (Kernel) between

each pair of samples One Vs.

Rest

Multi-class Classification

Page 15: April 2014 SEWM 20141 Event Detection from Social Media: User-centric Parallel Split-n-merge and Composite Kernel  Truc-Vien T. Nguyen, Lugano University,

April 2014 SEWM 2014 15

Event Categories

Class Event Type

0 Conference

1 Fashion

2 Concert

3 Non_event

4 Sports

5 Protest

6 Other

7 Exhibition

8 Theater_dance

Page 16: April 2014 SEWM 20141 Event Detection from Social Media: User-centric Parallel Split-n-merge and Composite Kernel  Truc-Vien T. Nguyen, Lugano University,

April 2014 SEWM 2014 16

Composite Kernel

text features

Coefficient

visual features

212121 ,1),(),( EEKEEKEECK VT

Page 17: April 2014 SEWM 20141 Event Detection from Social Media: User-centric Parallel Split-n-merge and Composite Kernel  Truc-Vien T. Nguyen, Lugano University,

April 2014 SEWM 2014 17

Text Features

NLP basic features: the word, its lower-case, four prefixes, four suffixes, orthographic feature, word form feature.

Ontological features: obtained by matching wi with a knowledge base, for ex. “Washington”->City

Encyclopedic features: obtained by associating wi with Wikipedia, for ex. “Washington”-> http://en.wikipedia.org/wiki/Washington,_D.C.

Page 18: April 2014 SEWM 20141 Event Detection from Social Media: User-centric Parallel Split-n-merge and Composite Kernel  Truc-Vien T. Nguyen, Lugano University,

An excerpt from the ontology

April 2014 SEWM 2014 18

Page 19: April 2014 SEWM 20141 Event Detection from Social Media: User-centric Parallel Split-n-merge and Composite Kernel  Truc-Vien T. Nguyen, Lugano University,

Visual Features

April 2014 SEWM 2014 19

- Dense RGB-SIFT- SVM with histogram intersection kernel- the SVMs have been trained with the images given in the SED training set- codebook for the bag of words with 4096 visual words

Page 20: April 2014 SEWM 20141 Event Detection from Social Media: User-centric Parallel Split-n-merge and Composite Kernel  Truc-Vien T. Nguyen, Lugano University,

Results – Events Classification

April 2014 SEWM 2014 20

Run with test-set

cross-validation on the training set

cross-validation on the training set

Page 21: April 2014 SEWM 20141 Event Detection from Social Media: User-centric Parallel Split-n-merge and Composite Kernel  Truc-Vien T. Nguyen, Lugano University,

Ongoing work

April 2014 SEWM 2014 21

Events clustering

Web media

Events classificationTrainingdata

- Set of instances of events- Have ability of automatically annotating events- Extend to “automatically annotation images”

Topic modeling(apply on set of document Dk)

name clusters

classifiers

events

Improve events clustering qualification

Page 22: April 2014 SEWM 20141 Event Detection from Social Media: User-centric Parallel Split-n-merge and Composite Kernel  Truc-Vien T. Nguyen, Lugano University,

Conclusion

April 2014 SEWM 2014 22

1. Event clustering- Simple and easy to develop- Can develop to run on parallel mode- Need to find the way to automatically adjust parameters

2. Event classification- Composite kernel combined both text and visual features- The combination has proved its robustness with a significant

improvement in performance (from 45.83% to 53.58% with basic features, and from 47.61% to 54.86% with our new features)

- Encyclopedic knowledge such as Wikipedia, could provide a great additional resource

Page 23: April 2014 SEWM 20141 Event Detection from Social Media: User-centric Parallel Split-n-merge and Composite Kernel  Truc-Vien T. Nguyen, Lugano University,

Thanks for your attention

April 2014 SEWM 2014 23

Q & A

Page 24: April 2014 SEWM 20141 Event Detection from Social Media: User-centric Parallel Split-n-merge and Composite Kernel  Truc-Vien T. Nguyen, Lugano University,

April 2014 Truc-Vien T. Nguyen 24

Features

wi is text of the title, description, or the tag in each event

li is the word wi in lower-case

p1i, p2i, p3i, p4i are the four prefixes of wi

s1i, s2i, s3i, s4i are the four suffixes of wi

fi is the part-of-speech of wi

gi is the orthographic feature that test whether a word contains all upper-cased, initial letter upper-cased, all lower-cased.

ki is the word form feature that test whether a token is a word, a number, a symbol, a punctuation mark.

oi is the ontological features. We used an ontology and knowledge base that contains 355 classes, 99 properties, and more than 100,000 entities. Given a full ontology, wi is be matched to the deepest subsumed child class.