Detecting Non-Gaussian Geographical Topics in Tagged Photo Collections

33
Institute for Web Science & Technologies University of Koblenz ▪ Landau, Germany Detecting Non-Gaussian Geographical Topics in Tagged Photo Collections Christoph Carl Kling , Jérôme Kunegis, Sergej Sizov, Steffen Staab

description

Nowadays, large collections of photos are tagged with GPS coordinates. The modelling of such large geo-tagged corpora is an important problem in data mining and information re- trieval, and involves the use of geographical information to detect topics with a spatial component. In this paper, we propose a novel geographical topic model which captures dependencies between geographical regions to support the detection of topics with complex, non-Gaussian distributed spatial structures. The model is based on a multi-Dirichlet process (MDP), a novel generalisation of the hierarchical Dirichlet process extended to support multiple base distributions. Our method thus is called the MDP-based geographical topic model (MGTM). We show how to use a MDP to dynamically smooth topic distributions between groups of spatially adjacent documents. In systematic quantitative and qualitative evaluations using independent datasets from prior related work, we show that such a model can exploit the adjacency of regions and leads to a significant improvement in the quality of topics compared to the state of the art in geographical topic modelling.

Transcript of Detecting Non-Gaussian Geographical Topics in Tagged Photo Collections

Page 1: Detecting Non-Gaussian Geographical Topics in Tagged Photo Collections

Institute for Web Science & Technologies University of Koblenz ▪ Landau, Germany

Detecting Non-Gaussian Geographical Topicsin Tagged Photo Collections

Christoph Carl Kling, Jérôme Kunegis, Sergej Sizov, Steffen Staab

Page 2: Detecting Non-Gaussian Geographical Topics in Tagged Photo Collections

Detecting Non-Gaussian Geographical Topics 2Christoph Carl Kling

Outline

1) Motivation2) Existing approaches3) Our approach4) Evaluation

Page 3: Detecting Non-Gaussian Geographical Topics in Tagged Photo Collections

Detecting Non-Gaussian Geographical Topics 3Christoph Carl Kling

Motivation

Page 4: Detecting Non-Gaussian Geographical Topics in Tagged Photo Collections

Detecting Non-Gaussian Geographical Topics 4Christoph Carl Kling

Topics in topic modelling:

Latent variables that explain the co-occurrence of wordsin documents.

Page 5: Detecting Non-Gaussian Geographical Topics in Tagged Photo Collections

Detecting Non-Gaussian Geographical Topics 5Christoph Carl Kling

Topics in topic modelling:

Latent variables that explain the co-occurrence of wordsin documents.

Geographical topics:

Latent variables that explain the co-occurrence of wordsboth in documents and in the geographical space.

Page 6: Detecting Non-Gaussian Geographical Topics in Tagged Photo Collections

Detecting Non-Gaussian Geographical Topics 6Christoph Carl Kling

fish, rice

seafood, fish seafood, shrimp lobster, wine

seafood, fish, salmon

fish, salmon, wine

seafood, shrimp

lobster, seafood, shrimp

coffee

coffee, wine

coffee

wine

wine

pizza, wine

pizza, wine

pasta, wine

pasta, shrimp

lobster, shrimp

seafood, shrimp

Tagged photographies with geo-coordinates

Page 7: Detecting Non-Gaussian Geographical Topics in Tagged Photo Collections

Detecting Non-Gaussian Geographical Topics 7Christoph Carl Kling

fish, rice

seafood, fish seafood, shrimp lobster, wine

seafood, fish, salmon

fish, salmon, wine

seafood, shrimp

lobster, seafood, shrimp

coffee

coffee, wine

coffeeitalian, wine

wine

pizza, wine

italian, pizza, wine

pasta, wine

pasta, shrimp

seafoodfishlobstershrimpcrabwinesalmon

winepizzacoffeeitalianpasta

seafood, shrimp

lobster, shrimp

Page 8: Detecting Non-Gaussian Geographical Topics in Tagged Photo Collections

Detecting Non-Gaussian Geographical Topics 8Christoph Carl Kling

Existing Approaches

Page 9: Detecting Non-Gaussian Geographical Topics in Tagged Photo Collections

Detecting Non-Gaussian Geographical Topics 9Christoph Carl Kling

fish, rice

lobster, shrimp

seafood, fish seafood, shrimp lobster, wine

seafood, fish, salmon

seafood, shrimp

fish, salmon, wine

seafood, shrimp

lobster, seafood, shrimp

coffee

coffee, wine

coffeeitalian, wine

wine

pizza, wine

italian, pizza, wine

pasta, wine

pasta, shrimp

shrimpfishriceseafoodlobster

winepizzacoffeeitalianpasta

fishseafoodsalmonshrimpwine

seafoodshrimplobster

lobsterseafoodfishsalmonwineGeoFolk, S. Sizov 2010

Page 10: Detecting Non-Gaussian Geographical Topics in Tagged Photo Collections

Detecting Non-Gaussian Geographical Topics 10Christoph Carl Kling

fish, rice

lobster, shrimp

seafood, fish seafood, shrimp lobster, wine

seafood, fish, salmon

seafood, shrimp

fish, salmon, wine

seafood, shrimp

lobster, seafood, shrimp

coffee

coffee, wine

coffeeitalian, wine

wine

pizza, wine

italian, pizza, wine

pasta, wine

pasta, shrimp

seafoodfishlobstershrimpcrabwinesalmon

winepizzacoffeeitalianpasta

LGTA, Z. Yin et al., 2011

Page 11: Detecting Non-Gaussian Geographical Topics in Tagged Photo Collections

Detecting Non-Gaussian Geographical Topics 11Christoph Carl Kling

fish, rice

lobster, shrimp

seafood, fish seafood, shrimp lobster, wine

seafood, fish, salmon

seafood, shrimp

fish, salmon, wine

seafood, shrimp

lobster, seafood, shrimp

coffee

coffee, wine

coffeeitalian, wine

wine

pizza, wine

italian, pizza, wine

pasta, wine

pasta, shrimp

seafoodfishlobstershrimpcrabwinesalmon

winepizzacoffeeitalianpasta

A. Ahmed, L. Hong and A. Smola, 2013

Page 12: Detecting Non-Gaussian Geographical Topics in Tagged Photo Collections

Detecting Non-Gaussian Geographical Topics 12Christoph Carl Kling

Our Approach

Page 13: Detecting Non-Gaussian Geographical Topics in Tagged Photo Collections

Detecting Non-Gaussian Geographical Topics 13Christoph Carl Kling

Cultural areas, country borders, geographical features and other geographical observations exhibit complex spatial distributions

wikipedia.org

Page 14: Detecting Non-Gaussian Geographical Topics in Tagged Photo Collections

Detecting Non-Gaussian Geographical Topics 17Christoph Carl Kling

fish, rice

lobster, shrimp

seafood, fish seafood, shrimp lobster, wine

seafood, fish, salmon

seafood, shrimp

fish, salmon, wine

seafood, shrimp

lobster, seafood, shrimp

coffee

coffee, wine

coffeeitalian, wine

wine

pizza, wine

italian, pizza, wine

pasta, wine

pasta, shrimp

Clustering:E.g. mixture of Gaussian/Fisher distributions

Page 15: Detecting Non-Gaussian Geographical Topics in Tagged Photo Collections

Detecting Non-Gaussian Geographical Topics 18Christoph Carl Kling

fish, rice

lobster, shrimp

seafood, fish seafood, shrimp lobster, wine

seafood, fish, salmon

seafood, shrimp

fish, salmon, wine

seafood, shrimp

lobster, seafood, shrimp

coffee

coffee, wine

coffeeitalian, wine

wine

pizza, wine

italian, pizza, wine

pasta, wine

pasta, shrimp

seafoodfishlobstershrimpcrabwinesalmon

winepizzacoffeeitalianpasta

Page 16: Detecting Non-Gaussian Geographical Topics in Tagged Photo Collections

Detecting Non-Gaussian Geographical Topics 19Christoph Carl Kling

Page 17: Detecting Non-Gaussian Geographical Topics in Tagged Photo Collections

Detecting Non-Gaussian Geographical Topics 20Christoph Carl Kling

Adjacency:Delaunay triangulationK-NN…

Page 18: Detecting Non-Gaussian Geographical Topics in Tagged Photo Collections

Detecting Non-Gaussian Geographical Topics 21Christoph Carl Kling

fish, rice

lobster, shrimp

seafood, fish seafood, shrimp lobster, wine

seafood, fish, salmon

seafood, shrimp

fish, salmon, wine

seafood, shrimp

lobster, seafood, shrimp

coffee

coffee, wine

coffeeitalian, wine

wine

pizza, wine

italian, pizza, wine

pasta, wine

pasta, shrimp

seafoodfishlobstershrimpcrabwinesalmon

winepizzacoffeeitalianpasta

Page 19: Detecting Non-Gaussian Geographical Topics in Tagged Photo Collections

Detecting Non-Gaussian Geographical Topics 22Christoph Carl Kling

Cluster adjacency Dependencies of document-specific topic distributions

Exchange of topic information between clusters

Page 20: Detecting Non-Gaussian Geographical Topics in Tagged Photo Collections

Detecting Non-Gaussian Geographical Topics 23Christoph Carl Kling

Exchange of topic information between clusters

Page 21: Detecting Non-Gaussian Geographical Topics in Tagged Photo Collections

Detecting Non-Gaussian Geographical Topics 24Christoph Carl Kling

Exchange of topic information between clusters

Page 22: Detecting Non-Gaussian Geographical Topics in Tagged Photo Collections

Detecting Non-Gaussian Geographical Topics 25Christoph Carl Kling

Exchange of topic information between clusters

Page 23: Detecting Non-Gaussian Geographical Topics in Tagged Photo Collections

Detecting Non-Gaussian Geographical Topics 26Christoph Carl Kling

Exchange of topic information between clusters

Page 24: Detecting Non-Gaussian Geographical Topics in Tagged Photo Collections

Detecting Non-Gaussian Geographical Topics 27Christoph Carl Kling

γ

M N

L

H

G

G

α0

G

Al

j

0

θjn

w

η s

d

l

δl

L: #regionsM: #documents in clusterN: #words in documentG :⁰ Global topic distributionG : Cluster-topic distributionG : Document-topic distribution

s

d

MGTM

Page 25: Detecting Non-Gaussian Geographical Topics in Tagged Photo Collections

Detecting Non-Gaussian Geographical Topics 28Christoph Carl Kling

Evaluation

Page 26: Detecting Non-Gaussian Geographical Topics in Tagged Photo Collections

Detecting Non-Gaussian Geographical Topics 29Christoph Carl Kling

Datasets

Activities: 1.931 photosLandscape: 5.791 photosManhattan: 28.922 photosCar: 34.707 photosFood: 151.747 photos

LGTA, Z. Yin et al., 2011

Page 27: Detecting Non-Gaussian Geographical Topics in Tagged Photo Collections

Detecting Non-Gaussian Geographical Topics 30Christoph Carl Kling

Compared models:

- LGTA: Model with regions- Basic model: 3-level Hierarchical Dirichlet Process- MGTM: Basic model plus dynamically

smoothed adjacent regions

Page 28: Detecting Non-Gaussian Geographical Topics in Tagged Photo Collections

Detecting Non-Gaussian Geographical Topics 31Christoph Carl Kling

manhattan (100 regions) landscape (200 regions)

activities (300 regions) car (500 regions) food (1000 regions)

Word Perplexity

Page 29: Detecting Non-Gaussian Geographical Topics in Tagged Photo Collections

Detecting Non-Gaussian Geographical Topics 32Christoph Carl Kling

User Study

Food dataset (1000 regions)31 participantsTask: intrusion detectionMeasure: precision

4 topicsavg / median

6 topicsavg / median

8 topicsavg / median

LGTA 0.67 / 0.64 0.57 / 0.57 0.60 / 0.58

Basic model 0.45 / 0.57 0.63 / 0.61 0.64 / 0.58

MGTM 0.79 / 0.80 0.82 / 0.81 0.78 / 0.75

Page 30: Detecting Non-Gaussian Geographical Topics in Tagged Photo Collections

Detecting Non-Gaussian Geographical Topics 33Christoph Carl Kling

west.uni-koblenz.deResearch → systems → MGTM

west.uni-koblenz.de liveandgov.eu

Page 31: Detecting Non-Gaussian Geographical Topics in Tagged Photo Collections

Detecting Non-Gaussian Geographical Topics 34Christoph Carl Kling

Thank you!Questions?

Contact: [email protected]

Page 32: Detecting Non-Gaussian Geographical Topics in Tagged Photo Collections

Detecting Non-Gaussian Geographical Topics 35Christoph Carl Kling

Summary

• Geographical topics often exhibit a complex spatial distribution

• The detection of such complex topics can be supported

• The dynamic smoothing of adjacent regions leads to an evolutionary creation and spread of topics during inference

Page 33: Detecting Non-Gaussian Geographical Topics in Tagged Photo Collections

Detecting Non-Gaussian Geographical Topics 36Christoph Carl Kling

ReferencesReferences

Hierarchical Dirichlet processesby: Y. W. Teh, M. I. Jordan, M. J. Beal, and D. M. BleiIn: Journal of the American Statistical Association, Vol. 101 (2006) , p. 1566-1581.

GeoFolk: latent spatial semantics in web 2.0 social media.by: Sergej SizovIn: WSDM ACM (2010) , p. 281-290.

Geographical topic discovery and comparison.by: Zhijun Yin, Liangliang Cao, Jiawei Han, Chengxiang Zhai, and Thomas S. HuangIn: WWW ACM (2011) , p. 247-256.

A Nonparametric Bayesian Model of Multi-Level Category Learning.by: Kevin Robert Canini, and Thomas L. GriffithsIn: AAAI AAAI Press (2011) .