Detecting Non-Gaussian Geographical Topics in Tagged Photo Collections
description
Transcript of Detecting Non-Gaussian Geographical Topics in Tagged Photo Collections
Institute for Web Science & Technologies University of Koblenz ▪ Landau, Germany
Detecting Non-Gaussian Geographical Topicsin Tagged Photo Collections
Christoph Carl Kling, Jérôme Kunegis, Sergej Sizov, Steffen Staab
Detecting Non-Gaussian Geographical Topics 2Christoph Carl Kling
Outline
1) Motivation2) Existing approaches3) Our approach4) Evaluation
Detecting Non-Gaussian Geographical Topics 3Christoph Carl Kling
Motivation
Detecting Non-Gaussian Geographical Topics 4Christoph Carl Kling
Topics in topic modelling:
Latent variables that explain the co-occurrence of wordsin documents.
Detecting Non-Gaussian Geographical Topics 5Christoph Carl Kling
Topics in topic modelling:
Latent variables that explain the co-occurrence of wordsin documents.
Geographical topics:
Latent variables that explain the co-occurrence of wordsboth in documents and in the geographical space.
Detecting Non-Gaussian Geographical Topics 6Christoph Carl Kling
fish, rice
seafood, fish seafood, shrimp lobster, wine
seafood, fish, salmon
fish, salmon, wine
seafood, shrimp
lobster, seafood, shrimp
coffee
coffee, wine
coffee
wine
wine
pizza, wine
pizza, wine
pasta, wine
pasta, shrimp
lobster, shrimp
seafood, shrimp
Tagged photographies with geo-coordinates
Detecting Non-Gaussian Geographical Topics 7Christoph Carl Kling
fish, rice
seafood, fish seafood, shrimp lobster, wine
seafood, fish, salmon
fish, salmon, wine
seafood, shrimp
lobster, seafood, shrimp
coffee
coffee, wine
coffeeitalian, wine
wine
pizza, wine
italian, pizza, wine
pasta, wine
pasta, shrimp
seafoodfishlobstershrimpcrabwinesalmon
winepizzacoffeeitalianpasta
seafood, shrimp
lobster, shrimp
Detecting Non-Gaussian Geographical Topics 8Christoph Carl Kling
Existing Approaches
Detecting Non-Gaussian Geographical Topics 9Christoph Carl Kling
fish, rice
lobster, shrimp
seafood, fish seafood, shrimp lobster, wine
seafood, fish, salmon
seafood, shrimp
fish, salmon, wine
seafood, shrimp
lobster, seafood, shrimp
coffee
coffee, wine
coffeeitalian, wine
wine
pizza, wine
italian, pizza, wine
pasta, wine
pasta, shrimp
shrimpfishriceseafoodlobster
winepizzacoffeeitalianpasta
fishseafoodsalmonshrimpwine
seafoodshrimplobster
lobsterseafoodfishsalmonwineGeoFolk, S. Sizov 2010
Detecting Non-Gaussian Geographical Topics 10Christoph Carl Kling
fish, rice
lobster, shrimp
seafood, fish seafood, shrimp lobster, wine
seafood, fish, salmon
seafood, shrimp
fish, salmon, wine
seafood, shrimp
lobster, seafood, shrimp
coffee
coffee, wine
coffeeitalian, wine
wine
pizza, wine
italian, pizza, wine
pasta, wine
pasta, shrimp
seafoodfishlobstershrimpcrabwinesalmon
winepizzacoffeeitalianpasta
LGTA, Z. Yin et al., 2011
Detecting Non-Gaussian Geographical Topics 11Christoph Carl Kling
fish, rice
lobster, shrimp
seafood, fish seafood, shrimp lobster, wine
seafood, fish, salmon
seafood, shrimp
fish, salmon, wine
seafood, shrimp
lobster, seafood, shrimp
coffee
coffee, wine
coffeeitalian, wine
wine
pizza, wine
italian, pizza, wine
pasta, wine
pasta, shrimp
seafoodfishlobstershrimpcrabwinesalmon
winepizzacoffeeitalianpasta
A. Ahmed, L. Hong and A. Smola, 2013
Detecting Non-Gaussian Geographical Topics 12Christoph Carl Kling
Our Approach
Detecting Non-Gaussian Geographical Topics 13Christoph Carl Kling
Cultural areas, country borders, geographical features and other geographical observations exhibit complex spatial distributions
wikipedia.org
Detecting Non-Gaussian Geographical Topics 17Christoph Carl Kling
fish, rice
lobster, shrimp
seafood, fish seafood, shrimp lobster, wine
seafood, fish, salmon
seafood, shrimp
fish, salmon, wine
seafood, shrimp
lobster, seafood, shrimp
coffee
coffee, wine
coffeeitalian, wine
wine
pizza, wine
italian, pizza, wine
pasta, wine
pasta, shrimp
Clustering:E.g. mixture of Gaussian/Fisher distributions
Detecting Non-Gaussian Geographical Topics 18Christoph Carl Kling
fish, rice
lobster, shrimp
seafood, fish seafood, shrimp lobster, wine
seafood, fish, salmon
seafood, shrimp
fish, salmon, wine
seafood, shrimp
lobster, seafood, shrimp
coffee
coffee, wine
coffeeitalian, wine
wine
pizza, wine
italian, pizza, wine
pasta, wine
pasta, shrimp
seafoodfishlobstershrimpcrabwinesalmon
winepizzacoffeeitalianpasta
Detecting Non-Gaussian Geographical Topics 19Christoph Carl Kling
Detecting Non-Gaussian Geographical Topics 20Christoph Carl Kling
Adjacency:Delaunay triangulationK-NN…
Detecting Non-Gaussian Geographical Topics 21Christoph Carl Kling
fish, rice
lobster, shrimp
seafood, fish seafood, shrimp lobster, wine
seafood, fish, salmon
seafood, shrimp
fish, salmon, wine
seafood, shrimp
lobster, seafood, shrimp
coffee
coffee, wine
coffeeitalian, wine
wine
pizza, wine
italian, pizza, wine
pasta, wine
pasta, shrimp
seafoodfishlobstershrimpcrabwinesalmon
winepizzacoffeeitalianpasta
Detecting Non-Gaussian Geographical Topics 22Christoph Carl Kling
Cluster adjacency Dependencies of document-specific topic distributions
Exchange of topic information between clusters
Detecting Non-Gaussian Geographical Topics 23Christoph Carl Kling
Exchange of topic information between clusters
Detecting Non-Gaussian Geographical Topics 24Christoph Carl Kling
Exchange of topic information between clusters
Detecting Non-Gaussian Geographical Topics 25Christoph Carl Kling
Exchange of topic information between clusters
Detecting Non-Gaussian Geographical Topics 26Christoph Carl Kling
Exchange of topic information between clusters
Detecting Non-Gaussian Geographical Topics 27Christoph Carl Kling
γ
M N
L
H
G
G
α0
G
Al
j
0
θjn
w
η s
d
l
δl
L: #regionsM: #documents in clusterN: #words in documentG :⁰ Global topic distributionG : Cluster-topic distributionG : Document-topic distribution
s
d
MGTM
Detecting Non-Gaussian Geographical Topics 28Christoph Carl Kling
Evaluation
Detecting Non-Gaussian Geographical Topics 29Christoph Carl Kling
Datasets
Activities: 1.931 photosLandscape: 5.791 photosManhattan: 28.922 photosCar: 34.707 photosFood: 151.747 photos
LGTA, Z. Yin et al., 2011
Detecting Non-Gaussian Geographical Topics 30Christoph Carl Kling
Compared models:
- LGTA: Model with regions- Basic model: 3-level Hierarchical Dirichlet Process- MGTM: Basic model plus dynamically
smoothed adjacent regions
Detecting Non-Gaussian Geographical Topics 31Christoph Carl Kling
manhattan (100 regions) landscape (200 regions)
activities (300 regions) car (500 regions) food (1000 regions)
Word Perplexity
Detecting Non-Gaussian Geographical Topics 32Christoph Carl Kling
User Study
Food dataset (1000 regions)31 participantsTask: intrusion detectionMeasure: precision
4 topicsavg / median
6 topicsavg / median
8 topicsavg / median
LGTA 0.67 / 0.64 0.57 / 0.57 0.60 / 0.58
Basic model 0.45 / 0.57 0.63 / 0.61 0.64 / 0.58
MGTM 0.79 / 0.80 0.82 / 0.81 0.78 / 0.75
Detecting Non-Gaussian Geographical Topics 33Christoph Carl Kling
west.uni-koblenz.deResearch → systems → MGTM
west.uni-koblenz.de liveandgov.eu
Detecting Non-Gaussian Geographical Topics 34Christoph Carl Kling
Thank you!Questions?
Contact: [email protected]
Detecting Non-Gaussian Geographical Topics 35Christoph Carl Kling
Summary
• Geographical topics often exhibit a complex spatial distribution
• The detection of such complex topics can be supported
• The dynamic smoothing of adjacent regions leads to an evolutionary creation and spread of topics during inference
Detecting Non-Gaussian Geographical Topics 36Christoph Carl Kling
ReferencesReferences
Hierarchical Dirichlet processesby: Y. W. Teh, M. I. Jordan, M. J. Beal, and D. M. BleiIn: Journal of the American Statistical Association, Vol. 101 (2006) , p. 1566-1581.
GeoFolk: latent spatial semantics in web 2.0 social media.by: Sergej SizovIn: WSDM ACM (2010) , p. 281-290.
Geographical topic discovery and comparison.by: Zhijun Yin, Liangliang Cao, Jiawei Han, Chengxiang Zhai, and Thomas S. HuangIn: WWW ACM (2011) , p. 247-256.
A Nonparametric Bayesian Model of Multi-Level Category Learning.by: Kevin Robert Canini, and Thomas L. GriffithsIn: AAAI AAAI Press (2011) .