Liangliang Cao - University Of Illinoiscao4/papers/geo_photomemex_talk.pdf · Recommendation...

Post on 16-Jul-2020

1 views 0 download

Transcript of Liangliang Cao - University Of Illinoiscao4/papers/geo_photomemex_talk.pdf · Recommendation...

Geophoto Memex

Liangliang Cao

What is “Geophoto Memex”?

• Geophoto Memex: Record all the photos that are associated with locations in

the world, and provide geographical analytics on request.

• Our related papers: – ACMMM 2009: “Enhancing semantic and geographic annotation

of web images via logistic canonical correlation regression”

– ICASSP 2010: “A worldwide tourism recommendation system based on geotagged web photos”

– SDM 2011: “Diversified Trajectory Pattern Ranking in Geo-tagged Social Media”

– WWW 2011: “Geographical topic discovery and comparison”

Geophotos

• Where is the data from? − Advanced cameras with GPS receivers

− GPS sensor in smart phones

− Web Apps including Google Earth, Flickr, Twitter

3

Project Overview

Data Collection

Data Cleaning

Data

Analytics

4

• Already collected 1M geo-tagged photos

• Aim to collect

− 100+M geo-photo from Flickr

− More geo-tagged document from Twitter

Project Overview

Data Collection

Data Cleaning

Data

Analytics

5

• Remove the

label ambiguity

• Refine the

annotation

Project 1

Project Overview

Data Collection

Data Cleaning

Data

Analytics

6

• Tourism

Recommendation

• Geo-info discovery

• User interest mining

Project 2 Project 3

Our Projects

7

Geographical &

Semantic Annotation

Tourism

Recommendation Geographical Topics in

Social Media

Project 1 Project 2 Project 3

Geographical & Semantic Annotation

8

2006, clouds, sc,

d50, mywinners,

nikon, pond,

reflections, sun,

september

• User-provided annotation are usually limited and noisy.

– ambiguous or irrelevant labels

– Only a small amount of photos are geo-tagged

• Is it possible to refine and enrich these annotations?

– Large scale visual recognition can help.

– We combine both visual feature and tag features into the classifiers.

Geographical & Semantic Annotation

9

2006, clouds, sc,

d50, mywinners,

nikon, pond,

reflections, sun,

september

Annotation questions:

– What exists in the image?

– Where was the image taken?

Geographical & Semantic Annotation

10

2006, clouds, sc,

d50, mywinners,

nikon, pond,

reflections, sun,

september

• Annotation cue lies in different features.

• We train a model using Flickr data to annotate the images automatically.

Image

Geographical & Semantic Annotation

11

2006, clouds, sc,

d50, mywinners,

nikon, pond,

reflections, sun,

september nature, sky, water

New

Annotation

Combining Visual Features and Noisy Labels

• There are multiple features for online images

– Visual features: color, shape, GIST…

– Noisy annotations

• We explore the canonical correlations between multiple feature and use them to enrich the annotations

12

Canonical Correlation Analysis

Let x and y represent two feature vectors, CCA looks for the projection

where the optimal a, b maximize the correlation in projected subspaces

It is easy to show that the solution can be found by solving the general eigen decomposition problem

13

CCA for a Toy Example

Neither of the two dimensions in original space characterizes the

linear correlation. However, after projecting the data into the

canonical space, we can see the linear correlation clearly.

14

Logistic Canonical Correlation Regression

• Given multiple features, we can compute the canonical correlations between the feature and a given label.

• To combine the clues from multiple features, we employ the logistic canonical correlation regression (LCCR) model, which maximizes the likelihood

• The estimated function is

where is the correlation between label and the m-th feature, is the parameter for the logistic model.

15

Dataset

• We collect 380,573 images with tags and GPS records from Flickr.

• The number of tags for each image

– varies from zero to over ten

– the average number is 4.96 tags per image.

• The GPS location:

– The scope of the geographic areas is within the North America (users in other areas may use different languages for tags)

16

Enhancing Semantic Annotation

• We employ 66 semantic concepts (tags) for semantic annotation: most popular labels in Flickr

• We train our LCCR model based on multiple features

– 6 visual features: LAB color histogram, GIST, tiny image, LAB color of tiny image, image projection in PCA and LDA spaces.

– Existing tag features: we remove the terms that are the same as labeling concepts in the process of both training and testing because they are the very annotation we are trying to predict.

17

More Examples

canon, water, ocean, wildlife, fish, 20d, seagull, feathery friday, camping, gull

river, colors, sony, quebec, minolta, paysage, a100, automne

nikon, red, green, usa, flower, purple, october, plants, texture, flora, illinois, natural, pattern, wallpaper

More Examples

canon, water, ocean, wildlife, fish, 20d, seagull, feathery friday, camping, gull, bird, nature, sea

river, colors, sony, quebec, minolta, paysage, a100, automne, autumn, fall, landscape

nikon, red, green, usa, flower, purple, october, plants, texture, flora, illinois, natural, pattern, wallpaper, autumn, fall, flower, garden, nature

More Examples

canon, water, ocean, wildlife, fish, 20d, seagull, feathery friday, camping, gull, bird, nature, sea

20

river, colors, sony, quebec, minolta, paysage, a100, automne, autumn, fall, landscape

nikon, red, green, usa, flower, purple, october, plants, texture, flora, illinois, natural, pattern, wallpaper, autumn, fall, flower, garden, nature

Evaluation: Geographical Annotation

21

Evaluation Semantic Annotation

22

Our Projects

23

Geographical &

Semantic Annotation

Tourism

Recommendation Geographical Topics in

Social Media

Project 1 Project 2 Project 3

Tourism Recommendation from Image Retrieval

24

Query

Recommended places

Similar photo

from indexed

dataset

Find Popular Attractions

Popular attraction are usually those with many photos.

(different color denotes different attractions)

Mining Top Tourist Routes in Big Cities

26

London Eye →

Big Ben →

Downing Street →

Horse Guards →

Trafalgar Square

Apple Store →

St.patrick Cathedral →

Rockefeller Center

Eiffel Tower →

Louvre →

Notredame

Our Projects

27

Geographical &

Semantic Annotation

Tourism

Recommendation Geographical Topics in

Social Media

Project 1 Project 2 Project 3

Topics over Geographical Regions [WWW’ 11]

28

Input:

Output:

1. Geographic topics 2. Topics at a location

Motivations

• Goal: – Analyze the cultural differences around the world

– Explore the hot topics or events in different places

– Compare the popularity of specific products in different regions

• Latent Geographical Topic Analysis – The topics are generated from regions instead of

documents

– If two words are close to each other in space, they are more likely to belong to the same region

– If two words are from the same region, they are more likely to be clustered into the same topic

29

Latent Geographical Topic Analysis

region

importance

(N-d vector)

region geo-information

{p(z|r)} {p(w|z)}

location shape

30

Location/Text Perplexity

}

N

)l,p(w log

exp{)(Dperplexity

test

test

Dd d

Dd dd

testextlocation/t

Geographical Topic Comparison

• Food dataset

Topics over Geographical Regions

33

Italian food

Japanese food Chinese food

Spanish food Mexican food

French food

The redness

represents the

probability of

each topic at a

location.

Distinguish Different Landscapes

34

Beach Desert Mountains

Acknowledgement To My Terrific Collaborators

Thomas Huang Jiawei Han

Zhijun Yin Jiebo Luo Andew Gallagher