Generating Summaries and Visualization for Large Collections of
Geo-referenced PhotographsAlexander Jaffe*, Mor Naaman*, Tamir Tassa†, Marc Davis$
*Yahoo! Research Berkeley
†Open University of Israel$Yahoo! Research
Generating Summaries - Mor Naaman 2
Attraction Map of Paris
Stanley Milgram, 1976. Psychological Maps of Paris
Generating Summaries - Mor Naaman 3
Attraction Map of London
Jaffe et al, 2006.
Generating Summaries - Mor Naaman 4
Information Overload?
Flickr “geotagged”
Generating Summaries - Mor Naaman 5
Overview
• Problem definition
• Intuition for solution
• Algorithm for summarization
• Visualizing the dataset
• Evaluation
• Demo?
Generating Summaries - Mor Naaman 6
Problem Definition
• Dataset: (photo_id, user_id, latitude, longitude) (photo_id, tag)
• Result: (photo_id, rank)
Given all photos from a geographic region, find a “representative” summary set
Generating Summaries - Mor Naaman 7
Issues to Tackle
• Noisy data
Whatever, color, city, spectrum, santa barbara, california, usa, Lookatme, Herbert Bayer Chromatic Gate
• Photographer biases– In locations– In Tags
• Wrong data
Generating Summaries - Mor Naaman 8
Intuition
More “activity” in a certain location indicates importance of that location
Tag that are unique to a certain location can suggest importance of that location
Generating Summaries - Mor Naaman 9
(Very) Simple Example
Generating Summaries - Mor Naaman 10
Algorithm Overview
1. Hierarchical Clustering of the location data
2. For each cluster, generate cluster score3. Recursively generate ordering of all photos in each
cluster, based on subcluster score and ordering
Generating Summaries - Mor Naaman 11
The Clustered Return of the (Very) Simple Example!
4, 6, 58,7
4,8,6,5,7
20
10
Generating Summaries - Mor Naaman 12
Generating a Summary
• A complete ranking is produced for all photos in the dataset
• An n-photo summary is simply the first n photos in this ranking.
Generating Summaries - Mor Naaman 13
Generating Cluster Scores
• Main Factors:– Number of photos– Relevance (bias) factors– “Tag Distinguishability”– “Photographer Distinguishability”
Generating Summaries - Mor Naaman 14
Tag Distinguishability
• A measure of uniqueness of concepts represented in the cluster (“document”)
• TF/IDF based– Compute frequency of each tag (TF)
– Compute (inverse) frequency of tag in the rest of the dataset (IDF)
– Aggregate TF/IDF over all tags in cluster using L2 norm
• Or, if you like formulas:
Read the damn paper!
Generating Summaries - Mor Naaman 15
Summary of San Francisco
Golden Gate Bridge TransAmerica
AT&T Baseball Park
Golden Gate Twin Peaks Golden Gate
Bay BridgeOcean Beach Chinatown
Generating Summaries - Mor Naaman 16
Progress Bar (almost done)
• Problem definition
• Intuition for solution
• Algorithm for summarization
• Visualizing the dataset
• Evaluation
• Demo?
Generating Summaries - Mor Naaman 17
Tag Maps
• Observation:– The algorithm identifies “representative”
locations– The algorithm identifies unique, important
tags
Can be used to visualize the dataset!
Generating Summaries - Mor Naaman 18
Tag Maps
Generating Summaries - Mor Naaman 19
Tag Maps
Generating Summaries - Mor Naaman 20
Ok, how do we evaluate this?
• Direct human-evaluation of algorithmic results– Evaluated Tag Maps with various weighting
options– Compared summaries to 3 base conditions
• Compared chosen locations to top 15 locations selected by humans (Milgram-style)
Generating Summaries - Mor Naaman 21
Maybe we have time for a demo
Generating Summaries - Mor Naaman 22
Maybe we have time for Q’s
http://zonetag.research.yahoo.com(applied in prototype cameraphone app)
http://blog.yahooresearchberkeley.com(more on this and other topics)
Top Related