Image Management Using Automatic Recognition Systems Bongwon Suh Computer Science Department Human...
-
Upload
garry-daniels -
Category
Documents
-
view
215 -
download
0
Transcript of Image Management Using Automatic Recognition Systems Bongwon Suh Computer Science Department Human...
Image Management Using Automatic Recognition Systems
Bongwon Suh
Computer Science Department
Human Computer Interaction Laboratory
University of Maryland at College Park
Dec. 6th, 2004 Palo Alto Research Center - Bongwon Suh 2
Overview
Image Management Problems Thumbnail Presentation Lack of Metadata
Zoomable User Interfaces Automatic Thumbnail Cropping Semi-Automatic Photo Annotation
Dec. 6th, 2004 Palo Alto Research Center - Bongwon Suh 3
Motivation Managing personal document
An ever increasing amount of personal document Sometimes, it’s easier to find information from the
Web than from your local hard disk. Tools
Microsoft Longhorn, Apple OSX Tiger Google Desktop Search
But, what about photos?But, what about photos?
Dec. 6th, 2004 Palo Alto Research Center - Bongwon Suh 4
Image Management Problems Document management
Indexing, organizing, searching, browsing, sharing, and so on Extend conventional information management principles for
image management E.g. Using image captions, annotated keywords
Two additional challenges for image management
Thumbnail presentationThumbnail presentation Metadata acquisitionMetadata acquisition
Dec. 6th, 2004 Palo Alto Research Center - Bongwon Suh 5
Thumbnails
Bigger thumbnails use more screen space, smaller thumbnails are hard to recognize
Two views for the same directory
Detail View Mode Preview Mode
Dec. 6th, 2004 Palo Alto Research Center - Bongwon Suh 6
Lack of Metadata Metadata
Pieces of information associated with photos Digital image
A stream of color pixels Hard to extract useful metadata
When a cat sits quietly, staring off at something, and the tail twitches slowly, the
cat is concentrating on something. If a cat is lashing his tail back and forth quickly, it means
he is annoyed and angry. This is when a cat is likely to bite or scratch.
Extracting “cat”?Extracting “cat”?
Dec. 6th, 2004 Palo Alto Research Center - Bongwon Suh 7
Overview
Image Management Problems Thumbnail Presentation Lack of Metadata
Zoomable User InterfacesZoomable User Interfaces Automatic Thumbnail Cropping Semi-Automatic Photo Annotation
Dec. 6th, 2004 Palo Alto Research Center - Bongwon Suh 8
Zoomable User Interface 2.5 Dimension Environment
2D + Depth Zooming and panning for navigation ZUIs are dependent on humans’ ability to remember where th
ings are in space.
PhotoMesa Zoomable image browser Uses Treemap algorithm to layout photos Capable of showing thousands of images on the screen Commercialized: http://www.photomesa.com [Bederson UIST 2001]
Dec. 6th, 2004 Palo Alto Research Center - Bongwon Suh 9
Overview
Image Management Problems Thumbnail Presentation Lack of Metadata
Zoomable User Interfaces
Automatic Thumbnail CroppingAutomatic Thumbnail Cropping Semi-Automatic Photo Annotation
Dec. 6th, 2004 Palo Alto Research Center - Bongwon Suh 10
Thumbnail Cropping Fit images into limited screen space
Image Shrinking (Plain Thumbnail) We lose detailed information
Image Cropping We lose a part of information
Remove the periphery and show the core objects Remove the periphery and show the core objects bigger on the screenbigger on the screen Select the portion of maximal informativeness Preserve the recognizability of important objects in thumbnails
Dec. 6th, 2004 Palo Alto Research Center - Bongwon Suh 11
Thumbnail Cropping Example
Original Image
Cropped Image(periphery removed)
Generated Thumbnails
Shrinking(Subsampling)
Crop first, then shrink the cropped images
Dec. 6th, 2004 Palo Alto Research Center - Bongwon Suh 12
More Examples
Plain Thumbnails Cropped Thumbnails
Both sets use the same amount of screen space
Dec. 6th, 2004 Palo Alto Research Center - Bongwon Suh 13
Automatic Thumbnail Cropping Which part is more informative?
Need to measure informativeness
Saliency based thumbnail cropping Improve cropping by using dynamic threshold
Face detection based thumbnail cropping Applying existing techniques as an example of detecti
ng semantic information in images
Dec. 6th, 2004 Palo Alto Research Center - Bongwon Suh 14
Saliency Based Thumbnail Cropping Saliency
Visual attention model (color, intensity, and orientation) Itti and Koch (1998, 1999) Does not need prior knowledge on images
Assumption Saliency More informativeness
Computed Saliency MapOriginal Image
Dec. 6th, 2004 Palo Alto Research Center - Bongwon Suh 15
Saliency Based Cropping
Find a minimum size rectangle that contains a certain portion (threshold) of total saliency Static threshold algorithm
Brute force algorithm Require exhaustive search
Greedy algorithm Keep increasing cropping bounds
Dec. 6th, 2004 Palo Alto Research Center - Bongwon Suh 16
Dynamic Saliency Threshold
The most effective threshold varies from images to images
Scattered saliency Need to cut out little
Gathered saliency Close cutting is possible
Scattered saliency
Gathered saliency
Dec. 6th, 2004 Palo Alto Research Center - Bongwon Suh 17
Original Image
Original Image
Area–Saliency Sum Graph Compute an optimal cropping rectangle for each saliency threshold
Cro
ppin
g A
rea
Sum of Saliency Values inside Area70% saliency
contained inside30% area
80% saliency contained inside
50% area
0.7
0.3
0.5
0.8
Dec. 6th, 2004 Palo Alto Research Center - Bongwon Suh 18
Cropping With Dynamic Threshold
Area-Saliency Sum Graph
Cro
ppin
g A
rea
Sum of Saliency Values inside Area
Find a point of diminishing returnsAdding small amounts of saliency requires a large increase of the cropping bounds
Binary search for maximum gradient point
Maximum gradient
point
Dec. 6th, 2004 Palo Alto Research Center - Bongwon Suh 19
Static And Dynamic Threshold
Dynamic ThresholdStatic Threshold 90%Original Image
Cutting outtoo little
Cutting outtoo much
0.9
0.9
Area-Saliency Sum Graph
: Maximum Gradient Point
Dec. 6th, 2004 Palo Alto Research Center - Bongwon Suh 20
Face Detection Based Thumbnail Cropping
When semantic information of images can be detected, more efficient cropping is possible
Face Detection: Schneiderman and Kanade (2000)
OriginalImage
FaceDetection
Face Detection Based Cropping
Face Detection Based
Thumbnail
PlainThumbnail
Dec. 6th, 2004 Palo Alto Research Center - Bongwon Suh 21
User Study Design Participant
Twenty students recruited on campus Task
Recognition Task Visual Search Task
Image Set Animal Set: Common objects Corbis Set: Professionally prepared photos Face Set: Well known figures (e.g., Entertainers)
Thumbnail Technique No cropping Saliency based cropping Face detection based cropping
Dec. 6th, 2004 Palo Alto Research Center - Bongwon Suh 22
Recognition Task To measure the effect of thumbnail techniques on object recognition Target thumbnails were shown for two seconds Participants were asked to click what they saw. Measured recognition accuracy: # of right answers / # of total tasks# of right answers / # of total tasks
Face Set Animal Set
Dec. 6th, 2004 Palo Alto Research Center - Bongwon Suh 23
Recognition Task HypothesisR
ecog
nitio
n R
ate
Thumbnail Size
100%
Effect on Recognition Rate?
Meaningful
Thumbnailsare too small
anyway
Big enough to be recognized
in both cases
Thumbnail Technique A
Thumbnail Technique B
Dec. 6th, 2004 Palo Alto Research Center - Bongwon Suh 24
Recognition Task Result All curves are different from each others. (p < 0.01)
Animal Set Face Set
Face DetectionBased Cropped
Saliency Based Cropped
No Cropping
Dec. 6th, 2004 Palo Alto Research Center - Bongwon Suh 25
Visual Search Task Find an image that matches a
given task description Verbal description (except
faces) PhotoMesa interface
Measured browsing completion time 3X3 within-subject ANOVA Three thumbnail techniques Three image sets
Dec. 6th, 2004 Palo Alto Research Center - Bongwon Suh 26
Visual Search Task Result No cropping vs. Saliency Based
Cropping 18%, 24%, and 23% faster18%, 24%, and 23% faster,
respectively F(1, 190) = 3.82, p = 0.05
Three Thumbnail Techniques on Face Set Visual search with face
detection based thumbnails is 50% faster50% faster
F(2, 87) = 4.56, p = 0.013
Bro
wsi
ng T
ime
(sec
.)
Dec. 6th, 2004 Palo Alto Research Center - Bongwon Suh 27
Overview
Image Management Problems Thumbnail Presentation Lack of Metadata
Zoomable User Interfaces Automatic Thumbnail Cropping
Semi-Automatic Photo AnnotationSemi-Automatic Photo Annotation
Dec. 6th, 2004 Palo Alto Research Center - Bongwon Suh 28
Acquiring Metadata From Devices
Basic information such as date, image size, and so on Adding a GPS unit into digital camera
From Context Image from a web page: Use captions, surrounding text
Image Analysis Color, texture, face, and so on
Manual Annotation Most reliable, accurate, and relevant Slow, tedious
Automatic ExtractionInaccurate, Irrelevant
Dec. 6th, 2004 Palo Alto Research Center - Bongwon Suh 29
Semi-Automatic Annotation Incremental and interactive annotation Appropriate user interfaces are important
Browse Search
Automatic Metadata
Extraction Manager
Semi-Automatic Annotation Interface
Update Knowledge
Photos ready to be annotated
Automatic Suggestion with Available Knowledge
Annotate
Relevance Feedback (Fix Errors)
Dec. 6th, 2004 Palo Alto Research Center - Bongwon Suh 30
Relevant Metadata
Chronological order Last Halloween
Event information Birthday party, Camping trip Often associated with location
People in photos
[Rodden, CHI2002]
Dec. 6th, 2004 Palo Alto Research Center - Bongwon Suh 31
Semi-Automatic PHoto Annotation and Recognition Interface (SAPHARI) Browse, Search, and Annotate Image clustering
Facilitate bulk annotation Using available metadata for
clustering
Event groupingEvent grouping Face groupingFace grouping
Treemap layout Zoomable user interface
Dec. 6th, 2004 Palo Alto Research Center - Bongwon Suh 32
Event Based Annotation
Personal photo collection Bursty or episodic
Using pause as event boundary Event gap detection How large is the current temporal gap (compared to
neighbors)
d
djjijiii tt
dKtt )log(
12
1)log( 11 [Platt, 2001]
Dec. 6th, 2004 Palo Alto Research Center - Bongwon Suh 33
Event Hierarchy
Summer Camping TripJune 13th – June 17th
William’s BirthdayJune 23rd
Family Dinner 6pm – 8pm
Party at Kinder3pm – 4pm
Santa CruzJune 16th
CanoeingJune 15th
HikingJune 14th
Require different levels of granularity
Coarse Grouping
Fine Grouping
Dec. 6th, 2004 Palo Alto Research Center - Bongwon Suh 34
Event Group Example Provide multiple views for
the same photo collection Coarse grouping Fine grouping By month By directory
Fixing event group boundaries Automatically update event
group boundaries of other levels in event hierarchy
Dec. 6th, 2004 Palo Alto Research Center - Bongwon Suh 35
Clothing Based Annotation Face recognition
Not applicable for personal photos Less than 50% accuracy (even state-of-the-art systems)
Face detection Identify the location of face Higher accuracy than face recognition
People usually don’t change clothing during a day Use clothing information instead of facial information
Dec. 6th, 2004 Palo Alto Research Center - Bongwon Suh 36
Human Model Find an upper body part (torso) of h
uman by using face detection technique Viola-Jones face detector
Convert torso into mathematical model 4D samples (relative distance
from the neck, red, green, blue) Gaussian sampling (more weig
ht on the center line) Build 4D histogram with the sa
mples
Detected Face
Neck
Torso
Sampling with more weight on the center axis
Dec. 6th, 2004 Palo Alto Research Center - Bongwon Suh 37
Compute Visual Distance
frequency
f(X)
gM1(X)
Bhattacharyya Distance
Converted ModelIdentified Clothing
Sampling Color Pixels(y-distance, r, g, b)
gMn(X)
Pre-identified Models
Four Dimensional Probability Density
Function (pdf)
Dec. 6th, 2004 Palo Alto Research Center - Bongwon Suh 38
User Study Results Semi-controlled User study
Seven participants, using their own photo collections
Event annotation Event based group vs. User’s own directory 55% faster, statistically significant Valid grouping + zoomable user interface
Face annotation Clothing based group vs. Manual 15% faster (not significant) Unanimously preferred: F(2, 18) = 21.1, p < 0.01
Dec. 6th, 2004 Palo Alto Research Center - Bongwon Suh 39
Clothing Based Group
As the number of faces gets larger, the semi-automatic annotation becomes more efficient.
Semi- Automatic Face Annotation (Clothing Based)
0
1
2
3
4
5
6
7
0 20 40 60 80 100
Number of Annotated Faces
Tim
e P
er A
nnot
atio
n (s
ec.)
Manual Face Annotation
0
1
2
3
4
5
6
7
0 20 40 60 80 100
Number of Annotated Faces
Tim
e P
er A
nnot
atio
n (s
ec.)
Dec. 6th, 2004 Palo Alto Research Center - Bongwon Suh 40
Conclusion
PhotoMesa Scale up the size of image collection that users can
comfortably browse
Automatic Thumbnail Cropping Create better thumbnails that can fit into limited screen space
Semi-Automatic Photo Annotation Help users make accurate annotation with less effort
Dec. 6th, 2004 Palo Alto Research Center - Bongwon Suh 41
Research On Other Topics
Popout Prism “Overview+Detail” Web Browser Apply visual perception principles CHI 2002
OZONE Zoomable Ontology Browser DAML, the Semantic Web AVI 2002
Dec. 6th, 2004 Palo Alto Research Center - Bongwon Suh 42
Thank YouBenjamin B. Bederson
David W. JacobsHaibin Ling
Catherine Plaisant
http://www.cs.umd.edu/~sbw [email protected]