Image Management Using Automatic Recognition Systems Bongwon Suh Computer Science Department Human...

Image Management Using Automatic Recognition Systems

Bongwon Suh

Computer Science Department

Human Computer Interaction Laboratory

University of Maryland at College Park

[email protected]

Dec. 6th, 2004 Palo Alto Research Center - Bongwon Suh 2

Overview

Image Management Problems Thumbnail Presentation Lack of Metadata

Zoomable User Interfaces Automatic Thumbnail Cropping Semi-Automatic Photo Annotation


Motivation Managing personal document

An ever increasing amount of personal document Sometimes, it’s easier to find information from the

Web than from your local hard disk. Tools

Microsoft Longhorn, Apple OSX Tiger Google Desktop Search

But, what about photos?But, what about photos?


Image Management Problems Document management

Indexing, organizing, searching, browsing, sharing, and so on Extend conventional information management principles for

image management E.g. Using image captions, annotated keywords

Two additional challenges for image management

Thumbnail presentationThumbnail presentation Metadata acquisitionMetadata acquisition


Thumbnails

Bigger thumbnails use more screen space, smaller thumbnails are hard to recognize

Two views for the same directory

Detail View Mode Preview Mode


Lack of Metadata Metadata

Pieces of information associated with photos Digital image

A stream of color pixels Hard to extract useful metadata

When a cat sits quietly, staring off at something, and the tail twitches slowly, the

cat is concentrating on something. If a cat is lashing his tail back and forth quickly, it means

he is annoyed and angry. This is when a cat is likely to bite or scratch.

Extracting “cat”?Extracting “cat”?


Overview


Zoomable User InterfacesZoomable User Interfaces Automatic Thumbnail Cropping Semi-Automatic Photo Annotation


Zoomable User Interface 2.5 Dimension Environment

2D + Depth Zooming and panning for navigation ZUIs are dependent on humans’ ability to remember where th

ings are in space.

PhotoMesa Zoomable image browser Uses Treemap algorithm to layout photos Capable of showing thousands of images on the screen Commercialized: http://www.photomesa.com [Bederson UIST 2001]


Overview


Zoomable User Interfaces

Automatic Thumbnail CroppingAutomatic Thumbnail Cropping Semi-Automatic Photo Annotation


Thumbnail Cropping Fit images into limited screen space

Image Shrinking (Plain Thumbnail) We lose detailed information

Image Cropping We lose a part of information

Remove the periphery and show the core objects Remove the periphery and show the core objects bigger on the screenbigger on the screen Select the portion of maximal informativeness Preserve the recognizability of important objects in thumbnails


Thumbnail Cropping Example

Original Image

Cropped Image(periphery removed)

Generated Thumbnails

Shrinking(Subsampling)

Crop first, then shrink the cropped images


More Examples

Plain Thumbnails Cropped Thumbnails

Both sets use the same amount of screen space


Automatic Thumbnail Cropping Which part is more informative?

Need to measure informativeness

Saliency based thumbnail cropping Improve cropping by using dynamic threshold

Face detection based thumbnail cropping Applying existing techniques as an example of detecti

ng semantic information in images


Saliency Based Thumbnail Cropping Saliency

Visual attention model (color, intensity, and orientation) Itti and Koch (1998, 1999) Does not need prior knowledge on images

Assumption Saliency More informativeness

Computed Saliency MapOriginal Image


Saliency Based Cropping

Find a minimum size rectangle that contains a certain portion (threshold) of total saliency Static threshold algorithm

Brute force algorithm Require exhaustive search

Greedy algorithm Keep increasing cropping bounds


Dynamic Saliency Threshold

The most effective threshold varies from images to images

Scattered saliency Need to cut out little

Gathered saliency Close cutting is possible

Scattered saliency

Gathered saliency


Original Image

Original Image

Area–Saliency Sum Graph Compute an optimal cropping rectangle for each saliency threshold

Cro

ppin

g A

rea

Sum of Saliency Values inside Area70% saliency

contained inside30% area

80% saliency contained inside

50% area

0.7

0.3

0.5

0.8


Cropping With Dynamic Threshold

Area-Saliency Sum Graph

Cro

ppin

g A

rea

Sum of Saliency Values inside Area

Find a point of diminishing returnsAdding small amounts of saliency requires a large increase of the cropping bounds

Binary search for maximum gradient point

Maximum gradient

point


Static And Dynamic Threshold

Dynamic ThresholdStatic Threshold 90%Original Image

Cutting outtoo little

Cutting outtoo much

0.9

0.9

Area-Saliency Sum Graph

: Maximum Gradient Point


Face Detection Based Thumbnail Cropping

When semantic information of images can be detected, more efficient cropping is possible

Face Detection: Schneiderman and Kanade (2000)

OriginalImage

FaceDetection

Face Detection Based Cropping

Face Detection Based

Thumbnail

PlainThumbnail


User Study Design Participant

Twenty students recruited on campus Task

Recognition Task Visual Search Task

Image Set Animal Set: Common objects Corbis Set: Professionally prepared photos Face Set: Well known figures (e.g., Entertainers)

Thumbnail Technique No cropping Saliency based cropping Face detection based cropping


Recognition Task To measure the effect of thumbnail techniques on object recognition Target thumbnails were shown for two seconds Participants were asked to click what they saw. Measured recognition accuracy: # of right answers / # of total tasks# of right answers / # of total tasks

Face Set Animal Set


Recognition Task HypothesisR

ecog

nitio

n R

ate

Thumbnail Size

100%

Effect on Recognition Rate?

Meaningful

Thumbnailsare too small

anyway

Big enough to be recognized

in both cases

Thumbnail Technique A

Thumbnail Technique B


Recognition Task Result All curves are different from each others. (p < 0.01)

Animal Set Face Set

Face DetectionBased Cropped

Saliency Based Cropped

No Cropping


Visual Search Task Find an image that matches a

given task description Verbal description (except

faces) PhotoMesa interface

Measured browsing completion time 3X3 within-subject ANOVA Three thumbnail techniques Three image sets


Visual Search Task Result No cropping vs. Saliency Based

Cropping 18%, 24%, and 23% faster18%, 24%, and 23% faster,

respectively F(1, 190) = 3.82, p = 0.05

Three Thumbnail Techniques on Face Set Visual search with face

detection based thumbnails is 50% faster50% faster

F(2, 87) = 4.56, p = 0.013

Bro

wsi

ng T

ime

(sec

.)


Overview


Zoomable User Interfaces Automatic Thumbnail Cropping

Semi-Automatic Photo AnnotationSemi-Automatic Photo Annotation


Acquiring Metadata From Devices

Basic information such as date, image size, and so on Adding a GPS unit into digital camera

From Context Image from a web page: Use captions, surrounding text

Image Analysis Color, texture, face, and so on

Manual Annotation Most reliable, accurate, and relevant Slow, tedious

Automatic ExtractionInaccurate, Irrelevant


Semi-Automatic Annotation Incremental and interactive annotation Appropriate user interfaces are important

Browse Search

Automatic Metadata

Extraction Manager

Semi-Automatic Annotation Interface

Update Knowledge

Photos ready to be annotated

Automatic Suggestion with Available Knowledge

Annotate

Relevance Feedback (Fix Errors)


Relevant Metadata

Chronological order Last Halloween

Event information Birthday party, Camping trip Often associated with location

People in photos

[Rodden, CHI2002]


Semi-Automatic PHoto Annotation and Recognition Interface (SAPHARI) Browse, Search, and Annotate Image clustering

Facilitate bulk annotation Using available metadata for

clustering

Event groupingEvent grouping Face groupingFace grouping

Treemap layout Zoomable user interface


Event Based Annotation

Personal photo collection Bursty or episodic

Using pause as event boundary Event gap detection How large is the current temporal gap (compared to

neighbors)

d

djjijiii tt

dKtt )log(

12

1)log( 11 [Platt, 2001]


Event Hierarchy

Summer Camping TripJune 13th – June 17th

William’s BirthdayJune 23rd

Family Dinner 6pm – 8pm

Party at Kinder3pm – 4pm

Santa CruzJune 16th

CanoeingJune 15th

HikingJune 14th

Require different levels of granularity

Coarse Grouping

Fine Grouping


Event Group Example Provide multiple views for

the same photo collection Coarse grouping Fine grouping By month By directory

Fixing event group boundaries Automatically update event

group boundaries of other levels in event hierarchy


Clothing Based Annotation Face recognition

Not applicable for personal photos Less than 50% accuracy (even state-of-the-art systems)

Face detection Identify the location of face Higher accuracy than face recognition

People usually don’t change clothing during a day Use clothing information instead of facial information


Human Model Find an upper body part (torso) of h

uman by using face detection technique Viola-Jones face detector

Convert torso into mathematical model 4D samples (relative distance

from the neck, red, green, blue) Gaussian sampling (more weig

ht on the center line) Build 4D histogram with the sa

mples

Detected Face

Neck

Torso

Sampling with more weight on the center axis


Compute Visual Distance

frequency

f(X)

gM1(X)

Bhattacharyya Distance

Converted ModelIdentified Clothing

Sampling Color Pixels(y-distance, r, g, b)

gMn(X)

Pre-identified Models

Four Dimensional Probability Density

Function (pdf)


User Study Results Semi-controlled User study

Seven participants, using their own photo collections

Event annotation Event based group vs. User’s own directory 55% faster, statistically significant Valid grouping + zoomable user interface

Face annotation Clothing based group vs. Manual 15% faster (not significant) Unanimously preferred: F(2, 18) = 21.1, p < 0.01


Clothing Based Group

As the number of faces gets larger, the semi-automatic annotation becomes more efficient.

Semi- Automatic Face Annotation (Clothing Based)

0

1

2

3

4

5

6

7

0 20 40 60 80 100

Number of Annotated Faces

Tim

e P

er A

nnot

atio

n (s

ec.)

Manual Face Annotation

0

1

2

3

4

5

6

7

0 20 40 60 80 100

Number of Annotated Faces

Tim

e P

er A

nnot

atio

n (s

ec.)


Conclusion

PhotoMesa Scale up the size of image collection that users can

comfortably browse

Automatic Thumbnail Cropping Create better thumbnails that can fit into limited screen space

Semi-Automatic Photo Annotation Help users make accurate annotation with less effort


Research On Other Topics

Popout Prism “Overview+Detail” Web Browser Apply visual perception principles CHI 2002

OZONE Zoomable Ontology Browser DAML, the Semantic Web AVI 2002


Thank YouBenjamin B. Bederson

David W. JacobsHaibin Ling

Catherine Plaisant

http://www.cs.umd.edu/~sbw [email protected]

Image Management Using Automatic Recognition Systems Bongwon Suh Computer Science Department Human...

Documents

Transcript of Image Management Using Automatic Recognition Systems Bongwon Suh Computer Science Department Human...