BEYOND SLIDING WINDOW:

17
BEYOND SLIDING WINDOW: Object Localization by Efficient Subwindow Search Christoph H. Lampert, Matthew B. Blaschko, and Thomas Hofmann

description

BEYOND SLIDING WINDOW:. Christoph H. Lampert, Matthew B. Blaschko, and Thomas Hofmann. Object Localization by Efficient Subwindow Search. Motivations. To localize the object without exhaustive search observation : often, only a small portion of the image contains the object of interest - PowerPoint PPT Presentation

Transcript of BEYOND SLIDING WINDOW:

BEYOND SLIDING WINDOW:Object Localization by Efficient Subwindow Search

Christoph H. Lampert, Matthew B. Blaschko, and Thomas Hofmann

MOTIVATIONS

To localize the object without exhaustive search observation : often, only a small portion of the

image contains the object of interest

To find a global optimum in a huge search space

Object detection and retrieval

CONTRIBUTIONS

Efficient (n^2 VS n^4) n^4 rectangles for an image n X n

n X n possible centers n possible choices for width & n for height n^4 rectangles

Optimal Versatile

arbitrary objects VS simple parametric objects in line drawings [4]

flexible in the choice of the cost function VS L2 distance [13]

Challenge To find optimal and tight bounds

BRANCH AND BOUND

first proposed by A. H. Land and A. G. Doig in 1960 for linear programming

a “divide and conquer” approach to optimize some cost function f(x)

recursively branching & bounding split S into subsets Si that min(f(x)) = min(vi) compute the lower & upper bounds of f(x) within

Si

pruning

METHODOLOGY Cost function Parameter space

Bounds

BEST FIRST

BOUNDING I

a bag of visual words for non-rigid objects histograms of SIFT prototypes SVM decision function

bounds

get the maximal amount of + and minimal amount of –

integral image makes evaluation O(1)

,

RESULTS

PASCAL VOC 06 5,304 images with 9,507 objects from 10

categories 1000 visual words from 50,000 SURF descriptors claim a match when > 50% overlap between the

detected bounding box and the ground truth

PASCAL VOC 2007 9,963 images with 24,640 objects

RESULTS

EVALUATION

SPEED

40ms per image on a 2.4 GHz PC

BOUNDING II

spatial pyramid for rigid objects histograms with spatial information Extensions with ESS (fine-grained pyramids) SVM decision function

RESULTS

UIUC Car database (side-view, one car per image) 1050 training (550 positive images) 277 test (170 single scale + 107 multi scale) 1000 visual words from 50,000 SURF descriptors

IMAGE PART RETRIEVAL

query-by-example localized similarity measure

bounds

RESULTS

10143 keyframes of a movie return 100 most relevant images for a query 2s per returned image

CONCLUSIONS

high speed with global optimum

can be extended to multi-detections, other shapes, different cost functions