Gang WangDerek HoiemDavid Forsyth. INTRODUCTION APROACH (implement detail) EXPERIMENTS CONCLUSION.

19
Learning Image Similarity from Flickr Groups Using Stochastic Intersection Kernel Machines ICCV 2009, UIUC Gang Wang Derek HoiemDavid Forsyth

Transcript of Gang WangDerek HoiemDavid Forsyth. INTRODUCTION APROACH (implement detail) EXPERIMENTS CONCLUSION.

Page 1: Gang WangDerek HoiemDavid Forsyth. INTRODUCTION APROACH (implement detail) EXPERIMENTS CONCLUSION.

Learning Image Similarity from Flickr Groups Using

Stochastic Intersection Kernel Machines

ICCV 2009, UIUC

Gang Wang Derek Hoiem David Forsyth

Page 2: Gang WangDerek HoiemDavid Forsyth. INTRODUCTION APROACH (implement detail) EXPERIMENTS CONCLUSION.

INTRODUCTION

APROACH(implement detail)

EXPERIMENTS

CONCLUSION

OUTLINE

Page 3: Gang WangDerek HoiemDavid Forsyth. INTRODUCTION APROACH (implement detail) EXPERIMENTS CONCLUSION.

Using online photo sharing sites → Flickr(Group)

Determine which image are similar , how they are similar

Learn these Group membership likelihoodsDue to the time that it would take to learn categoriesPropose a new method for stochastic learning of SVMs

using Histogram Intersection Kernel (HIK)SIKMA

Combine with [14] and [18]

Introduction

Page 4: Gang WangDerek HoiemDavid Forsyth. INTRODUCTION APROACH (implement detail) EXPERIMENTS CONCLUSION.

Related work Algorithm classes (train very large scale kernel SVM)

i) Exploits the sparseness of the lagrange multipliers → SMO[22]

ii) Use stochastic gradient descentwithout touching every example

http://0rz.tw/BDHWJ

Kivinen [14] → method applies to kernel machines

Maji[18] → very quickly evaluating a histogram intersection kernel

Construction

Page 5: Gang WangDerek HoiemDavid Forsyth. INTRODUCTION APROACH (implement detail) EXPERIMENTS CONCLUSION.

Flickr provide an organizational structureHow people like to group

SIKMA classifier allows efficient and accurate learning of these categories

This property generalizes wellEven the test dataset was not obtained from Flickr

Construction conclusion

Page 6: Gang WangDerek HoiemDavid Forsyth. INTRODUCTION APROACH (implement detail) EXPERIMENTS CONCLUSION.

Approach(SIKMA)

Suppose we have a list of training examples

For the test example uThe classification score

Page 7: Gang WangDerek HoiemDavid Forsyth. INTRODUCTION APROACH (implement detail) EXPERIMENTS CONCLUSION.

Approach(SIKMA)Approximate the gradient by replacing the sum over all

examples(batch) with a sum over some subset, chosen at random. It is usual to consider a single example.

New decision function

It’s expensive to calculate ft-1. The NORMA Algo.[14] keeps a set of support vectors of fixed length by dropping the oldest ones.

Doing so comes at a considerable cost in accuracy !

Page 8: Gang WangDerek HoiemDavid Forsyth. INTRODUCTION APROACH (implement detail) EXPERIMENTS CONCLUSION.

Approach(SIKMA)

D is feature dimension

Page 9: Gang WangDerek HoiemDavid Forsyth. INTRODUCTION APROACH (implement detail) EXPERIMENTS CONCLUSION.

SIKMA Conventional SVM solver

The Computational complexity O(TMD) O(T2D)

The Space cost O(MD)

O(T2)

O(D) is Evaluation

for each example

Approach

T: # of training example M: # of quantization binsD: # of feature dimension

Page 10: Gang WangDerek HoiemDavid Forsyth. INTRODUCTION APROACH (implement detail) EXPERIMENTS CONCLUSION.

Measuring image similarity

Found a simple Euclidean distance between the SVM outputs.

Since we have names(groups), we can also perform text-based queries

(get image like “people dancing”) and determine how two image are similar

Approach

Page 11: Gang WangDerek HoiemDavid Forsyth. INTRODUCTION APROACH (implement detail) EXPERIMENTS CONCLUSION.

Use four type of feature:

SIFT featureDetect and describe local patches

Gist feature960 dimensions Gist descriptor

Color featureRGB space, value range from 1 to 512

Gradient featureThe whole image is represented as a 256

dimensional vector

Implement detail

Combine the outputs of these four classifier to be a final prediction on a validation data set

Page 12: Gang WangDerek HoiemDavid Forsyth. INTRODUCTION APROACH (implement detail) EXPERIMENTS CONCLUSION.

SIKMA Training Time and Test Accuracy

For 103 Flickr categories, using

15,000 ~ 30,00 positive images and 60,000 negative images.

The average AP over these categories is0.433

Page 13: Gang WangDerek HoiemDavid Forsyth. INTRODUCTION APROACH (implement detail) EXPERIMENTS CONCLUSION.

Experiments

Page 14: Gang WangDerek HoiemDavid Forsyth. INTRODUCTION APROACH (implement detail) EXPERIMENTS CONCLUSION.

Experimentsimage matching with Feedback

Select top five negative examples and five randomly chosen positive examples from among the top 50 ranked images

yi is 1 if it is positive, otherwise 0

Page 15: Gang WangDerek HoiemDavid Forsyth. INTRODUCTION APROACH (implement detail) EXPERIMENTS CONCLUSION.

Experiments

Page 16: Gang WangDerek HoiemDavid Forsyth. INTRODUCTION APROACH (implement detail) EXPERIMENTS CONCLUSION.

Experiments

Page 17: Gang WangDerek HoiemDavid Forsyth. INTRODUCTION APROACH (implement detail) EXPERIMENTS CONCLUSION.

Experiments

Page 18: Gang WangDerek HoiemDavid Forsyth. INTRODUCTION APROACH (implement detail) EXPERIMENTS CONCLUSION.

Experimentstext-based queries

Flickr category can be described with several word, we can support text-based queries.

Input a word query find the Flickr group whose description contains such word

Test this on the Corel data set, with two queries ”airplane” and “sunset”.

Page 19: Gang WangDerek HoiemDavid Forsyth. INTRODUCTION APROACH (implement detail) EXPERIMENTS CONCLUSION.

Conclusion

SIKMA, an algorithm to quickly train an SVM with the histogram intersection kernel using tens of thousands of training examples

two images that are likely to belong to the same Flickr groups are considered similar.

Experimental results show that matching with Prediction features better than matching with visual features