Using classifiers to compute similarities between face images. Prof. Lior Wolf, Tel-Aviv University

1

LEARNING VISUAL SIMILARITY USING CLASSIFIERSLior Wolf, The Blavatnik School of Computer Science, Tel-Aviv University

Collaborators: Students: Yaniv Taigman Tal Hassner Orit Klipper-Gross Itay Maoz face.com Open U Weizmann inst. Tel-Aviv U

The Blavatnik School of Computer ScienceTel-Aviv University

An example of higher education in Israel2

• A school in the Faculty of Exact Sciences– that also includes: Mathematics, Physics, Chemistry, Geophysics and

Planetary Sciences• Originated in the 1970’s as part of the School of

Math, – since 2000 a separate School

• 39 Faculty Members• ~1000 undergrads• ~200 MSc students• ~70 PhD students• Post Docs and other research personnel

3

School Ranking in the world• TAU/CS Ranked #29 in number of citations - Thompson

Scientific, (for the years 2000-2010).[Technion #33 , Weizmann #72, HebrewU #105]

• TAU/CS Ranked #28 by the Shanghai Academic Ranking of World Universities in Computer Science – 2011[Weizmann #12, Technion #15, HebrewU #21]

• TAU/CS Ranked #14 in the world in CS impact – Scientometrics, Vol. 76, No. 2, 2008.

• 12 TAU/CS faculty in positions 1-100 in “list of most central computer Scientists in Theory of Computer Science” (Kuhn – Wattenhofer, Sigact news, Dec ’07)

4

Raw data:images,video,audio

Information:objects,

tags, IDs,context

Query

Searchresults

Computer vision in search

Preprocessing

5

Over 1,000,000,000 photos uploaded each month

shared by 200,000,000+ users 10’s of billions served/week No tags No Photos…

On :

“can I see all my photos?”“tagging takes hours, can you do that for

me?”

The pain: too many images

6

The evolution of perceptual search

Text-basedimage search

With basicproperties

Specializationin face identification

Catalog basedsearch

Gist-based Image similarity

Reranking bysimilarity

No vision Low-level vision Mid-level vision

High-level vision: scene understanding

7

Photo Finder for facebook

8

9

THE 1st MOBILE APP TO FIND 3D ITEMS

10

WHAT MAKES IT SO HARD?

High-level vision: what is where?

High-level vision: scene understanding

A happy couple walks in a field

What kind of field?

Where? Which season?

How old are they? Gender?

How attractive?

What are they wearing?

11

LEARNING VISUAL SIMILARITY USING CLASSIFIERSLior Wolf, The Blavatnik School of Computer Science, Tel-Aviv University

Collaborators: Students: Yaniv Taigman Tal Hassner Orit Klipper-Gross Itay Maoz face.com Open U Weizmann inst. Tel-Aviv U

YaC, Moscow, September 19, 2011

12

13

The Pair-Matching Problem

Training:

14


Training:

Modeling never before seen

objectsNatural setup for image retrieval

with no categories

15

Instances

Face Recogniti0n

Video Face Recogniti0n

Document Analysis

Video Action Recogniti0n

16


Training:

17

Labeled Faces in the Wild (LFW)

Training:

13,000 labeled images of faces collected from the web

5,749 individuals1-150 images per individual

18

Restricted Protocol10-fold cross validation tests on randomly generated splits, each with:

300 same pairs300 not same pairs

19

Pipeline (take 1)*

* “Descriptor Based Methods in the Wild,” ECCVw’08

same

not same

Training. Note: no use of labels!

Sim ( , )Sim ( , )

Sim ( , )

Sim ( , )

=1

=2

=i

=i+1

Classifier

(e.g.SVM)

Threshold

20

Pipeline (take 1)*


same

not same

Training – multiple descriptors \ similarities

(1,1, 1,2,…,1,n)

(2,1, 2,2,…,2,n)

(i,1, i,2,…,i,n)

(i+1,1, i+1,2,…,i+1,n)

Classifier

(e.g.SVM)

21

Some Questions

How to represent the images?

Which similarity to use?

Later on:How can subject IDs help improve pair-

matching performance?

Grayscales, Edge responces [Brunelli & Poggio’93], C1-Gabor [e.g., Riesenhuber & Poggio’99], SIFT [Lowe’04], LBP [e.g., Ojala & Pietikainen & Harwood’96],…

L2, Correlation, Learned metrics [e.g., Bilenko etal.’04, Cristianini etal.’02, Hertz etal. 04, …], “hand-crafted” metrics [e.g., Belongie etal.’01]

22

One-Shot Similarity (OSS) Score* What:

A measure of the similarity between two vectors

Input: The two vectors A set of “Background samples”

How: Use “One-Shot Learning” (classification

with one positive example)

* “Descriptor Based Methods in the Wild,” ECCVw’08 “The One-Shot Similarity Kernel”, ICCV’09

23

Computing the “One-Shot” Similarity

p

q

Similarity

Set “A” of background

examples

Step a: Model1 = train (p, A)

Step b: Score1 = classify(q, Model1)

Step c: Model2 = train (q, A)

Step d: Score2 = classify(p, Model1)

One-Shot-Sim = (score1 + score2) /2

24

Euclidean vs. One-Shot Visualized

Euclidean

One-Shot

25

Euclidean vs. One-Shot Visualized

Euclidean

One-Shot

26


* “The One-Shot Similarity Kernel”, ICCV’09

LDA

22One-Shot-Sim

T j AT ++ i Aj A W ii A W j

i j + +W i A W j A

x + μx + μ x - μ S x -x - μ S x -x ,x , A = +

S x - μ S x - μ

Using LDA as the underlying classifier :

Where is the mean of set A, and is the pseudo-inverse of the intra-class cov. matrix.

Aμ +WS

27



LDAOne-Shot-Sim2 2FS

TT j A+ +i Ai j i A W j j A W i

x + μx + μx ,x , A = x - μ S x - + x - μ S x -

Using Free-Scale LDA as the underlying classifier :

Where is the mean of set A, and is the pseudo-inverse of the intra-class cov. matrix.

Aμ +WS

28

Some Properties of the OSS*


Uses unlabeled training data OSS based on

Free-Scale LDA is a CPD Kernel

May be efficiently computed

Complexity: is independent of the two vectors compared, and so computed only once. Also, repeated comparisons of a vector xi to different xj may be performed in O(n)

+WS

29



30



OSS based on Free-Scale LDA is

a CPD Kernel

31

Metric learning for OSS*

*“One Shot Similarity Metric Learning for Action Recognition”, In submission.

Instead of examples xi use Txi for some “optimal” T

The transformation T is obtained by a gradient decent procedure that optimizes the score:

),,(OSS),,,(OSSML TATxTxTAxx jiji

2

0samenot

ML

same

ML ),,,(OSS),,,(OSS)( TTTAxxTAxxTf jiji

32

The Unrestricted Protocol10-fold cross validation tests on randomly generated splits, each with:

300 same pairs300 not same pairs

Training now includes subject

labels

33

Multiple One-Shots*

We now have IDs. How do we use them? Compute multiple OSS, each time using

examples from a single class

* “Multiple One-Shots for Utilizing Class Label Information,” BMVC’09

34

Multiple One-Shots

ID-based OSS

35

Multiple One-Shots

We now have IDs. How do we use them? Compute multiple OSS, each time using

examples from a single class Discrimination based on different

sources of variation: Subject ID, Pose, etc.

36

The Pose IssueMost confident wrong

results*


37

Getting Poses

7 fiducial points (eyes, mouth, nose) 14 x,y coordinates 14D vector of alignment errors (similarity

trnsf.) Project to first Principal Component Bin to 10 classes

To compute Pose based OSS, you need sets of images in

the same pose…

38

Multiple One-Shots

Pose-based OSS

39

Multiple One-Shots - Examples

IdentityPose

5 Id-based OSS and5 Pose-based OSS scores

40


IdentityPose

41


IdentityPose

42

Pipeline*


Input image pair

Image alignment

Commercial alignment software by

43

Pipeline*


Input image pair

Image alignment

Feature vectors

Using:•SIFT [Lowe’04]•LBP [Ojala etal.’96,

01,02]•TPLBP, FPLBP [Wolf

etal.’08]

44

Pipeline*


Input image pair

Image alignment

Feature vectors

PCA+ITML

Information Theoretic Metric Learning [Davis etal.’07]

45

Pipeline*


Input image pair

Image alignment

Feature vectors

PCA+ITML

Multiple OSS scores

20 Subjects10 Poses

46

Pipeline*


Input image pair

Image alignment

Feature vectors

PCA+ITML

Multiple OSS scores

Output

Same \ Not-same

SVM classifier

47

Pipeline – Multiple Descriptors*


Output

Same \ Not-same

Feature vectors

SIFT

PCA+ITML

Multiple OSS

scores

Feature vectors

LBP

PCA+ITML

Multiple OSS

scores

Image alignment

SVM classifier

48

0.7847 ± 0.0051 [WHT’08]0.8398 ± 0.0035 [WHT’08 + alignment]

Results

0.8517 ± 0.0061 [this work, only LBP]0.8950 ± 0.0051 [this work,

multi-desc.]0.9753 [Kumar etal. 09 - HUMAN]

49

Pair-Matching of Sets

* Face Recognition in Unconstrained Videos with Matched B/G Similarity. CVPR 2011.

50

Pair-Matching of Sets

Training:

51

Conventional methods

all pairs comparison, distance between all frames of the first video and the second video.

pose based methods, comparing the two most frontal faces in each video or the two faces with the most similar pose.

algebraic methods set-to-set methods, such as max correlation, projection and Procrustes.

non algebraic methods such as PMK and LLC.

52

Matched B/G similarity

• X1 & X2: Sets of video frame descriptors.• B: background set of faces.

Similarity = MBGS(X1, X2, B) B1 = Find_Nearest_Neighbors(X1,B) Model1 = train(X1, B1) Confidences1 = classify(X2,

Model1) Sim1 = mean(Confidences1)

X1

X2

Similarity

53

Matched B/G similarity

• X1 & X2: Sets of video frame descriptors.• B: background set of faces.

Similarity = MBGS(X1, X2, B) B1 = Find_Nearest_Neighbors(X1,B) Model1 = train(X1, B1) Confidences1 = classify(X2,

Model1) Sim1 = mean(Confidences1) B2 = Find_Nearest_Neighbors(X2, B) Model2 = train(X2, B2) Confidences2 = classify(X1,

Model2) Sim2 = mean(Confidences2) Similarity = (Sim1+Sim2)/2

54

Thank you!

Software available:http://www.cs.tau.ac.il/~wolf

http://www.openu.ac.il/home/hassner/projects/Ossk/

Using classifiers to compute similarities between face images. Prof. Lior Wolf, Tel-Aviv University

Technology

Transcript of Using classifiers to compute similarities between face images. Prof. Lior Wolf, Tel-Aviv University