Lecture 13 Visual recognition - Silvio Savarese...Lecture 13 Visual recognition –Principle...

Post on 30-May-2020

6 views 0 download

Transcript of Lecture 13 Visual recognition - Silvio Savarese...Lecture 13 Visual recognition –Principle...

Lecture 13 -Silvio Savarese 20-Feb-14

• Announcements

Lecture 13Visual recognition

Lecture 13 -Silvio Savarese 20-Feb-14

• Object classification bag of words models •Discriminative methods•Generative methods

• Object classification by PCA and FLD

Lecture 13Visual recognition

Variability due to:

• View point

• Illumination

• Occlusions

• Intra-class variability

Challenges

Challenges: intra-class variation

Basic properties

• Representation

– How to represent an object category; which classification scheme?

• Learning

– How to learn the classifier, given training data

• Recognition

– How the classifier is to be used on novel data

definition of “BoW”

– Histogram of visual words (codewords)

codewords dictionary

category

decision

Representation

feature detection

& representation

codewords dictionary

image representation

category models

(and/or) classifiers

recognitionle

arn

ing

20-Feb-14

•Discriminative methods•Nearest neighbors•Linear classifier•SVM

•Generative methods

Classification

SVM classification

Model spacecategory models

Class 1 Class N

… …

w

SVM classification

Model space

w

Query image

Winning class: pink

Caltech 101

Caltech 101

BOW

~15%

Major drawback of BOW models

Don’t capture spatial information!

Spatial Pyramid MatchingBeyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories. S. Lazebnik, C. Schmid, and J. Ponce.. 2006

N

i

ihihhhI1

2121 ))(),(min(),(),(2

1),(),( 212121 hhIhhIhhSPM

Caltech 101

Pyramid matching

Caltech 101

Discriminative models

Support Vector Machines

Guyon, Vapnik, Heisele,

Serre, Poggio…

Boosting

Viola, Jones 2001,

Torralba et al. 2004,

Opelt et al. 2006,…

106 examples

Nearest neighbor

Shakhnarovich, Viola, Darrell 2003

Berg, Berg, Malik 2005...

Neural networks

Slide adapted from Antonio Torralba

Courtesy of Vittorio Ferrari

Slide credit: Kristen Grauman

Latent SVM

Structural SVM

Felzenszwalb 00

Ramanan 03…

LeCun, Bottou, Bengio, Haffner 1998

Rowley, Baluja, Kanade 1998

Random forests

Lecture 13 -Silvio Savarese 20-Feb-14

• Object classification bag of words models •Discriminative methods•Generative methods

• Object classification by PCA and FLD

Lecture 13Visual recognition

Image classification

)|( imagezebrap

)( ezebra|imagnop

vs.

)(

)(

)|(

)|(

)|(

)|(

zebranop

zebrap

zebranoimagep

zebraimagep

imagezebranop

imagezebrap

• Bayes rule:• Bayes rule:

)(

)(

)|(

)|(

)|(

)|(

zebranop

zebrap

zebranoimagep

zebraimagep

imagezebranop

imagezebrap

posterior ratio likelihood ratio prior ratio

Zebra

Non-zebra

Decision

boundary

)|(

)|(

imagezebranop

imagezebrap

• Modeling the posterior ratio:

Discriminative methods

Generative methods

)|( imagezebrap

)( ezebra|imagnop

vs.

)(

)(

)|(

)|(

)|(

)|(

zebranop

zebrap

zebranoimagep

zebraimagep

imagezebranop

imagezebrap

• Bayes rule:• Bayes rule:

)(

)(

)|(

)|(

)|(

)|(

zebranop

zebrap

zebranoimagep

zebraimagep

imagezebranop

imagezebrap

posterior ratio likelihood ratio prior ratio

Generative models

1. Naïve Bayes classifier– Csurka Bray, Dance & Fan, 2004

2. Hierarchical Bayesian text models (pLSA and LDA)

– Background: Hoffman 2001, Blei, Ng & Jordan, 2004

– Object categorization: Sivic et al. 2005, Sudderth et al. 2005

– Natural scene categorization: Fei-Fei et al. 2005

• w: a collection of all N codewords in the

image

w = [w1,w2,…,wN]

• c: category of the image

Some notations

the Naïve Bayes model

)c|w(p)c(p~

Prior prob. of

the object classes

Image likelihood

given the class

)w|c(p

w

N

c

)|,,()( 1 cwwpcp N

the Naïve Bayes model

)c|w(p)c(p~

Prior prob. of

the object classes

Image likelihood

given the class

)w|c(p )|,,()( 1 cwwpcp N

N

1i

i )c|w(p

• Assume that each feature (codewords) is conditionally

independent given the class

)c|w,,w(p N1

N

n

n cwpcp1

)|()(

Likelihood of nth visual

word given the class

the Naïve Bayes model

)|( 1cwp i

)|( 2cwp i

)c|w(p)c(p~

Prior prob. of

the object classes

Image likelihood

given the class

)w|c(p )|,,()( 1 cwwpcp N

N

n

n cwpcp1

)|()(

Likelihood of nth visual

word given the class

Example:

2 classes:

bananas vs oranges

Histogram of colors

Wi = number of pixels colored in

yellow in the image

x-axis: percentage of pixel that are

colored in yellow in the image

75%50%25%

the Naïve Bayes model

)|( 1cwp i

)|( 2cwp i

)c|w(p)c(p~

Prior prob. of

the object classes

Image likelihood

given the class

)w|c(p )|,,()( 1 cwwpcp N

N

n

n cwpcp1

)|()(

Likelihood of nth visual

word given the class

• How do we learn P(wi|cj)?

• From empirical

frequencies of code words

in images from a given

class

Classification/Recognition

N

n

n cwpcp1

)|()(

Object class

decision

)|( wcpc

c maxarg

)|( 1cwp i

)|( 2cwp i

Example:

2 classes:

bananas vs oranges

Query image contains a banana

Look at how many pixels are

yellow: say 60%

Look at corresponding likelihood

values given the two class

hypotheses banana!60%

Summary: Generative models

• Naïve Bayes

– Unigram models in document analysis

– Assumes conditional independence of words given class

– Parameter estimation: frequency counting

Csurka et al. 2004

Csurka’s dataset – 7 classes

E = 28%

E = 15%

Generative vs discriminative

• Discriminative methods– Computationally efficient & fast

• Generative models– Convenient for weakly- or un-supervised,

incremental training

– Prior information

– Flexibility in modeling parameters

• All have equal probability for bag-of-words methods

• Location information is important

• No rigorous geometric information of the object components

• Segmentation and localization unclear

Weakness of BoW the models

Lecture 13 -Silvio Savarese 20-Feb-14

• Object classification bag of words models •Discriminative methods•Generative methods

• Object classification by PCA and FLD

Lecture 13Visual recognition

– Principle Component Analysis (PCA)

– Linear Discriminant Analysis (LDA)

Originally introduced for faces:

• Eigenfaces and Fisherfaces

Object classification by…

Turk & Penland, 91

Belhumeur et al.,

The Space of images or histograms

• An image (or histogram) H is a point in a high

dimensional space

– An N x M image is a point in RNM

[Thanks to Chuck Dyer, Steve Seitz, Nishino]

• H in the possible set are highly correlated.

• So, compress them to a low-dimensional subspace that

captures key appearance characteristics of the visual DOFs.

Key Idea

}x̂{

• USE PCA for estimating the sub-space (dimensionality reduction)

• Compare two objects by projecting the images into the

subspace and measuring the EUCLIDEAN distance

between them.

EIGENFACES: [Turk and Pentland 91]

Image

space

• Computes n-dim subspace such that the projection of the data points

onto the subspace has the largest variance among all n-dim subspaces.

Face space

• Maximize the scatter of the training images in face space

x1

x2

1 2

3

4

5

6

x1

x2

1 2

3

4

5

6

X1’

PCA projection

USE PCA for estimating the sub-space

• Computes n-dim subspace such that the projection of the data points

onto the subspace has the largest variance among all n-dim subspaces.

x1

x2

1st principal component

2rd principal component

USE PCA for estimating the sub-space

Orthonormal

PCA Mathematical Formulation

Define a transformation, W,

m-dimensional n-dimensional

= Data Scatter matrixT

j

N

1jjT )xx)(xx(S

N...2,1jxWy j

T

j

PCA = eigenvalue decomposition of a data covariance matrix

= Transf. data scatter matrix

Eigenvectors of ST

WSW)yy)(yy(S~

T

TT

j

N

1jjT

Measure data scatter

][ 21 mvvv

Image

spaceFace space

v3

v1

v2 v4

Projecting onto the Eigenfaces

• The eigenfaces v1, ..., vK span the space of faces

– A face is converted to eigenface coordinates by

Algorithm

1. Align training images x1, x2, …, xN

2. Compute average face x = 1/N Σ xi

3. Compute the difference image xi – x

Training

Note that each image is formulated into a long vector!

Testing1. Take query image X

2. Project X into Eigenface space (W = {eigenfaces})

and compute projection ωi

3. Compare projection ωi with all training N projections ai

4. Compute the covariance matrix (total scatter matrix)

5. Compute the eigenvectors of the covariance matrix ST

6. Compute training projections a1, a2... aN

Algorithm

T

j

N

1jjT )xx)(xx(S

Illustration of Eigenfaces

These are the first 4 eigenvectors from a training set of

400 images (ORL Face Database).

• The visualization of eigenvectors:

Eigenfaces look somewhat like generic faces.

• Only selecting the top P eigenfaces reduces the

dimensionality.

• Fewer eigenfaces result in more information loss, and hence

less discrimination between faces.

Reconstruction and Errors

P = 4

P = 200

P = 400

Summary for Eigenface

Pros

• Non-iterative, globally optimal solution

Limitations

•PCA projection is optimal for reconstruction

from a low dimensional basis, but may NOT be

optimal for discrimination…

PCA-SIFT: A More Distinctive Representation for Local Image Descriptors -

Y Ke, R Sukthankar - IEEE CVPR 04

Extensions

•PCA-SIFT

• Generalized PCA: R. Vidal, Y. Ma, and S. Sastry. Generalized Principal Component Analysis (GPCA). IEEE

Transactions on Pattern Analysis and Machine Intelligence, volume 27, number 12, pages 1 - 15,

2005.

• Tensor Faces:

"Multilinear Analysis of Image Ensembles: TensorFaces," M.A.O. Vasilescu, D. Terzopoulos, Proc. 7th

European Conference on Computer Vision (ECCV'02), Copenhagen, Denmark, May, 2002

Linear Discriminant Analysis (LDA)

Fisher’s Linear Discriminant (FLD)

• Eigenfaces exploit the max scatter of the

training images in face space

• Fisherfaces attempt to maximise the between

class scatter, while minimising the within

class scatter.

Illustration of the Projection

Poor Projection

x1

x2

x1

x2

Using two classes as example:

Good

Variables

• N Sample images:

• c classes:

• Average of each class:

• Total average:

Nxx ,,1

c ,,1

ikx

k

i

i xN

1

N

kkx

N 1

1

Scatters

• Scatter of class i: Tikx

iki xxSik

c

iiW SS

1

c

i

T

iiiBS1

BWT SSS

• Within class scatter:

• Between class scatter:

• Total scatter:

Illustration

2S

1S

BS

21 SSSW

x1

x2Within class scatter

Between class scatter

Tikx

iki xxSik

c

iiW SS

1

c

i

T

iiiBS1

Mathematical Formulation (1)

• After projection:

• Between class scatter (of y’s):

• Within class scatter (of y’s):

kT

k xWy

WSWS BT

B ~

WSWS WT

W ~

Illustration

2

~S

1

~S

BS~

21

~~~SSSW

x1

x2

kT

k xWy

WSWS BT

B ~

WSWS WT

W ~

c

iiW SS

1

c

i

T

iiiBS1

Mathematical Formulation

• The desired projection:

WSW

WSW

S

SW

WT

BT

W

B

optWW

max arg~

~

max arg

miwSwS iWiiB ,,1

• How is it found ? Generalized Eigenvectors

• If Sw has full rank, the generalized eigenvectors are eigenvectors of SW

-1 SB with largest eigen-values

Results: Eigenface vs. Fisherface

• Variation in Facial Expression, Eyewear, and Lighting

• Input: 160 images of 16 people

• Train: 159 images

• Test: 1 image

With glasses

Without glasses

3 Lighting conditions

5 expressions

Err

or

rate

%Results: Eigenface vs. Fisherface

Next lecture

• Object detection