Post on 30-May-2020
Lecture 13 -Silvio Savarese 20-Feb-14
• Announcements
Lecture 13Visual recognition
Lecture 13 -Silvio Savarese 20-Feb-14
• Object classification bag of words models •Discriminative methods•Generative methods
• Object classification by PCA and FLD
Lecture 13Visual recognition
Variability due to:
• View point
• Illumination
• Occlusions
• Intra-class variability
Challenges
Challenges: intra-class variation
Basic properties
• Representation
– How to represent an object category; which classification scheme?
• Learning
– How to learn the classifier, given training data
• Recognition
– How the classifier is to be used on novel data
definition of “BoW”
– Histogram of visual words (codewords)
codewords dictionary
category
decision
Representation
feature detection
& representation
codewords dictionary
image representation
category models
(and/or) classifiers
recognitionle
arn
ing
20-Feb-14
•Discriminative methods•Nearest neighbors•Linear classifier•SVM
•Generative methods
Classification
SVM classification
Model spacecategory models
Class 1 Class N
… …
…
w
SVM classification
Model space
w
Query image
Winning class: pink
Caltech 101
Caltech 101
BOW
~15%
Major drawback of BOW models
Don’t capture spatial information!
Spatial Pyramid MatchingBeyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories. S. Lazebnik, C. Schmid, and J. Ponce.. 2006
N
i
ihihhhI1
2121 ))(),(min(),(),(2
1),(),( 212121 hhIhhIhhSPM
Caltech 101
Pyramid matching
Caltech 101
Discriminative models
Support Vector Machines
Guyon, Vapnik, Heisele,
Serre, Poggio…
Boosting
Viola, Jones 2001,
Torralba et al. 2004,
Opelt et al. 2006,…
106 examples
Nearest neighbor
Shakhnarovich, Viola, Darrell 2003
Berg, Berg, Malik 2005...
Neural networks
Slide adapted from Antonio Torralba
Courtesy of Vittorio Ferrari
Slide credit: Kristen Grauman
Latent SVM
Structural SVM
Felzenszwalb 00
Ramanan 03…
LeCun, Bottou, Bengio, Haffner 1998
Rowley, Baluja, Kanade 1998
…
Random forests
Lecture 13 -Silvio Savarese 20-Feb-14
• Object classification bag of words models •Discriminative methods•Generative methods
• Object classification by PCA and FLD
Lecture 13Visual recognition
Image classification
)|( imagezebrap
)( ezebra|imagnop
vs.
)(
)(
)|(
)|(
)|(
)|(
zebranop
zebrap
zebranoimagep
zebraimagep
imagezebranop
imagezebrap
• Bayes rule:• Bayes rule:
)(
)(
)|(
)|(
)|(
)|(
zebranop
zebrap
zebranoimagep
zebraimagep
imagezebranop
imagezebrap
posterior ratio likelihood ratio prior ratio
Zebra
Non-zebra
Decision
boundary
)|(
)|(
imagezebranop
imagezebrap
• Modeling the posterior ratio:
Discriminative methods
Generative methods
)|( imagezebrap
)( ezebra|imagnop
vs.
)(
)(
)|(
)|(
)|(
)|(
zebranop
zebrap
zebranoimagep
zebraimagep
imagezebranop
imagezebrap
• Bayes rule:• Bayes rule:
)(
)(
)|(
)|(
)|(
)|(
zebranop
zebrap
zebranoimagep
zebraimagep
imagezebranop
imagezebrap
posterior ratio likelihood ratio prior ratio
Generative models
1. Naïve Bayes classifier– Csurka Bray, Dance & Fan, 2004
2. Hierarchical Bayesian text models (pLSA and LDA)
– Background: Hoffman 2001, Blei, Ng & Jordan, 2004
– Object categorization: Sivic et al. 2005, Sudderth et al. 2005
– Natural scene categorization: Fei-Fei et al. 2005
• w: a collection of all N codewords in the
image
w = [w1,w2,…,wN]
• c: category of the image
Some notations
the Naïve Bayes model
)c|w(p)c(p~
Prior prob. of
the object classes
Image likelihood
given the class
)w|c(p
w
N
c
)|,,()( 1 cwwpcp N
the Naïve Bayes model
)c|w(p)c(p~
Prior prob. of
the object classes
Image likelihood
given the class
)w|c(p )|,,()( 1 cwwpcp N
N
1i
i )c|w(p
• Assume that each feature (codewords) is conditionally
independent given the class
)c|w,,w(p N1
N
n
n cwpcp1
)|()(
Likelihood of nth visual
word given the class
the Naïve Bayes model
)|( 1cwp i
)|( 2cwp i
)c|w(p)c(p~
Prior prob. of
the object classes
Image likelihood
given the class
)w|c(p )|,,()( 1 cwwpcp N
N
n
n cwpcp1
)|()(
Likelihood of nth visual
word given the class
Example:
2 classes:
bananas vs oranges
Histogram of colors
Wi = number of pixels colored in
yellow in the image
x-axis: percentage of pixel that are
colored in yellow in the image
75%50%25%
the Naïve Bayes model
)|( 1cwp i
)|( 2cwp i
)c|w(p)c(p~
Prior prob. of
the object classes
Image likelihood
given the class
)w|c(p )|,,()( 1 cwwpcp N
N
n
n cwpcp1
)|()(
Likelihood of nth visual
word given the class
• How do we learn P(wi|cj)?
• From empirical
frequencies of code words
in images from a given
class
Classification/Recognition
N
n
n cwpcp1
)|()(
Object class
decision
)|( wcpc
c maxarg
)|( 1cwp i
)|( 2cwp i
Example:
2 classes:
bananas vs oranges
Query image contains a banana
Look at how many pixels are
yellow: say 60%
Look at corresponding likelihood
values given the two class
hypotheses banana!60%
Summary: Generative models
• Naïve Bayes
– Unigram models in document analysis
– Assumes conditional independence of words given class
– Parameter estimation: frequency counting
Csurka et al. 2004
Csurka’s dataset – 7 classes
E = 28%
E = 15%
Generative vs discriminative
• Discriminative methods– Computationally efficient & fast
• Generative models– Convenient for weakly- or un-supervised,
incremental training
– Prior information
– Flexibility in modeling parameters
• All have equal probability for bag-of-words methods
• Location information is important
• No rigorous geometric information of the object components
• Segmentation and localization unclear
Weakness of BoW the models
Lecture 13 -Silvio Savarese 20-Feb-14
• Object classification bag of words models •Discriminative methods•Generative methods
• Object classification by PCA and FLD
Lecture 13Visual recognition
– Principle Component Analysis (PCA)
– Linear Discriminant Analysis (LDA)
Originally introduced for faces:
• Eigenfaces and Fisherfaces
Object classification by…
Turk & Penland, 91
Belhumeur et al.,
The Space of images or histograms
• An image (or histogram) H is a point in a high
dimensional space
– An N x M image is a point in RNM
[Thanks to Chuck Dyer, Steve Seitz, Nishino]
• H in the possible set are highly correlated.
• So, compress them to a low-dimensional subspace that
captures key appearance characteristics of the visual DOFs.
Key Idea
}x̂{
• USE PCA for estimating the sub-space (dimensionality reduction)
• Compare two objects by projecting the images into the
subspace and measuring the EUCLIDEAN distance
between them.
EIGENFACES: [Turk and Pentland 91]
Image
space
• Computes n-dim subspace such that the projection of the data points
onto the subspace has the largest variance among all n-dim subspaces.
Face space
• Maximize the scatter of the training images in face space
x1
x2
1 2
3
4
5
6
x1
x2
1 2
3
4
5
6
X1’
PCA projection
USE PCA for estimating the sub-space
• Computes n-dim subspace such that the projection of the data points
onto the subspace has the largest variance among all n-dim subspaces.
x1
x2
1st principal component
2rd principal component
USE PCA for estimating the sub-space
Orthonormal
PCA Mathematical Formulation
Define a transformation, W,
m-dimensional n-dimensional
= Data Scatter matrixT
j
N
1jjT )xx)(xx(S
N...2,1jxWy j
T
j
PCA = eigenvalue decomposition of a data covariance matrix
= Transf. data scatter matrix
Eigenvectors of ST
WSW)yy)(yy(S~
T
TT
j
N
1jjT
Measure data scatter
][ 21 mvvv
Image
spaceFace space
v3
v1
v2 v4
Projecting onto the Eigenfaces
• The eigenfaces v1, ..., vK span the space of faces
– A face is converted to eigenface coordinates by
Algorithm
1. Align training images x1, x2, …, xN
2. Compute average face x = 1/N Σ xi
3. Compute the difference image xi – x
Training
Note that each image is formulated into a long vector!
Testing1. Take query image X
2. Project X into Eigenface space (W = {eigenfaces})
and compute projection ωi
3. Compare projection ωi with all training N projections ai
4. Compute the covariance matrix (total scatter matrix)
5. Compute the eigenvectors of the covariance matrix ST
6. Compute training projections a1, a2... aN
Algorithm
T
j
N
1jjT )xx)(xx(S
Illustration of Eigenfaces
These are the first 4 eigenvectors from a training set of
400 images (ORL Face Database).
• The visualization of eigenvectors:
Eigenfaces look somewhat like generic faces.
• Only selecting the top P eigenfaces reduces the
dimensionality.
• Fewer eigenfaces result in more information loss, and hence
less discrimination between faces.
Reconstruction and Errors
P = 4
P = 200
P = 400
Summary for Eigenface
Pros
• Non-iterative, globally optimal solution
Limitations
•PCA projection is optimal for reconstruction
from a low dimensional basis, but may NOT be
optimal for discrimination…
PCA-SIFT: A More Distinctive Representation for Local Image Descriptors -
Y Ke, R Sukthankar - IEEE CVPR 04
Extensions
•PCA-SIFT
• Generalized PCA: R. Vidal, Y. Ma, and S. Sastry. Generalized Principal Component Analysis (GPCA). IEEE
Transactions on Pattern Analysis and Machine Intelligence, volume 27, number 12, pages 1 - 15,
2005.
• Tensor Faces:
"Multilinear Analysis of Image Ensembles: TensorFaces," M.A.O. Vasilescu, D. Terzopoulos, Proc. 7th
European Conference on Computer Vision (ECCV'02), Copenhagen, Denmark, May, 2002
Linear Discriminant Analysis (LDA)
Fisher’s Linear Discriminant (FLD)
• Eigenfaces exploit the max scatter of the
training images in face space
• Fisherfaces attempt to maximise the between
class scatter, while minimising the within
class scatter.
Illustration of the Projection
Poor Projection
x1
x2
x1
x2
Using two classes as example:
Good
Variables
• N Sample images:
• c classes:
• Average of each class:
• Total average:
Nxx ,,1
c ,,1
ikx
k
i
i xN
1
N
kkx
N 1
1
Scatters
• Scatter of class i: Tikx
iki xxSik
c
iiW SS
1
c
i
T
iiiBS1
BWT SSS
• Within class scatter:
• Between class scatter:
• Total scatter:
Illustration
2S
1S
BS
21 SSSW
x1
x2Within class scatter
Between class scatter
Tikx
iki xxSik
c
iiW SS
1
c
i
T
iiiBS1
Mathematical Formulation (1)
• After projection:
• Between class scatter (of y’s):
• Within class scatter (of y’s):
kT
k xWy
WSWS BT
B ~
WSWS WT
W ~
Illustration
2
~S
1
~S
BS~
21
~~~SSSW
x1
x2
kT
k xWy
WSWS BT
B ~
WSWS WT
W ~
c
iiW SS
1
c
i
T
iiiBS1
Mathematical Formulation
• The desired projection:
WSW
WSW
S
SW
WT
BT
W
B
optWW
max arg~
~
max arg
miwSwS iWiiB ,,1
• How is it found ? Generalized Eigenvectors
• If Sw has full rank, the generalized eigenvectors are eigenvectors of SW
-1 SB with largest eigen-values
Results: Eigenface vs. Fisherface
• Variation in Facial Expression, Eyewear, and Lighting
• Input: 160 images of 16 people
• Train: 159 images
• Test: 1 image
With glasses
Without glasses
3 Lighting conditions
5 expressions
Err
or
rate
%Results: Eigenface vs. Fisherface
Next lecture
• Object detection