Lecture 13 Visual recognition - Silvio Savarese...Lecture 13 Visual recognition –Principle...

Lecture 13 -Silvio Savarese 20-Feb-14

• Announcements

Lecture 13Visual recognition

• Object classification bag of words models •Discriminative methods•Generative methods

• Object classification by PCA and FLD

Variability due to:

• View point

• Illumination

• Occlusions

• Intra-class variability

Challenges

Challenges: intra-class variation

Basic properties

• Representation

– How to represent an object category; which classification scheme?

• Learning

– How to learn the classifier, given training data

• Recognition

– How the classifier is to be used on novel data

definition of “BoW”

– Histogram of visual words (codewords)

codewords dictionary

category

decision

Representation

feature detection

& representation

codewords dictionary

image representation

category models

(and/or) classifiers

recognitionle

20-Feb-14

•Discriminative methods•Nearest neighbors•Linear classifier•SVM

•Generative methods

Classification

SVM classification

Model spacecategory models

Class 1 Class N

… …

SVM classification

Model space

Query image

Winning class: pink

Caltech 101

Major drawback of BOW models

Don’t capture spatial information!

Spatial Pyramid MatchingBeyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories. S. Lazebnik, C. Schmid, and J. Ponce.. 2006

ihihhhI1

2121 ))(),(min(),(),(2

1),(),( 212121 hhIhhIhhSPM

Caltech 101

Pyramid matching

Caltech 101

Discriminative models

Support Vector Machines

Guyon, Vapnik, Heisele,

Serre, Poggio…

Boosting

Viola, Jones 2001,

Torralba et al. 2004,

Opelt et al. 2006,…

106 examples

Nearest neighbor

Shakhnarovich, Viola, Darrell 2003

Berg, Berg, Malik 2005...

Neural networks

Slide adapted from Antonio Torralba

Courtesy of Vittorio Ferrari

Slide credit: Kristen Grauman

Latent SVM

Structural SVM

Felzenszwalb 00

Ramanan 03…

LeCun, Bottou, Bengio, Haffner 1998

Rowley, Baluja, Kanade 1998

Random forests

Image classification

)|( imagezebrap

)( ezebra|imagnop

zebranop

zebrap

zebranoimagep

zebraimagep

imagezebranop

imagezebrap

• Bayes rule:• Bayes rule:

zebranop

zebrap

zebranoimagep

zebraimagep

imagezebranop

imagezebrap

posterior ratio likelihood ratio prior ratio

Non-zebra

Decision

boundary

imagezebranop

imagezebrap

• Modeling the posterior ratio:

Discriminative methods

Generative methods

)|( imagezebrap

)( ezebra|imagnop

zebranop

zebrap

zebranoimagep

zebraimagep

imagezebranop

imagezebrap

• Bayes rule:• Bayes rule:

zebranop

zebrap

zebranoimagep

zebraimagep

imagezebranop

imagezebrap

posterior ratio likelihood ratio prior ratio

Generative models

1. Naïve Bayes classifier– Csurka Bray, Dance & Fan, 2004

2. Hierarchical Bayesian text models (pLSA and LDA)

– Background: Hoffman 2001, Blei, Ng & Jordan, 2004

– Object categorization: Sivic et al. 2005, Sudderth et al. 2005

– Natural scene categorization: Fei-Fei et al. 2005

• w: a collection of all N codewords in the

w = [w1,w2,…,wN]

• c: category of the image

Some notations

the Naïve Bayes model

)c|w(p)c(p~

Prior prob. of

the object classes

Image likelihood

given the class

)w|c(p

)|,,()( 1 cwwpcp N

)c|w(p)c(p~

Prior prob. of

the object classes

Image likelihood

given the class

)w|c(p )|,,()( 1 cwwpcp N

i )c|w(p

• Assume that each feature (codewords) is conditionally

independent given the class

)c|w,,w(p N1

n cwpcp1

Likelihood of nth visual

word given the class

)|( 1cwp i

)|( 2cwp i

)c|w(p)c(p~

Prior prob. of

the object classes

Image likelihood

given the class

n cwpcp1

Example:

2 classes:

bananas vs oranges

Histogram of colors

Wi = number of pixels colored in

yellow in the image

x-axis: percentage of pixel that are

colored in yellow in the image

75%50%25%

)|( 1cwp i

)|( 2cwp i

)c|w(p)c(p~

Prior prob. of

the object classes

Image likelihood

given the class

n cwpcp1

• How do we learn P(wi|cj)?

• From empirical

frequencies of code words

in images from a given

Classification/Recognition

n cwpcp1

Object class

decision

)|( wcpc

c maxarg

)|( 1cwp i

)|( 2cwp i

Example:

2 classes:

bananas vs oranges

Query image contains a banana

Look at how many pixels are

yellow: say 60%

Look at corresponding likelihood

values given the two class

hypotheses banana!60%

Summary: Generative models

• Naïve Bayes

– Unigram models in document analysis

– Assumes conditional independence of words given class

– Parameter estimation: frequency counting

Csurka et al. 2004

Csurka’s dataset – 7 classes

E = 28%

E = 15%

Generative vs discriminative

• Discriminative methods– Computationally efficient & fast

• Generative models– Convenient for weakly- or un-supervised,

incremental training

– Prior information

– Flexibility in modeling parameters

• All have equal probability for bag-of-words methods

• Location information is important

• No rigorous geometric information of the object components

• Segmentation and localization unclear

Weakness of BoW the models

– Principle Component Analysis (PCA)

– Linear Discriminant Analysis (LDA)

Originally introduced for faces:

• Eigenfaces and Fisherfaces

Object classification by…

Turk & Penland, 91

Belhumeur et al.,

The Space of images or histograms

• An image (or histogram) H is a point in a high

dimensional space

– An N x M image is a point in RNM

[Thanks to Chuck Dyer, Steve Seitz, Nishino]

• H in the possible set are highly correlated.

• So, compress them to a low-dimensional subspace that

captures key appearance characteristics of the visual DOFs.

Key Idea

• USE PCA for estimating the sub-space (dimensionality reduction)

• Compare two objects by projecting the images into the

subspace and measuring the EUCLIDEAN distance

between them.

EIGENFACES: [Turk and Pentland 91]

• Computes n-dim subspace such that the projection of the data points

onto the subspace has the largest variance among all n-dim subspaces.

Face space

• Maximize the scatter of the training images in face space

PCA projection

USE PCA for estimating the sub-space

• Computes n-dim subspace such that the projection of the data points

onto the subspace has the largest variance among all n-dim subspaces.

1st principal component

2rd principal component

USE PCA for estimating the sub-space

Orthonormal

PCA Mathematical Formulation

Define a transformation, W,

m-dimensional n-dimensional

= Data Scatter matrixT

1jjT )xx)(xx(S

N...2,1jxWy j

PCA = eigenvalue decomposition of a data covariance matrix

= Transf. data scatter matrix

Eigenvectors of ST

WSW)yy)(yy(S~

Measure data scatter

][ 21 mvvv

spaceFace space

Projecting onto the Eigenfaces

• The eigenfaces v1, ..., vK span the space of faces

– A face is converted to eigenface coordinates by

Algorithm

1. Align training images x1, x2, …, xN

2. Compute average face x = 1/N Σ xi

3. Compute the difference image xi – x

Training

Note that each image is formulated into a long vector!

Testing1. Take query image X

2. Project X into Eigenface space (W = {eigenfaces})

and compute projection ωi

3. Compare projection ωi with all training N projections ai

4. Compute the covariance matrix (total scatter matrix)

5. Compute the eigenvectors of the covariance matrix ST

6. Compute training projections a1, a2... aN

Algorithm

1jjT )xx)(xx(S

Illustration of Eigenfaces

These are the first 4 eigenvectors from a training set of

400 images (ORL Face Database).

• The visualization of eigenvectors:

Eigenfaces look somewhat like generic faces.

• Only selecting the top P eigenfaces reduces the

dimensionality.

• Fewer eigenfaces result in more information loss, and hence

less discrimination between faces.

Reconstruction and Errors

P = 200

P = 400

Summary for Eigenface

• Non-iterative, globally optimal solution

Limitations

•PCA projection is optimal for reconstruction

from a low dimensional basis, but may NOT be

optimal for discrimination…

PCA-SIFT: A More Distinctive Representation for Local Image Descriptors -

Y Ke, R Sukthankar - IEEE CVPR 04

Extensions

•PCA-SIFT

• Generalized PCA: R. Vidal, Y. Ma, and S. Sastry. Generalized Principal Component Analysis (GPCA). IEEE

Transactions on Pattern Analysis and Machine Intelligence, volume 27, number 12, pages 1 - 15,

• Tensor Faces:

"Multilinear Analysis of Image Ensembles: TensorFaces," M.A.O. Vasilescu, D. Terzopoulos, Proc. 7th

European Conference on Computer Vision (ECCV'02), Copenhagen, Denmark, May, 2002

Linear Discriminant Analysis (LDA)

Fisher’s Linear Discriminant (FLD)

• Eigenfaces exploit the max scatter of the

training images in face space

• Fisherfaces attempt to maximise the between

class scatter, while minimising the within

class scatter.

Illustration of the Projection

Poor Projection

Using two classes as example:

Variables

• N Sample images:

• c classes:

• Average of each class:

• Total average:

Nxx ,,1

Scatters

• Scatter of class i: Tikx

iki xxSik

iiW SS

iiiBS1

BWT SSS

• Within class scatter:

• Between class scatter:

• Total scatter:

Illustration

21 SSSW

x2Within class scatter

Between class scatter

iki xxSik

iiW SS

iiiBS1

Mathematical Formulation (1)

• After projection:

• Between class scatter (of y’s):

• Within class scatter (of y’s):

WSWS BT

WSWS WT

Illustration

~~~SSSW

WSWS BT

WSWS WT

iiW SS

iiiBS1

Mathematical Formulation

• The desired projection:

max arg~

max arg

miwSwS iWiiB ,,1

• How is it found ? Generalized Eigenvectors

• If Sw has full rank, the generalized eigenvectors are eigenvectors of SW

-1 SB with largest eigen-values

Results: Eigenface vs. Fisherface

• Variation in Facial Expression, Eyewear, and Lighting

• Input: 160 images of 16 people

• Train: 159 images

• Test: 1 image

With glasses

Without glasses

3 Lighting conditions

5 expressions

%Results: Eigenface vs. Fisherface

Next lecture

• Object detection

Lecture 13 Visual recognition - Silvio Savarese...Lecture 13 Visual recognition –Principle...

Documents

Transcript of Lecture 13 Visual recognition - Silvio Savarese...Lecture 13 Visual recognition –Principle...

Visual word recognition rules vs. pattern recognition and memory retrieval

Ralph James Savarese, Grinnell College

Visual word recognition of multisyllabic words Word Recognition of... · Visual word recognition Multisyllabic words Megastudies ... and the consistency of individual graphemes were

Visual pattern recognition

Audio-Visual Speech Analysis & Recognition

10.Savarese Meta Analysis

FM 3-01.80 Visual Aircraft Recognition

Visual pattern recognition in robotics

Audio Visual Speech Recognition

Continuous Audio-Visual Speech Recognition

Cognitive Stages in Visual Recognition: Visual …psych.nyu.edu/pelli/papers/kertesz1986agnosia.pdfCognitive Stages in Visual Recognition: Visual Agnosia Andrew Kertesz St. Joseph

Large-scale visual recognition - Inria

CURRICULUM VITAE Michael Savarese February 19, 2019 ...€¦ · Savarese CV 1 CURRICULUM VITAE Michael Savarese February 19, 2019 ... • Facilitator and planner of numerous oyster

Visual Word Recognition Strategies

target acquisition through visual recognition

Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14 •Descriptors (wrapping up) •An introduction to recognition •Image classification

Local feature descriptors for visual recognition

Visual Object Recognition

Analysis of Large Scale Visual Recognition

Steve Savarese, Executive Director