Machine Learning in Machine Vision - ERNETval.serc.iisc.ernet.in/DAV/ML_in_Vision.pdf · Machine...

93
Machine Learning in Machine Vision R. Venkatesh Babu Video Analytics Lab, SERC Indian Institute of Science, Bangalore

Transcript of Machine Learning in Machine Vision - ERNETval.serc.iisc.ernet.in/DAV/ML_in_Vision.pdf · Machine...

Page 1: Machine Learning in Machine Vision - ERNETval.serc.iisc.ernet.in/DAV/ML_in_Vision.pdf · Machine Learning in Machine Vision R. Venkatesh Babu Video Analytics Lab, SERC ... H. Schneiderman,

Machine Learning in Machine

Vision

R. Venkatesh Babu

Video Analytics Lab, SERC

Indian Institute of Science, Bangalore

Page 2: Machine Learning in Machine Vision - ERNETval.serc.iisc.ernet.in/DAV/ML_in_Vision.pdf · Machine Learning in Machine Vision R. Venkatesh Babu Video Analytics Lab, SERC ... H. Schneiderman,

Can Machines Replace Human?

Semantic Gap

How do we interpret image data?

Page 3: Machine Learning in Machine Vision - ERNETval.serc.iisc.ernet.in/DAV/ML_in_Vision.pdf · Machine Learning in Machine Vision R. Venkatesh Babu Video Analytics Lab, SERC ... H. Schneiderman,

What is an Image?

What do we see?

Page 4: Machine Learning in Machine Vision - ERNETval.serc.iisc.ernet.in/DAV/ML_in_Vision.pdf · Machine Learning in Machine Vision R. Venkatesh Babu Video Analytics Lab, SERC ... H. Schneiderman,

What is an Image?

What do machines see?

Page 5: Machine Learning in Machine Vision - ERNETval.serc.iisc.ernet.in/DAV/ML_in_Vision.pdf · Machine Learning in Machine Vision R. Venkatesh Babu Video Analytics Lab, SERC ... H. Schneiderman,

Semantic Gap

Page 6: Machine Learning in Machine Vision - ERNETval.serc.iisc.ernet.in/DAV/ML_in_Vision.pdf · Machine Learning in Machine Vision R. Venkatesh Babu Video Analytics Lab, SERC ... H. Schneiderman,

Organization

• Machine Vision – Challenges

• Discriminative and Generative Approaches

• ML Applications in Vision

• Deep Learning • Inspiration from Neuroscience

• Deep Architecture

• Applications

Page 7: Machine Learning in Machine Vision - ERNETval.serc.iisc.ernet.in/DAV/ML_in_Vision.pdf · Machine Learning in Machine Vision R. Venkatesh Babu Video Analytics Lab, SERC ... H. Schneiderman,

Machine Vision -

Challenges

Page 8: Machine Learning in Machine Vision - ERNETval.serc.iisc.ernet.in/DAV/ML_in_Vision.pdf · Machine Learning in Machine Vision R. Venkatesh Babu Video Analytics Lab, SERC ... H. Schneiderman,

Challenges 1: view point variation

Michelangelo 1475-1564

Page 9: Machine Learning in Machine Vision - ERNETval.serc.iisc.ernet.in/DAV/ML_in_Vision.pdf · Machine Learning in Machine Vision R. Venkatesh Babu Video Analytics Lab, SERC ... H. Schneiderman,

Challenges 2: illumination

slide credit: S. Ullman

Page 10: Machine Learning in Machine Vision - ERNETval.serc.iisc.ernet.in/DAV/ML_in_Vision.pdf · Machine Learning in Machine Vision R. Venkatesh Babu Video Analytics Lab, SERC ... H. Schneiderman,

Challenges 3: occlusion

Magritte, 1957

Page 11: Machine Learning in Machine Vision - ERNETval.serc.iisc.ernet.in/DAV/ML_in_Vision.pdf · Machine Learning in Machine Vision R. Venkatesh Babu Video Analytics Lab, SERC ... H. Schneiderman,

Challenges 4: scale

slide by Fei Fei, Fergus & Torralba

Page 12: Machine Learning in Machine Vision - ERNETval.serc.iisc.ernet.in/DAV/ML_in_Vision.pdf · Machine Learning in Machine Vision R. Venkatesh Babu Video Analytics Lab, SERC ... H. Schneiderman,

Challenges 5: deformation

Xu, Beihong 1943

Page 13: Machine Learning in Machine Vision - ERNETval.serc.iisc.ernet.in/DAV/ML_in_Vision.pdf · Machine Learning in Machine Vision R. Venkatesh Babu Video Analytics Lab, SERC ... H. Schneiderman,

Challenges 6: background clutter

Klimt, 1913

Page 14: Machine Learning in Machine Vision - ERNETval.serc.iisc.ernet.in/DAV/ML_in_Vision.pdf · Machine Learning in Machine Vision R. Venkatesh Babu Video Analytics Lab, SERC ... H. Schneiderman,

Challenges 7: object intra-class variation

slide by Fei-Fei, Fergus & Torralba

Page 15: Machine Learning in Machine Vision - ERNETval.serc.iisc.ernet.in/DAV/ML_in_Vision.pdf · Machine Learning in Machine Vision R. Venkatesh Babu Video Analytics Lab, SERC ... H. Schneiderman,

Object Categorization Discriminative model p(Object | image)

Generative models p(image | Object)

Slides from: Fei-Fei Li

Page 16: Machine Learning in Machine Vision - ERNETval.serc.iisc.ernet.in/DAV/ML_in_Vision.pdf · Machine Learning in Machine Vision R. Venkatesh Babu Video Analytics Lab, SERC ... H. Schneiderman,

Discriminative

Page 17: Machine Learning in Machine Vision - ERNETval.serc.iisc.ernet.in/DAV/ML_in_Vision.pdf · Machine Learning in Machine Vision R. Venkatesh Babu Video Analytics Lab, SERC ... H. Schneiderman,

Generative

p(image | zebra) p(image | no zebra)

Page 18: Machine Learning in Machine Vision - ERNETval.serc.iisc.ernet.in/DAV/ML_in_Vision.pdf · Machine Learning in Machine Vision R. Venkatesh Babu Video Analytics Lab, SERC ... H. Schneiderman,

Object Detection Pipeline

Object Representation Which features are suitable for the task

Learning

Which machine learning algorithm to choose

Page 19: Machine Learning in Machine Vision - ERNETval.serc.iisc.ernet.in/DAV/ML_in_Vision.pdf · Machine Learning in Machine Vision R. Venkatesh Babu Video Analytics Lab, SERC ... H. Schneiderman,

Bag-of-words Approach

Page 20: Machine Learning in Machine Vision - ERNETval.serc.iisc.ernet.in/DAV/ML_in_Vision.pdf · Machine Learning in Machine Vision R. Venkatesh Babu Video Analytics Lab, SERC ... H. Schneiderman,

Features

Pixels

Texture

Color Histograms

SIFT/SURF

HoG …

Requirements: Invariance to challenges (illumination, scale,

orientation …), computational and memory burden

Page 21: Machine Learning in Machine Vision - ERNETval.serc.iisc.ernet.in/DAV/ML_in_Vision.pdf · Machine Learning in Machine Vision R. Venkatesh Babu Video Analytics Lab, SERC ... H. Schneiderman,

Machine Learning Algorithms

Nearest Neighbor

Naïve Bayes

ANN

SVM

Ada- Boost

CNN …

Page 22: Machine Learning in Machine Vision - ERNETval.serc.iisc.ernet.in/DAV/ML_in_Vision.pdf · Machine Learning in Machine Vision R. Venkatesh Babu Video Analytics Lab, SERC ... H. Schneiderman,

Face Detection

Neural Network-Based Face Detection

Rowley, Baluja and Kanade, PAMI ’98

Object Detection Using the Statistics of Parts

H. Schneiderman, & T. Kanade, CVPR’00, IJCV’04

Robust Real-time Object Detection

Paul Viola and Michael Jones (IJCV’04)

Page 23: Machine Learning in Machine Vision - ERNETval.serc.iisc.ernet.in/DAV/ML_in_Vision.pdf · Machine Learning in Machine Vision R. Venkatesh Babu Video Analytics Lab, SERC ... H. Schneiderman,

Neural Network-Based Face

Detection

(Henry A. Rowley, Shumeet Baluja, and Takeo Kanade, PAMI ‘98)

Page 24: Machine Learning in Machine Vision - ERNETval.serc.iisc.ernet.in/DAV/ML_in_Vision.pdf · Machine Learning in Machine Vision R. Venkatesh Babu Video Analytics Lab, SERC ... H. Schneiderman,

System

Stage 1: Applies a set of neural network-based filters to an

image.

The filters examine each location in the image at several scales,

Stage 2: Uses an arbitrator to combine the outputs

Merges detections from individual filters and eliminates

overlapping detections.

Page 25: Machine Learning in Machine Vision - ERNETval.serc.iisc.ernet.in/DAV/ML_in_Vision.pdf · Machine Learning in Machine Vision R. Venkatesh Babu Video Analytics Lab, SERC ... H. Schneiderman,

Overview

Page 26: Machine Learning in Machine Vision - ERNETval.serc.iisc.ernet.in/DAV/ML_in_Vision.pdf · Machine Learning in Machine Vision R. Venkatesh Babu Video Analytics Lab, SERC ... H. Schneiderman,

Detection Time

#NWs: Two networks

•Image Size: 320 x 240 pixel image

• 246,766 (20x20) windows

•Machine : 200 MHz R4400 SGI Indigo 2

•Time Taken: 383 seconds (approx) ( > 6mins!)

Page 27: Machine Learning in Machine Vision - ERNETval.serc.iisc.ernet.in/DAV/ML_in_Vision.pdf · Machine Learning in Machine Vision R. Venkatesh Babu Video Analytics Lab, SERC ... H. Schneiderman,

Object Detection Using the Statistics of Parts H. Schneiderman, & T. Kanade, CVPR’00, IJCV’04

Page 28: Machine Learning in Machine Vision - ERNETval.serc.iisc.ernet.in/DAV/ML_in_Vision.pdf · Machine Learning in Machine Vision R. Venkatesh Babu Video Analytics Lab, SERC ... H. Schneiderman,

Object Detection Using the Statistics of

Parts

•Represent appearance statistics as a product of histogram

•Each histogram represents the joint statistics of a subset of

wavelet coefficients and their position on the object.

•Use many such histograms representing a wide variety of visual

attributes

Page 29: Machine Learning in Machine Vision - ERNETval.serc.iisc.ernet.in/DAV/ML_in_Vision.pdf · Machine Learning in Machine Vision R. Venkatesh Babu Video Analytics Lab, SERC ... H. Schneiderman,

Number of orientations

Face – 2

Cars – 8

Page 30: Machine Learning in Machine Vision - ERNETval.serc.iisc.ernet.in/DAV/ML_in_Vision.pdf · Machine Learning in Machine Vision R. Venkatesh Babu Video Analytics Lab, SERC ... H. Schneiderman,

There are too many parameters to learn

)(

)(?

)|,...,(

)|,...,(

1?)()|,...,(

)()|,...,(

1?),...,|(

),...,|(

1

1

1

1

1

1

ObjectP

ObjectP

ObjectxxP

ObjectxxP

ObjectPObjectxxP

ObjectPObjectxxP

xxObjectP

xxObjectP

n

n

n

n

n

n

Bayes optimal classifier

Image is defined by n attrs: x1,x2,…,xn

SE 263 R. Venkatesh Babu

Page 31: Machine Learning in Machine Vision - ERNETval.serc.iisc.ernet.in/DAV/ML_in_Vision.pdf · Machine Learning in Machine Vision R. Venkatesh Babu Video Analytics Lab, SERC ... H. Schneiderman,

Reported results for faces

Kodak dataset: Test set: 17 images, 46 faces, 36 profile views.

ϒ=λ2

SE 263 R. Venkatesh Babu

Page 32: Machine Learning in Machine Vision - ERNETval.serc.iisc.ernet.in/DAV/ML_in_Vision.pdf · Machine Learning in Machine Vision R. Venkatesh Babu Video Analytics Lab, SERC ... H. Schneiderman,

A bigger dataset From multiple sources 208 images, 441 faces, about 347

profiles.

Page 33: Machine Learning in Machine Vision - ERNETval.serc.iisc.ernet.in/DAV/ML_in_Vision.pdf · Machine Learning in Machine Vision R. Venkatesh Babu Video Analytics Lab, SERC ... H. Schneiderman,

Robust Real-time Object Detection Paul Viola and Michael Jones (IJCV’04)

Integral Image with Haar Features

Training via AdaBoost

Speed-up through Attentional cascades

Page 34: Machine Learning in Machine Vision - ERNETval.serc.iisc.ernet.in/DAV/ML_in_Vision.pdf · Machine Learning in Machine Vision R. Venkatesh Babu Video Analytics Lab, SERC ... H. Schneiderman,

Integral Image

The integral image at location (x,y), is the sum

of the pixel values above and to the left of (x,y),

inclusive.

Page 35: Machine Learning in Machine Vision - ERNETval.serc.iisc.ernet.in/DAV/ML_in_Vision.pdf · Machine Learning in Machine Vision R. Venkatesh Babu Video Analytics Lab, SERC ... H. Schneiderman,

Rapid evaluation of rectangular

features

Using the integral image

representation one can compute the

value of any rectangular sum in

constant time.

For example the integral sum inside

rectangle D we can compute as:

ii(4) + ii(1) – ii(2) – ii(3)

As a result two-, three-, and four-rectangular features can be computed with 6, 8 and 9 array

references respectively.

Page 36: Machine Learning in Machine Vision - ERNETval.serc.iisc.ernet.in/DAV/ML_in_Vision.pdf · Machine Learning in Machine Vision R. Venkatesh Babu Video Analytics Lab, SERC ... H. Schneiderman,

Haar Features 3 rectangular features types:

• two-rectangle feature type

(horizontal/vertical)

• three-rectangle feature type

• four-rectangle feature type

Using a 24x24 pixel base detection window, with all the possible

combination of horizontal and vertical location and scale of these feature

types the full set of features has 49,396 features.

The motivation behind using rectangular features, as opposed to more

expressive steerable filters is due to their extreme computational efficiency.

Page 37: Machine Learning in Machine Vision - ERNETval.serc.iisc.ernet.in/DAV/ML_in_Vision.pdf · Machine Learning in Machine Vision R. Venkatesh Babu Video Analytics Lab, SERC ... H. Schneiderman,

Scanning at many Scales

At base scale objects are detected at 24x24 size

Scanned at 11 scales with a factor of 1.25 (24x24, 30x30, 38x38,

47x47 ….)

Conventional Approach:

• Compute a pyramid of 11 images, each 1.25 times

smaller than the previous

• Requires significant time (< 15fps)

Page 38: Machine Learning in Machine Vision - ERNETval.serc.iisc.ernet.in/DAV/ML_in_Vision.pdf · Machine Learning in Machine Vision R. Venkatesh Babu Video Analytics Lab, SERC ... H. Schneiderman,

AdaBoost: Intuition

39 K. Grauman, B. Leibe

Figure adapted from Freund and Schapire

Consider a 2-d feature

space with positive and

negative examples.

Each weak classifier splits

the training examples with

at least 50% accuracy.

Examples misclassified by

a previous weak learner

are given more emphasis

at future rounds.

Page 39: Machine Learning in Machine Vision - ERNETval.serc.iisc.ernet.in/DAV/ML_in_Vision.pdf · Machine Learning in Machine Vision R. Venkatesh Babu Video Analytics Lab, SERC ... H. Schneiderman,

40 K. Grauman, B. Leibe

AdaBoost: Intuition

Page 40: Machine Learning in Machine Vision - ERNETval.serc.iisc.ernet.in/DAV/ML_in_Vision.pdf · Machine Learning in Machine Vision R. Venkatesh Babu Video Analytics Lab, SERC ... H. Schneiderman,

41 K. Grauman, B. Leibe

AdaBoost: Intuition

Page 41: Machine Learning in Machine Vision - ERNETval.serc.iisc.ernet.in/DAV/ML_in_Vision.pdf · Machine Learning in Machine Vision R. Venkatesh Babu Video Analytics Lab, SERC ... H. Schneiderman,

AdaBoost Algorithm Start with uniform

weights on training

examples

Evaluate weighted

error for each

feature, pick best.

Incorrectly classified -> more weight

Correctly classified -> less weight

Final classifier is combination of the weak ones,

weighted according to error they had.

Freund & Schapire 1995

{x1,…xn}

Page 42: Machine Learning in Machine Vision - ERNETval.serc.iisc.ernet.in/DAV/ML_in_Vision.pdf · Machine Learning in Machine Vision R. Venkatesh Babu Video Analytics Lab, SERC ... H. Schneiderman,

Boosting Example

Page 43: Machine Learning in Machine Vision - ERNETval.serc.iisc.ernet.in/DAV/ML_in_Vision.pdf · Machine Learning in Machine Vision R. Venkatesh Babu Video Analytics Lab, SERC ... H. Schneiderman,

First classifier

Page 44: Machine Learning in Machine Vision - ERNETval.serc.iisc.ernet.in/DAV/ML_in_Vision.pdf · Machine Learning in Machine Vision R. Venkatesh Babu Video Analytics Lab, SERC ... H. Schneiderman,

First 2 classifiers

Page 45: Machine Learning in Machine Vision - ERNETval.serc.iisc.ernet.in/DAV/ML_in_Vision.pdf · Machine Learning in Machine Vision R. Venkatesh Babu Video Analytics Lab, SERC ... H. Schneiderman,

First 3 classifiers

Page 46: Machine Learning in Machine Vision - ERNETval.serc.iisc.ernet.in/DAV/ML_in_Vision.pdf · Machine Learning in Machine Vision R. Venkatesh Babu Video Analytics Lab, SERC ... H. Schneiderman,

Final Classifier learned by Boosting

-0.42-0.65+0.92 = -0.15

-0.42+0.65+0.92 = 1.15

Page 47: Machine Learning in Machine Vision - ERNETval.serc.iisc.ernet.in/DAV/ML_in_Vision.pdf · Machine Learning in Machine Vision R. Venkatesh Babu Video Analytics Lab, SERC ... H. Schneiderman,

Recall: Perceptron Operation Equations of “thresholded” operation:

= 1 (if w1x1 +… wd xd + wd+1 > 0)

o(x1, x2,…, xd-1, xd)

= -1 (otherwise)

Page 48: Machine Learning in Machine Vision - ERNETval.serc.iisc.ernet.in/DAV/ML_in_Vision.pdf · Machine Learning in Machine Vision R. Venkatesh Babu Video Analytics Lab, SERC ... H. Schneiderman,

Performance of 200 feature face

detector The ROC curve of the constructed classifies

indicates that a reasonable detection rate of 0.95

can be achieved while maintaining an extremely

low false positive rate of approximately 10-4 (1 in

14084).

• First features selected by AdaBoost are meaningful and have high

discriminative power

• By varying the threshold of the final classifier one can construct a

two-feature classifier which has a detection rate of 1 and a false

positive rate of 0.4.

•Requires 0.7 sec to scan 384x288 image !

Page 49: Machine Learning in Machine Vision - ERNETval.serc.iisc.ernet.in/DAV/ML_in_Vision.pdf · Machine Learning in Machine Vision R. Venkatesh Babu Video Analytics Lab, SERC ... H. Schneiderman,

Speed-up through the Attentional

Cascade • Simple, boosted classifiers can reject many of negative sub-windows

while detecting all positive instances.

• Series of such simple classifiers can achieve good detection

performance while eliminating the need for further processing of

negative sub-windows.

more difficult examples faced by deeper classifiers

Page 50: Machine Learning in Machine Vision - ERNETval.serc.iisc.ernet.in/DAV/ML_in_Vision.pdf · Machine Learning in Machine Vision R. Venkatesh Babu Video Analytics Lab, SERC ... H. Schneiderman,

Single Vs Cascade Classifier

The Cascaded

Classifier is

nearly

10 times faster!

Page 51: Machine Learning in Machine Vision - ERNETval.serc.iisc.ernet.in/DAV/ML_in_Vision.pdf · Machine Learning in Machine Vision R. Venkatesh Babu Video Analytics Lab, SERC ... H. Schneiderman,

Experiments (dataset for training)

4916 positive training example

were hand picked aligned,

normalized, and scaled to a base

resolution of 24x24

10,000 negative examples were

selected by randomly picking sub-

windows from 9500 images which

did not contain faces

Page 52: Machine Learning in Machine Vision - ERNETval.serc.iisc.ernet.in/DAV/ML_in_Vision.pdf · Machine Learning in Machine Vision R. Venkatesh Babu Video Analytics Lab, SERC ... H. Schneiderman,

Results cont.

Page 53: Machine Learning in Machine Vision - ERNETval.serc.iisc.ernet.in/DAV/ML_in_Vision.pdf · Machine Learning in Machine Vision R. Venkatesh Babu Video Analytics Lab, SERC ... H. Schneiderman,

More Detection Examples

Page 54: Machine Learning in Machine Vision - ERNETval.serc.iisc.ernet.in/DAV/ML_in_Vision.pdf · Machine Learning in Machine Vision R. Venkatesh Babu Video Analytics Lab, SERC ... H. Schneiderman,

Practical implementation

Details discussed in Viola-Jones paper

•Training time = weeks (with 5k faces and 9.5k non-faces)

•Final detector has 32 layers in the cascade, 4297 features

•700 Mhz Pentium III processor :

Can process a 384 x 288 image in 0.067 seconds (in 2002

when paper was written)

Page 55: Machine Learning in Machine Vision - ERNETval.serc.iisc.ernet.in/DAV/ML_in_Vision.pdf · Machine Learning in Machine Vision R. Venkatesh Babu Video Analytics Lab, SERC ... H. Schneiderman,

Ensemble Tracking Shai Avidan – CVPR 05

(Adaboost in Tracking)

Page 56: Machine Learning in Machine Vision - ERNETval.serc.iisc.ernet.in/DAV/ML_in_Vision.pdf · Machine Learning in Machine Vision R. Venkatesh Babu Video Analytics Lab, SERC ... H. Schneiderman,

Object Localization

Ensemble of weak learners is used to create a per-pixel

confidence map

Optimal location found by mean shift algorithm

Ensemble is updated in new location

Page 57: Machine Learning in Machine Vision - ERNETval.serc.iisc.ernet.in/DAV/ML_in_Vision.pdf · Machine Learning in Machine Vision R. Venkatesh Babu Video Analytics Lab, SERC ... H. Schneiderman,

Weak Classifiers Linear classifiers are used as weak classifiers

Find the best hyperplane to separate data

Strong classifier calculated using AdaBoost

Determines weights of each weak classifier

Trains iteratively on “harder” examples

Page 58: Machine Learning in Machine Vision - ERNETval.serc.iisc.ernet.in/DAV/ML_in_Vision.pdf · Machine Learning in Machine Vision R. Venkatesh Babu Video Analytics Lab, SERC ... H. Schneiderman,

Experimental Results

Page 59: Machine Learning in Machine Vision - ERNETval.serc.iisc.ernet.in/DAV/ML_in_Vision.pdf · Machine Learning in Machine Vision R. Venkatesh Babu Video Analytics Lab, SERC ... H. Schneiderman,

SVMs in Machine Vision

Ensemble of Exemplar-SVMs for Object

Detection and Beyond (Malisiewicz et al.,

ICCV’11)

Page 60: Machine Learning in Machine Vision - ERNETval.serc.iisc.ernet.in/DAV/ML_in_Vision.pdf · Machine Learning in Machine Vision R. Venkatesh Babu Video Analytics Lab, SERC ... H. Schneiderman,

Discriminative Object Detectors

Linear SVM on HOG

Hard-Negative Mining

Sliding Window Detection

Page 61: Machine Learning in Machine Vision - ERNETval.serc.iisc.ernet.in/DAV/ML_in_Vision.pdf · Machine Learning in Machine Vision R. Venkatesh Babu Video Analytics Lab, SERC ... H. Schneiderman,

Exemplar SVMs

Learn a separate linear SVM for each instance

(exemplar) in the dataset

Page 62: Machine Learning in Machine Vision - ERNETval.serc.iisc.ernet.in/DAV/ML_in_Vision.pdf · Machine Learning in Machine Vision R. Venkatesh Babu Video Analytics Lab, SERC ... H. Schneiderman,
Page 63: Machine Learning in Machine Vision - ERNETval.serc.iisc.ernet.in/DAV/ML_in_Vision.pdf · Machine Learning in Machine Vision R. Venkatesh Babu Video Analytics Lab, SERC ... H. Schneiderman,

Exemplar SVM

Advantages: we can use different features for each exemplar

Adapt features to each exemplar’s aspect ratio

Page 64: Machine Learning in Machine Vision - ERNETval.serc.iisc.ernet.in/DAV/ML_in_Vision.pdf · Machine Learning in Machine Vision R. Venkatesh Babu Video Analytics Lab, SERC ... H. Schneiderman,

Ensemble of Exemplar SVMs

Page 65: Machine Learning in Machine Vision - ERNETval.serc.iisc.ernet.in/DAV/ML_in_Vision.pdf · Machine Learning in Machine Vision R. Venkatesh Babu Video Analytics Lab, SERC ... H. Schneiderman,

Results

Page 66: Machine Learning in Machine Vision - ERNETval.serc.iisc.ernet.in/DAV/ML_in_Vision.pdf · Machine Learning in Machine Vision R. Venkatesh Babu Video Analytics Lab, SERC ... H. Schneiderman,

Image Parsing

Tighe et al., Finding Things: Image Parsing with Regions and Per-Exemplar Detectors,

CVPR’13

Page 67: Machine Learning in Machine Vision - ERNETval.serc.iisc.ernet.in/DAV/ML_in_Vision.pdf · Machine Learning in Machine Vision R. Venkatesh Babu Video Analytics Lab, SERC ... H. Schneiderman,

Results

Page 68: Machine Learning in Machine Vision - ERNETval.serc.iisc.ernet.in/DAV/ML_in_Vision.pdf · Machine Learning in Machine Vision R. Venkatesh Babu Video Analytics Lab, SERC ... H. Schneiderman,

Representation Learning

using CNNs

Video Analytics Lab, SERC, IISc

Page 69: Machine Learning in Machine Vision - ERNETval.serc.iisc.ernet.in/DAV/ML_in_Vision.pdf · Machine Learning in Machine Vision R. Venkatesh Babu Video Analytics Lab, SERC ... H. Schneiderman,

Why Deep Learning??

❖ To learn feature hierarchies

❖ In Vision

➢Mainly for recognition

➢But, is being applied in almost all the vision

tasks

Page 70: Machine Learning in Machine Vision - ERNETval.serc.iisc.ernet.in/DAV/ML_in_Vision.pdf · Machine Learning in Machine Vision R. Venkatesh Babu Video Analytics Lab, SERC ... H. Schneiderman,

Conventional Recognition approach

Hand designed

feature extraction Trainable classifier Object

Class

Features are not learned

Image/Video

Pixels

Page 71: Machine Learning in Machine Vision - ERNETval.serc.iisc.ernet.in/DAV/ML_in_Vision.pdf · Machine Learning in Machine Vision R. Venkatesh Babu Video Analytics Lab, SERC ... H. Schneiderman,

Conventional Recognition approach

❖ Classifiers are often generic

❖ Features are key to progress in recognition until now

❖ Multitude of hand-designed features

➢ SIFT, HOG, LBP, MSER, Color-SIFT etc.

Page 72: Machine Learning in Machine Vision - ERNETval.serc.iisc.ernet.in/DAV/ML_in_Vision.pdf · Machine Learning in Machine Vision R. Venkatesh Babu Video Analytics Lab, SERC ... H. Schneiderman,

But, Why learn features ??

❖ Better performance

❖ Other new domains (unclear how to hand engineer)

➢ Kinect

➢ Video

➢ Multi spectral

❖ Feature computation time

Page 73: Machine Learning in Machine Vision - ERNETval.serc.iisc.ernet.in/DAV/ML_in_Vision.pdf · Machine Learning in Machine Vision R. Venkatesh Babu Video Analytics Lab, SERC ... H. Schneiderman,

Deep Learning??

Learning

multiple levels of representation and abstraction

that help to make sense of data

such as images, sound, and text.

Page 74: Machine Learning in Machine Vision - ERNETval.serc.iisc.ernet.in/DAV/ML_in_Vision.pdf · Machine Learning in Machine Vision R. Venkatesh Babu Video Analytics Lab, SERC ... H. Schneiderman,

Hierarchical Structure of Visual Cortex

N. Kruger et al.

Page 75: Machine Learning in Machine Vision - ERNETval.serc.iisc.ernet.in/DAV/ML_in_Vision.pdf · Machine Learning in Machine Vision R. Venkatesh Babu Video Analytics Lab, SERC ... H. Schneiderman,

Lateral Geniculate Nucleus (LGN)

Page 76: Machine Learning in Machine Vision - ERNETval.serc.iisc.ernet.in/DAV/ML_in_Vision.pdf · Machine Learning in Machine Vision R. Venkatesh Babu Video Analytics Lab, SERC ... H. Schneiderman,

Primary Visual Cortex (V1)

David Hubel and Torsten Wiesel won the Nobel prize for discovering

the functional organization and basic physiology of neurons in V1.

• Simple Cells

• Complex Cells

• Hypercomplex Cells

Page 77: Machine Learning in Machine Vision - ERNETval.serc.iisc.ernet.in/DAV/ML_in_Vision.pdf · Machine Learning in Machine Vision R. Venkatesh Babu Video Analytics Lab, SERC ... H. Schneiderman,

Simple Cell: Hubel-Wiesel Model

Page 78: Machine Learning in Machine Vision - ERNETval.serc.iisc.ernet.in/DAV/ML_in_Vision.pdf · Machine Learning in Machine Vision R. Venkatesh Babu Video Analytics Lab, SERC ... H. Schneiderman,

Complex Cell

Page 79: Machine Learning in Machine Vision - ERNETval.serc.iisc.ernet.in/DAV/ML_in_Vision.pdf · Machine Learning in Machine Vision R. Venkatesh Babu Video Analytics Lab, SERC ... H. Schneiderman,

Deep Architecture

Theoretical:

“Many functions can be much more efficiently represented with deeper

architectures…” [Bengio & LeCun 2007]

fl takes as input a datum xl and parameter set wl and outputs xl+1

Page 80: Machine Learning in Machine Vision - ERNETval.serc.iisc.ernet.in/DAV/ML_in_Vision.pdf · Machine Learning in Machine Vision R. Venkatesh Babu Video Analytics Lab, SERC ... H. Schneiderman,

Learning a Hierarchy of Feature

Extractors

❖ Each layer extracts features from output of previous layer

❖ All the way from pixels to classifier

❖ Layers have (nearly) the same structure

❖ Train all layers jointly

layer 1 Layer 2 Layer 3 Simple

Classifier

Image/Video

Pixels

Page 81: Machine Learning in Machine Vision - ERNETval.serc.iisc.ernet.in/DAV/ML_in_Vision.pdf · Machine Learning in Machine Vision R. Venkatesh Babu Video Analytics Lab, SERC ... H. Schneiderman,

Learning a Hierarchy of Feature

Extractors

❖ Stack multiple stages of simple cells / complex cells layers

❖ Higher stages compute more global, more invariant features

❖ Classification layer on top

Page 82: Machine Learning in Machine Vision - ERNETval.serc.iisc.ernet.in/DAV/ML_in_Vision.pdf · Machine Learning in Machine Vision R. Venkatesh Babu Video Analytics Lab, SERC ... H. Schneiderman,

Natural progression from

low level to high level structures.

Can share the lower-level

representations for multiple tasks.

Deep architectures can be

representationally efficient.

Page 83: Machine Learning in Machine Vision - ERNETval.serc.iisc.ernet.in/DAV/ML_in_Vision.pdf · Machine Learning in Machine Vision R. Venkatesh Babu Video Analytics Lab, SERC ... H. Schneiderman,

Typical CNN Operations

❖ Filtering (Convolution)

❖ Contrast Normalization

❖ Local Pooling (Sub-sampling)

Page 84: Machine Learning in Machine Vision - ERNETval.serc.iisc.ernet.in/DAV/ML_in_Vision.pdf · Machine Learning in Machine Vision R. Venkatesh Babu Video Analytics Lab, SERC ... H. Schneiderman,

2D Convolution

Image from http://developer.amd.com

Page 85: Machine Learning in Machine Vision - ERNETval.serc.iisc.ernet.in/DAV/ML_in_Vision.pdf · Machine Learning in Machine Vision R. Venkatesh Babu Video Analytics Lab, SERC ... H. Schneiderman,

Image Convolution / Filtering

❖ Convolutional

➢ Translation equivariance

➢ Tied filter weights

(same at each position: few

parameters)

Feature Maps

Page 86: Machine Learning in Machine Vision - ERNETval.serc.iisc.ernet.in/DAV/ML_in_Vision.pdf · Machine Learning in Machine Vision R. Venkatesh Babu Video Analytics Lab, SERC ... H. Schneiderman,

Translation Equivariance

❖ Input translation results in translation of features

➢ Fewer filters needed: no translated replications

➢ But still need to cover orientation/frequency

Convolutional FIlters

Page 87: Machine Learning in Machine Vision - ERNETval.serc.iisc.ernet.in/DAV/ML_in_Vision.pdf · Machine Learning in Machine Vision R. Venkatesh Babu Video Analytics Lab, SERC ... H. Schneiderman,

CNN: Convolution in 3D

Image from http://deeplearning.net

Page 88: Machine Learning in Machine Vision - ERNETval.serc.iisc.ernet.in/DAV/ML_in_Vision.pdf · Machine Learning in Machine Vision R. Venkatesh Babu Video Analytics Lab, SERC ... H. Schneiderman,

Normalization

❖ Contrast normalization

➢ Across feature maps or within the maps

❖ Each feature is scaled by

❖ α and β are parameters, n: size of the local region

❖ Induces local competition between features to explain input

Page 89: Machine Learning in Machine Vision - ERNETval.serc.iisc.ernet.in/DAV/ML_in_Vision.pdf · Machine Learning in Machine Vision R. Venkatesh Babu Video Analytics Lab, SERC ... H. Schneiderman,

Local Pooling

Images by Zhu et al. and http://vaaaaaanquish.hatenablog.com

Page 90: Machine Learning in Machine Vision - ERNETval.serc.iisc.ernet.in/DAV/ML_in_Vision.pdf · Machine Learning in Machine Vision R. Venkatesh Babu Video Analytics Lab, SERC ... H. Schneiderman,

Pooling

❖ Spatial Pooling

❖ Non-overlapping / overlapping regions

❖ Sum or max

❖ In-variance to small transformations

Sum Max

Page 91: Machine Learning in Machine Vision - ERNETval.serc.iisc.ernet.in/DAV/ML_in_Vision.pdf · Machine Learning in Machine Vision R. Venkatesh Babu Video Analytics Lab, SERC ... H. Schneiderman,

Example Nets

Page 92: Machine Learning in Machine Vision - ERNETval.serc.iisc.ernet.in/DAV/ML_in_Vision.pdf · Machine Learning in Machine Vision R. Venkatesh Babu Video Analytics Lab, SERC ... H. Schneiderman,

CNN Applications

❖ Image recognition, speech recognition, photo taggers

❖ Have won several competitions

➢ ImageNet, Kaggle Facial Expression and Multimodal Learning,

German Traffic Signs, Connectomics, Handwriting etc.

❖ Applicable to array data where nearby values are correlated

➢ Images, sound, time-frequency representations, video, volumetric

images, RGB-Depth images etc.

❖ Reading Text in the Wild

❖ One of the few models that can be trained purely supervised

Page 93: Machine Learning in Machine Vision - ERNETval.serc.iisc.ernet.in/DAV/ML_in_Vision.pdf · Machine Learning in Machine Vision R. Venkatesh Babu Video Analytics Lab, SERC ... H. Schneiderman,

Software Tools

Caffe: From Berkeley

Torch7: www.torch.ch

OverFeat: From NYU

Cuda-Convnet: http://code.google.com/p/cuda-convnet/

MatConvnet: CNNs for MATLAB

Theano:

http://deeplearning.net/software/theano/