Iccv2009 recognition and learning object categories p2 c02 - recognizing muliple objects in an...

Multiclass object detection

Context: objects appear in configurations

Generalization: objects share parts

How many categories?

Slide by Aude Oliva

“Muchas”

How many object categories are there?

Biederman 1987

How many categories?

• Probably this question is not even specific enough to have an answer

Which level of categorization is the right one?

Car is an object composed of: a few doors, four wheels (not all visible at all times), a roof, front lights, windshield

If you are thinking in buying a car, you might want to be a bit more specific aboutyour categorization level.

?

Entry-level categories(Jolicoeur, Gluck, Kosslyn 1984)

• Typical member of a basic-level category are categorized at the expected level

• Atypical members tend to be classified at a subordinate level.

A birdAn ostrich

We do not need to recognize the exact category

A new class can borrow information from similar categories

So, where is computer vision?

Well…

Multiclass object detectionthe not so early days

Multiclass object detectionthe not so early days

• Schneiderman-Kanade multiclass object detection

Using a set of independent binary classifiers was a common strategy:• Viola-Jones extension for dealing with rotations

- two cascades for each view

(a) One detector for each class

There is nothing wrong with this approach if you have access to lots of training data and you do not care about efficiency.

Generalizing Across Categories

Can we transfer knowledge from one object category to another?Slide by Erik Sudderth

Shared features• Is learning the object class 1000 easier than

learning the first?

• Can we transfer knowledge from one object to another?

• Are the shared properties interesting by themselves?

…

Multitask learningR. Caruana. Multitask Learning. ML 1997

“MTL improves generalization by leveraging the domain-specific information contained in the training signals of related tasks. It does this by training tasks in parallel while using a shared representation”.

vs.

Sejnowski & Rosenberg 1986; Hinton 1986; Le Cun et al. 1989; Suddarth & Kergosien 1990; Pratt et al. 1991; Sharkey & Sharkey 1992; …

Multitask learning

•horizontal location of doorknob •single or double door•horizontal location of doorway center •width of doorway•horizontal location of left door jamb

•horizontal location of right door jamb•width of left door jamb •width of right door jamb•horizontal location of left edge of door •horizontal location of right edge of door

Primary task: detect door knobs

Tasks used:

R. Caruana. Multitask Learning. ML 1997

Sharing invariancesS. Thrun.Is Learning the n-th Thing Any Easier Than Learning The First? NIPS 1996

Knowledge is transferred between tasks via a learned model of the invariances of the domain: object recognition is invariant to rotation, translation, scaling, lighting, … These invariances are common to all object recognition tasks.

Toy world

Without sharing

With sharing

Convolutional Neural Network

Translation invariance is already built into the network

The output neurons share all the intermediate levels

Le Cun et al, 98

Sharing transformationsMiller, E., Matsakis, N., and Viola, P. (2000). Learning from one example through

shared densities on transforms. In IEEE Computer Vision and Pattern Recognition.

Transformations are sharedand can be learnt from other tasks.

Sharing in constellation models

Pictorial StructuresFischler & Elschlager, IEEE Trans. Comp. 1973

Constellation ModelFei-Fei, Fergus, Perona, ICCV 2003

SVM DetectorsHeisele, Poggio, et. al., NIPS 2001

Model-Guided SegmentationMori, Ren, Efros, & Malik, CVPR 2004

Reusable Parts

Goal: Look for a vocabulary of edges that reduces the number of features.

Krempp, Geman, & Amit “Sequential Learning of Reusable Parts for Object Detection”. TR 2002

Num

ber o

f fea

ture

s

Number of classes

Examples of reused parts

Specific feature

Non-shared feature: this featureis too specific to faces.

pedestrian

chair

Traffic light

sign

face

Background class

Shared feature

shared feature

Additive models and boosting

Torralba, Murphy, Freeman. CVPR 2004. PAMI 2007

Screen detector

Car detector

Face detector

• Binary classifiers that share features:

Screen detector

Car detector

Face detector

• Independent binary classifiers:

50 training samples/class29 object classes2000 entries in the dictionary

Results averaged on 20 runsError bars = 80% interval


Shared features

Class-specific features

Generalization as a function of object similarities

12 viewpoints12 unrelated object classes

Number of training samples per class Number of training samples per class

Area

und

er R

OC

Area

und

er R

OC K = 2.1 K = 4.8


Opelt, Pinz, Zisserman, CVPR 2006

Efficiency Generalization

J. Shotton, A. Blake, R. Cipolla.Multi-Scale Categorical Object Recognition Using

Contour Fragments. In IEEETrans. on PAMI, 30(7):1270-1281, July 2008.

Sharing patches

• Bart and Ullman, 2004For a new class, use only features similar to features that where good for other classes:

Proposed Dog features

Some more references

• Baxter 1996• Caruana 1997• Schapire, Singer, 2000• Thrun, Pratt 1997• Krempp, Geman, Amit, 2002• E.L.Miller, Matsakis, Viola, 2000• Mahamud, Hebert, Lafferty, 2001• Fink et al. 2003, 2004• LeCun, Huang, Bottou, 2004• Holub, Welling, Perona, 2005• …

Modeling object relationships

The “guess what I am trying to detect” challenge

The detector challenge: by looking at the output of a detector on a random setof images, can you guess which object is it trying to detect?

What object is detector trying to detect?

The detector challenge: by looking at the output of a detector on a random setof images, can you guess which object is it trying to detect?

1. chair, 2. table, 3. road, 4. road, 5. table, 6. car, 7. keyboard.

The context challenge

How far can you go without using an object detector?

2

1

What are the hidden objects?

What are the hidden objects?

Chance ~ 1/30000

p(O | I) ap(I|O) p(O)

Object model Context model

imageobjects



Full jointScene model Aprox. joint



Full jointScene model Approx. joint



Full jointScene model

p(O) = S Pp(Oi|S=s) p(S=s)s i

Approx. joint

officestreet



Full jointScene model Approx. joint

Pixel labeling using MRFsEnforce consistency between neighboring labels,

and between labels and pixels

Carbonetto, de Freitas & Barnard, ECCV’04

Oi

Beyond nearest-neighbor grids

• Most MRF/CRF models assume nearest-neighbor graph topology

• This cannot capture long-distance correlations

Object-Object Relationships

Use latent variables to induce long distance correlations between labels in a Conditional Random Field (CRF)

He, Zemel & Carreira-Perpinan (04)


[Kumar Hebert 2005]

• Fink & Perona (NIPS 03)Use output of boosting from other objects at previous

iterations as input into boosting for this iteration


Objects in Context

Building,boat, motorbike

Building, boat, person

Water,sky

Road

Most consistent labeling according to object co-occurrences& locallabel probabilities.

Boat

Building

Water

Road

A. Rabinovich, A. Vedaldi, C. Galleguillos, E. Wiewiora and S. Belongie. Objects in Context. ICCV 2007

52

Objects in Context:

Contextual RefinementContextual model based on co-occurrencesTry to find the most consistent labeling with high posterior probability and high mean pairwise interaction.Use CRF for this purpose. Boat

Building

Water

Road

Independent segment classificationMean interaction of all label pairs

Φ(i,j) is basically the observed label co-occurrences in training set.

Slide by GokberkCinbis

Detecting difficult objects

Office Maybethere is a mouse

Start recognizing the scene

Torralba, Murphy, Freeman. NIPS 2004.

Detecting difficult objects

Detect first simple objects (reliable detectors) that provide strongcontextual constraints to the target (screen -> keyboard -> mouse)

Torralba, Murphy, Freeman. NIPS 2004.

BRF for car detection: topology

Torralba Murphy Freeman (2004)

BRF for car detection: results


A “car” out of context is less of a car

Car Building Road

b F G b F G b F G

From image

From detectors

Thresholded beliefs

Contextual object relationshipsCarbonetto, de Freitas & Barnard (2004) Kumar, Hebert (2005)


Fink & Perona (2003)E. Sudderth et al (2005)

Iccv2009 recognition and learning object categories p2 c02 - recognizing muliple objects in an...

Education

Transcript of Iccv2009 recognition and learning object categories p2 c02 - recognizing muliple objects in an...