Iccv2009 recognition and learning object categories p2 c02 - recognizing muliple objects in an...
description
Transcript of Iccv2009 recognition and learning object categories p2 c02 - recognizing muliple objects in an...
Multiclass object detection
Multiclass object detection
Context: objects appear in configurations
Generalization: objects share parts
How many categories?
Slide by Aude Oliva
“Muchas”
How many object categories are there?
Biederman 1987
How many categories?
• Probably this question is not even specific enough to have an answer
Which level of categorization is the right one?
Car is an object composed of: a few doors, four wheels (not all visible at all times), a roof, front lights, windshield
If you are thinking in buying a car, you might want to be a bit more specific aboutyour categorization level.
?
Entry-level categories(Jolicoeur, Gluck, Kosslyn 1984)
• Typical member of a basic-level category are categorized at the expected level
• Atypical members tend to be classified at a subordinate level.
A birdAn ostrich
We do not need to recognize the exact category
A new class can borrow information from similar categories
So, where is computer vision?
Well…
Multiclass object detectionthe not so early days
Multiclass object detectionthe not so early days
• Schneiderman-Kanade multiclass object detection
Using a set of independent binary classifiers was a common strategy:• Viola-Jones extension for dealing with rotations
- two cascades for each view
(a) One detector for each class
There is nothing wrong with this approach if you have access to lots of training data and you do not care about efficiency.
Generalizing Across Categories
Can we transfer knowledge from one object category to another?Slide by Erik Sudderth
Shared features• Is learning the object class 1000 easier than
learning the first?
• Can we transfer knowledge from one object to another?
• Are the shared properties interesting by themselves?
…
Multitask learningR. Caruana. Multitask Learning. ML 1997
“MTL improves generalization by leveraging the domain-specific information contained in the training signals of related tasks. It does this by training tasks in parallel while using a shared representation”.
vs.
Sejnowski & Rosenberg 1986; Hinton 1986; Le Cun et al. 1989; Suddarth & Kergosien 1990; Pratt et al. 1991; Sharkey & Sharkey 1992; …
Multitask learning
•horizontal location of doorknob •single or double door•horizontal location of doorway center •width of doorway•horizontal location of left door jamb
•horizontal location of right door jamb•width of left door jamb •width of right door jamb•horizontal location of left edge of door •horizontal location of right edge of door
Primary task: detect door knobs
Tasks used:
R. Caruana. Multitask Learning. ML 1997
Sharing invariancesS. Thrun.Is Learning the n-th Thing Any Easier Than Learning The First? NIPS 1996
Knowledge is transferred between tasks via a learned model of the invariances of the domain: object recognition is invariant to rotation, translation, scaling, lighting, … These invariances are common to all object recognition tasks.
Toy world
Without sharing
With sharing
Convolutional Neural Network
Translation invariance is already built into the network
The output neurons share all the intermediate levels
Le Cun et al, 98
Sharing transformationsMiller, E., Matsakis, N., and Viola, P. (2000). Learning from one example through
shared densities on transforms. In IEEE Computer Vision and Pattern Recognition.
Transformations are sharedand can be learnt from other tasks.
Sharing in constellation models
Pictorial StructuresFischler & Elschlager, IEEE Trans. Comp. 1973
Constellation ModelFei-Fei, Fergus, Perona, ICCV 2003
SVM DetectorsHeisele, Poggio, et. al., NIPS 2001
Model-Guided SegmentationMori, Ren, Efros, & Malik, CVPR 2004
Reusable Parts
Goal: Look for a vocabulary of edges that reduces the number of features.
Krempp, Geman, & Amit “Sequential Learning of Reusable Parts for Object Detection”. TR 2002
Num
ber o
f fea
ture
s
Number of classes
Examples of reused parts
Specific feature
Non-shared feature: this featureis too specific to faces.
pedestrian
chair
Traffic light
sign
face
Background class
Shared feature
shared feature
Additive models and boosting
Torralba, Murphy, Freeman. CVPR 2004. PAMI 2007
Screen detector
Car detector
Face detector
• Binary classifiers that share features:
Screen detector
Car detector
Face detector
• Independent binary classifiers:
50 training samples/class29 object classes2000 entries in the dictionary
Results averaged on 20 runsError bars = 80% interval
Torralba, Murphy, Freeman. CVPR 2004. PAMI 2007
Shared features
Class-specific features
Generalization as a function of object similarities
12 viewpoints12 unrelated object classes
Number of training samples per class Number of training samples per class
Area
und
er R
OC
Area
und
er R
OC K = 2.1 K = 4.8
Torralba, Murphy, Freeman. CVPR 2004. PAMI 2007
Opelt, Pinz, Zisserman, CVPR 2006
Efficiency Generalization
J. Shotton, A. Blake, R. Cipolla.Multi-Scale Categorical Object Recognition Using
Contour Fragments. In IEEETrans. on PAMI, 30(7):1270-1281, July 2008.
Sharing patches
• Bart and Ullman, 2004For a new class, use only features similar to features that where good for other classes:
Proposed Dog features
Some more references
• Baxter 1996• Caruana 1997• Schapire, Singer, 2000• Thrun, Pratt 1997• Krempp, Geman, Amit, 2002• E.L.Miller, Matsakis, Viola, 2000• Mahamud, Hebert, Lafferty, 2001• Fink et al. 2003, 2004• LeCun, Huang, Bottou, 2004• Holub, Welling, Perona, 2005• …
Modeling object relationships
The “guess what I am trying to detect” challenge
The detector challenge: by looking at the output of a detector on a random setof images, can you guess which object is it trying to detect?
What object is detector trying to detect?
The detector challenge: by looking at the output of a detector on a random setof images, can you guess which object is it trying to detect?
1. chair, 2. table, 3. road, 4. road, 5. table, 6. car, 7. keyboard.
The context challenge
How far can you go without using an object detector?
2
1
What are the hidden objects?
What are the hidden objects?
Chance ~ 1/30000
p(O | I) ap(I|O) p(O)
Object model Context model
imageobjects
p(O | I) ap(I|O) p(O)
Object model Context model
Full jointScene model Aprox. joint
p(O | I) ap(I|O) p(O)
Object model Context model
Full jointScene model Approx. joint
p(O | I) ap(I|O) p(O)
Object model Context model
Full jointScene model
p(O) = S Pp(Oi|S=s) p(S=s)s i
Approx. joint
officestreet
p(O | I) ap(I|O) p(O)
Object model Context model
Full jointScene model Approx. joint
Pixel labeling using MRFsEnforce consistency between neighboring labels,
and between labels and pixels
Carbonetto, de Freitas & Barnard, ECCV’04
Oi
Beyond nearest-neighbor grids
• Most MRF/CRF models assume nearest-neighbor graph topology
• This cannot capture long-distance correlations
Object-Object Relationships
Use latent variables to induce long distance correlations between labels in a Conditional Random Field (CRF)
He, Zemel & Carreira-Perpinan (04)
Object-Object Relationships
[Kumar Hebert 2005]
• Fink & Perona (NIPS 03)Use output of boosting from other objects at previous
iterations as input into boosting for this iteration
Object-Object Relationships
Objects in Context
Building,boat, motorbike
Building, boat, person
Water,sky
Road
Most consistent labeling according to object co-occurrences& locallabel probabilities.
Boat
Building
Water
Road
A. Rabinovich, A. Vedaldi, C. Galleguillos, E. Wiewiora and S. Belongie. Objects in Context. ICCV 2007
52
Objects in Context:
Contextual RefinementContextual model based on co-occurrencesTry to find the most consistent labeling with high posterior probability and high mean pairwise interaction.Use CRF for this purpose. Boat
Building
Water
Road
Independent segment classificationMean interaction of all label pairs
Φ(i,j) is basically the observed label co-occurrences in training set.
Slide by GokberkCinbis
Detecting difficult objects
Office Maybethere is a mouse
Start recognizing the scene
Torralba, Murphy, Freeman. NIPS 2004.
Detecting difficult objects
Detect first simple objects (reliable detectors) that provide strongcontextual constraints to the target (screen -> keyboard -> mouse)
Torralba, Murphy, Freeman. NIPS 2004.
Detecting difficult objects
Detect first simple objects (reliable detectors) that provide strongcontextual constraints to the target (screen -> keyboard -> mouse)
Torralba, Murphy, Freeman. NIPS 2004.
BRF for car detection: topology
Torralba Murphy Freeman (2004)
BRF for car detection: results
Torralba Murphy Freeman (2004)
A “car” out of context is less of a car
Car Building Road
b F G b F G b F G
From image
From detectors
Thresholded beliefs
Contextual object relationshipsCarbonetto, de Freitas & Barnard (2004) Kumar, Hebert (2005)
Torralba Murphy Freeman (2004)
Fink & Perona (2003)E. Sudderth et al (2005)