High Level Computer Vision Part-Based Models for Object Class ...€¦ · ‣ parts obtained by...

88
High Level Computer Vision Part-Based Models for Object Class Recognition Part 1 Bernt Schiele - [email protected] Mario Fritz - [email protected] https://www.mpi-inf.mpg.de/hlcv

Transcript of High Level Computer Vision Part-Based Models for Object Class ...€¦ · ‣ parts obtained by...

Page 1: High Level Computer Vision Part-Based Models for Object Class ...€¦ · ‣ parts obtained by clustering interest-points ... ‣ Grouping local orientation into horizontal and vertical

High Level Computer Vision

Part-Based Models for Object Class Recognition Part 1

Bernt Schiele - [email protected] Mario Fritz - [email protected]

https://www.mpi-inf.mpg.de/hlcv

Page 2: High Level Computer Vision Part-Based Models for Object Class ...€¦ · ‣ parts obtained by clustering interest-points ... ‣ Grouping local orientation into horizontal and vertical

High Level Computer Vision - May 23, 2o16 2

Complexity of Recognition

Page 3: High Level Computer Vision Part-Based Models for Object Class ...€¦ · ‣ parts obtained by clustering interest-points ... ‣ Grouping local orientation into horizontal and vertical

High Level Computer Vision - May 23, 2o16 3

Complexity of Recognition

Page 4: High Level Computer Vision Part-Based Models for Object Class ...€¦ · ‣ parts obtained by clustering interest-points ... ‣ Grouping local orientation into horizontal and vertical

High Level Computer Vision - May 23, 2o16

• Pictorial Structures [Fischler & Elschlager 1973] ‣ Model has two components

- parts (2D image fragments) - structure (configuration of parts)

4

Class of Object Models: Part-Based Models / Pictorial Structures

Page 5: High Level Computer Vision Part-Based Models for Object Class ...€¦ · ‣ parts obtained by clustering interest-points ... ‣ Grouping local orientation into horizontal and vertical

High Level Computer Vision - May 23, 2o16

• Bag of Words Models (BoW) ‣ object model = histogram of local features ‣ e.g. local feature around interest points

• Global Object Models ‣ object model = global feature object feature

‣ e.g. HOG (Histogram of Oriented Gradients)

• Part-Based Object Models ‣ object model = models of parts

& spatial topology model ‣ e.g. constellation model or

ISM (Implicit Shape Model)

• But: What is the Ideal Notion of Parts here? • And: Should those Parts be Semantic?

“State-of-the-Art” in Object Class Representations

5

BoW: no spatial relationships

e.g. HOG: fixed spatial relationships

e.g. ISM: flexible spatial relationships

Page 6: High Level Computer Vision Part-Based Models for Object Class ...€¦ · ‣ parts obtained by clustering interest-points ... ‣ Grouping local orientation into horizontal and vertical

High Level Computer Vision - May 23, 2o16 6

Part-Based Models - Overview Today (more next week)

• Part-Based using Manual Labeling of Parts ‣ Detection by Components ‣ Multi-Scale Parts

• The Constellation Model ‣ automatic discovery of parts and part-structure

• The Implicit Shape Model (ISM) ‣ parts obtained by clustering interest-points

‣ star-model to model configuration of parts

Page 7: High Level Computer Vision Part-Based Models for Object Class ...€¦ · ‣ parts obtained by clustering interest-points ... ‣ Grouping local orientation into horizontal and vertical

High Level Computer Vision - May 23, 2o16 7

Manually Selected Parts

• Simplest solution ‣ Let a human expert select a set of parts ‣ (If it doesn’t work, take a different human expert)

Mohan, Papageorgiou, Poggio, ‘01

Page 8: High Level Computer Vision Part-Based Models for Object Class ...€¦ · ‣ parts obtained by clustering interest-points ... ‣ Grouping local orientation into horizontal and vertical

High Level Computer Vision - May 23, 2o16 8

Example 1: Detection by Components

• Application ‣ Pedestrian detection

• Representation by 4 parts ‣ Part candidates are selected

by a human expert

‣ Part detectors are learnedand applied independently

‣ The “most suitable” head, leg, and arms are identifiedby the part detectors

Mohan, Papageorgiou, Poggio, ‘01

Page 9: High Level Computer Vision Part-Based Models for Object Class ...€¦ · ‣ parts obtained by clustering interest-points ... ‣ Grouping local orientation into horizontal and vertical

High Level Computer Vision - May 23, 2o16 9

• “Structural model” via a Combination Classifier (stacking)

‣ Part scores are fed intothe combination classifier

‣ Combination classifier classifies the pattern as“person” or “non-person”

‣ The person is detected as anensemble of its parts

Mohan, Papageorgiou, Poggio, ‘01

Example 1: Detection by Components

Page 10: High Level Computer Vision Part-Based Models for Object Class ...€¦ · ‣ parts obtained by clustering interest-points ... ‣ Grouping local orientation into horizontal and vertical

High Level Computer Vision - May 23, 2o16 10

• Detection results

Example 1: Detection by Components

Mohan, Papageorgiou, Poggio, ‘01

Page 11: High Level Computer Vision Part-Based Models for Object Class ...€¦ · ‣ parts obtained by clustering interest-points ... ‣ Grouping local orientation into horizontal and vertical

High Level Computer Vision - May 23, 2o16 11

Example 1: Detection by Components

• Robustness to occlusion ‣ System still detects pedestrians if a part is not visible

Mohan, Papageorgiou, Poggio, ‘01

Page 12: High Level Computer Vision Part-Based Models for Object Class ...€¦ · ‣ parts obtained by clustering interest-points ... ‣ Grouping local orientation into horizontal and vertical

High Level Computer Vision - May 23, 2o16 12

• Body model ‣ 4 body parts

(face, head, upper body, legs)

• 7 part detectors ‣ frontal face & head

‣ profile face & head

‣ frontal upper body

‣ profile upper body

‣ legs

Frontal face Profile face

Frontalupper body

Profileupper body

Legs

Example 2: Multi-Scale Parts

Frontal head Profile head

Mikolajczyk, Schmid, Zisserman, ‘04

Page 13: High Level Computer Vision Part-Based Models for Object Class ...€¦ · ‣ parts obtained by clustering interest-points ... ‣ Grouping local orientation into horizontal and vertical

High Level Computer Vision - May 23, 2o16 13

• Computing local features for body parts (BP) ‣ Computing gradient and Laplacian ‣ Estimating and quantizing local orientations ‣ Grouping local orientation into horizontal and vertical triplets

orientations gradient

Laplacian

Horizontal and vertical orientation

triplets

Motivated by local shape context, SIFT [D. Lowe], wavelet features [H. Schneiderman]

Mikolajczyk, Schmid, Zisserman, ‘04

Example 2: Multi-Scale Parts

Page 14: High Level Computer Vision Part-Based Models for Object Class ...€¦ · ‣ parts obtained by clustering interest-points ... ‣ Grouping local orientation into horizontal and vertical

High Level Computer Vision - May 23, 2o16 14

• Learning feature occurrence and co-occurrence on training data ‣ Representing the body parts (BP) with feature distributions

‣ Using aligned examples for training

Positive examples

Occurrence and co-occurence probability distributions

Negative examples

Example 2: Multi-Scale Parts

Page 15: High Level Computer Vision Part-Based Models for Object Class ...€¦ · ‣ parts obtained by clustering interest-points ... ‣ Grouping local orientation into horizontal and vertical

High Level Computer Vision - May 23, 2o16 15

• Classifiers ‣ Weak classifiers based on a single feature

‣ Weak classifiers based on feature co-occurrence

‣ Strong classifier trained with AdaBoost

Mikolajczyk, Schmid, Zisserman, ‘04

Example 2: Multi-Scale Parts

Page 16: High Level Computer Vision Part-Based Models for Object Class ...€¦ · ‣ parts obtained by clustering interest-points ... ‣ Grouping local orientation into horizontal and vertical

High Level Computer Vision - May 23, 2o16 16

Input image

Feature scale-space

Head log-likelihood map

Upper body log-likelihood map

Legs log-likelihood map

Σ HH

Σ HU

Σ HL

Classifiers

Mikolajczyk, Schmid, Zisserman, ‘04

Part detections

Example 2: Multi-Scale Parts

Page 17: High Level Computer Vision Part-Based Models for Object Class ...€¦ · ‣ parts obtained by clustering interest-points ... ‣ Grouping local orientation into horizontal and vertical

High Level Computer Vision - May 23, 2o16 17

Joint likelihood body detector

Individual part detectors(a)(b) (c)

(d)Example 2: Human Detection Results

Page 18: High Level Computer Vision Part-Based Models for Object Class ...€¦ · ‣ parts obtained by clustering interest-points ... ‣ Grouping local orientation into horizontal and vertical

High Level Computer Vision - May 23, 2o16 18

• Fawlty Towers

Human Detection Results

Mikolajczyk, Schmid, Zisserman, ‘04

Page 19: High Level Computer Vision Part-Based Models for Object Class ...€¦ · ‣ parts obtained by clustering interest-points ... ‣ Grouping local orientation into horizontal and vertical

High Level Computer Vision - May 23, 2o16 19

Discussion

• Approach ‣ Manually selected set of parts - Specific detector trained for each part

‣ Spatial model trained on part activations

‣ Evaluate joint likelihood of part activations

• Advantages ‣ Parts have intuitive meaning.

‣ Standard detection approaches can be used for each part (e.g. SVMs or AdaBoost).

‣ Works well for specific categories.

• Disadvantages ‣ Parts need to be selected manually

- Semantically motivated parts sometimes don’t have a simple appearance distribution

- No guarantee that some important part hasn’t been missed

‣ When switching to another category, the model has to be rebuilt from scratch.

⇒ Goal: Model that can be automatically learned for many categories

Page 20: High Level Computer Vision Part-Based Models for Object Class ...€¦ · ‣ parts obtained by clustering interest-points ... ‣ Grouping local orientation into horizontal and vertical

High Level Computer Vision - May 23, 2o16 20

Part-Based Models - Overview Today (more next week)

• Part-Based using Manual Labeling of Parts ‣ Detection by Components ‣ Multi-Scale Parts

• The Constellation Model ‣ automatic discovery of parts and part-structure

• The Implicit Shape Model (ISM) ‣ parts obtained by clustering interest-points

‣ star-model to model configuration of parts

Page 21: High Level Computer Vision Part-Based Models for Object Class ...€¦ · ‣ parts obtained by clustering interest-points ... ‣ Grouping local orientation into horizontal and vertical

High Level Computer Vision - May 23, 2o16 21

x1

x3

x4

x6

x5

x2

Fully connected shape model

Weber, Welling, Perona, ’00; Fergus, Zisserman, Perona, 03

Constellation of Parts

Page 22: High Level Computer Vision Part-Based Models for Object Class ...€¦ · ‣ parts obtained by clustering interest-points ... ‣ Grouping local orientation into horizontal and vertical

High Level Computer Vision - May 23, 2o16 22

Automatic Part Learning

• Basic idea consists of two steps ‣ “Part” candidates in each image

- take the output regions of an interest point detector as part candidates (use scale-invariant interest point detector for that).

- interest point detector “guarantees” (sort of ;-) that similar structures will be detected in all images (keyword: repeatability)

‣ “Part learning” - find those regions, that occur repeatedly on different instances of the same

object: - for this: group (=cluster) the extracted regions to find those that are

characteristic for the object category.

Fergus, Zisserman, Perona, ‘03

Page 23: High Level Computer Vision Part-Based Models for Object Class ...€¦ · ‣ parts obtained by clustering interest-points ... ‣ Grouping local orientation into horizontal and vertical

High Level Computer Vision - May 23, 2o16 23

11x11 patch NormalizeProjection onto

PCA basis

c1

c2

c15

Representation of Appearance Fergus, Zisserman, Perona, ‘03

interest point detection

size normalized

luminance normalized

Page 24: High Level Computer Vision Part-Based Models for Object Class ...€¦ · ‣ parts obtained by clustering interest-points ... ‣ Grouping local orientation into horizontal and vertical

High Level Computer Vision - May 23, 2o16 24

Selected Features & “Parts” (=feature clusters)

interest points100 clusters

Weber, Welling, Perona, ‘00

Page 25: High Level Computer Vision Part-Based Models for Object Class ...€¦ · ‣ parts obtained by clustering interest-points ... ‣ Grouping local orientation into horizontal and vertical

High Level Computer Vision - May 23, 2o16 25

• Repeating structures (clusters in appearance space and in location space) are more likely to belong to the object category than to the background. ⇒ Clusters should mainly represent objects.

Weakly Supervised Training

200 images containing faces 200 background images

Weber, Welling, Perona, ‘00

Page 26: High Level Computer Vision Part-Based Models for Object Class ...€¦ · ‣ parts obtained by clustering interest-points ... ‣ Grouping local orientation into horizontal and vertical

High Level Computer Vision - May 23, 2o16 26

Constellation Model

• Joint model for appearance and structure (=shape) ‣ X: positions, A: part appearance, S: scale ‣ h: Hypothesis = assignment of features (in the image) to

parts (of the model)

Gaussian shape pdfProb. of

detection

Gaussian part appearance pdf

Gaussian relative scale pdf

Log(scale)

Weber, Welling, Perona, ’00; Fergus, Zisserman, Perona, 03

Page 27: High Level Computer Vision Part-Based Models for Object Class ...€¦ · ‣ parts obtained by clustering interest-points ... ‣ Grouping local orientation into horizontal and vertical

High Level Computer Vision - May 23, 2o16 27

Training Procedure

• Need to solve two problems ‣ Select a subset of appearance clusters as part candidates

- Greedy strategy - Start with 3-part model, then test if additional part

improves the results

‣ Learn the parameters of their joint probability density over appearance & structure

- Expectation Maximization (EM) algorithm

Weber, Welling, Perona, ‘00

Page 28: High Level Computer Vision Part-Based Models for Object Class ...€¦ · ‣ parts obtained by clustering interest-points ... ‣ Grouping local orientation into horizontal and vertical

High Level Computer Vision - May 23, 2o16 28

• Task: Estimation of model parameters • Chicken and Egg type problem, since we initially know neither:

‣ Model parameters

‣ Assignment of regions to foreground/background

• Let the assignments be a hidden variable and use EM algorithm to learn them and the model parameters

Learning

Page 29: High Level Computer Vision Part-Based Models for Object Class ...€¦ · ‣ parts obtained by clustering interest-points ... ‣ Grouping local orientation into horizontal and vertical

High Level Computer Vision - May 23, 2o16 29

• Find regions: their location, scale & appearance • Initialize model parameters • Use EM and iterate to convergence

‣ E-step: Compute assignments for which regions are foreground/background

‣ M-step: Update model parameters

• Trying to maximize likelihood – consistency in shape & appearance

Learning Procedure

Page 30: High Level Computer Vision Part-Based Models for Object Class ...€¦ · ‣ parts obtained by clustering interest-points ... ‣ Grouping local orientation into horizontal and vertical

High Level Computer Vision - May 23, 2o16 30

Experiments

• Data sets ‣ Motorbikes, Airplanes, Faces, Cars from side and behind, Spotted cats ‣ and background images ‣ Between 200 and 800 images per category

• Training ‣ 50% of images ‣ position of object unknown within

image (called weakly supervised)

• Testing ‣ 50% of images ‣ Simple object present/absent test ‣ ROC equal error rate computed,

using background set of images

Fergus, Zisserman, Perona, ‘03

Page 31: High Level Computer Vision Part-Based Models for Object Class ...€¦ · ‣ parts obtained by clustering interest-points ... ‣ Grouping local orientation into horizontal and vertical

High Level Computer Vision - May 23, 2o16 31

Fergus, Zisserman, Perona, ‘03

Example: Motorbikes - Part Hypotheses

Page 32: High Level Computer Vision Part-Based Models for Object Class ...€¦ · ‣ parts obtained by clustering interest-points ... ‣ Grouping local orientation into horizontal and vertical

High Level Computer Vision - May 23, 2o16 32

Fergus, Zisserman, Perona, ‘03

Example: Motorbikes - Learned Parts

Page 33: High Level Computer Vision Part-Based Models for Object Class ...€¦ · ‣ parts obtained by clustering interest-points ... ‣ Grouping local orientation into horizontal and vertical

High Level Computer Vision - May 23, 2o16 33

Equal error rate: 7.5%

Fergus, Zisserman, Perona, ‘03

Motorbikes - Constellation Model

Page 34: High Level Computer Vision Part-Based Models for Object Class ...€¦ · ‣ parts obtained by clustering interest-points ... ‣ Grouping local orientation into horizontal and vertical

High Level Computer Vision - May 23, 2o16 34

Fergus, Zisserman, Perona, ‘03

Background Images

Page 35: High Level Computer Vision Part-Based Models for Object Class ...€¦ · ‣ parts obtained by clustering interest-points ... ‣ Grouping local orientation into horizontal and vertical

High Level Computer Vision - May 23, 2o16 35

Equal error rate: 4.6%

Fergus, Zisserman, Perona, ‘03

Frontal Faces - Constellation Model

Page 36: High Level Computer Vision Part-Based Models for Object Class ...€¦ · ‣ parts obtained by clustering interest-points ... ‣ Grouping local orientation into horizontal and vertical

High Level Computer Vision - May 23, 2o16 36

Equal error rate: 9.8%

Airplanes - Constellation Model

Fergus, Zisserman, Perona, ‘03

Page 37: High Level Computer Vision Part-Based Models for Object Class ...€¦ · ‣ parts obtained by clustering interest-points ... ‣ Grouping local orientation into horizontal and vertical

High Level Computer Vision - May 23, 2o16 37

Equal error rate: 10.0%

Fergus, Zisserman, Perona, ‘03

Spotted Cats - Constellation Model

Page 38: High Level Computer Vision Part-Based Models for Object Class ...€¦ · ‣ parts obtained by clustering interest-points ... ‣ Grouping local orientation into horizontal and vertical

High Level Computer Vision - May 23, 2o16 38

Equal error rate: 9.7%

Fergus, Zisserman, Perona, ‘03

Cars (Rear Views) - Constellation Model

Page 39: High Level Computer Vision Part-Based Models for Object Class ...€¦ · ‣ parts obtained by clustering interest-points ... ‣ Grouping local orientation into horizontal and vertical

High Level Computer Vision - May 23, 2o16 39

Fergus, Zisserman, Perona, ‘03

Robustness of the Algorithm

Page 40: High Level Computer Vision Part-Based Models for Object Class ...€¦ · ‣ parts obtained by clustering interest-points ... ‣ Grouping local orientation into horizontal and vertical

High Level Computer Vision - May 23, 2o16 40

Scale-invariant learning and recognition:

Fergus, Zisserman, Perona, ‘03

ROC equal error rates

Page 41: High Level Computer Vision Part-Based Models for Object Class ...€¦ · ‣ parts obtained by clustering interest-points ... ‣ Grouping local orientation into horizontal and vertical

High Level Computer Vision - May 23, 2o16 41

Discussion

• Advantages ‣ Works well for different object categories ‣ Can adapt to categories where

- Shape/structure is more important - Appearance is more important

‣ Everything is learned from training data ‣ Weakly-supervised training possible

• Disadvantages ‣ Model contains many parameters that need to be estimated

‣ Cost increases exponentially with increasing number of parameters (that is in particular with the # of parts !)

Page 42: High Level Computer Vision Part-Based Models for Object Class ...€¦ · ‣ parts obtained by clustering interest-points ... ‣ Grouping local orientation into horizontal and vertical

High Level Computer Vision - May 23, 2o16 42

Part-Based Models - Today

• Part-Based using Manual Labeling of Parts ‣ Detection by Components ‣ Multi-Scale Parts

• The Constellation Model ‣ automatic discovery of parts and part-structure

• The Implicit Shape Model (ISM) ‣ parts obtained by clustering interest-points

‣ star-model to model configuration of parts

Page 43: High Level Computer Vision Part-Based Models for Object Class ...€¦ · ‣ parts obtained by clustering interest-points ... ‣ Grouping local orientation into horizontal and vertical

High Level Computer Vision - May 23, 2o16

Implicit Shape Model: Object Categorization

• Goals ‣ Learn to recognize object categories

‣ Detect and localize them in real-world scenes

‣ Segment objects from background

• Combination with top-down segmentation ‣ Initial hypothesis generation

‣ Category-specific figure-ground segmentation - used to verify object hypothesis

43

“cow”

“motorbike”

“car”

Page 44: High Level Computer Vision Part-Based Models for Object Class ...€¦ · ‣ parts obtained by clustering interest-points ... ‣ Grouping local orientation into horizontal and vertical

High Level Computer Vision - May 23, 2o16 44

Codebook Representation

• Extraction of local object patches ‣ Interest Points (e.g. Harris detector, Hes-Lap, DoG, ...) ‣ inspired by [Agarwal & Roth, 02]

• Collect patches from whole training set ‣ Example:

Page 45: High Level Computer Vision Part-Based Models for Object Class ...€¦ · ‣ parts obtained by clustering interest-points ... ‣ Grouping local orientation into horizontal and vertical

High Level Computer Vision - May 23, 2o16

Appearance Codebook

• Clustering Results ‣ Visual similarity preserved ‣ Wheel parts, window corners, fenders, ...

45

Page 46: High Level Computer Vision Part-Based Models for Object Class ...€¦ · ‣ parts obtained by clustering interest-points ... ‣ Grouping local orientation into horizontal and vertical

High Level Computer Vision - May 23, 2o16

Learning the Spatial Layout

• For every codebook entry, store possible “occurrences”

46

‣ Object identity ‣ Pose ‣ Relative position

‣ Object identity ‣ Pose ‣ Relative position

For new image, let the matched patches vote for possible object positions

Page 47: High Level Computer Vision Part-Based Models for Object Class ...€¦ · ‣ parts obtained by clustering interest-points ... ‣ Grouping local orientation into horizontal and vertical

High Level Computer Vision - May 23, 2o16 47

Implicit Shape Model - Representation

• Learn appearance codebook ‣ Extract patches at DoG interest points

‣ Agglomerative clustering ⇒ codebook

• Learn spatial distributions ‣ Match codebook to training images

‣ Record matching positions on object

105 training images (+motion segmentation) Appearance codebook…

………

Spatial occurrence distributions

Page 48: High Level Computer Vision Part-Based Models for Object Class ...€¦ · ‣ parts obtained by clustering interest-points ... ‣ Grouping local orientation into horizontal and vertical

High Level Computer Vision - May 23, 2o16

Object Detection: ISM (Implicit Shape Model)

• Appearance of parts: Implicit Shape Model (ISM) [Leibe, Seemann & Schiele, CVPR 2005]

48

xo

Page 49: High Level Computer Vision Part-Based Models for Object Class ...€¦ · ‣ parts obtained by clustering interest-points ... ‣ Grouping local orientation into horizontal and vertical

High Level Computer Vision - May 23, 2o16 49

Spatial Models for Categorization

x1

x3

x4

x6

x5

x2

“Star” shape model

x1

x3

x4

x6

x5

x2

Fully connected shape model

‣ e.g. Constellation Model ‣ Parts fully connected ‣ Recognition complexity: O(NP) ‣ Method: Exhaustive search

‣ e.g. ISM (Implicit Shape Model) ‣ Parts mutually independent ‣ Recognition complexity: O(NP) ‣ Method: Generalized Hough Transform

Page 50: High Level Computer Vision Part-Based Models for Object Class ...€¦ · ‣ parts obtained by clustering interest-points ... ‣ Grouping local orientation into horizontal and vertical

High Level Computer Vision - May 23, 2o16 50

Object Categorization Procedure

Interest Points Matched Codebook Entries

Probabilistic Voting

Interpretation(Codebook match)

Object Position

Image Patch

o,xe I

Page 51: High Level Computer Vision Part-Based Models for Object Class ...€¦ · ‣ parts obtained by clustering interest-points ... ‣ Grouping local orientation into horizontal and vertical

High Level Computer Vision - May 23, 2o16 51

Voting Space(continuous)

Object Categorization Procedure

Interest Points Matched Codebook Entries

Probabilistic Voting

Backprojectionof Maximum

Refined Hypothesis(uniform sampling)

BackprojectedHypothesis

Page 52: High Level Computer Vision Part-Based Models for Object Class ...€¦ · ‣ parts obtained by clustering interest-points ... ‣ Grouping local orientation into horizontal and vertical

High Level Computer Vision - May 23, 2o16 52

• 1st hypothesis

Car Categorization - Qualitative Results

2nd hypothesis

4th hypothesis

7th hypothesis

8th hypothesis

Page 53: High Level Computer Vision Part-Based Models for Object Class ...€¦ · ‣ parts obtained by clustering interest-points ... ‣ Grouping local orientation into horizontal and vertical

High Level Computer Vision - May 23, 2o16 53

Original imageInterest pointsMatched patches

Results on Cows

Prob. Votes

Page 54: High Level Computer Vision Part-Based Models for Object Class ...€¦ · ‣ parts obtained by clustering interest-points ... ‣ Grouping local orientation into horizontal and vertical

High Level Computer Vision - May 23, 2o16 54

1’st hypothesis

Results on Cows

Page 55: High Level Computer Vision Part-Based Models for Object Class ...€¦ · ‣ parts obtained by clustering interest-points ... ‣ Grouping local orientation into horizontal and vertical

High Level Computer Vision - May 23, 2o16 55

2’nd hypothesis

Results on Cows

Page 56: High Level Computer Vision Part-Based Models for Object Class ...€¦ · ‣ parts obtained by clustering interest-points ... ‣ Grouping local orientation into horizontal and vertical

High Level Computer Vision - May 23, 2o16 56

Results on Cows

3’rd hypothesis

Page 57: High Level Computer Vision Part-Based Models for Object Class ...€¦ · ‣ parts obtained by clustering interest-points ... ‣ Grouping local orientation into horizontal and vertical

High Level Computer Vision - May 23, 2o16 57

More Results on Cows…

2’nd hypothesis 14’th hypothesis8’th hypothesis16’th hypothesis

Page 58: High Level Computer Vision Part-Based Models for Object Class ...€¦ · ‣ parts obtained by clustering interest-points ... ‣ Grouping local orientation into horizontal and vertical

High Level Computer Vision - May 23, 2o16

Detection Results

• Qualitative Performance (UIUC database -200 cars) ‣ Recognizes different kinds of cars ‣ Robust to clutter, occlusion, low contrast, noise

58

Leibe, Leonardis, Schiele, ‘04

Page 59: High Level Computer Vision Part-Based Models for Object Class ...€¦ · ‣ parts obtained by clustering interest-points ... ‣ Grouping local orientation into horizontal and vertical

High Level Computer Vision - May 23, 2o16 59

• Results on UIUC car database ‣ (170 images containing 200 cars)

‣ Good performance, similar to Constellation Model

‣ Still some false positives

Quantitative Evaluation

-

Page 60: High Level Computer Vision Part-Based Models for Object Class ...€¦ · ‣ parts obtained by clustering interest-points ... ‣ Grouping local orientation into horizontal and vertical

High Level Computer Vision - May 23, 2o16 60

• Scale-invariant feature selection ‣ Scale-invariant interest points ‣ Rescale extracted patches ‣ Match to constant-size codebook

• Generate scale votes ‣ Scale as 3rd dimension in voting space

‣ Search for maxima in 3D voting space

Scale Invariance

Search window

x

y

s

Leibe, Schiele ‘04

Page 61: High Level Computer Vision Part-Based Models for Object Class ...€¦ · ‣ parts obtained by clustering interest-points ... ‣ Grouping local orientation into horizontal and vertical

High Level Computer Vision - May 23, 2o16 61

Qualitative Detection Results

Altogether, objects detected with factor 5.0 scale differences!

scale = 0.75

scale = 3.71

Page 62: High Level Computer Vision Part-Based Models for Object Class ...€¦ · ‣ parts obtained by clustering interest-points ... ‣ Grouping local orientation into horizontal and vertical

High Level Computer Vision - May 23, 2o16 62

Discussion

• Approach: Implicit Shape Model ‣ Generate appearance codebook ‣ Learn spatial occurrence distribution for each codebook entry ‣ Recognition using a probabilistic extension of the Generalized Hough

Transform

• Advantages ‣ Highly flexible shape model

‣ Each image feature acts independently

‣ Possible to learn good object models already from very few (50-100) training examples

‣ Recognition is fast!

Page 63: High Level Computer Vision Part-Based Models for Object Class ...€¦ · ‣ parts obtained by clustering interest-points ... ‣ Grouping local orientation into horizontal and vertical

High Level Computer Vision - May 23, 2o16 63

Discussion (2)

• Disadvantages ‣ Each feature acts independently

⇒ Assumption violated if sampled patches overlap

‣ Only loose constraints on object shape

‣ False positives on structured regions of the background

⇒ Hypothesis verification needed

• Idea: Combination with top-down segmentation ‣ Initial hypothesis generation ‣ Category-specific figure-ground segmentation ‣ Hypothesis verification using segmentation

Page 64: High Level Computer Vision Part-Based Models for Object Class ...€¦ · ‣ parts obtained by clustering interest-points ... ‣ Grouping local orientation into horizontal and vertical

High Level Computer Vision - May 23, 2o16 64

Voting Space(continuous)

"Closing the Loop"

Interest Points Matched Codebook Entries

Probabilistic Voting

Backprojectionof Maximum

Refined Hypothesis(uniform sampling)

BackprojectedHypothesis

Segmentation

Page 65: High Level Computer Vision Part-Based Models for Object Class ...€¦ · ‣ parts obtained by clustering interest-points ... ‣ Grouping local orientation into horizontal and vertical

High Level Computer Vision - May 23, 2o16 65

Segmentation: Probabilistic Formulation

• Influence of patch e on object hypothesis

• Backprojection to patches e and pixels p:

Leibe, Schiele, ‘03

Page 66: High Level Computer Vision Part-Based Models for Object Class ...€¦ · ‣ parts obtained by clustering interest-points ... ‣ Grouping local orientation into horizontal and vertical

High Level Computer Vision - May 23, 2o16 66

Segmentation: Probabilistic Formulation

• Resolve patches by interpretations (codebook entries) I

Segmentationinformation

Influence on object hypothesis

⇒ Store patch segmentation mask for every occurrence position!

Leibe, Schiele, ‘03

Page 67: High Level Computer Vision Part-Based Models for Object Class ...€¦ · ‣ parts obtained by clustering interest-points ... ‣ Grouping local orientation into horizontal and vertical

High Level Computer Vision - May 23, 2o16 67

p(figure)p(ground)

Segmentation

Segmentation

p(figure)

Original image

p(ground)

Page 68: High Level Computer Vision Part-Based Models for Object Class ...€¦ · ‣ parts obtained by clustering interest-points ... ‣ Grouping local orientation into horizontal and vertical

High Level Computer Vision - May 23, 2o16

p(figure)

p(ground)

Segmentation

Segmentation

p(figure)

p(ground)

Original image

• Interpretation of p(figure) map ‣ per-pixel confidence in object hypothesis

‣ Use for hypothesis verification

Page 69: High Level Computer Vision Part-Based Models for Object Class ...€¦ · ‣ parts obtained by clustering interest-points ... ‣ Grouping local orientation into horizontal and vertical

High Level Computer Vision - May 23, 2o16 69

Top-Down Driven Segmentation

• Example 1:

‣ Pedestrian is segmented out since it does not contribute to the car hypothesis

• Example 2:

imagehypothesis segmentation p(figure)

imagep(figure)segmentationsub-image

contours

Leibe, Schiele, ‘03

Page 70: High Level Computer Vision Part-Based Models for Object Class ...€¦ · ‣ parts obtained by clustering interest-points ... ‣ Grouping local orientation into horizontal and vertical

High Level Computer Vision - May 23, 2o16 70

Motorbikes: Segmentation Results

Leibe, Schiele, ‘04

Page 71: High Level Computer Vision Part-Based Models for Object Class ...€¦ · ‣ parts obtained by clustering interest-points ... ‣ Grouping local orientation into horizontal and vertical

High Level Computer Vision - May 23, 2o16 71

• Secondary hypotheses ‣ Desired property of algorithm! ⇒ robustness to partial occlusion

‣ Standard solution: reject based on bounding box overlap ⇒ Problematic - may lead to missing detections!

⇒ Use segmentations to resolve ambiguities instead

Hypothesis Verification: Motivation

Leibe, Leonardis, Schiele, ‘04

Page 72: High Level Computer Vision Part-Based Models for Object Class ...€¦ · ‣ parts obtained by clustering interest-points ... ‣ Grouping local orientation into horizontal and vertical

High Level Computer Vision - May 23, 2o16 72

Formalization in MDL Framework

• Savings of a hypothesis [Leonardis, IJCV’95]

• with ‣ Sarea : #pixels N in segmentation

‣ Smodel : model cost, assumed constant

‣ Serror : estimate of error, according to

• Final form of equation

Sh = K0Sarea �K1Smodel �K2Serror

Serror =�

p�Seg(h)

(1� p(p = figure|h))

Sh = �K1

K0+

�1� K2

K0

⇥N +

K2

K0

p�Seg(h)

p(p = figure|h)

Page 73: High Level Computer Vision Part-Based Models for Object Class ...€¦ · ‣ parts obtained by clustering interest-points ... ‣ Grouping local orientation into horizontal and vertical

High Level Computer Vision - May 23, 2o16 73

Formalization in MDL Framework (2)

• Savings of combined hypothesis

• Goal: Find combination (vector m) that best explains the image ‣ Quadratic Boolean Optimization problem [Leonardis et al, 95]

‣ In practice often sufficient to compute greedy approximation

Sh1�h2 = Sh1 + Sh2 � Sarea(h1 ⇥ h2) + Serror(h1 ⇥ h2)

Page 74: High Level Computer Vision Part-Based Models for Object Class ...€¦ · ‣ parts obtained by clustering interest-points ... ‣ Grouping local orientation into horizontal and vertical

High Level Computer Vision - May 23, 2o16 74

Performance after Verification Stage

• Direct Comparison

-

195/200 correct detections5 false positives

Page 75: High Level Computer Vision Part-Based Models for Object Class ...€¦ · ‣ parts obtained by clustering interest-points ... ‣ Grouping local orientation into horizontal and vertical

High Level Computer Vision - May 23, 2o16 75

Other Categories: Cows

• Articulated Object Recognition ‣ Use set of cow sequences (from Derek Magee@Leeds)

• Extract frames from subset of sequences

Train on 113 images (+ segmentation)

Leibe, Leonardis, Schiele, ‘04

Page 76: High Level Computer Vision Part-Based Models for Object Class ...€¦ · ‣ parts obtained by clustering interest-points ... ‣ Grouping local orientation into horizontal and vertical

High Level Computer Vision - May 23, 2o16 76

Cows: Results on Novel Sequences

• Object Detections ‣ Single-frame recognition - No temporal continuity used!

Leibe, Leonardis, Schiele, ‘04

Page 77: High Level Computer Vision Part-Based Models for Object Class ...€¦ · ‣ parts obtained by clustering interest-points ... ‣ Grouping local orientation into horizontal and vertical

High Level Computer Vision - May 23, 2o16 77

• Segmentations from interest points ‣ Single-frame recognition - No temporal continuity used!

Leibe, Leonardis, Schiele, ‘04

Cows: Results on Novel Sequences (2)

Page 78: High Level Computer Vision Part-Based Models for Object Class ...€¦ · ‣ parts obtained by clustering interest-points ... ‣ Grouping local orientation into horizontal and vertical

High Level Computer Vision - May 23, 2o16

• Segmentations from refined hypotheses ‣ Single-frame recognition - No temporal continuity used!

78

Cows: Results on Novel Sequences (3)Leibe, Leonardis, Schiele, ‘04

Page 79: High Level Computer Vision Part-Based Models for Object Class ...€¦ · ‣ parts obtained by clustering interest-points ... ‣ Grouping local orientation into horizontal and vertical

High Level Computer Vision - May 23, 2o16

• Object Detections

79

Another ExampleLeibe, Leonardis, Schiele, ‘04

Page 80: High Level Computer Vision Part-Based Models for Object Class ...€¦ · ‣ parts obtained by clustering interest-points ... ‣ Grouping local orientation into horizontal and vertical

High Level Computer Vision - May 23, 2o16

• Segmentations from interest points

80

Another Example (2)Leibe, Leonardis, Schiele, ‘04

Page 81: High Level Computer Vision Part-Based Models for Object Class ...€¦ · ‣ parts obtained by clustering interest-points ... ‣ Grouping local orientation into horizontal and vertical

High Level Computer Vision - May 23, 2o16

• Segmentations from refined hypotheses

81

Another Example (3)Leibe, Leonardis, Schiele, ‘04

Page 82: High Level Computer Vision Part-Based Models for Object Class ...€¦ · ‣ parts obtained by clustering interest-points ... ‣ Grouping local orientation into horizontal and vertical

High Level Computer Vision - May 23, 2o16 82

Robustness to Occlusion

• Quantitative results (14 sequences, 2217 frames total) ‣ No difficulties recognizing fully visible cows (99.1% recall)

‣ Robust to significant partial occlusion!

Page 83: High Level Computer Vision Part-Based Models for Object Class ...€¦ · ‣ parts obtained by clustering interest-points ... ‣ Grouping local orientation into horizontal and vertical

High Level Computer Vision - May 23, 2o16 83

Application: Pedestrian Detection

• Small training set ‣ 105 images (+motion segmentations), 210 silhouettes

• Challenging real-world test set ‣ 209 crowded scenes containing 595 pedestrians

training

test

Page 84: High Level Computer Vision Part-Based Models for Object Class ...€¦ · ‣ parts obtained by clustering interest-points ... ‣ Grouping local orientation into horizontal and vertical

High Level Computer Vision - May 23, 2o16 84

Problem for Articulated Objects

• Overcomplete Segmentations ‣ Flexible spatial model

‣ Segmentations may contain superfluous body parts ⇒ Score of neighboring hypotheses may be reduced!

Page 85: High Level Computer Vision Part-Based Models for Object Class ...€¦ · ‣ parts obtained by clustering interest-points ... ‣ Grouping local orientation into horizontal and vertical

High Level Computer Vision - May 23, 2o16 85

Qualitative Results

Leibe, Seemann, Schiele, ‘05

Page 86: High Level Computer Vision Part-Based Models for Object Class ...€¦ · ‣ parts obtained by clustering interest-points ... ‣ Grouping local orientation into horizontal and vertical

High Level Computer Vision - May 23, 2o16 86

Estimated Articulations

Walking direction

Detection scale

Body pose

Problem cases

Page 87: High Level Computer Vision Part-Based Models for Object Class ...€¦ · ‣ parts obtained by clustering interest-points ... ‣ Grouping local orientation into horizontal and vertical

High Level Computer Vision - May 23, 2o16 87

Detections at EER (Equal Error Rate)

Single-frame recognition – no temporal continuity used!Leibe, Seemann, Schiele, ‘05

Page 88: High Level Computer Vision Part-Based Models for Object Class ...€¦ · ‣ parts obtained by clustering interest-points ... ‣ Grouping local orientation into horizontal and vertical

High Level Computer Vision - May 23, 2o16 88

Example Detections