High Level Computer Vision Part-Based Models for Object Class ...€¦ · ‣ parts obtained by...

High Level Computer Vision

Part-Based Models for Object Class Recognition Part 1

Bernt Schiele - schiele@mpi-inf.mpg.de Mario Fritz - mfritz@mpi-inf.mpg.de

https://www.mpi-inf.mpg.de/hlcv

High Level Computer Vision - May 23, 2o16 2

Complexity of Recognition

High Level Computer Vision - May 23, 2o16

• Pictorial Structures [Fischler & Elschlager 1973] ‣ Model has two components

- parts (2D image fragments) - structure (configuration of parts)

Class of Object Models: Part-Based Models / Pictorial Structures

• Bag of Words Models (BoW) ‣ object model = histogram of local features ‣ e.g. local feature around interest points

• Global Object Models ‣ object model = global feature object feature

‣ e.g. HOG (Histogram of Oriented Gradients)

• Part-Based Object Models ‣ object model = models of parts

& spatial topology model ‣ e.g. constellation model or

ISM (Implicit Shape Model)

• But: What is the Ideal Notion of Parts here? • And: Should those Parts be Semantic?

“State-of-the-Art” in Object Class Representations

BoW: no spatial relationships

e.g. HOG: fixed spatial relationships

e.g. ISM: flexible spatial relationships

Part-Based Models - Overview Today (more next week)

• Part-Based using Manual Labeling of Parts ‣ Detection by Components ‣ Multi-Scale Parts

• The Constellation Model ‣ automatic discovery of parts and part-structure

• The Implicit Shape Model (ISM) ‣ parts obtained by clustering interest-points

‣ star-model to model configuration of parts

Manually Selected Parts

• Simplest solution ‣ Let a human expert select a set of parts ‣ (If it doesn’t work, take a different human expert)

Mohan, Papageorgiou, Poggio, ‘01

Example 1: Detection by Components

• Application ‣ Pedestrian detection

• Representation by 4 parts ‣ Part candidates are selected

by a human expert

‣ Part detectors are learnedand applied independently

‣ The “most suitable” head, leg, and arms are identifiedby the part detectors

• “Structural model” via a Combination Classifier (stacking)

‣ Part scores are fed intothe combination classifier

‣ Combination classifier classifies the pattern as“person” or “non-person”

‣ The person is detected as anensemble of its parts

• Detection results

• Robustness to occlusion ‣ System still detects pedestrians if a part is not visible

• Body model ‣ 4 body parts

(face, head, upper body, legs)

• 7 part detectors ‣ frontal face & head

‣ profile face & head

‣ frontal upper body

‣ profile upper body

‣ legs

Frontal face Profile face

Frontalupper body

Profileupper body

Example 2: Multi-Scale Parts

Frontal head Profile head

Mikolajczyk, Schmid, Zisserman, ‘04

• Computing local features for body parts (BP) ‣ Computing gradient and Laplacian ‣ Estimating and quantizing local orientations ‣ Grouping local orientation into horizontal and vertical triplets

orientations gradient

Laplacian

Horizontal and vertical orientation

triplets

Motivated by local shape context, SIFT [D. Lowe], wavelet features [H. Schneiderman]

• Learning feature occurrence and co-occurrence on training data ‣ Representing the body parts (BP) with feature distributions

‣ Using aligned examples for training

Positive examples

Occurrence and co-occurence probability distributions

Negative examples

• Classifiers ‣ Weak classifiers based on a single feature

‣ Weak classifiers based on feature co-occurrence

‣ Strong classifier trained with AdaBoost

Input image

Feature scale-space

Head log-likelihood map

Upper body log-likelihood map

Legs log-likelihood map

Classifiers

Part detections

Joint likelihood body detector

Individual part detectors(a)(b) (c)

(d)Example 2: Human Detection Results

• Fawlty Towers

Human Detection Results

Discussion

• Approach ‣ Manually selected set of parts - Specific detector trained for each part

‣ Spatial model trained on part activations

‣ Evaluate joint likelihood of part activations

• Advantages ‣ Parts have intuitive meaning.

‣ Standard detection approaches can be used for each part (e.g. SVMs or AdaBoost).

‣ Works well for specific categories.

• Disadvantages ‣ Parts need to be selected manually

- Semantically motivated parts sometimes don’t have a simple appearance distribution

- No guarantee that some important part hasn’t been missed

‣ When switching to another category, the model has to be rebuilt from scratch.

⇒ Goal: Model that can be automatically learned for many categories

Part-Based Models - Overview Today (more next week)

Fully connected shape model

Weber, Welling, Perona, ’00; Fergus, Zisserman, Perona, 03

Constellation of Parts

Automatic Part Learning

• Basic idea consists of two steps ‣ “Part” candidates in each image

- take the output regions of an interest point detector as part candidates (use scale-invariant interest point detector for that).

- interest point detector “guarantees” (sort of ;-) that similar structures will be detected in all images (keyword: repeatability)

‣ “Part learning” - find those regions, that occur repeatedly on different instances of the same

object: - for this: group (=cluster) the extracted regions to find those that are

characteristic for the object category.

Fergus, Zisserman, Perona, ‘03

11x11 patch NormalizeProjection onto

PCA basis

Representation of Appearance Fergus, Zisserman, Perona, ‘03

interest point detection

size normalized

luminance normalized

Selected Features & “Parts” (=feature clusters)

interest points100 clusters

Weber, Welling, Perona, ‘00

• Repeating structures (clusters in appearance space and in location space) are more likely to belong to the object category than to the background. ⇒ Clusters should mainly represent objects.

Weakly Supervised Training

200 images containing faces 200 background images

Constellation Model

• Joint model for appearance and structure (=shape) ‣ X: positions, A: part appearance, S: scale ‣ h: Hypothesis = assignment of features (in the image) to

parts (of the model)

Gaussian shape pdfProb. of

detection

Gaussian part appearance pdf

Gaussian relative scale pdf

Log(scale)

Weber, Welling, Perona, ’00; Fergus, Zisserman, Perona, 03

Training Procedure

• Need to solve two problems ‣ Select a subset of appearance clusters as part candidates

- Greedy strategy - Start with 3-part model, then test if additional part

improves the results

‣ Learn the parameters of their joint probability density over appearance & structure

- Expectation Maximization (EM) algorithm

• Task: Estimation of model parameters • Chicken and Egg type problem, since we initially know neither:

‣ Model parameters

‣ Assignment of regions to foreground/background

• Let the assignments be a hidden variable and use EM algorithm to learn them and the model parameters

Learning

• Find regions: their location, scale & appearance • Initialize model parameters • Use EM and iterate to convergence

‣ E-step: Compute assignments for which regions are foreground/background

‣ M-step: Update model parameters

• Trying to maximize likelihood – consistency in shape & appearance

Learning Procedure

Experiments

• Data sets ‣ Motorbikes, Airplanes, Faces, Cars from side and behind, Spotted cats ‣ and background images ‣ Between 200 and 800 images per category

• Training ‣ 50% of images ‣ position of object unknown within

image (called weakly supervised)

• Testing ‣ 50% of images ‣ Simple object present/absent test ‣ ROC equal error rate computed,

using background set of images

Example: Motorbikes - Part Hypotheses

Example: Motorbikes - Learned Parts

Equal error rate: 7.5%

Motorbikes - Constellation Model

Background Images

Frontal Faces - Constellation Model

Airplanes - Constellation Model

Spotted Cats - Constellation Model

Cars (Rear Views) - Constellation Model

Robustness of the Algorithm

Scale-invariant learning and recognition:

ROC equal error rates

Discussion

• Advantages ‣ Works well for different object categories ‣ Can adapt to categories where

- Shape/structure is more important - Appearance is more important

‣ Everything is learned from training data ‣ Weakly-supervised training possible

• Disadvantages ‣ Model contains many parameters that need to be estimated

‣ Cost increases exponentially with increasing number of parameters (that is in particular with the # of parts !)

Part-Based Models - Today

Implicit Shape Model: Object Categorization

• Goals ‣ Learn to recognize object categories

‣ Detect and localize them in real-world scenes

‣ Segment objects from background

• Combination with top-down segmentation ‣ Initial hypothesis generation

‣ Category-specific figure-ground segmentation - used to verify object hypothesis

“cow”

“motorbike”

“car”

Codebook Representation

• Extraction of local object patches ‣ Interest Points (e.g. Harris detector, Hes-Lap, DoG, ...) ‣ inspired by [Agarwal & Roth, 02]

• Collect patches from whole training set ‣ Example:

Appearance Codebook

• Clustering Results ‣ Visual similarity preserved ‣ Wheel parts, window corners, fenders, ...

Learning the Spatial Layout

• For every codebook entry, store possible “occurrences”

‣ Object identity ‣ Pose ‣ Relative position

For new image, let the matched patches vote for possible object positions

Implicit Shape Model - Representation

• Learn appearance codebook ‣ Extract patches at DoG interest points

‣ Agglomerative clustering ⇒ codebook

• Learn spatial distributions ‣ Match codebook to training images

‣ Record matching positions on object

105 training images (+motion segmentation) Appearance codebook…

………

Spatial occurrence distributions

Object Detection: ISM (Implicit Shape Model)

• Appearance of parts: Implicit Shape Model (ISM) [Leibe, Seemann & Schiele, CVPR 2005]

Spatial Models for Categorization

“Star” shape model

Fully connected shape model

‣ e.g. Constellation Model ‣ Parts fully connected ‣ Recognition complexity: O(NP) ‣ Method: Exhaustive search

‣ e.g. ISM (Implicit Shape Model) ‣ Parts mutually independent ‣ Recognition complexity: O(NP) ‣ Method: Generalized Hough Transform

Object Categorization Procedure

Interest Points Matched Codebook Entries

Probabilistic Voting

Interpretation(Codebook match)

Object Position

Image Patch

o,xe I

Voting Space(continuous)

Object Categorization Procedure

Backprojectionof Maximum

Refined Hypothesis(uniform sampling)

BackprojectedHypothesis

• 1st hypothesis

Car Categorization - Qualitative Results

2nd hypothesis

4th hypothesis

7th hypothesis

8th hypothesis

Original imageInterest pointsMatched patches

Results on Cows

Prob. Votes

1’st hypothesis

Results on Cows

2’nd hypothesis

Results on Cows

3’rd hypothesis

More Results on Cows…

2’nd hypothesis 14’th hypothesis8’th hypothesis16’th hypothesis

Detection Results

• Qualitative Performance (UIUC database -200 cars) ‣ Recognizes different kinds of cars ‣ Robust to clutter, occlusion, low contrast, noise

Leibe, Leonardis, Schiele, ‘04

• Results on UIUC car database ‣ (170 images containing 200 cars)

‣ Good performance, similar to Constellation Model

‣ Still some false positives

Quantitative Evaluation

• Scale-invariant feature selection ‣ Scale-invariant interest points ‣ Rescale extracted patches ‣ Match to constant-size codebook

• Generate scale votes ‣ Scale as 3rd dimension in voting space

‣ Search for maxima in 3D voting space

Scale Invariance

Search window

Leibe, Schiele ‘04

Qualitative Detection Results

Altogether, objects detected with factor 5.0 scale differences!

scale = 0.75

scale = 3.71

Discussion

• Approach: Implicit Shape Model ‣ Generate appearance codebook ‣ Learn spatial occurrence distribution for each codebook entry ‣ Recognition using a probabilistic extension of the Generalized Hough

Transform

• Advantages ‣ Highly flexible shape model

‣ Each image feature acts independently

‣ Possible to learn good object models already from very few (50-100) training examples

‣ Recognition is fast!

Discussion (2)

• Disadvantages ‣ Each feature acts independently

⇒ Assumption violated if sampled patches overlap

‣ Only loose constraints on object shape

‣ False positives on structured regions of the background

⇒ Hypothesis verification needed

• Idea: Combination with top-down segmentation ‣ Initial hypothesis generation ‣ Category-specific figure-ground segmentation ‣ Hypothesis verification using segmentation

Voting Space(continuous)

"Closing the Loop"

Backprojectionof Maximum

Refined Hypothesis(uniform sampling)

BackprojectedHypothesis

Segmentation

Segmentation: Probabilistic Formulation

• Influence of patch e on object hypothesis

• Backprojection to patches e and pixels p:

Leibe, Schiele, ‘03

Segmentation: Probabilistic Formulation

• Resolve patches by interpretations (codebook entries) I

Segmentationinformation

Influence on object hypothesis

⇒ Store patch segmentation mask for every occurrence position!

p(figure)p(ground)

Segmentation

p(figure)

Original image

p(ground)

p(figure)

p(ground)

Segmentation

p(figure)

p(ground)

Original image

• Interpretation of p(figure) map ‣ per-pixel confidence in object hypothesis

‣ Use for hypothesis verification

Top-Down Driven Segmentation

• Example 1:

‣ Pedestrian is segmented out since it does not contribute to the car hypothesis

• Example 2:

imagehypothesis segmentation p(figure)

imagep(figure)segmentationsub-image

contours

Motorbikes: Segmentation Results

• Secondary hypotheses ‣ Desired property of algorithm! ⇒ robustness to partial occlusion

‣ Standard solution: reject based on bounding box overlap ⇒ Problematic - may lead to missing detections!

⇒ Use segmentations to resolve ambiguities instead

Hypothesis Verification: Motivation

Formalization in MDL Framework

• Savings of a hypothesis [Leonardis, IJCV’95]

• with ‣ Sarea : #pixels N in segmentation

‣ Smodel : model cost, assumed constant

‣ Serror : estimate of error, according to

• Final form of equation

Sh = K0Sarea �K1Smodel �K2Serror

Serror =�

p�Seg(h)

(1� p(p = figure|h))

Sh = �K1

�1� K2

⇥N +

p�Seg(h)

p(p = figure|h)

Formalization in MDL Framework (2)

• Savings of combined hypothesis

• Goal: Find combination (vector m) that best explains the image ‣ Quadratic Boolean Optimization problem [Leonardis et al, 95]

‣ In practice often sufficient to compute greedy approximation

Sh1�h2 = Sh1 + Sh2 � Sarea(h1 ⇥ h2) + Serror(h1 ⇥ h2)

Performance after Verification Stage

• Direct Comparison

195/200 correct detections5 false positives

Other Categories: Cows

• Articulated Object Recognition ‣ Use set of cow sequences (from Derek Magee@Leeds)

• Extract frames from subset of sequences

Train on 113 images (+ segmentation)

Cows: Results on Novel Sequences

• Object Detections ‣ Single-frame recognition - No temporal continuity used!

• Segmentations from interest points ‣ Single-frame recognition - No temporal continuity used!

Cows: Results on Novel Sequences (2)

• Segmentations from refined hypotheses ‣ Single-frame recognition - No temporal continuity used!

Cows: Results on Novel Sequences (3)Leibe, Leonardis, Schiele, ‘04

• Object Detections

Another ExampleLeibe, Leonardis, Schiele, ‘04

• Segmentations from interest points

Another Example (2)Leibe, Leonardis, Schiele, ‘04

• Segmentations from refined hypotheses

Another Example (3)Leibe, Leonardis, Schiele, ‘04

Robustness to Occlusion

• Quantitative results (14 sequences, 2217 frames total) ‣ No difficulties recognizing fully visible cows (99.1% recall)

‣ Robust to significant partial occlusion!

Application: Pedestrian Detection

• Small training set ‣ 105 images (+motion segmentations), 210 silhouettes

• Challenging real-world test set ‣ 209 crowded scenes containing 595 pedestrians

training

Problem for Articulated Objects

• Overcomplete Segmentations ‣ Flexible spatial model

‣ Segmentations may contain superfluous body parts ⇒ Score of neighboring hypotheses may be reduced!

Qualitative Results

Leibe, Seemann, Schiele, ‘05

Estimated Articulations

Walking direction

Detection scale

Body pose

Problem cases

Detections at EER (Equal Error Rate)

Single-frame recognition – no temporal continuity used!Leibe, Seemann, Schiele, ‘05

Example Detections

High Level Computer Vision Part-Based Models for Object Class ...€¦ · ‣ parts obtained by...

Documents

Transcript of High Level Computer Vision Part-Based Models for Object Class ...€¦ · ‣ parts obtained by...

The Basics - Analytics, Business Intelligence and Data ... · The Basics . Chapter 1 ... 1.2.3 RFM Cell Classification Grouping .....8 1.2.4 Purchase Affinity Clustering ...

Grouping and segmentation - University of Texas at Austinvision.cs.utexas.edu/378h-fall2015/slides/lecture7.pdf · – Gestalt properties • Bottom-up segmentation via clustering

CPSC 425: Computer Visionlsigal/425_2018W2/Lecture23b.pdf · Grouping in Human Vision Slide credit: Kristen Grauman. 45 Grouping in Human Vision Slide credit: Kristen Grauman. Clustering

Data Clustering: A Review - VUB Artificial …methods for grouping of unlabeled data. These communities have different ter-minologies and assumptions for the components of the clustering

Clustering horizontal powerpoint ppt slides.

Lecture 1 Segmentation by Clustering - University of Ioanninacnikou/Courses/Strasbourg/Lecture_1...•Grouping and the Gestalt school. •K-means •Mean shift •Spectral methods

Clustering Web Documents: A Phrase-Based Method for Grouping

Clustering. What is Clustering? Clustering: the process of grouping a set of objects into classes of similar objects –Documents within a cluster should.

UVA CS 6316: Machine Learning Lecture 19:Unsupervised ...€¦ · What is clustering? •Clustering: the process of grouping a set of objects into classes of similar objects –high

Basic concepts of organizational structure: - Structural design - Vertical/horizontal linkages - Grouping strategies - Application - misalignments.

Clustering 10/9/2002. Idea and Applications Clustering is the process of grouping a set of physical or abstract objects into classes of similar objects.

Statistical Genomics and Bioinformatics Workshop: … · Statistical Genomics and Bioinformatics Workshop 8/16/2013 2 Clustering Basics • Clustering is the process of grouping a

DATA MINING - CLUSTERING. Clustering 4 Clustering - unsupervised classification 4 Clustering - the process of grouping physical or abstract objects into.

Discovering Multiple Clustering Solutions: Grouping ... · Why should we aim at multiple clustering solutions? (1) Each object may have several roles in multiple clusters (2) Clusters

Information Retrieval - Clusteringce.sharif.edu/.../root/Slides/Lect-22to24.pdf · Information Retrieval j Introduction Introduction 1 (Document) clustering is the process of grouping

Clustering horizontal powerpoint ppt templates.

Analyzing Customers of South Khorasan Telecommunication ...jad.shahroodut.ac.ir/article_1187_2c51c149ee9fdd0b4dc816a5ba235… · Clustering: Clustering is the task of grouping a set

Clustering. Idea and Applications Clustering is the process of grouping a set of physical or abstract objects into classes of similar objects. –It is.

End-to-end Face Detection and Cast Grouping in Movies ...elm/papers/Erdos.pdfEnd-to-end Face Detection and Cast Grouping in Movies Using Erdos-R˝ ´enyi Clustering SouYoung Jin1,

Discovering Multiple Clustering Solutions: Grouping … Multiple Clustering Solutions: Grouping Objects in Different Views of the Data Emmanuel Müller , Stephan Günnemann , Ines