Outline Pascal VOC challenge framework. Successful detection
methods o Object Detection with Discriminatively Trained Part Based
Models (P.F.Felzenszwalb et al.)-UoC/TTI Method. o Multiple Kernels
for Object Detection (A.Vedaldi et al.)- Oxford\MSR India method. A
successful classification method o Image Classification using
Super-Vector Coding of Local Image Descriptors (Xi Zhou et
al)-NEC/UIUC Method. Discussion about bias in datasets. 2010
Winners Overview. 2
Slide 3
Pascal VOC Challenge Framework The PASCAL Visual Object Classes
(VOC) Challenge Mark Everingham Luc Van Gool Christopher K. I.
Williams John Winn Andrew Zisserman 3
Slide 4
Pascal VOC Challenge Classification Task. Detection Task.
Pixel-level segmentation. Person Layout detection. Action
Classification in still images. 4
Slide 5
Classification Task 5 At least one bus 100%
Slide 6
Detection Task 6 100% Predicted bounding box should overlap by
at least 50% with ground truth!!!
Slide 7
Detections near misses Didnt fulfill the BB overlap criterion
7
Slide 8
Pascal VOC Challenge-The Object Classes 8
Slide 9
9 Images retrieved from flicker website.
Slide 10
Pixel Level Segmentation Class segmentation Object segmentation
Image 10
Slide 11
Person Layout 11
Slide 12
Action Classification Classification among 9 action classes. 12
Speaking on the phone Playing the guitar 100%
Evaluation Precision\Recall Curves. Interpolated Precision.
AP(Average Precision) 15 A way to compare between different
methods.
Slide 16
Evaluation-Precision\Recall Curves(1) Practical Tradeoff
between precision and recall. Interpolated Precision Interpolated
Precision - 16 Rank12345678910 g.t.YesNoYesNoYesNo
Precision1/11/22/32/43/53/63/73/83/93/10 Recall0.2 0.4 0.6
Slide 17
Evaluation-Precision\Recall Curves(2) 17
Slide 18
Evaluation-Average Precision(AP) 18 AP is for determining whos
the best.
Slide 19
Successful Detection Methods 19
Slide 20
UoC/TTI Method Overview Joint winner in 2009 Pascal VOC
challenge with the Oxford Method. Award of "lifetime achievement in
2010. Mixture of deformable part models. Each component has global
template + deformable parts o HOG feature templates. Fully trained
from bounding boxes alone. (P.Felzenszwalb et al.) 20
Slide 21
UoC/TTI Method HOG Features(1) [-1 0 1] and its transpose
Gradient. Gradient orientation is discretized into one of p values.
Pixel-level features Cells of size k. 8-pixel cells(k=8). 9 bins
contrast sensitive +18 bins contrast insensitive =total 27 bins! 21
Soft binning
Slide 22
UoC/TTI Method HOG Features(2) 22 27
Slide 23
UoC/TTI Method HOG Features(3) Normalization. Truncation. 27
bins X 4 normalization factors= 4X27 matrix. Dimensionality
Reduction to 31. 23
Slide 24
UoC/TTI Method Deformable Part Models Coarse root.
High-Resolution deformable parts. Part - (Anchor position,
deformation cost, Res. Level) 24
Slide 25
UoC/TTI Method Mixture Models(1) Diversity of a rich object
category. Different views of the same object. 25 A mixture of
deformable part models for each class. Each deformable part model
in the mixture is called a component.
Slide 26
UoC/TTI Method Object Hypothesis 26Slide taken from the methods
presentation
Slide 27
UoC/TTI Method Models(1) 6 component person model 27
Slide 28
UoC/TTI Method Models(2) 6 component bicycle model 28
Slide 29
UoC/TTI Method Score of a Hypothesis 29Slide taken from
method's presentation
Slide 30
UoC/TTI Method Matching(1) Sliding window approach. High
scoring root locations define detections. Matching is done for each
component separately. 30 Best part location Root location
Slide 31
UoC/TTI Method Matching(2) 31
Slide 32
UoC/TTI Method Post Processing & Context Rescoring 32Slide
taken from method's presentation
Slide 33
UoC/TTI Method Training & DM Weakly labeled data in
Training set. Latent SVM(LSVM) training with as latent value.
Training and Data mining in 4 stages: 33 Optimize z Optimize Add
hard negative examples Remove easy negative examples
Slide 34
UoC/TTI Method Results(1) 34
Slide 35
UoC/TTI Method Results(2) 35
Slide 36
Oxford Method Overview (A.Vedaldi et al.) 36 Regions with
different scales and aspect ratios 6 feature channels 3 level
spatial pyramid Cascade :3 SVM classifiers with 3 different kernels
Post Processing
Slide 37
Oxford Method Feature Channels Bag of Visual Words- SIFT
descriptors are extracted and quantized in a vocabulary of 64
words. Dense words (PhowGray, PhowColor)- Another set of SIFT
Descriptors are then quantized in 300 visual words. Histogram of
oriented edges (Phog180, Phog360)- Similar to the HOG descriptor
used by the UoC/TTI Method with 8 orientation bins. Self-similarity
features (SSIM). 37
Slide 38
Oxford Method Spatial Pyramids 38
Slide 39
Oxford Method Feature Vector 39Chart is taken from the methods
presentation
Slide 40
Oxford Method Discriminant Function(1) 40
Slide 41
Oxford Method Discriminant Function(2) The kernel of the
discriminant function is a linear combination of histogram kernels:
The parameters and the weights (total 18)are learned using MKL
(Multiple Kernel Learning). The discriminant function is used to
rank candidate regions R by the likelihood of containing an
instance of the object of interest. 41
Slide 42
Oxford Method Cascade Solution(1) Exhaustive search of the best
candidate regions R, requires a number of operations which is
O(MBN): o N The number of regions. o M The number of support
vectors in. o B The dimensionality of the histograms. To reduce
this complexity a cascade solution is applied. The first stage uses
a cheap linear kernel to evaluate. The second uses a more expensive
and powerful quasi-linear kernel. The Third uses the most powerful
non-linear kernel. Each stage evaluates the discriminant function
on a smaller number of candidate regions. 42
Slide 43
Oxford Method Cascade Solution(2) TypeEvaluation Complexity
LinearO(N) Quasi-LinearO( BN) Non-LinearO(MBN) Stage 1- Linear
Stage 2- Quasi-linearStage 3- Non linear 43
Slide 44
Oxford Method Cascade Solution(3) 44Chart is taken from the
methods presentation
Slide 45
Oxford Method The Kernels All the before mentioned kernels are
of the following form: For Linear kernels both f and g are linear.
For quasi-linear kernels only f is linear. 45
Slide 46
Oxford Method Post-Processing The output of the last stage is a
ranked list of 100 candidate regions per image. Many of these
regions correspond to multiple detections. Non- Maxima Suppression
is used. Max 10 regions per image remain. 46
Slide 47
Oxford Method Training/Retraining(1) Jittered\flipped instances
are used as positive samples. 47 Training images are partitioned
into two subsets. The classifiers are tested on each subset in turn
adding new hard negative samples for retraining.
Slide 48
Oxford Method Results(1) 48
Slide 49
Oxford Method Results(2) 49
Slide 50
Oxford Method Results(3) Training and testing on VOC2007.
Training and testing on VOC2008. Training on VOC2008 and testing on
VOC2007. Training and testing on VOC2009. 50
Slide 51
Oxford Method Summary 51
Slide 52
A Successful Classification Method 52
Slide 53
NEC/UIUC Method Overview A winner in the 2009 Pascal VOC
classification challenge. A framework for classification is
proposed. (Xi Zhou Kai Yu et al.) 53 Descriptor Coding: Super
Vector Coding Spatial Pyramid Pooling Classification: Linear SVM
The important part!
NEC/UIUC Method Descriptor Coding(2) Super Vector Coding
56
Slide 57
NEC/UIUC Method Spatial Pooling 1X1 2X23X1 57 N-The size of a
set of local descriptors. Y-The set of local descriptors.
Slide 58
NEC/UIUC Method Results(1) 58 SIFT 128-dimentional vectors over
a grid with spacing of 4 pixels on three patch levels (16x16,25x25
and 31x31). PCA Reduction of dimensionality to 80.
Slide 59
NEC/UIUC Method Results(2) |C|=512 59
Slide 60
NEC/UIUC Method Results(3) |C|=2048 60
Slide 61
NEC/UIUC Method Results(4) 61
Slide 62
Bias in Datasets Unbiased Look at Dataset Bias Antonio Torralba
Massachusetts Institute of Technology Alexei A. Efros Carnegie
Mellon University 62
Slide 63
Name The Dataset 63 People were asked to guess, based on three
images, the dataset they were taken from. People, who worked in the
field got more than 75% correct.
Slide 64
Name The Dataset - The Dataset Classifier 4 classifiers were
trained to play the Name The Dataset game. Each classifier used
different image descriptor- o 32X32 thumbnail grayscale and color.
o Gist. o Bag of HOG visual words. 1000 images were randomly
sampled from the training portions of 12 datasets. The classifier
was tested on 300 random images from each of the test sets repeated
20 times. 64
Slide 65
Name The Dataset - The Dataset Classifier The best classifier
performs at 39% (chance is about 8%)!!! The best classifier
performs at 39% (chance is about 8%)!!! Confusion Table Recog.
Performance vs. Number of training examples per class 65
Slide 66
Name The Dataset - The Dataset Classifier Car images from
different datasets Performance is 61% on car images from 5
different datasets (chance is 20%). 66
Slide 67
Cross - Dataset Generalization(1) Training on one dataset while
testing on another. Dalal&Triggs detector(HOG detector + linear
SVM) for the detection task. Bag of Words approach with a Gaussian
kernel SVM for the classification task. The car and person objects
are used. Each classifier(for each dataset) was trained with 500
positive images and 2000 negative ones. Each detector (for each
dataset) was trained with 100 positive images and 1000 negative
ones. Testing classification with 50 positive and 1000 negative
examples. Testing detection 10 positive and 20000 negative
examples. Each classifier\detector ran 20 times and the results
averaged. 67
Slide 68
Cross - Dataset Generalization(2) 68
Slide 69
Cross - Dataset Generalization(3) Logarithmic dependency on the
amount of training samples. 69
Slide 70
Types Of Dataset Biases Selection Bias. Capture Bias. Label
Bias. Negative Set Bias- What the dataset considers to be the rest
of the world. 70
Slide 71
Negative Set Bias-Experiment(1) Evaluation of the relative bias
in the negative sets of different datasets. Training detectors on
positives and negatives of a single dataset. Testing on positives
from the same dataset and on negatives from all 6 datasets
combined. The detector was trained with 100 positives and 1000
negatives. For testing, multiple runs of 10 positive examples for
20,000 negatives were performed. 71
Slide 72
Negative Set Bias-Experiment(2) 72
Slide 73
Negative Set Bias-Experiment(3) A large negative train set is
important for discriminating object with similar contexts in
images. 73
Slide 74
Datasets Market Value(1) A measure of the improvement in
performance when adding training data from another dataset. is the
shift in number of training samples between different datasets to
achieve the same average precision 74
Slide 75
Datasets Market Value(2) This table shows the sample value
(market value) for a car sample across datasets. 75 A sample from
another dataset worth more than a sample from the original
dataset!!!
Slide 76
Bias In Datasets- Summary Datasets, though gathered from the
internet, have distinguishable features of their own. Methods
performing well on a certain dataset can be much worse on another.
The Negative set has at least the same importance as the positive
samples in the dataset. Every dataset has it own Market Value.
Slide 77
2010 Winners Overview 77
Slide 78
Pascal VOC 2010-Winners Classification
Winner:NUSPSL_KERNELREGFUSING Qiang Chen 1, Zheng Song 1, Si Liu 1,
Xiangyu Chen 1, Xiaotong Yuan 1, Tat-Seng Chua 1, Shuicheng Yan 1,
Yang Hua 2, Zhongyang Huang 2, Shengmei Shen 2 1 National
University of Singapore; 2 Panasonic Singapore Laboratories
Detection Winner:NLPR_HOGLBP_MC_LCEGCHLC Yinan Yu, Junge Zhang,
Yongzhen Huang, Shuai Zheng, Weiqiang Ren, Chong Wang, Kaiqi Huang,
Tieniu Tan National Laboratory of Pattern Recognition, Institute of
Automation, Chinese Academy of Sciences Honourable
Mentions:MITUCLA_HIERARCHY Long Zhu, Yuanhao Chen, William Freeman,
Alan Yuille, Antonio Torralba MIT, UCLA
NUS_HOGLBP_CTX_CLS_RESCORE_V2 Zheng Song, Qiang Chen, Shuicheng Yan
National University of Singapore UVA_GROUPLOC/UVA_DETMONKEY Jasper
Uijlings, Koen van de Sande, Theo Gevers, Arnold Smeulders, Remko
Scha University of Amsterdam 78