Download - Pascal Grand Challenge Felix Vilensky 19/6/2011 1.

Pascal Grand Challenge Felix Vilensky 19/6/2011 1

Outline Pascal VOC challenge framework. Successful detection methods o Object Detection with Discriminatively Trained Part Based Models (P.F.Felzenszwalb et al.)-UoC/TTI Method. o Multiple Kernels for Object Detection (A.Vedaldi et al.)- Oxford\MSR India method. A successful classification method o Image Classification using Super-Vector Coding of Local Image Descriptors (Xi Zhou et al)-NEC/UIUC Method. Discussion about bias in datasets. 2010 Winners Overview. 2

Pascal VOC Challenge Framework The PASCAL Visual Object Classes (VOC) Challenge Mark Everingham Luc Van Gool Christopher K. I. Williams John Winn Andrew Zisserman 3

Pascal VOC Challenge Classification Task. Detection Task. Pixel-level segmentation. Person Layout detection. Action Classification in still images. 4

Classification Task 5 At least one bus 100%

Detection Task 6 100% Predicted bounding box should overlap by at least 50% with ground truth!!!

Detections near misses Didnt fulfill the BB overlap criterion 7

Pascal VOC Challenge-The Object Classes 8

9 Images retrieved from flicker website.

Pixel Level Segmentation Class segmentation Object segmentation Image 10

Person Layout 11

Action Classification Classification among 9 action classes. 12 Speaking on the phone Playing the guitar 100%

Annotation Class. Bounding Box. Viewpoint. Truncation. Difficult (for classification\detection). 13

Annotation Example 14

Evaluation Precision\Recall Curves. Interpolated Precision. AP(Average Precision) 15 A way to compare between different methods.

Evaluation-Precision\Recall Curves(1) Practical Tradeoff between precision and recall. Interpolated Precision Interpolated Precision - 16 Rank12345678910 g.t.YesNoYesNoYesNo Precision1/11/22/32/43/53/63/73/83/93/10 Recall0.2 0.4 0.6

Evaluation-Precision\Recall Curves(2) 17

Evaluation-Average Precision(AP) 18 AP is for determining whos the best.

Successful Detection Methods 19

UoC/TTI Method Overview Joint winner in 2009 Pascal VOC challenge with the Oxford Method. Award of "lifetime achievement in 2010. Mixture of deformable part models. Each component has global template + deformable parts o HOG feature templates. Fully trained from bounding boxes alone. (P.Felzenszwalb et al.) 20

UoC/TTI Method HOG Features(1) [-1 0 1] and its transpose Gradient. Gradient orientation is discretized into one of p values. Pixel-level features Cells of size k. 8-pixel cells(k=8). 9 bins contrast sensitive +18 bins contrast insensitive =total 27 bins! 21 Soft binning

UoC/TTI Method HOG Features(2) 22 27

UoC/TTI Method HOG Features(3) Normalization. Truncation. 27 bins X 4 normalization factors= 4X27 matrix. Dimensionality Reduction to 31. 23

UoC/TTI Method Deformable Part Models Coarse root. High-Resolution deformable parts. Part - (Anchor position, deformation cost, Res. Level) 24

UoC/TTI Method Mixture Models(1) Diversity of a rich object category. Different views of the same object. 25 A mixture of deformable part models for each class. Each deformable part model in the mixture is called a component.

UoC/TTI Method Object Hypothesis 26Slide taken from the methods presentation

UoC/TTI Method Models(1) 6 component person model 27

UoC/TTI Method Models(2) 6 component bicycle model 28

UoC/TTI Method Score of a Hypothesis 29Slide taken from method's presentation

UoC/TTI Method Matching(1) Sliding window approach. High scoring root locations define detections. Matching is done for each component separately. 30 Best part location Root location

UoC/TTI Method Matching(2) 31

UoC/TTI Method Post Processing & Context Rescoring 32Slide taken from method's presentation

UoC/TTI Method Training & DM Weakly labeled data in Training set. Latent SVM(LSVM) training with as latent value. Training and Data mining in 4 stages: 33 Optimize z Optimize Add hard negative examples Remove easy negative examples

UoC/TTI Method Results(1) 34

UoC/TTI Method Results(2) 35

Oxford Method Overview (A.Vedaldi et al.) 36 Regions with different scales and aspect ratios 6 feature channels 3 level spatial pyramid Cascade :3 SVM classifiers with 3 different kernels Post Processing

Oxford Method Feature Channels Bag of Visual Words- SIFT descriptors are extracted and quantized in a vocabulary of 64 words. Dense words (PhowGray, PhowColor)- Another set of SIFT Descriptors are then quantized in 300 visual words. Histogram of oriented edges (Phog180, Phog360)- Similar to the HOG descriptor used by the UoC/TTI Method with 8 orientation bins. Self-similarity features (SSIM). 37

Oxford Method Spatial Pyramids 38

Oxford Method Feature Vector 39Chart is taken from the methods presentation

Oxford Method Discriminant Function(1) 40

Oxford Method Discriminant Function(2) The kernel of the discriminant function is a linear combination of histogram kernels: The parameters and the weights (total 18)are learned using MKL (Multiple Kernel Learning). The discriminant function is used to rank candidate regions R by the likelihood of containing an instance of the object of interest. 41

Oxford Method Cascade Solution(1) Exhaustive search of the best candidate regions R, requires a number of operations which is O(MBN): o N The number of regions. o M The number of support vectors in. o B The dimensionality of the histograms. To reduce this complexity a cascade solution is applied. The first stage uses a cheap linear kernel to evaluate. The second uses a more expensive and powerful quasi-linear kernel. The Third uses the most powerful non-linear kernel. Each stage evaluates the discriminant function on a smaller number of candidate regions. 42

Oxford Method Cascade Solution(2) TypeEvaluation Complexity LinearO(N) Quasi-LinearO( BN) Non-LinearO(MBN) Stage 1- Linear Stage 2- Quasi-linearStage 3- Non linear 43

Oxford Method Cascade Solution(3) 44Chart is taken from the methods presentation

Oxford Method The Kernels All the before mentioned kernels are of the following form: For Linear kernels both f and g are linear. For quasi-linear kernels only f is linear. 45

Oxford Method Post-Processing The output of the last stage is a ranked list of 100 candidate regions per image. Many of these regions correspond to multiple detections. Non- Maxima Suppression is used. Max 10 regions per image remain. 46

Oxford Method Training/Retraining(1) Jittered\flipped instances are used as positive samples. 47 Training images are partitioned into two subsets. The classifiers are tested on each subset in turn adding new hard negative samples for retraining.

Oxford Method Results(1) 48

Oxford Method Results(2) 49

Oxford Method Results(3) Training and testing on VOC2007. Training and testing on VOC2008. Training on VOC2008 and testing on VOC2007. Training and testing on VOC2009. 50

Oxford Method Summary 51

A Successful Classification Method 52

NEC/UIUC Method Overview A winner in the 2009 Pascal VOC classification challenge. A framework for classification is proposed. (Xi Zhou Kai Yu et al.) 53 Descriptor Coding: Super Vector Coding Spatial Pyramid Pooling Classification: Linear SVM The important part!

NEC/UIUC Method Notation 54

NEC/UIUC Method Descriptor Coding(1) Vector Quantization Coding 55

NEC/UIUC Method Descriptor Coding(2) Super Vector Coding 56

NEC/UIUC Method Spatial Pooling 1X1 2X23X1 57 N-The size of a set of local descriptors. Y-The set of local descriptors.

NEC/UIUC Method Results(1) 58 SIFT 128-dimentional vectors over a grid with spacing of 4 pixels on three patch levels (16x16,25x25 and 31x31). PCA Reduction of dimensionality to 80.

NEC/UIUC Method Results(2) |C|=512 59

NEC/UIUC Method Results(3) |C|=2048 60

NEC/UIUC Method Results(4) 61

Bias in Datasets Unbiased Look at Dataset Bias Antonio Torralba Massachusetts Institute of Technology Alexei A. Efros Carnegie Mellon University 62

Name The Dataset 63 People were asked to guess, based on three images, the dataset they were taken from. People, who worked in the field got more than 75% correct.

Name The Dataset - The Dataset Classifier 4 classifiers were trained to play the Name The Dataset game. Each classifier used different image descriptor- o 32X32 thumbnail grayscale and color. o Gist. o Bag of HOG visual words. 1000 images were randomly sampled from the training portions of 12 datasets. The classifier was tested on 300 random images from each of the test sets repeated 20 times. 64

Name The Dataset - The Dataset Classifier The best classifier performs at 39% (chance is about 8%)!!! The best classifier performs at 39% (chance is about 8%)!!! Confusion Table Recog. Performance vs. Number of training examples per class 65

Name The Dataset - The Dataset Classifier Car images from different datasets Performance is 61% on car images from 5 different datasets (chance is 20%). 66

Cross - Dataset Generalization(1) Training on one dataset while testing on another. Dalal&Triggs detector(HOG detector + linear SVM) for the detection task. Bag of Words approach with a Gaussian kernel SVM for the classification task. The car and person objects are used. Each classifier(for each dataset) was trained with 500 positive images and 2000 negative ones. Each detector (for each dataset) was trained with 100 positive images and 1000 negative ones. Testing classification with 50 positive and 1000 negative examples. Testing detection 10 positive and 20000 negative examples. Each classifier\detector ran 20 times and the results averaged. 67

Cross - Dataset Generalization(2) 68

Cross - Dataset Generalization(3) Logarithmic dependency on the amount of training samples. 69

Types Of Dataset Biases Selection Bias. Capture Bias. Label Bias. Negative Set Bias- What the dataset considers to be the rest of the world. 70

Negative Set Bias-Experiment(1) Evaluation of the relative bias in the negative sets of different datasets. Training detectors on positives and negatives of a single dataset. Testing on positives from the same dataset and on negatives from all 6 datasets combined. The detector was trained with 100 positives and 1000 negatives. For testing, multiple runs of 10 positive examples for 20,000 negatives were performed. 71

Negative Set Bias-Experiment(2) 72

Negative Set Bias-Experiment(3) A large negative train set is important for discriminating object with similar contexts in images. 73

Datasets Market Value(1) A measure of the improvement in performance when adding training data from another dataset. is the shift in number of training samples between different datasets to achieve the same average precision 74

Datasets Market Value(2) This table shows the sample value (market value) for a car sample across datasets. 75 A sample from another dataset worth more than a sample from the original dataset!!!

Bias In Datasets- Summary Datasets, though gathered from the internet, have distinguishable features of their own. Methods performing well on a certain dataset can be much worse on another. The Negative set has at least the same importance as the positive samples in the dataset. Every dataset has it own Market Value.

2010 Winners Overview 77

Pascal VOC 2010-Winners Classification Winner:NUSPSL_KERNELREGFUSING Qiang Chen 1, Zheng Song 1, Si Liu 1, Xiangyu Chen 1, Xiaotong Yuan 1, Tat-Seng Chua 1, Shuicheng Yan 1, Yang Hua 2, Zhongyang Huang 2, Shengmei Shen 2 1 National University of Singapore; 2 Panasonic Singapore Laboratories Detection Winner:NLPR_HOGLBP_MC_LCEGCHLC Yinan Yu, Junge Zhang, Yongzhen Huang, Shuai Zheng, Weiqiang Ren, Chong Wang, Kaiqi Huang, Tieniu Tan National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences Honourable Mentions:MITUCLA_HIERARCHY Long Zhu, Yuanhao Chen, William Freeman, Alan Yuille, Antonio Torralba MIT, UCLA NUS_HOGLBP_CTX_CLS_RESCORE_V2 Zheng Song, Qiang Chen, Shuicheng Yan National University of Singapore UVA_GROUPLOC/UVA_DETMONKEY Jasper Uijlings, Koen van de Sande, Theo Gevers, Arnold Smeulders, Remko Scha University of Amsterdam 78

NUS-SPL Classification Method 79

NLPR Detection Method 80

Thank You. 81