Pascal Grand Challenge Felix Vilensky 19/6/2011 1.

download Pascal Grand Challenge Felix Vilensky 19/6/2011 1.

If you can't read please download the document

Transcript of Pascal Grand Challenge Felix Vilensky 19/6/2011 1.

  • Slide 1
  • Pascal Grand Challenge Felix Vilensky 19/6/2011 1
  • Slide 2
  • Outline Pascal VOC challenge framework. Successful detection methods o Object Detection with Discriminatively Trained Part Based Models (P.F.Felzenszwalb et al.)-UoC/TTI Method. o Multiple Kernels for Object Detection (A.Vedaldi et al.)- Oxford\MSR India method. A successful classification method o Image Classification using Super-Vector Coding of Local Image Descriptors (Xi Zhou et al)-NEC/UIUC Method. Discussion about bias in datasets. 2010 Winners Overview. 2
  • Slide 3
  • Pascal VOC Challenge Framework The PASCAL Visual Object Classes (VOC) Challenge Mark Everingham Luc Van Gool Christopher K. I. Williams John Winn Andrew Zisserman 3
  • Slide 4
  • Pascal VOC Challenge Classification Task. Detection Task. Pixel-level segmentation. Person Layout detection. Action Classification in still images. 4
  • Slide 5
  • Classification Task 5 At least one bus 100%
  • Slide 6
  • Detection Task 6 100% Predicted bounding box should overlap by at least 50% with ground truth!!!
  • Slide 7
  • Detections near misses Didnt fulfill the BB overlap criterion 7
  • Slide 8
  • Pascal VOC Challenge-The Object Classes 8
  • Slide 9
  • 9 Images retrieved from flicker website.
  • Slide 10
  • Pixel Level Segmentation Class segmentation Object segmentation Image 10
  • Slide 11
  • Person Layout 11
  • Slide 12
  • Action Classification Classification among 9 action classes. 12 Speaking on the phone Playing the guitar 100%
  • Slide 13
  • Annotation Class. Bounding Box. Viewpoint. Truncation. Difficult (for classification\detection). 13
  • Slide 14
  • Annotation Example 14
  • Slide 15
  • Evaluation Precision\Recall Curves. Interpolated Precision. AP(Average Precision) 15 A way to compare between different methods.
  • Slide 16
  • Evaluation-Precision\Recall Curves(1) Practical Tradeoff between precision and recall. Interpolated Precision Interpolated Precision - 16 Rank12345678910 g.t.YesNoYesNoYesNo Precision1/11/22/32/43/53/63/73/83/93/10 Recall0.2 0.4 0.6
  • Slide 17
  • Evaluation-Precision\Recall Curves(2) 17
  • Slide 18
  • Evaluation-Average Precision(AP) 18 AP is for determining whos the best.
  • Slide 19
  • Successful Detection Methods 19
  • Slide 20
  • UoC/TTI Method Overview Joint winner in 2009 Pascal VOC challenge with the Oxford Method. Award of "lifetime achievement in 2010. Mixture of deformable part models. Each component has global template + deformable parts o HOG feature templates. Fully trained from bounding boxes alone. (P.Felzenszwalb et al.) 20
  • Slide 21
  • UoC/TTI Method HOG Features(1) [-1 0 1] and its transpose Gradient. Gradient orientation is discretized into one of p values. Pixel-level features Cells of size k. 8-pixel cells(k=8). 9 bins contrast sensitive +18 bins contrast insensitive =total 27 bins! 21 Soft binning
  • Slide 22
  • UoC/TTI Method HOG Features(2) 22 27
  • Slide 23
  • UoC/TTI Method HOG Features(3) Normalization. Truncation. 27 bins X 4 normalization factors= 4X27 matrix. Dimensionality Reduction to 31. 23
  • Slide 24
  • UoC/TTI Method Deformable Part Models Coarse root. High-Resolution deformable parts. Part - (Anchor position, deformation cost, Res. Level) 24
  • Slide 25
  • UoC/TTI Method Mixture Models(1) Diversity of a rich object category. Different views of the same object. 25 A mixture of deformable part models for each class. Each deformable part model in the mixture is called a component.
  • Slide 26
  • UoC/TTI Method Object Hypothesis 26Slide taken from the methods presentation
  • Slide 27
  • UoC/TTI Method Models(1) 6 component person model 27
  • Slide 28
  • UoC/TTI Method Models(2) 6 component bicycle model 28
  • Slide 29
  • UoC/TTI Method Score of a Hypothesis 29Slide taken from method's presentation
  • Slide 30
  • UoC/TTI Method Matching(1) Sliding window approach. High scoring root locations define detections. Matching is done for each component separately. 30 Best part location Root location
  • Slide 31
  • UoC/TTI Method Matching(2) 31
  • Slide 32
  • UoC/TTI Method Post Processing & Context Rescoring 32Slide taken from method's presentation
  • Slide 33
  • UoC/TTI Method Training & DM Weakly labeled data in Training set. Latent SVM(LSVM) training with as latent value. Training and Data mining in 4 stages: 33 Optimize z Optimize Add hard negative examples Remove easy negative examples
  • Slide 34
  • UoC/TTI Method Results(1) 34
  • Slide 35
  • UoC/TTI Method Results(2) 35
  • Slide 36
  • Oxford Method Overview (A.Vedaldi et al.) 36 Regions with different scales and aspect ratios 6 feature channels 3 level spatial pyramid Cascade :3 SVM classifiers with 3 different kernels Post Processing
  • Slide 37
  • Oxford Method Feature Channels Bag of Visual Words- SIFT descriptors are extracted and quantized in a vocabulary of 64 words. Dense words (PhowGray, PhowColor)- Another set of SIFT Descriptors are then quantized in 300 visual words. Histogram of oriented edges (Phog180, Phog360)- Similar to the HOG descriptor used by the UoC/TTI Method with 8 orientation bins. Self-similarity features (SSIM). 37
  • Slide 38
  • Oxford Method Spatial Pyramids 38
  • Slide 39
  • Oxford Method Feature Vector 39Chart is taken from the methods presentation
  • Slide 40
  • Oxford Method Discriminant Function(1) 40
  • Slide 41
  • Oxford Method Discriminant Function(2) The kernel of the discriminant function is a linear combination of histogram kernels: The parameters and the weights (total 18)are learned using MKL (Multiple Kernel Learning). The discriminant function is used to rank candidate regions R by the likelihood of containing an instance of the object of interest. 41
  • Slide 42
  • Oxford Method Cascade Solution(1) Exhaustive search of the best candidate regions R, requires a number of operations which is O(MBN): o N The number of regions. o M The number of support vectors in. o B The dimensionality of the histograms. To reduce this complexity a cascade solution is applied. The first stage uses a cheap linear kernel to evaluate. The second uses a more expensive and powerful quasi-linear kernel. The Third uses the most powerful non-linear kernel. Each stage evaluates the discriminant function on a smaller number of candidate regions. 42
  • Slide 43
  • Oxford Method Cascade Solution(2) TypeEvaluation Complexity LinearO(N) Quasi-LinearO( BN) Non-LinearO(MBN) Stage 1- Linear Stage 2- Quasi-linearStage 3- Non linear 43
  • Slide 44
  • Oxford Method Cascade Solution(3) 44Chart is taken from the methods presentation
  • Slide 45
  • Oxford Method The Kernels All the before mentioned kernels are of the following form: For Linear kernels both f and g are linear. For quasi-linear kernels only f is linear. 45
  • Slide 46
  • Oxford Method Post-Processing The output of the last stage is a ranked list of 100 candidate regions per image. Many of these regions correspond to multiple detections. Non- Maxima Suppression is used. Max 10 regions per image remain. 46
  • Slide 47
  • Oxford Method Training/Retraining(1) Jittered\flipped instances are used as positive samples. 47 Training images are partitioned into two subsets. The classifiers are tested on each subset in turn adding new hard negative samples for retraining.
  • Slide 48
  • Oxford Method Results(1) 48
  • Slide 49
  • Oxford Method Results(2) 49
  • Slide 50
  • Oxford Method Results(3) Training and testing on VOC2007. Training and testing on VOC2008. Training on VOC2008 and testing on VOC2007. Training and testing on VOC2009. 50
  • Slide 51
  • Oxford Method Summary 51
  • Slide 52
  • A Successful Classification Method 52
  • Slide 53
  • NEC/UIUC Method Overview A winner in the 2009 Pascal VOC classification challenge. A framework for classification is proposed. (Xi Zhou Kai Yu et al.) 53 Descriptor Coding: Super Vector Coding Spatial Pyramid Pooling Classification: Linear SVM The important part!
  • Slide 54
  • NEC/UIUC Method Notation 54
  • Slide 55
  • NEC/UIUC Method Descriptor Coding(1) Vector Quantization Coding 55
  • Slide 56
  • NEC/UIUC Method Descriptor Coding(2) Super Vector Coding 56
  • Slide 57
  • NEC/UIUC Method Spatial Pooling 1X1 2X23X1 57 N-The size of a set of local descriptors. Y-The set of local descriptors.
  • Slide 58
  • NEC/UIUC Method Results(1) 58 SIFT 128-dimentional vectors over a grid with spacing of 4 pixels on three patch levels (16x16,25x25 and 31x31). PCA Reduction of dimensionality to 80.
  • Slide 59
  • NEC/UIUC Method Results(2) |C|=512 59
  • Slide 60
  • NEC/UIUC Method Results(3) |C|=2048 60
  • Slide 61
  • NEC/UIUC Method Results(4) 61
  • Slide 62
  • Bias in Datasets Unbiased Look at Dataset Bias Antonio Torralba Massachusetts Institute of Technology Alexei A. Efros Carnegie Mellon University 62
  • Slide 63
  • Name The Dataset 63 People were asked to guess, based on three images, the dataset they were taken from. People, who worked in the field got more than 75% correct.
  • Slide 64
  • Name The Dataset - The Dataset Classifier 4 classifiers were trained to play the Name The Dataset game. Each classifier used different image descriptor- o 32X32 thumbnail grayscale and color. o Gist. o Bag of HOG visual words. 1000 images were randomly sampled from the training portions of 12 datasets. The classifier was tested on 300 random images from each of the test sets repeated 20 times. 64
  • Slide 65
  • Name The Dataset - The Dataset Classifier The best classifier performs at 39% (chance is about 8%)!!! The best classifier performs at 39% (chance is about 8%)!!! Confusion Table Recog. Performance vs. Number of training examples per class 65
  • Slide 66
  • Name The Dataset - The Dataset Classifier Car images from different datasets Performance is 61% on car images from 5 different datasets (chance is 20%). 66
  • Slide 67
  • Cross - Dataset Generalization(1) Training on one dataset while testing on another. Dalal&Triggs detector(HOG detector + linear SVM) for the detection task. Bag of Words approach with a Gaussian kernel SVM for the classification task. The car and person objects are used. Each classifier(for each dataset) was trained with 500 positive images and 2000 negative ones. Each detector (for each dataset) was trained with 100 positive images and 1000 negative ones. Testing classification with 50 positive and 1000 negative examples. Testing detection 10 positive and 20000 negative examples. Each classifier\detector ran 20 times and the results averaged. 67
  • Slide 68
  • Cross - Dataset Generalization(2) 68
  • Slide 69
  • Cross - Dataset Generalization(3) Logarithmic dependency on the amount of training samples. 69
  • Slide 70
  • Types Of Dataset Biases Selection Bias. Capture Bias. Label Bias. Negative Set Bias- What the dataset considers to be the rest of the world. 70
  • Slide 71
  • Negative Set Bias-Experiment(1) Evaluation of the relative bias in the negative sets of different datasets. Training detectors on positives and negatives of a single dataset. Testing on positives from the same dataset and on negatives from all 6 datasets combined. The detector was trained with 100 positives and 1000 negatives. For testing, multiple runs of 10 positive examples for 20,000 negatives were performed. 71
  • Slide 72
  • Negative Set Bias-Experiment(2) 72
  • Slide 73
  • Negative Set Bias-Experiment(3) A large negative train set is important for discriminating object with similar contexts in images. 73
  • Slide 74
  • Datasets Market Value(1) A measure of the improvement in performance when adding training data from another dataset. is the shift in number of training samples between different datasets to achieve the same average precision 74
  • Slide 75
  • Datasets Market Value(2) This table shows the sample value (market value) for a car sample across datasets. 75 A sample from another dataset worth more than a sample from the original dataset!!!
  • Slide 76
  • Bias In Datasets- Summary Datasets, though gathered from the internet, have distinguishable features of their own. Methods performing well on a certain dataset can be much worse on another. The Negative set has at least the same importance as the positive samples in the dataset. Every dataset has it own Market Value.
  • Slide 77
  • 2010 Winners Overview 77
  • Slide 78
  • Pascal VOC 2010-Winners Classification Winner:NUSPSL_KERNELREGFUSING Qiang Chen 1, Zheng Song 1, Si Liu 1, Xiangyu Chen 1, Xiaotong Yuan 1, Tat-Seng Chua 1, Shuicheng Yan 1, Yang Hua 2, Zhongyang Huang 2, Shengmei Shen 2 1 National University of Singapore; 2 Panasonic Singapore Laboratories Detection Winner:NLPR_HOGLBP_MC_LCEGCHLC Yinan Yu, Junge Zhang, Yongzhen Huang, Shuai Zheng, Weiqiang Ren, Chong Wang, Kaiqi Huang, Tieniu Tan National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences Honourable Mentions:MITUCLA_HIERARCHY Long Zhu, Yuanhao Chen, William Freeman, Alan Yuille, Antonio Torralba MIT, UCLA NUS_HOGLBP_CTX_CLS_RESCORE_V2 Zheng Song, Qiang Chen, Shuicheng Yan National University of Singapore UVA_GROUPLOC/UVA_DETMONKEY Jasper Uijlings, Koen van de Sande, Theo Gevers, Arnold Smeulders, Remko Scha University of Amsterdam 78
  • Slide 79
  • NUS-SPL Classification Method 79
  • Slide 80
  • NLPR Detection Method 80
  • Slide 81
  • Thank You. 81