Learning Convolutional Feature Hierarchies for Visual Recognition Koray Kavukcuoglu, Pierre...
-
Upload
ethel-moody -
Category
Documents
-
view
229 -
download
3
Transcript of Learning Convolutional Feature Hierarchies for Visual Recognition Koray Kavukcuoglu, Pierre...
![Page 1: Learning Convolutional Feature Hierarchies for Visual Recognition Koray Kavukcuoglu, Pierre Sermanet, Y-Lan Boureau, Karol Gregor, Michael Mathieu, Yann.](https://reader036.fdocuments.in/reader036/viewer/2022062308/56649d1e5503460f949f203a/html5/thumbnails/1.jpg)
Learning Convolutional Feature Hierarchies for
Visual Recognition
Koray Kavukcuoglu, Pierre Sermanet, Y-Lan Boureau,Karol Gregor, Michael Mathieu, Yann LeCun
NIPS 2010
Presented by Bo Chen
![Page 2: Learning Convolutional Feature Hierarchies for Visual Recognition Koray Kavukcuoglu, Pierre Sermanet, Y-Lan Boureau, Karol Gregor, Michael Mathieu, Yann.](https://reader036.fdocuments.in/reader036/viewer/2022062308/56649d1e5503460f949f203a/html5/thumbnails/2.jpg)
Outline
• 1. Drawbacks in the Traditional Convolutional Methods
• 2. The Proposed Algorithm and Some Details• 3. Experimental Results• 4. Conslusions
![Page 3: Learning Convolutional Feature Hierarchies for Visual Recognition Koray Kavukcuoglu, Pierre Sermanet, Y-Lan Boureau, Karol Gregor, Michael Mathieu, Yann.](https://reader036.fdocuments.in/reader036/viewer/2022062308/56649d1e5503460f949f203a/html5/thumbnails/3.jpg)
Convolutional Sparse Coding
Negative:
1. The representations of whole images are highly redundantbecause the training and the inference are performed at the patch level.
2. The inference for a whole image is computationally expensive.
![Page 4: Learning Convolutional Feature Hierarchies for Visual Recognition Koray Kavukcuoglu, Pierre Sermanet, Y-Lan Boureau, Karol Gregor, Michael Mathieu, Yann.](https://reader036.fdocuments.in/reader036/viewer/2022062308/56649d1e5503460f949f203a/html5/thumbnails/4.jpg)
Solutions
• 1. Introducing Convolution Operator
• 2. Introducing Nonlinear Encoder Module
![Page 5: Learning Convolutional Feature Hierarchies for Visual Recognition Koray Kavukcuoglu, Pierre Sermanet, Y-Lan Boureau, Karol Gregor, Michael Mathieu, Yann.](https://reader036.fdocuments.in/reader036/viewer/2022062308/56649d1e5503460f949f203a/html5/thumbnails/5.jpg)
Learning Convolutional Dictionaries• 1. The Boundary Effects Due to Convolutions
Apply a mask on the derivatives of the reconstruction error:
where mask is a term-by-term multiplier that either puts zeros or graduallyscales down the boundaries.
• 2. Computational Efficient Derivative
![Page 6: Learning Convolutional Feature Hierarchies for Visual Recognition Koray Kavukcuoglu, Pierre Sermanet, Y-Lan Boureau, Karol Gregor, Michael Mathieu, Yann.](https://reader036.fdocuments.in/reader036/viewer/2022062308/56649d1e5503460f949f203a/html5/thumbnails/6.jpg)
Learning an Efficient Encoder1. A New Smooth Shrinkage Operator:
2. To aid faster convergence, use stochastic diagonal Levenberg-Marquardt method to calculate a positive diagonal approximation to the hessian.
![Page 7: Learning Convolutional Feature Hierarchies for Visual Recognition Koray Kavukcuoglu, Pierre Sermanet, Y-Lan Boureau, Karol Gregor, Michael Mathieu, Yann.](https://reader036.fdocuments.in/reader036/viewer/2022062308/56649d1e5503460f949f203a/html5/thumbnails/7.jpg)
Patch Based vs Convolutional Sparse Modeling
The convolution operator enables the system to model local structures that appear anywhere in the signal. The convolutional dictionary does not waste resources modeling similar filter structure at multiple locations. Instead, it Models more orientations, frequencies, and different structures including center-surround filters, double center-surround filters, and corner structures at various angles.
![Page 8: Learning Convolutional Feature Hierarchies for Visual Recognition Koray Kavukcuoglu, Pierre Sermanet, Y-Lan Boureau, Karol Gregor, Michael Mathieu, Yann.](https://reader036.fdocuments.in/reader036/viewer/2022062308/56649d1e5503460f949f203a/html5/thumbnails/8.jpg)
Multi-Stage ArchitectureThe convolutional encoder can be used to replace patch-based sparse coding modules used in multistage object recognition architectures. Building on the previous findings, for each stage, the encoder is followed by and absolute value rectification,contrast normalization and average subsampling.
Absolute Value Rectification: a simple pointwise absolute value function applied on the output of the encoder.
Contrast Normalization: reduce the dependencies between components (feature maps). When used in between layers, the mean and standard deviation is calculated across all feature maps with a 9 × 9 neighborhood in spatial dimensions.
Average Pooling: a spatial pooling operation that is applied on each feature map independently.
![Page 9: Learning Convolutional Feature Hierarchies for Visual Recognition Koray Kavukcuoglu, Pierre Sermanet, Y-Lan Boureau, Karol Gregor, Michael Mathieu, Yann.](https://reader036.fdocuments.in/reader036/viewer/2022062308/56649d1e5503460f949f203a/html5/thumbnails/9.jpg)
Experiments 1: Object Recognition Using Caltech 101 Dataset
Preprocess:1. 30/30 training/testing; 2. Resize: 151x143; 3. Local Contrast Normalization
Unsupervised Training: Berkeley segmentation dataset
Architecture:First Layer: 64 9x9; Pooling: 10 × 10 area with 5 pixel stride.Second Layer: 256 9x9, where each dictionary elementis constrained to connect 16 dictionary elements from the first layer; 6 × 6 area with stride 4.
![Page 10: Learning Convolutional Feature Hierarchies for Visual Recognition Koray Kavukcuoglu, Pierre Sermanet, Y-Lan Boureau, Karol Gregor, Michael Mathieu, Yann.](https://reader036.fdocuments.in/reader036/viewer/2022062308/56649d1e5503460f949f203a/html5/thumbnails/10.jpg)
Recognition Accuracy
One Layer
Two Layers
Ours: 65.8% (0.6)
![Page 11: Learning Convolutional Feature Hierarchies for Visual Recognition Koray Kavukcuoglu, Pierre Sermanet, Y-Lan Boureau, Karol Gregor, Michael Mathieu, Yann.](https://reader036.fdocuments.in/reader036/viewer/2022062308/56649d1e5503460f949f203a/html5/thumbnails/11.jpg)
Pedestrian Detection(1)Original dataset: positive=2416; negative=1218Augmented: positive= 11370 (1000); negative=9001(1000)
Layer-1: 32 7x7; Layer-2: 64 7x7; Pooling: 2x2
![Page 12: Learning Convolutional Feature Hierarchies for Visual Recognition Koray Kavukcuoglu, Pierre Sermanet, Y-Lan Boureau, Karol Gregor, Michael Mathieu, Yann.](https://reader036.fdocuments.in/reader036/viewer/2022062308/56649d1e5503460f949f203a/html5/thumbnails/12.jpg)
Pedestrian Detection(2)
![Page 13: Learning Convolutional Feature Hierarchies for Visual Recognition Koray Kavukcuoglu, Pierre Sermanet, Y-Lan Boureau, Karol Gregor, Michael Mathieu, Yann.](https://reader036.fdocuments.in/reader036/viewer/2022062308/56649d1e5503460f949f203a/html5/thumbnails/13.jpg)
Conclusions
• 1. Convolutional training of feature extractors reduces the redundancy among filters compared with those obtained from patch based models.
• 2. Introduced two different convolutional encode functions for performing efficient feature extraction which is crucial for using sparse coding in real world applications.
• 3. The proposed sparse modeling systems has been applied through a successful multi-stage architecture on object recognition and pedestrian detection problems and performed comparably to similar systems.