Digest of Human Detectionfrom CVPR 2015
Jan. 27th, 2016, Daichi SUZUO
Digest of Human Detection from CVPR2015
Features
1. Combination Features and Models for Human Detection - Y. Jiang et al.
2. Filtered Channel Features for Pedestrian Detection - S. Zhang et al.
Training
3. Learning Scene-Specific Pedestrian Detectors without Real Data - H.Hattori et al.
4. Taking a Deeper Look at Pedestrians - J. Hosang et al.
5. Pedestrian Detection aided by Deep Learning Semantic Tasks - Y. Tian et al.
Dataset / Benchmark
6. Multispectral Pedestrian Detection :
Benchmark Dataset and Baseline - S. Hwang et al.
Fundamentals of Human Detection
• Machine learning based bi-class classifier
• Sliding window search
Negative class
Positive class Convert toimage feature
Training Classifier
ClassifierCrop Feature
extraction
Human?
Not human?
Image features
1. Combination Features and Models for Human Detection- Y. Jiang et al.
2. Filtered Channel Features for Pedestrian Detection- S. Zhang et al.
θ
1. Combination Features and Models
for Human Detection - Y. Jiang et al.
• Popular HOG feature[Dalal05]
Input image Edge-image
Edgeextraction
(“cell”)pixel-wise
gradient
power
Histogram
θ
• Popular HOG feature[Dalal05]: 1st order feature
power
Input image 1st derivative
Differentiate
Histogram
(“cell”)pixel-wise
gradient
idea: How about extending to 0-th/2nd order?
1. Combination Features and Models
for Human Detection - Y. Jiang et al.
1. Combination Features and Models
for Human Detection - Y. Jiang et al.
• 2nd order: HOB – “bar” shape
• Same as HOG, just using 2nd derivative
• 0th order: HOC – color feature
• Using HSI color space; H as θ, S as power
ignore I
convert to HSIR
G
V
1. Combination Features and Models
for Human Detection - Y. Jiang et al.
• Combine them into one vector: HOG-III feature
1. Combination Features and Models
for Human Detection - Y. Jiang et al.
• Train different classifiers from the same HOG-IIIs
• Detect individually, and fuse into one result
Inputimage
HOG-IIIfeatures
Detection byGrammar model[Girshick11]
Detection byPoselet model[Bourdev10]
FusionFinalresult
(This is one of the key process of the method
Please refer the original paper for more details)
1. Combination Features and Models
for Human Detection - Y. Jiang et al.
Effect of HOG-III
Effect of Fusion
Feature AP
HOG 45.8%
HOC+HOG+HOB 50.1%
HOG-III 51.3%
Classifier AP
Single use of Grammer 45.8%
Single use of Poselet 47.0%
Fusion 52.3% Combining HOG-III and Fusion
performs best
2. Filtered Channel Features
for Pedestrian Detection - S. Zhang et al.
• Extension of “Integral Channel Features” [Dollár09]
• ChnFtrs: Extension of “Viola-Jones method” [Viola02]
(Viola-Jones method)
…
… …
Input imageLearn decision-tree
by AdaBoost
Extract “Haar-like”Features (scalar)
※Sum of difference between
white and black region
2. Filtered Channel Features
for Pedestrian Detection - S. Zhang et al.
• Extension of “Integral Channel Features” [Dollár09]
• ChnFtrs: Extension of “Viola-Jones method” [Viola02]
(Integral Channel Features)
…
… …
Input imageLearn decision-tree
by AdaBoost
“channel”
Extract sumof rectangle
※Unlike
Haar-like
Transform
2. Filtered Channel Features
for Pedestrian Detection - S. Zhang et al.
• Extension of “Integral Channel Features” [Dollár09]
• ChnFtrs: Extension of “Viola-Jones method” [Viola02]
(Filtered Channel Features)
…
… …
Learn decision-tree
by AdaBoost
“channel” Apply various
filters
(convolution)…
*
*
Pick-uppixel value
as a feature…
2. Filtered Channel Features
for Pedestrian Detection - S. Zhang et al.
Using 50 filters
performs bestAchieved the highest accuracy
Training
3. Learning Scene-Specific Pedestrian Detectors
without Real Data - H. Hattori et al.
4. Taking a Deeper Look at Pedestrians- J. Hosang et al.
5. Pedestrian Detection aided by
Deep Learning Semantic Tasks - Y. Tian et al.
• Train detector by CG-based training datasets
3. Learning Scene-Specific Pedestrian Detectors
without Real Data - H. Hattori et al.
Real background
(static image)
annotate
CG-based human
compositeSimulated scene
• Not only scene-specific, but also location-specific!
3. Learning Scene-Specific Pedestrian Detectors
without Real Data - H. Hattori et al.
…
Classifier
Classifier
…
Grid with overwrap
(102~105 patches)Training images
(~103 pos, ~103 neg
for each patch)
JointClassifierEnsembleTraining
Scene-specific
Location-specific
detectors
3. Learning Scene-Specific Pedestrian Detectors
without Real Data - H. Hattori et al.
Patch size # detectors Avg. Precision
8x8 371 .802
16x16 102 .798
32x32 30 .764
Effect of location-specific detection
Example of the detection result
Comparison
“convnet still underperforms state-of-the-arts”
…Really?
Enhance know-how of convnet based detector
4. Taking a Deeper Look at Pedestrians - J. Hosang et al.
• Small network (CifarNet) / Big network (AlexNet)
• Window size
• How to collect training images
• Fine-tuning
• Number and Type of layers
• …
4. Taking a Deeper Look at Pedestrians - J. Hosang et al.
Convnet with the best configuration outperforms!
Interesting points:
• Ratio of pos/neg does not affect
to the accuracy so much
• Data-augumentation is effective
• Network size should be chosen
by the amount of training samples
• ...
5. Pedestrian Detection aided by
Deep Learning Semantic Tasks - Y. Tian et al.
Binary-classification is sometimes insufficient…
Human
Not human
(Hard negatives)
It is necessary to use semantic information jointly
5. Pedestrian Detection aided by
Deep Learning Semantic Tasks - Y. Tian et al.
Classify pedestrian and Recognize semantic at once!
5. Pedestrian Detection aided by
Deep Learning Semantic Tasks - Y. Tian et al.
Classify pedestrian and Recognize semantic at once!
Also recognizes current scene semantics
• Pedestrian attribute (e.g. wearing backpack)
• Background attribute (e.g. road, sky, …)
5. Pedestrian Detection aided by
Deep Learning Semantic Tasks - Y. Tian et al.
Classify pedestrian and Recognize semantic at once!
Difficult to collect various (annotated) negs from one dataset…
Transfer from other annotated datasets by TA-CNN
(Please refer the original and related papers for more details about TA-CNN…)
5. Pedestrian Detection aided by
Deep Learning Semantic Tasks - Y. Tian et al.
Comparison with CNN-based methods
Example of detection results
Benchmark / Dataset
6. Multispectral Pedestrian Detection :
Benchmark Dataset and Baseline - S. Hwang et al.
• Dataset of visible-light and thermal image
6. Multispectral Pedestrian Detection :
Benchmark Dataset and Baseline - S. Hwang et al.
Contributions:
• Color and thermal images
• Both test/training data
• Temporally-corresponded tag
• Large enough
• …
Takeaways
• Human detection is still challenging
• Deep learning does not necessarily solve
every problems at this moment
• There are several knowledge that might be helpful
for your research/hobby/…
Takeaways
References / Supplemental materials
1. Filtered channel features for pedestrian detection
4. Taking a Deeper Look at Pedestrians• Author's website: http://rodrigob.github.io/
3. Learning Scene-Specific Pedestrian Detectors without Real Data• Project: http://vishnu.boddeti.net/projects/detection-by-synthesis.html
• YouTube: https://youtu.be/2Jf7faozHUs
5. Pedestrian Detection aided by Deep Learning Semantic Tasks• Project: http://mmlab.ie.cuhk.edu.hk/projects/TA-CNN/
6. Multispectral Pedestrian Detection: Benchmark Dataset and Baseline• Lab: http://rcv.kaist.ac.kr/v2/
And all the papers of CVPR2015 are available at cv-foundation.org
See also
Top Related