Digest of Human Detection from CVPR2015

31
Digest of Human Detection from CVPR 2015 Jan. 27th, 2016, Daichi SUZUO

Transcript of Digest of Human Detection from CVPR2015

Page 1: Digest of Human Detection from CVPR2015

Digest of Human Detectionfrom CVPR 2015

Jan. 27th, 2016, Daichi SUZUO

Page 2: Digest of Human Detection from CVPR2015

Digest of Human Detection from CVPR2015

Features

1. Combination Features and Models for Human Detection - Y. Jiang et al.

2. Filtered Channel Features for Pedestrian Detection - S. Zhang et al.

Training

3. Learning Scene-Specific Pedestrian Detectors without Real Data - H.Hattori et al.

4. Taking a Deeper Look at Pedestrians - J. Hosang et al.

5. Pedestrian Detection aided by Deep Learning Semantic Tasks - Y. Tian et al.

Dataset / Benchmark

6. Multispectral Pedestrian Detection :

Benchmark Dataset and Baseline - S. Hwang et al.

Page 3: Digest of Human Detection from CVPR2015

Fundamentals of Human Detection

• Machine learning based bi-class classifier

• Sliding window search

Negative class

Positive class Convert toimage feature

Training Classifier

ClassifierCrop Feature

extraction

Human?

Not human?

Page 4: Digest of Human Detection from CVPR2015

Image features

1. Combination Features and Models for Human Detection- Y. Jiang et al.

2. Filtered Channel Features for Pedestrian Detection- S. Zhang et al.

Page 5: Digest of Human Detection from CVPR2015

θ

1. Combination Features and Models

for Human Detection - Y. Jiang et al.

• Popular HOG feature[Dalal05]

Input image Edge-image

Edgeextraction

(“cell”)pixel-wise

gradient

power

Histogram

Page 6: Digest of Human Detection from CVPR2015

θ

• Popular HOG feature[Dalal05]: 1st order feature

power

Input image 1st derivative

Differentiate

Histogram

(“cell”)pixel-wise

gradient

idea: How about extending to 0-th/2nd order?

1. Combination Features and Models

for Human Detection - Y. Jiang et al.

Page 7: Digest of Human Detection from CVPR2015

1. Combination Features and Models

for Human Detection - Y. Jiang et al.

• 2nd order: HOB – “bar” shape

• Same as HOG, just using 2nd derivative

• 0th order: HOC – color feature

• Using HSI color space; H as θ, S as power

ignore I

convert to HSIR

G

V

Page 8: Digest of Human Detection from CVPR2015

1. Combination Features and Models

for Human Detection - Y. Jiang et al.

• Combine them into one vector: HOG-III feature

Page 9: Digest of Human Detection from CVPR2015

1. Combination Features and Models

for Human Detection - Y. Jiang et al.

• Train different classifiers from the same HOG-IIIs

• Detect individually, and fuse into one result

Inputimage

HOG-IIIfeatures

Detection byGrammar model[Girshick11]

Detection byPoselet model[Bourdev10]

FusionFinalresult

(This is one of the key process of the method

Please refer the original paper for more details)

Page 10: Digest of Human Detection from CVPR2015

1. Combination Features and Models

for Human Detection - Y. Jiang et al.

Effect of HOG-III

Effect of Fusion

Feature AP

HOG 45.8%

HOC+HOG+HOB 50.1%

HOG-III 51.3%

Classifier AP

Single use of Grammer 45.8%

Single use of Poselet 47.0%

Fusion 52.3% Combining HOG-III and Fusion

performs best

Page 11: Digest of Human Detection from CVPR2015

2. Filtered Channel Features

for Pedestrian Detection - S. Zhang et al.

• Extension of “Integral Channel Features” [Dollár09]

• ChnFtrs: Extension of “Viola-Jones method” [Viola02]

(Viola-Jones method)

… …

Input imageLearn decision-tree

by AdaBoost

Extract “Haar-like”Features (scalar)

※Sum of difference between

white and black region

Page 12: Digest of Human Detection from CVPR2015

2. Filtered Channel Features

for Pedestrian Detection - S. Zhang et al.

• Extension of “Integral Channel Features” [Dollár09]

• ChnFtrs: Extension of “Viola-Jones method” [Viola02]

(Integral Channel Features)

… …

Input imageLearn decision-tree

by AdaBoost

“channel”

Extract sumof rectangle

※Unlike

Haar-like

Transform

Page 13: Digest of Human Detection from CVPR2015

2. Filtered Channel Features

for Pedestrian Detection - S. Zhang et al.

• Extension of “Integral Channel Features” [Dollár09]

• ChnFtrs: Extension of “Viola-Jones method” [Viola02]

(Filtered Channel Features)

… …

Learn decision-tree

by AdaBoost

“channel” Apply various

filters

(convolution)…

*

*

Pick-uppixel value

as a feature…

Page 14: Digest of Human Detection from CVPR2015

2. Filtered Channel Features

for Pedestrian Detection - S. Zhang et al.

Using 50 filters

performs bestAchieved the highest accuracy

Page 15: Digest of Human Detection from CVPR2015

Training

3. Learning Scene-Specific Pedestrian Detectors

without Real Data - H. Hattori et al.

4. Taking a Deeper Look at Pedestrians- J. Hosang et al.

5. Pedestrian Detection aided by

Deep Learning Semantic Tasks - Y. Tian et al.

Page 16: Digest of Human Detection from CVPR2015

• Train detector by CG-based training datasets

3. Learning Scene-Specific Pedestrian Detectors

without Real Data - H. Hattori et al.

Real background

(static image)

annotate

CG-based human

compositeSimulated scene

Page 17: Digest of Human Detection from CVPR2015

• Not only scene-specific, but also location-specific!

3. Learning Scene-Specific Pedestrian Detectors

without Real Data - H. Hattori et al.

Classifier

Classifier

Grid with overwrap

(102~105 patches)Training images

(~103 pos, ~103 neg

for each patch)

JointClassifierEnsembleTraining

Scene-specific

Location-specific

detectors

Page 18: Digest of Human Detection from CVPR2015

3. Learning Scene-Specific Pedestrian Detectors

without Real Data - H. Hattori et al.

Patch size # detectors Avg. Precision

8x8 371 .802

16x16 102 .798

32x32 30 .764

Effect of location-specific detection

Example of the detection result

Comparison

Page 19: Digest of Human Detection from CVPR2015

“convnet still underperforms state-of-the-arts”

…Really?

Enhance know-how of convnet based detector

4. Taking a Deeper Look at Pedestrians - J. Hosang et al.

• Small network (CifarNet) / Big network (AlexNet)

• Window size

• How to collect training images

• Fine-tuning

• Number and Type of layers

• …

Page 20: Digest of Human Detection from CVPR2015

4. Taking a Deeper Look at Pedestrians - J. Hosang et al.

Convnet with the best configuration outperforms!

Interesting points:

• Ratio of pos/neg does not affect

to the accuracy so much

• Data-augumentation is effective

• Network size should be chosen

by the amount of training samples

• ...

Page 21: Digest of Human Detection from CVPR2015

5. Pedestrian Detection aided by

Deep Learning Semantic Tasks - Y. Tian et al.

Binary-classification is sometimes insufficient…

Human

Not human

(Hard negatives)

It is necessary to use semantic information jointly

Page 22: Digest of Human Detection from CVPR2015

5. Pedestrian Detection aided by

Deep Learning Semantic Tasks - Y. Tian et al.

Classify pedestrian and Recognize semantic at once!

Page 23: Digest of Human Detection from CVPR2015

5. Pedestrian Detection aided by

Deep Learning Semantic Tasks - Y. Tian et al.

Classify pedestrian and Recognize semantic at once!

Also recognizes current scene semantics

• Pedestrian attribute (e.g. wearing backpack)

• Background attribute (e.g. road, sky, …)

Page 24: Digest of Human Detection from CVPR2015

5. Pedestrian Detection aided by

Deep Learning Semantic Tasks - Y. Tian et al.

Classify pedestrian and Recognize semantic at once!

Difficult to collect various (annotated) negs from one dataset…

Transfer from other annotated datasets by TA-CNN

(Please refer the original and related papers for more details about TA-CNN…)

Page 25: Digest of Human Detection from CVPR2015

5. Pedestrian Detection aided by

Deep Learning Semantic Tasks - Y. Tian et al.

Comparison with CNN-based methods

Example of detection results

Page 26: Digest of Human Detection from CVPR2015

Benchmark / Dataset

6. Multispectral Pedestrian Detection :

Benchmark Dataset and Baseline - S. Hwang et al.

Page 27: Digest of Human Detection from CVPR2015

• Dataset of visible-light and thermal image

6. Multispectral Pedestrian Detection :

Benchmark Dataset and Baseline - S. Hwang et al.

Contributions:

• Color and thermal images

• Both test/training data

• Temporally-corresponded tag

• Large enough

• …

Page 28: Digest of Human Detection from CVPR2015

Takeaways

Page 29: Digest of Human Detection from CVPR2015

• Human detection is still challenging

• Deep learning does not necessarily solve

every problems at this moment

• There are several knowledge that might be helpful

for your research/hobby/…

Takeaways

Page 30: Digest of Human Detection from CVPR2015

References / Supplemental materials

Page 31: Digest of Human Detection from CVPR2015

1. Filtered channel features for pedestrian detection

4. Taking a Deeper Look at Pedestrians• Author's website: http://rodrigob.github.io/

3. Learning Scene-Specific Pedestrian Detectors without Real Data• Project: http://vishnu.boddeti.net/projects/detection-by-synthesis.html

• YouTube: https://youtu.be/2Jf7faozHUs

5. Pedestrian Detection aided by Deep Learning Semantic Tasks• Project: http://mmlab.ie.cuhk.edu.hk/projects/TA-CNN/

6. Multispectral Pedestrian Detection: Benchmark Dataset and Baseline• Lab: http://rcv.kaist.ac.kr/v2/

And all the papers of CVPR2015 are available at cv-foundation.org

See also