컨볼루션 멀티블럭 HOG를 이용한 퍼지신경망 보행자 검출...
Transcript of 컨볼루션 멀티블럭 HOG를 이용한 퍼지신경망 보행자 검출...
ISSN 1975-8359(Print) / ISSN 2287-4364(Online)
The Transactions of the Korean Institute of Electrical Engineers Vol. 66, No. 7, pp. 1117 1122, 2017
http://doi.org/10.5370/KIEE.2017.66.7.1117
Copyright ⓒ The Korean Institute of Electrical Engineers 1117
This is an Open-Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/
licenses/by-nc/3.0/)which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
컨볼루션 멀티블럭 HOG를 이용한 퍼지신경망 보행자 검출 방법
A Neuro-Fuzzy Pedestrian Detection Method Using Convolutional Multiblock HOG
명 근 우* ․ 곡 락 도* ․ 임 준 식
(Kun-Woo Myung ․ Le-Tao Qu ․ Joon-Shik Lim)
Abstract - Pedestrian detection is a very important and valuable part of artificial intelligence and computer vision. It can be
used in various areas for example automatic drive, video analysis and others. Many works have been done for the pedestrian
detection. The accuracy of pedestrian detection on multiple pedestrian image has reached high level. It is not easily get more
progress now. This paper proposes a new structure based on the idea of HOG and convolutional filters to do the pedestrian
detection in single pedestrian image. It can be a method to increase the accuracy depend on the high accuracy in single
pedestrian detection. In this paper, we use Multiblock HOG and magnitude of the pixel as the feature and use convolutional
filter to do the to extract the feature. And then use NEWFM to be the classifier for training and testing. We use single
pedestrian image of the INRIA data set as the data set. The result shows that the Convolutional Multiblock HOG we proposed
get better performance which is 0.015 miss rate at 10-4 false positive than the other detection methods for example HOGLBP
which is 0.03 miss rate and ChnFtrs which is 0.075 miss rate.
Key Words : Multiblock HOG, INRIA data set, NEWFM, pedestrian detection
Corresponding Author : Dept. of Computer Engineering,
Seongnam, South Korea.
E-mail:[email protected]
* IT College, Gachon University, Seongnam, South Korea
Received : June 7, 2017; Accepted : June 9, 2017
1. Introduction
Pedestrian detection from images or videos is challenging,
however very important in the field of computer vision. It has
significant impact on the automatic driving and artificial
intelligence field. A lot effort has been done for pedestrian
detection [1,2]. The performance of pedestrian detection has
reached a high level. Nowadays pedestrian detection always
focus on the accuracy of full image (multiple pedestrian in
one image) detection value. But it is not easily to increase
that. We propose a method to increase the single pedestrian
image detection accuracy so that we can easy to increase the
accuracy the full image accuracy. To enhance the detection
performance, we should have some robust features to
discriminate the human form clearly. Based on many years of
study, the HOG (histogram of oriented gradient) descriptors
[3], magnitude of pixel [4] and some other descriptors [5,6,7]
show excellent performance in discriminating the human
form. Besides robust features, efficient and accurate classifiers
are also required. The deep learning and convolutional
concept have shown the good performance in pedestrian
detection. We change the extraction method of HOG and
using the convolutional filter. Using multiblock HOG feature
with convolutional filter to do the feature extraction part. And
the performance shows the result is better than original HOG
and other methods such as LBP and CHNFTRs.
2. Related Works
2.1. HOG and Magnitude
The HOG feature is very useful and robust and is used in
pedestrian detection. It provides a dense overlapping
description of image regions. This feature is robust because
local object appearance and shape can often be characterized
well by the distribution of local intensity gradient or edge
directions, even without precise knowledge of the
corresponding gradient or edge positions.
We calculate and count the gradient values of the local
region of an image to be the HOG features. To obtain the
HOG features, as Fig. 1 shows, we normalize the gamma value
and color of the image. Then, we compute the gradient values
of every pixel by the pixel value, as shown in Equations (1)
and (2).
전기학회논문지 66권 7호 2017년 7월
1118
Fig. 1 The Structure of Extracting HOG Feature
(1)
(2)
Here, Gx(x,y), Gy(x,y), and H(x,y) are the horizontal,
vertical, and pixel values of point (x,y).
After calculating the gradient value of every pixel, we
use the value to get the magnitude of gradient and gradient
direction, as shown in Equations (3) and (4).
(3)
tan (4)
Here, Gx(x,y) and Gy(x,y) are the horizontal and vertical
gradient values. G(x,y) is the magnitude of gradient, and
a(x,y) is the gradient direction.
After calculating the magnitude of gradient and gradient
direction, we divide the image into cells and accumulate
weighted votes for gradient orientation over spatial cells,
after that every cell gets a few values as their features;
then, we use every four cells group one block, collect all
the features of these four cells into one feature set and
normalize the value of this feature set. The feature sets of
all blocks are HOG features, and we collect HOG features
for all blocks over a detection window.
Magnitude is the value of pixel gradient. We use HOG to
extract the direction distribution of pixel and magnitude to
show each pixel value and value distribution of pixel.
2.1. NEWFM
NEWFM is a supervised classification neuro-fuzzy system
using the bounded sum of weighted fuzzy membership
functions (BSWFMs) [11,12]. Fig. 2. shows the structure of
NEWFM.
Fig. 2 The Structure of NEWFM
3. Convolutional Multiblock HOG
In this paper we propose Convolutional Multiblock HOG
feature to do pedestrian detection. We use convolutional
filter to extract Multiblock HOG feature and magnitude
feature. And we also use original image to extract these two
feature to make sure get the global feature and edge
enhanced feature to do the pedestrian detection.
The Convolutional Miltiblock HOG is extract Multiblock
HOG from the image processed by convolutional filters.
Multoblock HOG is a feature depend on the idea of HOG
and using filters to do convolutional operation to extraction.
Different from HOG, the Convolutional Multiblock HOG has
three sizes of block without cells in the block like the Fig.
3 shows. There are three sizes of block(image I size, 0.25
image size and 0.0625 image size). Each size of blocks are
not overlapped.
Fig. 3 The Block Size of Multiblock HOG
There is no cell in the block. So the histogram of
orientation is calculated with all the pixel in each blocks.
Trans. KIEE. Vol. 66, No. 7, JUL, 2017
컨볼루션 멀티블럭 HOG를 이용한 퍼지신경망 보행자 검출 방법 1119
Table 1 Algorithm of Mutiblock HOG
1 Proceduce MB_HOG()
{ //extract Multi Block HOG feature
2 Input INRIA person image train data set
//transform the original image to gray image
3 for i=1 to n //n represents the number of the train data set transform_gray(p[i]);// p[i] represents the image of train data set //calculating gradient value and direction of every pixel4 for i=1 to n for j=1 to p for k=1 to q // p,q represent the length and width of image { //h[i][j][k] represents the pixel value in point(j,k) of image[i] //gradx[i][j][k] represents the horizontal gradient value at point(x,y) of image[i] //grady[i][j][k] represents the vertical gradient value at point(x,y) of image[i] gradx[i][j][k]=h[i][j+1][k]-h[i][j-1][k]; grady[i][j][k]=h[i][j][k+1]-h[i][j][k-1];
//grad[i][j][k] is the actual gradient value in point(j,k) of image[i] //angle[i][j][k] is the gradient direction in point(j,k) of image[i] grad[i][j][k]=sqrt(sqr(gradx[i][j][k])+sqr(grady[i][j][k])); angle[i][j][k]=acrtan(grady[i][j][k]/gradx[i][j][k]);1 }5 //set Multi block size width[3],length[3]; //three kinds blocks width and 6 //Calculating histogram of orientation of blcoks for i=1 to 3 //every kinds cells for j=1 to p for k=1 to q { // get histogram of orientation of cells6.1 for m=j to j+width[i]// m,n represents the block range for n= k to k+ength[i] { switch angle[m][n] get ori[m][n] // get point(m,n) orientation //add value to corresponding orientation his[[hcount]ori[m][n]]=his[ori[m][n]]+grad[m][n]; // hcount represents the count number of all histogram }7 Add weight with three direction filters hcount=hcount+1; //every block builds one histogram }
8 Output all histograms of all blcoks(Multi block HOG) }
Depend on the orientation of pixel get the histogram of
blocks. Different from the HOG, the value of histogram is
the number of each orientation pixel not the magnitude
sum of each orientation pixel. After get the histogram of
orientation, After we extract the Multiblock HOG, we select
highest three orientations in every histogram, and use three
filters(horizontal filtering, vertical filtering and diagonal
filtering) to get the distribution of these orientations. Fig. 4
shows the structure of extracting Mulitblock HOG and
calculating the weight shows by the formula.
or
Fig. 4 The Structure of Extracting Multiblock HOG
The alogorithm of Multiblock HOG shows in Table 1.
As we already know the method to extract the Multiblock
HOG, we need to use convolutional filters to process the
image. We use sobel, sharpness and scharr filters to extract
more edge information from the image and get convolutional
images and using max pooing to extract more representive
information. Then we extract Multiblock HOG from the
convolutional images. And we get the Convolutional
Multiblock HOG. Fig. 5 shows the structure of Convolutinal
and pooling part of Convolutional Multiblock feature.
Fig. 5 Convolutional and Pooling Part of Convolutional Feature
4. Experimental Results
4.1. Dataset
The experiment materials are selected from the INRIA
pedestrian dataset. The INRIA pedestrian dataset is collected
as part of research work on the detection of upright people in
the form of image or video. The INRIA pedestrian dataset is
the most popular pedestrian dataset used in pedestrian
detection. It includes 1671 negative samples (non-person
samples) and 902 positive samples (person samples).
전기학회논문지 66권 7호 2017년 7월
1120
Fig. 6 The example of INRIA dataset
4.2. Experiment
As we already know the method to extract the
Convolutional Multiblock HOG and Multiblock HOG. In this
experiment, we use Multiblock HOG and Convolutional
Multiblock HOG to be the feature for the classification. After
extracting feature we use Bhattacharyya Distance to do the
feature selection. Then we use NEWFM to do the
classification. The Fig. 7 shows the experiment structure.
Fig. 7 The Structure of Experiment
4.3. Results
To evaluate the performance of pedestrian detection, we
always use miss rate and false positive value to do it. Miss
rate is also known as false negative rate. It is calculated by
using false negative sample divide all positive samples. False
positive is calculated by using false negative sample divide
all positive samples.
In the experiment, we first compared the performance of
MultiblockHOG with HOG, LBP, and magnitude features
which are widely used in single image pedestrian detection.
The table 1 shows the performance of these features.
feature HOG MultiblockHOG HOG+MagnitudeMultiblockHOG+
magnitude
miss rate(%)
0.45 0.35 0.4 0.33
featureHOG+
LBP
MultiblockHOG
+LBP
HOG+Magnitude
+LBP
MultiblockHOG+
magnitude+LBP
miss rate(%)
0.41 0.39 0.42 0.37
Table 1 The performance of different features
From Table 1 we can see the Multiblock HOG and
magnitdue get better performance than other feature. Then
we try to use the Conolutional Multiblock HOG feature to
see the performance and compare with Multiblock HOG. In
this comparision we use magnitdue in every test. The Table
2 shows the performance. From the Table 2 we can see
using Convolutional Multiblock HOG and Multiblock HOG
together we can get the better result.
featureMultiblock
HOG
Convolutional
Multiblock HOG
Multiblock HOG
+Convolutional
Multiblock HOG
miss
rate(%)0.33 0.3 0.2
Table 2 The performance of Convolutional Multiblock HOG
Fig. 8 The Comparison Results of Convoutional Multiblock
HOG and Other Methods
After using feature selection by Bhattacharyya distance we
use NEWFM to do the classification. We can get the final
result of using convolutional MultiblockHOG and magnitude.
This Fig. 8 shows the performance of method we proposed
and other advanced pedestrian detection methods in single
pedestrian detection using INRIA pedestrian dataset. The
result using miss rate value (the lower the better) at 10-4
false positive value shows the accuracy of pedestrian
detection performance. The result shows the method and
Trans. KIEE. Vol. 66, No. 7, JUL, 2017
컨볼루션 멀티블럭 HOG를 이용한 퍼지신경망 보행자 검출 방법 1121
feature we proposed has better performance than other
method.
6. Conclusion
Comparing the performances of method, the Convolutional
Multiblock HOG shows the better results than other
pedestrian detection method. The Convolutional Multiblock
HOG has lower miss rate than other methods.
From Table 1 we can see using MultiblockHOG and
magniude can get the best performance. Because the
MulitiblockHOG feature shows the pixel gradient distribution
and the magnitude feature shows the pixel value distribution.
And MultiblockHOG not only get the local feature also get the
global feature of the image depend on the multi size blocks.
The noise of color is reduced. So the MultiblockHOG and
manitude feature get better result than other features.
From Table 2 we can see using the Multiblock HOG and
Convolutional Multiblock HOG together can get better result.
Because Convoutional Multiblock HOG get more local feature
information and edge information than Multiblock HOG and
Multiblock HOG feature can get more global feature than
Convolutional Multiblock HOG. So using both two features can
get better result.
Depend on these changes, the convolutional MultiblockHOG
we proposed get better result than other single image
detection method as Figure 3 shows. Because the HOG feature
only focus on the local feature and LBP feature is most focus
on difference of neighbor pixel. The result of our method is
better than them. We also use convolutional layer to get more
representive feature than CHNFTRs, So we get better result
than CHNFTRs.
With the development of pedestrian detection, pedestrian
detection can be more useful in computer vision and artificial
intelligence area.
Acknowledgement
This research was support by Basic Science Research
Program through the National Research Foundation of Korea
(NRF) funded by the Ministry of Education, Science and
Technology(2015R1D1A1A09057409)
References
[1] P. Dollar, C. Wojek, B. Schiele, and P. Perona. Pedestrian
detection: An evaluation of the state of the art. TPAMI
(2011).
[2] P. Dollar, C. Wojek, B. Schiele, and P. Perona,
“Pedestrian detection: a benchmark,” in IEEE Computer
Vision and Pattern Recognition (2009).
[3] N. Dalal and B. Triggs, “Histograms of oriented gradients
for human detection,” in IEEE Computer Vision and
Pattern Recognition (2005).
[4] Dollár, P., Tu, Z., Perona, P., Belongie, S.: Integral
channel features. In: BMVC(2009).
[5] P. Sabzmeydani and G. Mori, “Detecting pedestrians by
learning shapelet features,” in IEEE Computer Vision and
Pattern Recognition (2007).
[6] P. Doll´ar, Z. Tu, H. Tao, and S. Belongie, “Feature
mining for image classification,” in IEEE Conf. Computer
Vision and Pattern Recognition (2007).
[7] Z. Lin and L. S. Davis, “A pose-invariant descriptor for
human det. and seg.” in European Conf. Computer Vision
(2008).
[8] C. Papageorgiou and T. Poggio, “A trainable system for
object detection,” Intl. Journal of Computer Vision, vol.
38, no. 1, pp. 15–33, (2000).
[9] J. S. Lim, Finding Features for Real-Time Premature
Ventricular Contraction Detection Using a Fuzzy Neural
Network System, IEEE Transactions on Neural Networks,
pp. 522-527, (2009).
[10] Z. X. Zhang, S. H. Lee, and J. S. Lim, " Detecting
ventricular arrhythmias by NEWFM," Granular Computing,
2008. GrC 2008. IEEE International Conference on, (2008).
[11] J. S. Lim, D. Wang, Y.-S. Kim, and S. Gupta, A neuro-
fuzzy approach for diagnosis of antibody deficiency
syndrome. Neurocomputing 69, Issues 7-9, pp. 969-974,
(2006).
[12] J. S. Lim, T-W Ryu, H-J Kim, and S. Gupta, “Feature
Selection for Specific Antibody Deficiency Syndrome by
Neural Network with Weighted Fuzzy Membership
Functions,” LNCS 3614. pp. 811-820, Springer-Verlag,
Aug (2005).
[13] J. S. Lim and S. Gupta, “Feature Selection Using
Weighted Neuro-Fuzzy Membership Functions,” The
2004 International Conference on Artificial Intelligence
(IC-AI’04), June 21-24, vol. 1, pp. 261-266, Las Vegas,
Nevada, USA, (2004).
[14] Z. X. Zhang, S. H. Lee, and J. S. Lim, "Discrimination of
Ventricular Arrhythmias Using NEWFM," pp. 176-183,
(2008).
[15] Guorong Xuan, Xiuming Zhu, Peiqi Chai,Feature
Selection based on the Bhattacharyya Distance
International Conference on Pattern Recognition(2006).
[16] H. Jun; M. Claudio, The influence of the sigmoid
전기학회논문지 66권 7호 2017년 7월
1122
function parameters on the speed of backpropagation
learning. From Natural to Artificial Neural Computation,
pp. 195–201 (1995).
[17] Y. Wang, F. S. Makedon, J. C. Ford, and J. Pearlman,
“HykGene: A Hybrid Approach for Selecting Marker
Genes for Phenotype Classification Using Microarray
Gene Expression Data,” Bioinformatics vol. 21, pp.
1530-1537, (2005).
[18] X. Wang, T. X. Han, and S. Yan, “An hog-lbp human
detector with partial occlusion handling,” in IEEE Intl.
Conf. Computer Vision(2009).
저 자 소 개
명 근 우 (Kunwoo Myung)
Kunwoo Myung is studying Computer
Engineering in Gachon University, South
Korea. She is interesting in Artificial
Intelligence. Her research focuses on fuzzy
systems.
곡 락 도 (Letao Qu)
Letao Qu received B.S. degree in Computer
Science from Ludong University, China in
2013. He is pursuing master’s course in
Computer Science from Department of
Computer Software at Gachon University,
South Korea. His research focuses on
neuro-fuzzy systems, biomedical prediction systems, and
signal processes.
임 준 식 (Joon S. Lim)
Joon S. Lim received his B.S., M.S., and Ph.D.
degrees in Computer Science from Inha
University, South Korea, The University of
Alabama at Birmingham, and Ph.D. degree
was from Louisiana State University, Baton
Rouge, Louisiana in 1986, 1989, and 1994,
respectively. He is currently a professor in Department of
Computer Software at Gachon University, South Korea. His
research focuses on neuro-fuzzy systems, biomedical
prediction systems, and human-centered systems. He has
authored three textbooks on Artificial Intelligence
Programming (Green Press, 2000), Javaquest (Green Press,
2003), and C# Quest (Green Press, 2006).