A Robust Multi-Model Approach for Face Detection in...

8
A Robust Multi-Model Approach for Face Detection in Crowd Sonu Lamba , Neeta Nain , Harendra Chahar Department of Computer Science and Engineering Malviya National Institute of Technology, Jaipur, India-302017 [email protected] , [email protected] , [email protected] Abstract—The estimation of the number of people in surveil- lance areas is essential for monitoring crowded scenes. When density of a zone increases to a certain approximated level, people’s safety can be endangered. Detection of human is a prerequisite for density estimation, tracking, activity recognition and anomaly detection even in non congested areas. This paper presents a robust hybrid approach for face detection in crowd by combining the skin color segmentation and a Histogram of Oriented Gradients(HOG) with Support Vector Machine(SVM) architecture. Initially, image enhancement is performed to im- prove the detection rate. An edge preserving pyramidal approach is applied for multiscale representation of an image. Skin color segmentation is done with combination of YCbCr and RGB color model, and HOG features are extracted from the segmented skin region. We trained the SVM classifier by Muct and FEI databases which consist 751 and 2800 face images respectively. The accuracy of this approach is evaluated by testing it on BAO multiple face database and on various manually collected images captured in surveillance areas. Experimental results demonstrate that the supplementary skin color segmentation with HOG is more potent for increasing the detection rate than using HOG features only. The proposed approach achieves 98.02% accuracy which is higher in comparison to Viola Jones and fast face detection method. Keywords: Skin color segmentation, Oriented gradient fea- ture, RGB and YCbCr, SVM I. I NTRODUCTION Crowd analysis is important to deal with natural crowd complications. The significance of crowd analysis could be seen in mass gathering events, such as public demonstrations, marathons, concerts, rallies and religious gathering which are symbolized by the flock of thousands of people. Crowd analysis has a wide range of real-time applications like visual surveillance, public space design, crowd modeling and crowd calamities prevention. The recurrent and terrible stampede at pilgrimages and parades required robust techniques for visual analysis of highly dense crowd. Human detection in crowded areas is very crucial and funda- mental task. It is a basic function for density estimation, people tracking, activity recognition and anomaly detection even in non-crowded regions. Reliable human detection in each frame is the key element of robust human tracking. Even though, human detection has been surveyed at large extent, most of the existing techniques are not appropriate for detecting humans with large variance in appearance. Therefore, accurate and reliable detection of people is difficult when it is applied to visual analysis due to fewer pixel per target, perspective effects, high density with heavy occlusion, a different variation of poses, variable appearance, strange clothing and different camera orientations. The high-density crowd may lead to fallacious classification of a person which results in false detection. In mass gatherings, the human body may be partially or fully occluded. Face is the most visible part of body which get captured in the images since cameras are fixed at high altitude for better surveillance. In general, Histogram of oriented gradient has proven to be a very effective feature for object detection. In this paper, a simple but powerful approach is proposed to make robust use of HOG features and skin color segmentation for face detection in crowd. The rest of the paper is structured as follows. In section 2, background literature is discussed. Section 3 describes background of HOG in detail, as well as our approach to skin color segmentation and SVM classification for face detection. In section 4, we describe the performance evaluation with implementation results. Section 5 furnishes conclusion of the paper. II. RELATED WORK The detection of human is usually forerunner to many jobs in the era of computer vision. Several methodologies are available in the literature which resolves the human detection problem. The latest extensive review measures numerous state of the art of pedestrian detection and figures out their achieve- ments as detection rate with the degree of occlusion, perspec- tive effects and accuracy of localization. They summarized that the performance of the detector is inversely proportional to the degree of occlusion. They provided a valuable judgment that there is a noticeable interval between existing and desiderated human detection techniques. Geronimo et al. [1] discuss the purpose of pedestrian detection to facilitate drivers to prevent accidents and serious causalities. Due to several threats in detection, many methods lay on a line to improve human detection as given in [2], [3], [4], [5] [6]. In these papers, a part based detector is applied to infer the occluded region. Deep learning architecture has also recently demonstrated outstanding performance in a variety of vision tasks such as face recognition, object classification and object detection but it is certainly limited in its current form, because almost all the successful applications of it use supervised learning with human-annotated data. Due to requirement of annotated data, regularities of real world can not be captured. In [7], deep 2016 12th International Conference on Signal-Image Technology & Internet-Based Systems 978-1-5090-5698-9/16 $31.00 © 2016 IEEE DOI 10.1109/SITIS.2016.24 96 2016 12th International Conference on Signal-Image Technology & Internet-Based Systems 978-1-5090-5698-9/16 $31.00 © 2016 IEEE DOI 10.1109/SITIS.2016.24 96

Transcript of A Robust Multi-Model Approach for Face Detection in...

A Robust Multi-Model Approach for Face

Detection in Crowd

Sonu Lamba∗, Neeta Nain†, Harendra Chahar‡

Department of Computer Science and Engineering

Malviya National Institute of Technology, Jaipur, India-302017

[email protected]∗, [email protected]†, [email protected]

Abstract—The estimation of the number of people in surveil-lance areas is essential for monitoring crowded scenes. Whendensity of a zone increases to a certain approximated level,people’s safety can be endangered. Detection of human is aprerequisite for density estimation, tracking, activity recognitionand anomaly detection even in non congested areas. This paperpresents a robust hybrid approach for face detection in crowdby combining the skin color segmentation and a Histogram ofOriented Gradients(HOG) with Support Vector Machine(SVM)architecture. Initially, image enhancement is performed to im-prove the detection rate. An edge preserving pyramidal approachis applied for multiscale representation of an image. Skin colorsegmentation is done with combination of YCbCr and RGB colormodel, and HOG features are extracted from the segmentedskin region. We trained the SVM classifier by Muct and FEIdatabases which consist 751 and 2800 face images respectively.The accuracy of this approach is evaluated by testing it on BAOmultiple face database and on various manually collected imagescaptured in surveillance areas. Experimental results demonstratethat the supplementary skin color segmentation with HOG ismore potent for increasing the detection rate than using HOGfeatures only. The proposed approach achieves 98.02% accuracywhich is higher in comparison to Viola Jones and fast facedetection method.

Keywords: Skin color segmentation, Oriented gradient fea-

ture, RGB and YCbCr, SVM

I. INTRODUCTION

Crowd analysis is important to deal with natural crowd

complications. The significance of crowd analysis could be

seen in mass gathering events, such as public demonstrations,

marathons, concerts, rallies and religious gathering which

are symbolized by the flock of thousands of people. Crowd

analysis has a wide range of real-time applications like visual

surveillance, public space design, crowd modeling and crowd

calamities prevention. The recurrent and terrible stampede at

pilgrimages and parades required robust techniques for visual

analysis of highly dense crowd.

Human detection in crowded areas is very crucial and funda-

mental task. It is a basic function for density estimation, people

tracking, activity recognition and anomaly detection even in

non-crowded regions. Reliable human detection in each frame

is the key element of robust human tracking. Even though,

human detection has been surveyed at large extent, most of the

existing techniques are not appropriate for detecting humans

with large variance in appearance. Therefore, accurate and

reliable detection of people is difficult when it is applied

to visual analysis due to fewer pixel per target, perspective

effects, high density with heavy occlusion, a different variation

of poses, variable appearance, strange clothing and different

camera orientations. The high-density crowd may lead to

fallacious classification of a person which results in false

detection. In mass gatherings, the human body may be partially

or fully occluded. Face is the most visible part of body

which get captured in the images since cameras are fixed at

high altitude for better surveillance. In general, Histogram of

oriented gradient has proven to be a very effective feature for

object detection. In this paper, a simple but powerful approach

is proposed to make robust use of HOG features and skin color

segmentation for face detection in crowd. The rest of the paper

is structured as follows. In section 2, background literature is

discussed. Section 3 describes background of HOG in detail,

as well as our approach to skin color segmentation and SVM

classification for face detection. In section 4, we describe the

performance evaluation with implementation results. Section

5 furnishes conclusion of the paper.

II. RELATED WORK

The detection of human is usually forerunner to many jobs

in the era of computer vision. Several methodologies are

available in the literature which resolves the human detection

problem. The latest extensive review measures numerous state

of the art of pedestrian detection and figures out their achieve-

ments as detection rate with the degree of occlusion, perspec-

tive effects and accuracy of localization. They summarized that

the performance of the detector is inversely proportional to the

degree of occlusion. They provided a valuable judgment that

there is a noticeable interval between existing and desiderated

human detection techniques. Geronimo et al. [1] discuss the

purpose of pedestrian detection to facilitate drivers to prevent

accidents and serious causalities. Due to several threats in

detection, many methods lay on a line to improve human

detection as given in [2], [3], [4], [5] [6]. In these papers,

a part based detector is applied to infer the occluded region.

Deep learning architecture has also recently demonstrated

outstanding performance in a variety of vision tasks such as

face recognition, object classification and object detection but

it is certainly limited in its current form, because almost all

the successful applications of it use supervised learning with

human-annotated data. Due to requirement of annotated data,

regularities of real world can not be captured. In [7], deep

2016 12th International Conference on Signal-Image Technology & Internet-Based Systems

978-1-5090-5698-9/16 $31.00 © 2016 IEEE

DOI 10.1109/SITIS.2016.24

96

2016 12th International Conference on Signal-Image Technology & Internet-Based Systems

978-1-5090-5698-9/16 $31.00 © 2016 IEEE

DOI 10.1109/SITIS.2016.24

96

learning achieves successful face detection rate but occlusion

handling is a challenging problem. Since the occluder might

have arbitrary appearance, occluded objects have significant

intra-class variation Occlusion handling

Multiple face detection in the crowded areas is still a popular

topic, even after more than a decade of research. A con-

siderable divergence of face detection approaches have been

found in the literature. Much work has been done, and many

techniques have been proposed to perform single or profile

face detection. We briefly review several such works that

are closely related to detecting single face. A face detection

methodology is offered by [8] with the help of skin color

modeling and the modified Hausdorff distance. They computed

the probability of a pixel as being a skin color by using

predetermined threshold, followed by classification of face

or non-face by a template-based object classifier. Paul and

Gavrilova [10] discussed an automatic face detection technique

based on principal component analysis. Gaussian mixture-

based skin color models the geometric structure of the face.

Template based matching and PCA is applied for detection and

to retrieve the most dominant principal components to project

the skin region respectively. In [9], a face detection method

is presented based on rectangular Haar features by combining

skin color and improved AdaBoost algorithm.

The above discussed algorithms of face detection are ade-

quate to detect face for some selected images with a decent

correct detection rate. Though, we analyzed that some face

detection algorithms [8], [9], [10], [11] do not concern on the

multiple face detection in a single patch. Comparatively a little

work has been done of human detection in images, such as

face [12], [13], [14], [15] or pedestrian detection [16], [17] on

crowds. To resolve the problem of multiple face detection and

to promote density analysis in crowd scenes, we proposed a

face detection approach by incorporating skin color modeling

and histogram of oriented gradient features.

III. PROPOSED APPROACH

This section provides a detailed view of the proposed

approach. To maintain the paper self-enclosed, primarily, we

explain significant details of Histogram of Oriented Gradi-

ent [18] which is relevant to our methodology. Afterward,

we describe how to extract the oriented gradient feature from

images and train a support vector machine by those extracted

features. Next, we present a skin color based segmentation

technique to segment skin color regions from testing images.

Oriented gradient features are extracted from the segmented

skin color region. We classify the segmented skin color region

by applying support vector machine based on oriented gradient

features. A pyramidal approach is used for multi-scale image

representation to overcome the perspective effects and variable

scale of face size. In the end, bounding boxes are placed on

detected faces. Two parameters control our approach; one is

the block size with 50% overlapping and other is the threshold

on the skin color segmented region. The choice of HOG as a

feature descriptor is to match the shape and local appearance

of faces.

A. Background: Histogram of Oriented Gradient(HOG)

HOG is a rotationally invariant feature descriptor which

have been used in computer vision [19] [20], pattern recog-

nition, image processing as well as in optimization problems

to detect visual objects. HOG notably outperforms existing

feature sets for object detection. The aim of HOG is to

generalize an object in such a way that the object produces

nearly same features when viewed in different conditions.

These features are computed at the local segment of an image

by estimating occurrence of gradient orientation. Features like

SIFT, edge orientations, shape contexts are similar to HOG

which is mostly used in past decades. HOG only differs that

it is calculated on a compact grid of equally spaced cells

with overlapping by which detection accuracy is improved.

The key advantage of HOG is that it describes local shape

and appearance of an object just by the distribution of local

intensity gradient and edge direction without much knowledge

of gradient position.

In our work, skin color region segmentation and HOG

features are used for face detection in crowded scenes. HOG

offers a robust feature set to differentiate and detect hu-

man faces under different illumination conditions, complex

backgrounds, a wide variety of poses, etc. In this paper,

skin color segmentation is complimentary to oriented gradient

features. Their combination reduces false detection. Skin color

segmentation segregates skin color pixels and non-skin color

pixels. The RGB and YCbCr color spaces’ boundary rules

are applied to segment skin color regions. The color range is

decided by analyzing various images from existing database.

An overview of proposed face detection framework is depicted

in Fig. 1, which involves three phases: feature extraction,

training and testing. Each of these phases will be briefly

described in further sections.

B. Feature Extraction

An overview of feature extraction by using HOG is pre-

sented in this article, which is summarized in Fig. 2. These

features estimate the occurrences of gradient orientation in

local parts of a given image. First, gradient of an image is

computed at dense grid. The image is partitioned into tiny

uniformly spaced spatial areas named cells. Next, to form

HOG representation, gradient orientations are accumulated

for all the pixels of every cell. All the cells are normalized

using the accumulated local histogram over slightly larger

regions called blocks. The normalized features are invariant

to illumination or shadowing. These normalized blocks are

concatenated to form a feature descriptor. The step by step

procedure are given below to extract oriented gradient features

from each and every positive and negative training images of

database. The size of each training image is 32× 24 pixels.

1) Convolve the image by applying the 1D centered mask

in both horizontal (Dx) and vertical (Dy) directions with

the given filter kernels in Equation 1. Simple 1D mask

works best [18] as compare to larger masks.

9797

Test Image

Skin Color Region Segmentation

Feature Extraction

Training Samples

Feature Extraction

Training Classification

Face Detection

Density Estimation

Non-Face

Positive Samples

Negative Samples

Fig. 1. Histogram of oriented gradient based framework for face detection using Skin color segmentation.

Dx =

∣∣∣∣∣∣−101

∣∣∣∣∣∣, Dy =

∣∣∣∣∣∣−101

∣∣∣∣∣∣

T

(1)

Further, subdivide the image into cells wherein every

cell is made up of 4× 4 pixels and every block is made

up of 2 × 2 cells with 50% overlapping as shown in

Fig. III-B. The selection of cell size depends on image

resolution, if the image resolution and the face size is

small it is better to use smaller cells as 4×4 or 8×8 but

if the resolution is very high you can use larger sizes.

For each cell, gradient magnitude(M) and orientation(O)

are computed by the following Equations 2, 3 4 and 5

where i and j are the image(I) pixels.

gx(i, j) = I(i, j − 1)− I(i, j + 1) (2)

gy(i, j) = I(i− 1, j)− I(i+ 1, j) (3)

M(i, j) =√(gx(i, j))2 + (gy(i, j))2 (4)

O(i, j) = tan−1(gy

gx) (5)

2) A cell histogram is created with the contribution of

gradient magnitude. In the case of color image, we opt

the channel which has highest gradient magnitude value

for each pixel of an image. The bins of the histogram can

be in a range of 0 to 180 degrees for unsigned gradient

and 0 to 360 degrees for signed gradient.

Fig. 2. Detailed description of HOG feature extraction.

3) Next, normalize the gradient strength of each cell by

combining the cells together into larger, spatially con-

nected blocks to make contrast and illumination in-

variant. The HOG feature vector is a concatenation

of all the normalized block regions. These blocks are

overlapped together which means each cell contribute

many times into the final HOG feature descriptor. For

block normalization, we concatenate all the cell vectors

of a block into a larger vector. This vector size should

be number of bins×number of cells in a block. Now,

normalization of this feature vector is done by using

L2-norm(f) which is computed in Equation 6 where v is

non-normalized feature vector at each block and e is a

small positive constant which averts divisibility of zero

in gradient-less blocks. The final HOG feature descriptor

is an array of feature vector of all images which is

collected by concatenation of normalized blocks.

L2-Norm:f =v√

||v||2 + e2(6)

Visualization of HOG features is important to give us

the confidence that HOG descriptor is working as it should.

The illustrations of visualization can be seen in Fig. 4. The

9898

Cell

Block 1 Block 2

Fig. 3. Subdivision of image into blocks of 2×2 cells with 50% overlappingand cells with 4× 4 pixels.

Fig. 4. An Example: HOG feature visualization of training images used inface detection.

extracted features are passed to support vector machine as a

training feature vector.

C. Training via Support Vector Machine

Support vector machine [21] is most widely used supervised

learning method for classification purpose. The main aim of

SVM is to determine an optimal function of hyper-plane which

classify or separate the extracted features into a different class.

SVM is fast classifier because it does not include all data in

the training phase. The data involved in the training phase

is called support vectors which lie closest to the optimal

hyperplane. In our method, oriented gradient feature vector

is fed to SVM with a label for all classes in which we want

to have our data classified. A pre-model is trained with our

dataset. To find a reasonable good pre-model, k-fold cross-

validation is performed over training dataset. Once the best

parameter within the given parameter space is found, then the

model is saved, and output of training phase is given to testing

phase. Before testing, most important consideration must be

taken into account is that training data should not be used in

testing instances.

D. Testing

In the testing phase, a skin color segmentation is applied on

the testing image to extract foreground area in which probabil-

ity of human face existence is high. Skin color segmentation is

explained next in detail. For each skin color region component,

a gradient oriented feature is calculated which is fed to SVM.

A dense scale-space pyramid is made of the testing image

to decompose the image into multiple scales. At each level

of the pyramid, the skin color segmentation and oriented

gradients feature extraction is applied. In literature, Gaussian,

Laplacian, Wavelet and Steerable pyramid techniques can

be used for multiscale image representation. They do not

preserve edge smoothing, but in the case of HOG descriptor,

smoothing will hurt the classification. We used local Laplacian

Filters [22] which preserve edge smoothing at each level

of image decomposition. A pre-trained model plus gradient

oriented features of segmented skin color region are supplied

to the SVM classifier. SVM classifier categorizes it into a face

or non-face. To localize a face, a bounding box is fixed over

all the classified faces in an image.

Skin Color Region Segmentation: The skin color regions

are extracted by using the combination of RGB and YCbCr

color spaces’ boundary rules. The skin color segmentation

occurs in two stages: first is an establishment of skin color

model by analyzing training images followed by application

of the color spaces’ boundary rule on the testing image to

subtract skin color regions. These stages are explained below.

Before applying skin region segmentation, we apply image

enhancement to reduce illumination varying effects and to

increase an accuracy of detection. We used a median filter that

offers wonderful smoothing of noise with an edge preservation

of the image. A series of morphological operations are also

performed to remove the noisy pixels in the image to yield skin

color regions without noise and clutter. The morphological

opening is applied to remove very small objects from the

image that are well below the size of a face while preserving

the shape and size of larger objects in the image. A disk shaped

structuring element of radius 6 is used in this case.

• Establishment of Skin Color Model: Several images are

collected from various sources to determine the skin

tone color range. Most of color subspaces are analyzed

to determine that range. The collected images cover a

large range of skin tone color varieties. The images

were captured in different illumination conditions. In

computer vision, many color space models [23] exist

like RGB, HSV, YCbCr, YUV, YIQ etc. with variable

performance. Selection of an adequate color model for

skin segmentation is essential because it can affect the

detection performance to a large extent. To improve

detection rate, we have used the combination of RGB

with YCbCr because YCbCr provides explicit separation

of luminance and chrominance component. These color

spaces achieve better performance at segmentation and

detection.

• Boundary Rules: Skin color boundary rules are defined

for RGB and YCbCr in Equations 7 and 8 respectively.

W =[R G B

]

RGB =(R > 95 ∧G > 40 ∧B > 20 ∧ (max(W )

−min(R,G < B)) > 15 ∧ |R−G| ≥ 15)

∧R > G > R > B)

(7)

YCbCr = (85 ≤ Cb ≤ 135)∧(10 ≤ Cr ≤ 45)∧(Y ≥ 80)(8)

9999

In our skin color segmentation, these two color models

were used which help to increase the face detection rate. The

segmentation output of both color models are depicted in Fig. 5

HOG features are computed of the segmented output and fed

into SVM and the further procedure is as same as previously

explained in testing section. The subsequent section will

elaborate the implementation results.

IV. EXPERIMENTAL RESULTS AND DISCUSSION

We tested our approach on multiple face Bao database

and our manually collected images from crowd surveillance

areas. These databases incorporated different illuminations,

backgrounds, expressions and occlusions having glasses and

beard. In fact, no publicly accessible and available dataset

is appropriate to evaluate the efficiency of our proposed

approach. We require a dataset of crowd surveillance in

which cameras are pointing towards the faces with different

lightening conditions, variable poses, clutter background, etc.

The evaluation of the proposed technique of face detection

is done using three performance parameters which are cor-

rect detection count(CDC) as in Equation 9, false detection

count(FDC) as in Equation 10 and Miss Rate(MR) as in

Equation 11. CDC also called sensitivity is the percentage ratio

of correctly detected faces to the total number of actual faces

in images, FDC is percentage ratio of a false face detection

to the total number of actual faces and Miss rate is defined

as the percentage of the undetected face. The standard metrics

used to measure the performance of face detection is given by

confusion matrix as shown in TABLE IV.

• Correct Detection Count:

CDC =TP

TP + TN∗ 100 (9)

• False Detection Count:

FDC =FP

TP + TN∗ 100 (10)

• Missing Rate:

MR =TN

TP + TN∗ 100 (11)

• True Positive(TP): correctly detected face count

• True Negative(TN): undetected face count

• False Positive(FP): non-face object detected count

• Total Faces(TF): True Positive(TP)+True Negative(TN)

TABLE ISTANDARD CONFUSION MATRIX TERMINOLOGY.

Predicted

Act

ual Face Non-Face

Face TP=98.02 TN=1.98

Non- Face FP= 0.72 FN =0

In TABLE II, we described the evaluation of our face

detection method in the form of CDC, FDC, and MR and its

comparison with other techniques. The testing is performed on

BAO multiple face database which contains total 157 images

with 1500 human face. 1473 faces were correctly detected

over 1500 faces. The correct detection count was 98.02% with

0.72% of false detection count, compared to Viola [24], [25]

and [26] with 82.80%, 89.17% and 94.26% CDC respectively.

These are tested on BAO multiple faces database within dif-

ferent orientation conditions. In [26] and [27], they used skin

color segmentation with the facial feature for face detection.

In their approach, visibility of facial feature like eye, nose,

lips are required to measure the eccentricity which is used

to know the probability of a skin color area being a face

region. Our algorithm is more efficient as it gives very few

false detection(0.72%) even without proper visibility of all

required facial features as we trained the SVM with histogram

of oriented gradient feature, it describes local shape and

appearance of an object (face in our case) by the distribution

of local intensity gradient and edge direction without much

knowledge of gradient position.

TABLE IICOMPARATIVE PERFORMANCE EVALUATION OF PROPOSED METHOD ON

BAO MULTIPLE FACE DATABASE

Method CDC FDC MR

Viola Jones [24] 82.80 9.5 17.2Face detection in color images [25] 89.17 8.1 10.83

Skin segmentation and facial features [26] 94.26 6.7 5.74Proposed method 98.02 0.72 1.98

We also tested our approach with manually collected images

from crowd surveillance areas which consists of multiple peo-

ple with dark illumination, clutter and skin color background.

This database contains 1121 human faces out of 50 images

with an average 20 to 25 people per image. The 876 faces

are correctly detected by our method. The correct detection

count is 78.14% with 21.85% miss rate as summarized in

TABLE III. Face detection without skin color segmentation

by using HOG gives only 71.36% correct detection count in

the same image sets as shown in Fig. 8. It shows, idea of

skin color segmentation is very beneficial to improve correct

detection rate.

TABLE IIIPERFORMANCE EVALUATION OF PROPOSED METHOD ON OUR MANUALLY

COLLECTED IMAGE DATABASE.

Method Total faces CDC FDC MR

HOG 1121 71.36 4.46 28.63

skin color segmentation+ HOG 1121 78.14 4.46 21.85

The face detection results of Bao multiple face image

database and our manually collected images are shown in

Fig. 6 and 7 respectively.

V. CONCLUSION

This paper presents a robust approach for crowd face

detection in surveillance areas. This method incorporates

skin color segmentation, oriented gradient feature with SVM

100100

Fig. 5. Example: Skin color segmentation of testing image by using RGB, YCbCr and intersect RGB×YCbCr skin color models

Fig. 8. Face detection using HOG on our manually collected images.

classification. We emphasized that non-skin color objects are

automatically discarded by using skin color segmentation.

The remaining objects are classified by SVM based on the

histogram of gradient oriented features, which give the shape

appearance. A pyramidal approach is applied to detect small

pixel size faces as well as normally sized ones.

The proposed approach provides good detection rate in a

diverse varieties of images captured in unconstrained illumi-

nation conditions. Experimental results revealed the efficiency

and robustness of this approach under complex background,

dim light, a variety of poses, expressions, etc. The presented

methodology offers 98.02% true detection rate regardless

of scale and variation of poses, an existence of occlusion,

complex and clutter background. It also reduces mathematical

computation. A novel method of face detection in crowded

scenes is our significant contribution to this paper. The utility

of this approach can be seen in real time applications with in-

door and outdoor surveillance camera services. It is important

to note that detection is a prerequisite to all phases of visual

crowd analysis, especially density estimation for safety and

supervision.

REFERENCES

[1] Geronimo, David, et al. ”Survey of pedestrian detection for advanceddriver assistance systems.” IEEE transactions on pattern analysis andmachine intelligence 32.7 (2010): 1239-1258.

[2] Felzenszwalb, Pedro F., et al. ”Object detection with discriminativelytrained part-based models.” IEEE transactions on pattern analysis andmachine intelligence 32.9 (2010): 1627-1645.

[3] Mikolajczyk, Krystian, Cordelia Schmid, and Andrew Zisserman. ”Hu-man detection based on a probabilistic assembly of robust part detectors.”European Conference on Computer Vision. Springer Berlin Heidelberg,2004.

[4] Wu, Bo, and Ramakant Nevatia. ”Detection of multiple, partially occludedhumans in a single image by bayesian combination of edgelet partdetectors.” Tenth IEEE International Conference on Computer Vision(ICCV’05) Volume 1. Vol. 1. IEEE, 2005.

[5] Idrees, Haroon, Khurram Soomro, and Mubarak Shah. ”Detecting humansin dense crowds using locally-consistent scale prior and global occlusionreasoning.” IEEE transactions on pattern analysis and machine intelli-gence 37.10 (2015): 1986-1998.

[6] Badal, Tapas, Neeta Nain, and Mushtaq Ahmed. ”Video partitioning bysegmenting moving object trajectories.” Seventh International Conferenceon Machine Vision (ICMV 2014). International Society for Optics andPhotonics, 2015.

[7] Farfade, Sachin Sudhakar, Mohammad J. Saberian, and Li-Jia Li. ”Multi-view face detection using deep convolutional neural networks.” Pro-ceedings of the 5th ACM on International Conference on MultimediaRetrieval. ACM, 2015.

[8] Alajel, Khalid Mohamed, Wei Xiang, and John Leis. ”Face detectionbased on skin color modeling and modified Hausdorff distance.” 2011IEEE Consumer Communications and Networking Conference (CCNC).IEEE, 2011.

[9] Li, Zhengming, Lijie Xue, and Fei Tan. ”Face detection in complex back-ground based on skin color features and improved AdaBoost algorithms.”Progress in Informatics and Computing (PIC), 2010 IEEE InternationalConference on. Vol. 2. IEEE, 2010.

[10] Paul, Padma Polash, and Marina Gavrilova. ”PCA based geometricmodeling for automatic face detection.” Computational Science and ItsApplications (ICCSA), 2011 International Conference on. IEEE, 2011.

101101

Fig. 6. Example: face detection results of skin + HOG on BAO multiple face image database.

Fig. 7. Example: face detection results of skin + HOG on our manually collected images of crowd surveillance.

[11] Gupta, Sandeep K., et al. ”A hybrid method of feature extraction forfacial expression recognition.” Signal-Image Technology and Internet-Based Systems (SITIS), 2011 Seventh International Conference on. IEEE,2011.

[12] Viola, Paul, and Michael J. Jones. ”Robust real-time face detection.”International journal of computer vision 57.2 (2004): 137-154.

[13] Viola, Paul, Michael J. Jones, and Daniel Snow. ”Detecting pedestri-ans using patterns of motion and appearance.” International Journal ofComputer Vision 63.2 (2005): 153-161.

[14] Wu, Bo, et al. ”Fast rotation invariant multi-view face detection basedon real adaboost.” Automatic Face and Gesture Recognition, 2004.Proceedings. Sixth IEEE International Conference on. IEEE, 2004.

[15] Yang, Ming-Hsuan, Narendra Ahuja, and David Kriegman. ”A surveyon face detection methods.” (1999).

[16] Viola, Paul, Michael J. Jones, and Daniel Snow. ”Detecting pedestri-

ans using patterns of motion and appearance.” International Journal ofComputer Vision 63.2 (2005): 153-161.

[17] Gavrila, Dariu M. ”Pedestrian detection from a moving vehicle.” Euro-pean conference on computer vision. Springer Berlin Heidelberg, 2000.

[18] Dalal, Navneet, and Bill Triggs. ”Histograms of oriented gradients forhuman detection.” 2005 IEEE Computer Society Conference on ComputerVision and Pattern Recognition (CVPR’05). Vol. 1. IEEE, 2005.

[19] P. Dollr, S. Belongie, and P. Perona. The Fastest Pedestrian Detector inthe West. 2010.

[20] Schwartz, William Robson, et al. ”Human detection using partial leastsquares analysis.” 2009 IEEE 12th international conference on computervision. IEEE, 2009.

[21] Suykens, Johan AK, and Joos Vandewalle. ”Least squares support vectormachine classifiers.” Neural processing letters 9.3 (1999): 293-300.

[22] Paris, Sylvain, Samuel W. Hasinoff, and Jan Kautz. ”Local Laplacian

102102

filters: edge-aware image processing with a Laplacian pyramid.” Com-munications of the ACM 58.3 (2015): 81-91.

[23] Vezhnevets, Vladimir, Vassili Sazonov, and Alla Andreeva. ”A surveyon pixel-based skin color detection techniques.” Proc. Graphicon. Vol. 3.2003.

[24] Wang, Yi-Qing. ”An Analysis of the Viola-Jones face detection algo-rithm.” Image Processing On Line 4 (2014): 128-148.

[25] Hsu R-L, Abdel-Mottaleb M, Jain AK (2002) Face detection in colorimages. Pattern Anal Mach Intell IEEE Trans 24(5):696706

[26] Yadav, Shalini, and Neeta Nain. ”Fast Face Detection Based on SkinSegmentation and Facial Features.” 2015 11th International Conferenceon Signal-Image Technology and Internet-Based Systems (SITIS). IEEE,2015.

[27] Yadav, Shalini, and Neeta Nain. ”A novel approach for face detectionusing hybrid skin color model.” Journal of Reliable Intelligent Environ-ments (2016): 1-14.

[28] Gavrila, Dariu M. ”Pedestrian detection from a moving vehicle.” Euro-pean conference on computer vision. Springer Berlin Heidelberg, 2000.

103103