[IEEE 2014 IEEE International Conference on Multimedia and Expo (ICME) - Chengdu, China...

LEARNING FLEXIBLE BLOCK BASED LOCAL BINARY PATTERNS FORUNCONSTRAINED FACE DETECTION

Zhenhua Chai, Yu Zhang, Zhijun Du, Dong Wang

Media Technology Lab,Huawei Technologies Co., Ltd

Heydi Mendez-Vazquez

Advanced Technologies ApplicationCenter (CENATAV), Cuba

ABSTRACT

Face detection has been a very active research topic in re-

cent years. However, when applied to uncontrolled envi-

ronments, some systems exhibit poor generalization ability.

Even though few of existing methods can achieve promis-

ing results in some challenging situations, they usually have

the requirement of high computational cost. This will defi-

nitely limit the use of those methods in some mobile platforms

which have limited computational resources and strict power-

consumption control. In this paper, a novel facial represen-

tation method for multi-view face detection in uncontrolled

environment is presented. The proposed method, named Flex-

ible Block based Local Binary Patterns (FBLBP), has low

storage requirements and it is fast to compute; while its per-

formance is comparable with the state of the art methods,

demonstrated on the challenging Face Detection Data set and

Benchmark (FDDB).

Index Terms— face detection, structured ordinal fea-

tures, two stage learning, boosting

1. INTRODUCTION

Face detection is one of the hottest research topics in com-

puter vision and pattern recognition in the past few years [1].

The reasons can be attributed to its wide range of applications,

especially as one of the prerequisite components for other in-

teresting applications, e.g. smart cameras, biometrics, video

surveillance, digital album management and human computer

interactions (HCI) etc [2]. The aim of face detection is to find

and locate human faces in digital images (or videos), no mat-

ter the face pose or if there is an occlusion. Up to now it is still

a challenging task to detect all the faces in all images with-

out a mistake. Some examples of faces miss-classified by a

very popular and well optimized face detector (OpenCV) are

shown in Fig. 1.

Several algorithms have been proposed to work toward

this goal. The early works mainly focus on statistical learn-

ing based classifiers for frontal face detection. Thus, popular

classifiers such as neural networks (NN) [3], sparse network

of Winnows (SNoW) [4] and support vector machines (SVM)

Fig. 1. The challenge of face detection in real life (including

pose, backlight, low resolution, occlusion and large expres-

sion variations). The haarcascade−frontalface−alt−treeand haarcascade−profileface detectors (OpenCV2.4.7)

only have detected two faces in these difficult images.

[5] have been used in the literature for this purpose. Good re-

sults have been achieved by those methods, but their applica-

tions are greatly limited by the system speed. Later, a boost-

ing based method was proposed by Viola and Jones (VJ) [6],

exhibiting excellent performance on both accuracy and speed,

and being one of the most popular methods even till now.

The main advantages of VJ method are: 1) the used haar-

like features can be computed very fast by the integral im-

age; 2) the adaboost algorithm [7] is used for effective fea-

ture selection from a large feature pool and at the same time

for classifier training; 3) the cascade structure for detection is

exploited in order to filter most of non-face regions in early

stages. In recent years, many efforts have been made to im-

prove the VJ model in any of these three directions [2].

The features used for describing the faces is considered

one of the most important aspects on the VJ framework for

face detection. Lienhart and Maydt proposed an extended set

of haar-like features with more orientations [8]. The enriched

feature set can better describe the face information. Zhao

et al. introduced a method named Non-Adjacent Rectangle

(NAR) Haar-like Feature [9] to capture the context informa-

tion of face images. However, a single haar-like feature seems

to be too simple to represent the face image, and usually a de-

tector needs thousands of them to achieve a considerable per-

formance. There are other works that take advantage of the

co-occurrence information to make the features more discrim-

inative. The multi block local binary patterns (MBLBP) [10]

by Zhang et al. and the local assemble binary features (LAB)

[11] by Yan et al. are two good examples of this kind of meth-

ods. Both of them benefit from more complex structures and

achieve better performance than the original haar-like based

features [10, 11].

It should be noticed that all these features mentioned

above share a similar attribute, that is to describe the face

image using simple and robust ordinal comparisons. More

complex features have also been proposed, e.g. speeded up

robust features (SURF) [12], histogram of gradients (HOG)

[13], Gabor wavelets [14] etc. Some of them involved more

sophisticated and time-consuming structures, such as the de-

formable part based model (DPM) [13]. However, although

interesting results have been achieved, the computational cost

of most of these methods is very high. Thus in general, they

do not fit for real-time and mobile applications.

In this work a new method for face detection is proposed.

Following the strategy of VJ method, a new kind of ordinal

feature is proposed, in order to assure a fast speed in the fea-

tures computation process while increasing the classification

accuracy. The contributions of this work are twofold: 1) a

more general ordinal feature, named Flexible Block based Lo-

cal Binary Patterns (FBLBP), is introduced, and 2) a two-step

weak learner algorithm based on the proposed features is pre-

sented, in order to avoid permutation explosion.

This paper is organized as follows: in section 2 the draw-

backs of related works are analyzed. In section 3, the details

of the proposed method are introduced. Experimental results

on the challenging FDDB [15] are presented in section 4. Fi-

nally, conclusions are given in section 5.

2. RELATED WORK

There are mainly three methods related to our work. The first

of them is the MBLBP [10], which is an extension of the Lo-

cal Binary Patterns operator [16] by varying the block size

involved in local comparisons. The block based ordinal com-

parison is more robust to noise than the pixel based local com-

parison. Besides, the fusion of multiple block based ordinal

features allows to capture larger scale structure information.

Thus, MBLBP performs better than LBP for face detection

[10]. However, in MBLBP the feature structure is designed to

be fixed, and all the neighbor blocks must be connected to the

center block. The resulted features can not capture the dis-

criminative ordinal information between two regions at a dis-

tance. Another related method is the joint-Haar feature [17].

But it uses a different kind of weak learner and only considers

the adjacent block case.

The last method relevant to our work is the multi-scale

structured ordinal features (MSOF) [18]. Even though MSOF

overcomes the weakness of the first two methods mentioned

above by considering non-adjacent blocks, it uses a fixed inter

block distance between the center block and neighbor blocks.

Thus, it is not so flexible neither, as it is illustrated on Fig. 2

(a).

Fig. 2. The visual differences between a) MSOF and b) the

proposed FBLBP. The distances between the center block and

neighbor blocks in MSOF are all the same while for FBLBP

all the inter block distances and block positions are learned

from the training set.

3. FBLBP CASCADE FOR FACE DETECTION

The proposed face detector follows VJ strategy. First, a novel

set of more discriminative features is introduced that can be

computed very fast by using integral images. Then, a boost-

ing algorithm is used to select the most discriminative features

and to construct a binary classifier. Multi output regression

tree is used as the weak learner in the proposed framework,

and each weak classifier is trained in two steps and in an iter-

ative way based on a pivot block. Finally, a boosting cascade

structure is used to train the strong classifier. The details of

every step of the proposal will be described in the following.

3.1. Flexible Block based Local Binary Patterns (FBLBP)

The main motivation for proposing FBLBP feature is to over-

come the fixed structure problems in MBLBP [10] and MSOF

[18]. It is shown in Fig. 2 (b) that the structure of the pro-

posed FBLBP is more flexible than existing approaches, by

learning the inter block distances and positions of the blocks

under comparison in the training step.

FBLBP can be computed in a similar way to MBLBP and

MSOF. First, the differential values between average intensity

of the center block (also called pivot block) and average in-

tensities of all neighbor blocks are computed. Then, all the

values are thresholded into binary codes. Finally, all the bi-

nary codes are concatenated together to get one FBLBP fea-

ture. The details can be found in Fig. 3. All the block based

features can be computed very efficiently by using the integral

images [6].

Fig. 3. FBLBP encoding process. The average intensity value

in each block is compared with the center block, and then the

comparison result is thresholded into a binary value.

In order to make the block based local comparisons more

discriminative, we define the k-th FBLBP feature as two sub-

classes FBLBPk,1 and FBLBPk,0 (k ∈ [1,K] and K is the

total number of FBLBP features) by making an extra sign

judgement. The i-th element in FBLBPk,1 or FBLBPk,0 can

be obtained by formula (1).

Let δ(•) be the kronecker delta, so that when the input is

true, the output is 1, otherwise 0:

Set sign = 1, then FBLBPk,1,i =δ((AvgIntpivot −AvgIntneighbori) ≥ θk,1)

Set sign = 0, then FBLBPk,0,i =δ((AvgIntpivot −AvgIntneighbori) ≤ θk,2)

(1)

By using this strategy, the size of the resulted feature set

becomes doubled, which means a richer feature pool is ob-

tained. Besides, the sign bit makes the FBLBP feature more

discriminative. This thresholding strategy is very similar to

the one used in the local ternary patterns (LTP) [19] for face

recognition. However, in LTP the θk,1 is set to be the oppo-

site number of θk,0, and both of them are fixed for all face re-

gions. In the proposal, both variables can be different for each

FBLBP feature and they are obtained in the learning process

described in the next two subsections.

3.2. Gentleboost Cascade Learning

The flexible structure makes the FBLBP features more dis-

tinctive as described above but at the same time highly redun-

dant. Hence an efficient algorithm is needed to select a subset

of the most significant features. In this paper, the gentleboost

[20] is used due to its robustness to outliers. Given a set of

training samples labeled as (s1, y1), ..., (sN , yN ), the gentle-

boost can be viewed as a sequential procedure to fit additive

models F (s) =M∑

m=1fm(s). For each iteration, the aim is to

select the best weak classifier fm(s), which can minimize the

weighted squared error under the current sample distribution.

The details can be found on Algorithm 1.

When one stage classifier is obtained, we use the boot-

strap to create more samples and train the classifier of the

next stage. The training process for the cascade detector will

Algorithm 1: Gentleboost classifier training

1. Initialize the sample weight wj = 1N

, j = 1, 2, ..., Nand the model F (•) = 0

2. Repeat for m = 1..M :

(a) Fit the weighted least square

regression model of Y to S:

Jwse =∑N

j=1 wj(yj − fm(sj))2

(b) Update F (s) ← F (s) + fm(s)

(c) Update wj ← wje−yifm(si) and normalization

3. Output the stage classifier F (s) = sign[∑M

m=1 fm(s)]

stop when the system performance satisfies our goal. More

details about gentleboost and cascade structure can be found

in the literature [6, 20].

3.3. A kind of two-step weak learner

The flexible structure increases the number of features in

many times. Thus, the pivot block becomes more important

than before. Taking the FBLBP feature set with block size

of 5x5 pixels as an example, for a normalized face image of

24x24 pixels, the total number of blocks will be equal to 400.

If we assume that each FBLBP feature has eight neighbor

blocks, by permutation this will produce billions of FBLBP

features (C8400) while under the same condition the number

of the original MBLBP with the same block size is only 400.

Hence, this will cause the combination explosion and defi-

nitely can not work properly in practice.

Taking this into account, in the proposed method the fea-

ture learning process is divided into two steps and a greedy

solution is used instead of the optimal exhausted search. The

first step is to determine the best positions and the best scale

for both the pivot block and the first neighbor block. This

is desired to train one temporal weak classifier with any two

blocks in the same scale that best fits the minimization of the

weighted squared error, Jwse. The search space is greatly

reduced in this case, and the optimization becomes tractable

by a personal computer nowadays. Since our FBLBP feature

is composed of a string of binary values (or encoded as an

integer) and this value is non-metric, we use multi output re-

gression tree [10] as the weak classifier. The output can be

explicitly obtained by equation (2):

fm(FBLBP k) = ap =

N∑

j=1

wjyjδ(FBLBP jk = p)

N∑

j=1

wjδ(FBLBP jk = p)

(2)

where FBLBP jk stands for the k-th FBLBP feature of the

face sample j, and the k is determined by the number of

neighbor blocks P in one FBLBP feature (p ∈ [1, 2P ]).After we get the best combination of two blocks, we set

either of them as the pivot block and the other as the first

neighbor block. This will greatly reduce the search space for

the next step, because when the block scale is fixed the total

number of candidate blocks is reduced. In the example be-

fore, in a 24x24 face image, it will be less than 576. Then, the

next step is to build one complete weak classifier by wrap-

ping more neighbor blocks. In this paper a greedy forward

search is used, and the learning process will stop when the

performance begins to decrease. The details are described in

Algorithm 2.

Algorithm 2: Search of candidate blocks

Input: The weights of training samples wi,i = 1, 2, ..., N

1. Initialize FBLBPk = ∅2. Find the best pivot and the first

neighbor block by minimizing:

{xp, xn, err} = argmin

(∑N

j=1 wj(yj − fm(xpj

⋃xnj ))

2);

FBLBPk = FBLBPk

⋃xn,

2. Then, wrap more blocks in the same size scale:

While (i < T )err old = err;

{xn, err} = argmin

(∑N

j=1 wj(yj − fm(FBLBPk

⋃xnj ))

2);

if (err > err old)

break;

elseFBLBPk = FBLBPk

⋃xn;

endend

3. Output FBLBPk

Besides, we set a parameter T to control the maximum

number of neighbor blocks for each FBLBP feature in order

to avoid overfitting. Other methods like backward search and

float search [21] can also be involved in this framework.

4. EXPERIMENTS

In order to show the advantages of the proposed method, we

evaluate the proposed FBLBP cascade detector on the chal-

lenging Face Detection Data set and Benchmark (FDDB)

[15]. The FDDB is one of the most widely used face de-

tection databases. It contains 2845 images with 5771 faces

captured in unconstrained environment. For training, we have

collected more than 20K face images (both frontal and non-

frontal) and background samples from the Internet. During

testing we strictly follow the standard test protocol and use

the testing tools provided by this database. Besides, we use a

subset of training set for cross validation.

The first selected FBLBP feature is shown in Fig. 4. We

can find that the pivot block and the neighbor blocks are non-

adjacent for this most discriminative FBLBP feature. Thus,

the original MBLBP which only considers adjacent neigh-

bor blocks does not fully explore the discriminant property

Fig. 4. The first selected FBLBP feature. The pivot block is

in red while the neighbor blocks are in green.

of block based LBP. Besides, it can be seen from the figure

that the position and distances of the selected neighbor blocks

is not fixed like in MSOF. Besides, we can also find that the

pivot block of the most discriminative feature is located close

to the eye regions, which obeys our common sense for the

most discriminative feature of human faces.

We also explore how the maximum number of neighbor

blocks, T , affects the detection results. As it is shown in Fig.

5 the best results are obtaining for T = 6.

Fig. 5. The performances of different maximum number of

neighbor block in one FBLBP on validation set.

Following the FDDB protocol we have compared the pro-

posed FBLBP with a number of state-of-the-art techniques:

the original MBLBP, the non-adjacent rectangle (NAR) haar-

like feature [9], the latest tree structure model (TSM) [13] and

the top 5 academic methods listed on FDDB website [15], in-

cluding (1) Li’s SURF cascade detector from Intel Lab [12];

(2) Jain’s detector [22] and (3) Mikolajaczyk’s detector [23],

both of which leverage the context information; (4) Subbura-

man’s face detector [24] which has a fast bounding box esti-

mation; and (5) the VJ model [6] implemented by the latest

version of Opencv. The ROC curves for both continuous and

discrete testing cases are shown in Figure 6. It can be seen

that our proposed method is comparable or even better than

the state of the art methods when the false positive is low.

Some of the detected faces in difficult images can be found in

Fig. 7, including those not detected on Fig. 1.

a)

b)

Fig. 6. Performance on FDDB: a) continuous score and b)

discrete score.

Finally, the speed of our detector is tested. It can be found

that 10 fps can be achieved on a VGA image for a common

PC with i5-2400 processor and without explicit optimization

(e.g. SIMD ). This is almost at the same level as the SURF

detector [12] under a similar hardware, but it is much faster

than the TSM [13]. However, our minimum detection window

is 24x24 while for both TSM and SURF detector it is 40x40.

This means that faces less than 40x40 pixels will be missed

completely by their detectors, while our method can detect

more faces in low resolution images, which is important in

some cases (e.g. mobile surveillance). Besides, we believe

that our detector will be even more competitive when better

hardware be available.

5. CONCLUSION AND DISCUSSION

In this paper, an efficient and effective ensemble model of

flexible block based local binary patterns (FBLBP) is pro-

posed for face detection in uncontrolled environment. We

argue that the detection performance improvement over the

original MBLBP can be attributed to the more flexible struc-

ture. Besides, the extra sign bit makes the FBLBP feature

more discriminative. Moreover, the storage is less than the

original MBLBP, since we take at most six blocks to con-

struct a weak classifier and we need less stage classifiers to

achieve a training error rate similar to the one of MBLBP.

What is more important, a better detection result is obtained

with FBLBP.

As future work, a skin salience model will be considered

for fast filtering the non-face windows. We believe this will

further speed up the system. Besides, the research on how to

automatically set the number of blocks for each FBLBP will

also be interesting.

6. REFERENCES

[1] Ming hsuan Yang, David J. Kriegman, and Narendra

Ahuja, “Detecting faces in images: A survey,” IEEEPattern Analysis and Machine Intelligence (PAMI), vol.

24, no. 1, 2002.

[2] Cha Zhang and Zhengyou Zhang, “A survey of recent

advances in face detection,” Tech. Rep., Microsoft Re-

search, June 2010.

[3] H.A. Rowley, S. Baluja, and T Kanade, “Neural

network-based face detection,” in IEEE Conferenceon Computer Vision and Pattern Recognition (CVPR),1996, pp. 203–208.

[4] Ming hsuan Yang, Dan Roth, and Narendra Ahuja, “A

SNoW-based face detector,” in Advances in Neural In-formation Processing Systems, 2000, pp. 855–861.

[5] E. Osuna, R. Freund, and F. Girosi, “Training sup-

port vector machines: an application to face detection,”

in IEEE Conference on Computer Vision and PatternRecognition (CVPR), 1997, pp. 130–136.

[6] Paul Viola and MichaelJ. Jones, “Robust real-time face

detection,” International Journal of Computer Vision,

vol. 57, no. 2, pp. 137–154, 2004.

[7] Yoav Freund and Robert E Schapire, “A decision-

theoretic generalization of on-line learning and an ap-

plication to boosting,” Journal of Computer and SystemSciences, vol. 55, no. 1, pp. 119 – 139, 1997.

[8] R. Lienhart and J. Maydt, “An extended set of haar-

like features for rapid object detection,” in InternationalConference on Image Processing (ICIP), 2002, vol. 1,

pp. 900–903.

[9] Xiaowei Zhao, Xiujuan Chai, Zhiheng Niu, Cherkeng

Heng, and Shiguang Shan, “Context modeling for fa-

cial landmark detection based on non-adjacent rectangle

Fig. 7. Some examples of faces detected by the proposed FBLBP detector.

(NAR) haar-like feature,” Image and Vision Computing,

vol. 30, no. 3, pp. 136–146, 2012.

[10] Lun Zhang, Rufeng Chu, Shiming Xiang, Shengcai

Liao, and StanZ. Li, “Face detection based on Multi-

Block LBP representation,” in International Conferenceon Biometrics (ICB).

[11] Shengye Yan, Shiguang Shan, Xilin Chen, and Wen

Gao, “Locally assembled binary (lab) feature with

feature-centric cascade for fast and accurate face detec-

tion,” in IEEE Conference on Computer Vision and Pat-tern Recognition (CVPR), 2008, pp. 1–7.

[12] Jianguo Li, Tao Wang, and Yimin Zhang, “Face detec-

tion using SURF cascade,” in IEEE International Con-ference on Computer Vision Workshops (ICCV Work-shops), 2011, pp. 2183–2190.

[13] Xiangxin Zhu and Deva Ramanan, “Face detection,

pose estimation, and landmark localization in the wild,”

in IEEE Conference on Computer Vision and PatternRecognition (CVPR), 2012, pp. 2879–2886.

[14] Zhenhua Chai, Zhenan Sun, H. Mendez-Vazquez, Ran

He, and Tieniu Tan, “Gabor ordinal measures for face

recognition,” IEEE Transactions on Information Foren-sics and Security(TIFS), vol. 9, no. 1, pp. 14–26, 2014.

[15] Vidit Jain and Erik Learned-Miller, “Fddb: A bench-

mark for face detection in unconstrained settings,” Tech.

Rep. UM-CS-2010-009, University of Massachusetts,

Amherst, 2010.

[16] Timo Ojala, Matti Pietikainen, and Topi Maenpaa,

“Multiresolution gray-scale and rotation invariant tex-

ture classification with local binary patterns,” IEEETransactions on Pattern Analysis and Machine Intelli-gence (PAMI), vol. 24, no. 7, pp. 971–987, 2002.

[17] T. Mita, T. Kaneko, and O. Hori, “Joint haar-like fea-

tures for face detection,” in IEEE International Confer-ence on Computer Vision(ICCV), 2005, pp. 1619–1626.

[18] Shengcai Liao, Zhen Lei, Stan Z. Li, Xiaotong

Yuan, and Ran He, “Structured ordinal features for

appearance-based object representation,” in 3rd Inter-national Conference on Analysis and Modeling of Facesand Gestures (AMFG07), 2007, pp. 183–192.

[19] Xiaoyang Tan and B. Triggs, “Enhanced local texture

feature sets for face recognition under difficult lighting

conditions,” IEEE Transactions on Image Processing(TIP), vol. 19, no. 6, pp. 1635–1650, 2010.

[20] Jerome Friedman, Trevor Hastie, and Robert Tibshirani,

“Additive logistic regression: a statistical view of boost-

ing,” Annals of Statistics, vol. 28, pp. 2000, 1998.

[21] Stan Z. Li and ZhenQiu Zhang, “Floatboost learning and

statistical face detection,” IEEE Transactions on PatternAnalysis and Machine Intelligence (PAMI), vol. 26, no.

9, pp. 1112–1123, 2004.

[22] V. Jain and E. Learned-Miller, “Online domain adap-

tation of a pre-trained cascade of classifiers,” in IEEEConference on Computer Vision and Pattern Recogni-tion (CVPR), 2011, pp. 577–584.

[23] Krystian Mikolajczyk, Cordelia Schmid, and Andrew

Zisserman, “Human detection based on a probabilistic

assembly of robust part detectors,” in ECCV 2004, vol.

3021 of Lecture Notes in Computer Science, pp. 69–82.

2004.

[24] Venkatesh Bala Subburaman and Sebastien Marcel,

“Fast bounding box estimation based face detection,” in

ECCV Workshop on Face Detection: Where we are, andwhat next, 9 2010.

[IEEE 2014 IEEE International Conference on Multimedia and Expo (ICME) - Chengdu, China...

Documents

Transcript of [IEEE 2014 IEEE International Conference on Multimedia and Expo (ICME) - Chengdu, China...