Learned local Gabor patterns for face representation and ... Local Gabor Patterns... · CAS-PEAL-R1...

12
Learned local Gabor patterns for face representation and recognition Shufu Xie a,b , Shiguang Shan a, , Xilin Chen a , Xin Meng c , Wen Gao a,d a Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences (CAS), Beijing,100190, China b Graduate University of CAS, Beijing, 100190, China c NEC (China) Co., Ltd., Beijing, 100084, China d Institute for Digital Media, Peking University, Beijing, 100190, China article info Article history: Received 1 November 2008 Received in revised form 19 February 2009 Accepted 20 February 2009 Keywords: Gabor wavelet Local patterns Histogram Weighting strategy abstract In this paper, we propose Learned Local Gabor Patterns (LLGP) for face representation and recognition. The proposed method is based on Gabor feature and the concept of texton, and defines the feature cliques which appear frequently in Gabor features as the basic patterns. Different from Local Binary Patterns (LBP) whose patterns are predefined, the local patterns in our approach are learned from the patch set, which is constructed by sampling patches from Gabor filtered face images. Thus, the patterns in our approach are face-specific and desirable for face perception tasks. Based on these learned patterns, each facial image is converted into multiple pattern maps and the block-based histograms of these patterns are concatenated together to form the representation of the face image. In addition, we propose an effective weighting strategy to enhance the performances, which makes use of the discriminative powers of different facial parts as well as different patterns. The proposed approach is evaluated on two face databases: FERET and CAS-PEAL-R1. Extensive experimental results and comparisons with existing methods show the effectiveness of the LLGP representation method and the weighting strategy. Especially, heterogeneous testing results show that the LLGP codebook has very impressive generalizability for unseen data. & 2009 Elsevier B.V. All rights reserved. 1. Introduction Face recognition has attracted significant attention due to its potential value in security applications and research fields. Much progress has been made in the last several decades [34]. However, due to the fact that the facial appearances are easily affected by the variations of pose, expression, illumination and other factors, it is still an active and challenging research topic. Therefore, many researchers have devoted their efforts to make face recognition systems more robust to these variations. One of the key issues for successful face recognition systems is the development of effective representations. In the literature, numerous face representation meth- ods have been proposed (e.g., [1,2,6,8,12,15,19,25,33]). Among them, the appearance based approaches seem to dominate up to now. According to how the elements of the face representation are calculated, these methods can be coarsely categorized into global feature based and local feature based methods. Specifically, in global feature, its each element is related to the whole input face image, while in local feature, its every dimension is extracted from some local region of the face image. It is worth pointing out that, there is no clear boundary between them: for local feature, with the increase of the size of the region for feature extraction, its ‘‘globality’’ increases correspondingly. Contents lists available at ScienceDirect journal homepage: www.elsevier.com/locate/sigpro Signal Processing ARTICLE IN PRESS 0165-1684/$ - see front matter & 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.sigpro.2009.02.016 Corresponding author. E-mail addresses: [email protected] (S. Xie), [email protected] (S. Shan), [email protected] (X. Chen), [email protected] (X. Meng), [email protected] (W. Gao). Signal Processing ] (]]]]) ]]]]]] Please cite this article as: S. Xie, et al., Learned local Gabor patterns for face representation and recognition, Signal Process. (2009), doi:10.1016/j.sigpro.2009.02.016

Transcript of Learned local Gabor patterns for face representation and ... Local Gabor Patterns... · CAS-PEAL-R1...

Page 1: Learned local Gabor patterns for face representation and ... Local Gabor Patterns... · CAS-PEAL-R1 [3] databases. Experimental results show the effectiveness of the LLGP representation

ARTICLE IN PRESS

Contents lists available at ScienceDirect

Signal Processing

Signal Processing ] (]]]]) ]]]–]]]

0165-16

doi:10.1

� Cor

E-m

xlchen@

wgao@j

PleasProc

journal homepage: www.elsevier.com/locate/sigpro

Learned local Gabor patterns for face representation and recognition

Shufu Xie a,b, Shiguang Shan a,�, Xilin Chen a, Xin Meng c, Wen Gao a,d

a Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences (CAS), Beijing, 100190, Chinab Graduate University of CAS, Beijing, 100190, Chinac NEC (China) Co., Ltd., Beijing, 100084, Chinad Institute for Digital Media, Peking University, Beijing, 100190, China

a r t i c l e i n f o

Article history:

Received 1 November 2008

Received in revised form

19 February 2009

Accepted 20 February 2009

Keywords:

Gabor wavelet

Local patterns

Histogram

Weighting strategy

84/$ - see front matter & 2009 Elsevier B.V. A

016/j.sigpro.2009.02.016

responding author.

ail addresses: [email protected] (S. Xie), sgshan@

jdl.ac.cn (X. Chen), [email protected].

dl.ac.cn (W. Gao).

e cite this article as: S. Xie, et al.,ess. (2009), doi:10.1016/j.sigpro.2009

a b s t r a c t

In this paper, we propose Learned Local Gabor Patterns (LLGP) for face representation

and recognition. The proposed method is based on Gabor feature and the concept of

texton, and defines the feature cliques which appear frequently in Gabor features as the

basic patterns. Different from Local Binary Patterns (LBP) whose patterns are predefined,

the local patterns in our approach are learned from the patch set, which is constructed

by sampling patches from Gabor filtered face images. Thus, the patterns in our approach

are face-specific and desirable for face perception tasks. Based on these learned

patterns, each facial image is converted into multiple pattern maps and the block-based

histograms of these patterns are concatenated together to form the representation of the

face image. In addition, we propose an effective weighting strategy to enhance the

performances, which makes use of the discriminative powers of different facial parts as

well as different patterns. The proposed approach is evaluated on two face databases:

FERET and CAS-PEAL-R1. Extensive experimental results and comparisons with existing

methods show the effectiveness of the LLGP representation method and the weighting

strategy. Especially, heterogeneous testing results show that the LLGP codebook has very

impressive generalizability for unseen data.

& 2009 Elsevier B.V. All rights reserved.

1. Introduction

Face recognition has attracted significant attention dueto its potential value in security applications and researchfields. Much progress has been made in the last severaldecades [34]. However, due to the fact that the facialappearances are easily affected by the variations of pose,expression, illumination and other factors, it is still anactive and challenging research topic. Therefore, manyresearchers have devoted their efforts to make face

ll rights reserved.

jdl.ac.cn (S. Shan),

com.cn (X. Meng),

Learned local Gabor.02.016

recognition systems more robust to these variations. Oneof the key issues for successful face recognition systems isthe development of effective representations.

In the literature, numerous face representation meth-ods have been proposed (e.g., [1,2,6,8,12,15,19,25,33]).Among them, the appearance based approaches seem todominate up to now. According to how the elements of theface representation are calculated, these methods can becoarsely categorized into global feature based and localfeature based methods. Specifically, in global feature, itseach element is related to the whole input face image,while in local feature, its every dimension is extractedfrom some local region of the face image. It is worthpointing out that, there is no clear boundary betweenthem: for local feature, with the increase of the size of theregion for feature extraction, its ‘‘globality’’ increasescorrespondingly.

patterns for face representation and recognition, Signal

Page 2: Learned local Gabor patterns for face representation and ... Local Gabor Patterns... · CAS-PEAL-R1 [3] databases. Experimental results show the effectiveness of the LLGP representation

ARTICLE IN PRESS

S. Xie et al. / Signal Processing ] (]]]]) ]]]–]]]2

In the face recognition field, many global feature basedmethods have been studied widely for face representation(e.g., [2,10,11,18,24,25,29]). Among this category, the mostrepresentative work might be those subspace basedmethods, which commonly learn transform matrices fromone training set by optimizing different criteria [2,25].Usually, most of the elements in their transform matricesare non-zero, which implies that the extracted featuresare related to the whole face image. As a typical subspacemethod, Eigenfaces compute the principal vectors fromone training set and represent the face image as thecoefficients of the small set of characteristic facial images[25]. Different from Eigenfaces, Fisherfaces extract thediscriminative information from the holistic face imagesbased on Fisher’s Linear Discriminant (FLD) analysis,which maximizes the ratio of the determinant of thebetween-class scatter matrix and that of the within-classscatter matrix in the low dimensional subspace [2].Similarly, their 2D extensions, e.g., 2D PCA [29], Binary2D PCA (B-2DPCA) [18] and 2D LDA [10,31], are also globalfeature based methods. Another kind of method is to usethe frequency coefficients as the features for facerepresentation, such as Discrete Fourier Transform (DFT)[7], Discrete Cosine Transform (DCT) [4].

While global feature based face representations werepopular for face recognition, the representations based onlocal feature have received more and more attention(e.g., [1,6,12,14,17,27,30]). Generally, local features areextracted from local facial regions. So, they are overallrobust to those facial variations due to expression, partialocclusion and illumination, since most variations inappearance affect only part of the face region (i.e., onlyfew dimensions of local features). Local feature analysis(LFA) pioneers the study of local representation for facerecognition, which encodes the local topological structureof face images [19]. The 2D Gabor wavelets, whose kernelsprovide a useful and reasonably accurate description ofmost spatial aspects of simple receptive fields, exhibitdesirable characteristics of spatial locality and orientationselectivity [5]. Thus, Gabor feature has been broadlystudied and recognized as one of the most successful localfeatures for face representation. Many approaches basedon Gabor feature have been proposed for face recognition,such as Dynamic Link Architecture (DLA) [6], ElasticBunch Graph Match (EBGM) [27], Gabor-Fisher Classifier(GFC) [12], and AdaBoosted Gabor features [30]. Morerecently, based on the feed-forward model of the primatevisual object recognition pathway proposed by Riesenhu-ber and Poggio [23], S2 Facial Features (S2FF) are alsoproposed for face processing [14]. Using Region Covar-iance Matrices (RCM) proposed by Tuzel et al. [26], Panget al. [17] incorporate Gabor features to compute RCMs forhuman face recognition, and show that its discriminationability is significantly greater than that of the conven-tional RCM.

In the recent years, a few face representation methodsbased on local patterns have received much attention(e.g., [1,32,33]). They typically encode the local patterns ina small neighborhood and utilize the block basedhistograms of these patterns to represent the face image.Evidently, the element of the block based histogram is just

Please cite this article as: S. Xie, et al., Learned local GaborProcess. (2009), doi:10.1016/j.sigpro.2009.02.016

from certain facial part, and thus this kind of featurebelongs to local feature. One of the most representativeworks might be Local Binary Patterns (LBP), which canwell describe the micro-patterns of the input image suchas flat areas, spots, lines and edges [1]. Recently, localpatterns based on Gabor feature have also been proposedfor face representation, such as Local Gabor BinaryPatterns (LGBP) [33], Histogram of Gabor Phase Patterns(HGPP) [32]. In contrast to these approaches whosepatterns are predefined, the texton based methodstypically learn some cluster centers (i.e., textons)and utilize their frequency for classification. In fact,the texton based methods have been widely used fortexture classification and object categorization dueto their robustness to the large intra-class variations(e.g., [9,22,28]). More recently, a few methods based ontexton have also been studied for face representation.Considering that high-level facial semantic features con-sist of those low-level micro visual structures, Meng et al.propose Local Visual Primitives (LVP) for face modelingand recognition [15]. By learning the centers from thefilter response vectors, Learned Visual Codebook (LVC)[35] and Local Gabor Textons (LGT) [8] are proposed for3D and 2D face recognition, respectively. For the purposeof classification, the textons should not only characterizeeach class of object to allow for the discriminationbetween them but also generalize well beyond thetraining samples [9].

In spite of the impressive performance of LBP [1] andLGBP [33], their modes or patterns in the neighboringpixels are designed empirically, i.e., based on the ordinalrelationship between the intensity of the center pixel andits neighbors. It is still unclear why this empirical design iseffective to encode the discriminative local visual modes.To say the least, neither LBP nor LGBP is specificallydesigned for face recognition, so, what is the mostappropriate pattern-encoding method for face percep-tion? Keeping this question in mind, in this paper,we propose Learned Local Gabor Patterns (LLGP) for facerepresentation. Unlike LBP or LGBP, in our method, thepatterns are learned by applying the clustering approachto the set of patches, which are sampled from the Gaborfiltered face images. Then, the face image is encoded intomultiple pattern maps based on the learned patterns.Finally, the block-based histograms of the patterns areconcatenated together to describe the input face image.In addition, we propose an effective weighting strategy,which utilizes both the discriminative powers of differentfacial parts and those of the different patterns. Theproposed approach is evaluated on FERET [20] andCAS-PEAL-R1 [3] databases. Experimental results showthe effectiveness of the LLGP representation method andthe weighting strategy, especially the good generalizingability of LLGP representation under the heterogeneoustesting conditions.

The rest of this paper is organized as follows. Section 2illustrates the basic idea of our approach. Section 3presents how to construct LLGP codebooks. Section 4describes how to apply LLGP to face recognition.Experiments and analysis are presented in Section 5, andSection 6 gives some conclusions and future work.

patterns for face representation and recognition, Signal

Page 3: Learned local Gabor patterns for face representation and ... Local Gabor Patterns... · CAS-PEAL-R1 [3] databases. Experimental results show the effectiveness of the LLGP representation

ARTICLE IN PRESS

S. Xie et al. / Signal Processing ] (]]]]) ]]]–]]] 3

2. Basic idea of the proposed approach

In this section, we first describe the motivation andbasic idea of the proposed approach, then present theflowchart of LLGP for face representation, finally discussits difference from previous similar work.

2.1. Motivation

Recently, a few face representation methods based on themodes defined in neighborhood, which are so called localpatterns, have been proposed (e.g.,[1,15,32,33,35]). Evidently,for this kind of methods, how to define the patterns isessentially important. The success of LBP and its variousextensions show that it is a desirable representation tomodel the distribution modes in neighboring pixels andencode face images with these modes. Whereas, it is worthpointing out that, in LBP and its extensions, the patterns aredefined in the same way for face and non-face images. Butfor faces, as a kind of particular objects, they can be moreaccurately aligned using a few facial landmarks (e.g., the twoeyes) than most generic objects (e.g., chairs, bikes). So, it isnecessary to investigate more appropriate pattern designmethods for face perception tasks.

As shown in Fig. 1, from the generative perspective, oneface image can be represented as a hierarchical decom-position. Specifically, the whole face can be regarded as acomposition of some facial parts such as brows, eyes, nose,and mouth. These facial parts are further represented by afew local patterns, which can be viewed as the basicpatterns of those patches with all kinds of variations. Insome sense, these local patterns can be regarded as the‘‘middle layer’’ between pixels and facial parts. This impliesthat their semantic meanings are stronger than the isolatedpixel, but are much weaker than those of facial parts.

As to the definition of local patterns, there have beensome researches (e.g., [1,15,16]). For example, in the LBPoperator defined in 3�3 neighborhood, images are

Fig. 1. The illustration of face hierarchical struct

Please cite this article as: S. Xie, et al., Learned local GaborProcess. (2009), doi:10.1016/j.sigpro.2009.02.016

encoded by thresholding the eight neighbors of each pixelwith the center pixel value and considering the result as abinary number [16]. Thus, there are totally 256 distinctpatterns. In other words, it has an implicit ‘‘codebook’’predefined empirically, which is not designed specificallyfor face images. Therefore, it is necessary to design somenovel face-specific local patterns. In our opinion, thefeature cliques which appear most frequently in facialimages might be a better choice as the local patterns sincethey have the potential to reconstruct facial parts well.Thus, we utilize the clustering approach to construct thecodebook, of which each codeword represents a kind oflocal pattern that frequently appears in human faceimages. Fig. 2 illustrates the constructing process of ourcodebook, whose codeword is expected to representmicro-structures that frequently appear in face images.Semantically, these micro-structures might correspond tosome sub-components, such as eye corners, nose tips.

2.2. Overall flowchart of LLGP for face representation

In principle, the local patterns can be learned from theface images directly. Nevertheless, in this paper, inspired bythe success of Gabor features for face recognition, we learnthe local patterns in the Gabor feature field instead of theoriginal grey-level intensity field. That is the reason why ourmethod is called LLGP. As shown in Fig. 3, the LLGPrepresentation method can be divided into learning phase(Fig. 3a) and representing phase (Fig. 3b), which, respec-tively, constructs the codebooks and encodes the face image.

Specifically speaking, as shown in Fig. 3(a), in thelearning phase, each image in the training set is filteredwith different Gabor kernels. Then, based on the wholetraining set, for each Gabor kernel, one patch set isconstructed by sampling patches from the correspondingGabor filtered images. Finally, by applying clusteringapproach to each patch set, those feature cliques

ure with multiple level of decomposition.

patterns for face representation and recognition, Signal

Page 4: Learned local Gabor patterns for face representation and ... Local Gabor Patterns... · CAS-PEAL-R1 [3] databases. Experimental results show the effectiveness of the LLGP representation

ARTICLE IN PRESS

...

Clustering

...

Index Codeword

0

1

...

K-1

Learned codebook

...

...

...

...

Fig. 2. The process of learning the codebook.

Fig. 3. Learned Local Gabor Patterns (LLGP) for face representation (a) learning phase and (b) representing phase.

S. Xie et al. / Signal Processing ] (]]]]) ]]]–]]]4

(i.e., the cluster centers) constitute one LLGP codebook. So,with C Gabor kernels, C LLGP codebooks are obtained.

In the representing phase, as illustrated in Fig. 3(b), theinput face image is first filtered with the Gabor kernels asin the learning phase. Then, each Gabor transformedimage is encoded into one pattern map by mapping itspatches to the corresponding LLGP codebook pixel bypixel. Thus, with C LLGP codebooks, C LLGP maps areobtained. Finally, these pattern maps are spatially dividedinto many non-overlapping blocks and the histograms ofall the blocks are concatenated together to form onehistogram sequence, which is used to describe the inputface.

2.3. Relation to previous work

In some sense, this paper is an extension of ourprevious work named LVP [15]. The difference betweenthese two works lies in the following several aspects: First,we utilize Gabor feature instead of gray-level intensity tolearn the local patterns. Since Gabor feature capturessalient visual properties such as spatial frequency, spatiallocalization and orientation selectivity, the patternsdefined on Gabor feature are more effective than thosedefined on gray feature. Second, we propose a simple buteffective weighting strategy. The weighting strategyemphasizes the roles of different facial parts as well asdifferent patterns in face recognition, and thus enhancesthe performances. Third, much more extensive experi-

Please cite this article as: S. Xie, et al., Learned local GaborProcess. (2009), doi:10.1016/j.sigpro.2009.02.016

ments are conducted and the results are compared andanalyzed, which have shown that LLGP has achieved muchbetter performances.

The proposed LLGP is also closely related to the LGBPmethod [33], which can be regarded as a combination ofLBP operator and Gabor feature. However, they areevidently different in the pattern design. In LGBP, thepatterns are defined by applying LBP operator to Gaborfeatures. So, its patterns are predefined, e.g., for the 3�3LBP operator, 256 patterns are defined in total. While inLLGP, the patterns are learned from one patch set. Fromthe viewpoint of face representation, our patterns areface-specific and more suitable for face perception tasksthan the generic codebooks (e.g., LBP).

More recently, LVC [35] and LGT [8] have beenproposed for face recognition, which are also based onGabor feature and the texton strategy. However, ourmethod is different from theirs in the feature domain forcodebook learning. In their works, the codebooks arebased on Gabor jet, which is defined as the concatenatedGabor wavelet coefficients with different scales andorientations of the pixel. In their codebook learningprocess, the facial region is divided into many blocksand a jet-based codebook is learnt for each block. There-fore, in their method, the Gabor filters are consideredtogether and the amount of codebooks is equal to thenumber of the blocks. However, in our approach, thecodebooks are patch-based and Gabor filter specific,i.e., Gabor filters are considered separately in the processof codebooks learning. Therefore, the number of

patterns for face representation and recognition, Signal

Page 5: Learned local Gabor patterns for face representation and ... Local Gabor Patterns... · CAS-PEAL-R1 [3] databases. Experimental results show the effectiveness of the LLGP representation

ARTICLE IN PRESS

S. Xie et al. / Signal Processing ] (]]]]) ]]]–]]] 5

codebooks is equal to the number of Gabor kernels used.This implies that the learned patterns in LLGP are alsospecific to different scales and orientations. By concate-nating the histograms of all the pattern maps, the LLGPrepresentation reflects the feature distribution of differentscales and orientations separately. Thus, the LLGP repre-sentation has strong representation power, and ourcomparison experimental results on FERET database havealso verified that it performs better than LGT.

3. LLGP: codebook construction

Before we present the process of our codebookconstruction, we briefly review the Gabor feature widelyused in face recognition field [6,12,27]. The Gabor waveletrepresentation is the convolution of the image with theGabor kernels, which is defined as follows:

Gm;nðzÞ ¼ IðzÞ �cm;vðzÞ (1)

Here, Ið�Þ denotes the input image, * denotes theconvolution operator and cm;vð�Þ denotes the Gabor kernelwith orientation m and scale n, as defined in Eq. (2):

cm;n ¼kkm;nk

2

s2eð�kkm;nk

2kzk2=2s2Þ½eikm;nz � e�s2=2� (2)

where z ¼ ðx; yÞ, k � k denotes the norm operator, and thewave vector km;n is defined as follows:

km;n ¼ kneifm (3)

where kn ¼ kmax=f n and fm ¼ pm=8, kmax is the maximumfrequency, and f is the spacing between different kernelsin the frequency domain.

The overall process of constructing the LLGP codebooksis illustrated in Fig. 3(a). For clarity, we further formulatethe codebook learning process for Gabor kernel with scalen and orientation m as follows:

(1)

PlPr

For each normalized image in the training face imageset, calculate its Gabor wavelet representation Gm;nð�Þ.It is worth pointing out that, in order to reduceunnecessary variance in patches, all the normalizedface images should be aligned by fixing the eyecenters.

(2)

Sample patches of size m�n pixels from all Gm;nð�Þ inthe training set to form the patch set Sm;n. If we have N

images in the training set and sample c patches from

Fig. 4. Gabor kernels with 3 scales and 4 orientations used in our

ease cite this article as: S. Xie, et al., Learned local Gabor pattocess. (2009), doi:10.1016/j.sigpro.2009.02.016

each Gabor filtered image, there are c�N patchesin Sm;n.

(3)

For patch set Sm;n, K-means clustering approach isutilized to construct the LLGP codebook: Pm;n ¼ fPm;n;0;

Pm;n;1; . . . ; Pm;n;K�1g, where Pm;n;iði ¼ 0;1; . . . ;K � 1Þdenotes the i-th cluster center and K denotes the sizeof codebook.

Generally, Gabor kernels of 5 scales and 8 orientationsare used in the face recognition field. In this study, inorder to reduce both the computational cost and the finalfeature dimension of the LLGP representation, only12 Gabor kernels (i.e., 3 scales v 2 f2;3;4g and 4 orienta-tions m 2 f0;2;4;6g) are used in this paper. So, totally12 codebooks are constructed finally. Fig. 4 illustrates thereal and imaginary parts of the Gabor kernels used in thispaper. Because magnitudes vary slowly with positions andprovide a monotonic measure of the image properties [6],only the magnitude features are used to construct theLLGP codebooks in this paper.

Several other points should be noted during ourimplementation process. First, one of the key issues forK-means clustering approach [13] is the selection of initialcluster seeds. Nevertheless, our experience shows that theinitial seeds selection does not affect the recognitionaccuracy significantly. So, the first K patches sampled fromthe first image are selected as the initial cluster seeds.Second, in order to alleviate the variations of lighting, allthe sampled patches are normalized to zero mean andunit variance. Third, the patch set is constructed byuniformly sampling the patches from each Gabor filteredimage. For example, if the sampling grid is set to8�8 pixels for one image with 80�88 pixels, we can getabout 110 patches in total. Under this condition, for oneimage set with 1000 images, about 110,000 patches canbe obtained totally. Hence, the patch set with large sizecovers the patches with more possible variations, andthe learned patterns could model the distribution of thepatches accurately.

One possible improvement about our codebook con-struction method is to utilize the information of differentfacial parts, i.e., for each Gabor filtered image, we divide itinto many blocks and each block is used to learn thecorresponding LLGP codebooks. However, this methodwill result in a great number of codebooks. For example, ifeach Gabor transformed image is divided into M blocks,

method (a) the real parts (b) the imaginary parts.

erns for face representation and recognition, Signal

Page 6: Learned local Gabor patterns for face representation and ... Local Gabor Patterns... · CAS-PEAL-R1 [3] databases. Experimental results show the effectiveness of the LLGP representation

ARTICLE IN PRESS

S. Xie et al. / Signal Processing ] (]]]]) ]]]–]]]6

there are 12�M LLGP codebooks totally in this paper. Inpractice, in order to get a suitable division of facial region,M is often large (e.g., M410). Thus, it leads that severalhundred codebooks are constructed, which is much time-consuming. In our opinion, for feature set of each Gaborkernel, those sampled patches can be grouped into thesame cluster if they have the similar visual appearances.Thus, our codebook construction method is a tradeoffbetween the computation cost and the elaborate facerepresentation.

4. LLGP: face representation and recognition

This section presents how to encode a face image forface recognition based on the learned local patterncodebook. First, how to encode an input face image intomultiple pattern maps is described in Section 4.1. Then,the direct LLGP histogram sequence matching for facerecognition is detailed in Section 4.2. Finally, we presentthe weighted LLGP histogram sequence matching methodfor face recognition in Section 4.3.

4.1. Image encoding with LLGP

With the LLGP codebooks, face images are thenencoded into multiple pattern maps of various scalesand orientations. As illustrated in Fig. 3(b), the process ofimage encoding with LLGP is formulated in detail asfollows:

(1)

PlPr

For an input normalized face image, compute itsGabor wavelet representation Gm;nð�Þ.

(2)

For each position z in Gm;nð�Þ, sample the patch pm;nðzÞ

centering at position z with the same size as thesampled patches when learning the codebooks.

(3)

For patch pm;nðzÞ, calculate its distance from all thecodewords in the corresponding LLGP codebookPm;n;i ði ¼ 0;1; . . . ;K � 1Þ and label the position z asthe index of the pattern with the minimal distance.Formally, we have LðzÞ ¼ arg min0pipK�1Dðpm;vðzÞ; Pm;v;iÞ,where D is the distance measure.

By performing the above operation pixel by pixel, Gm;nð�Þ

is converted into one pattern map Lm;nð�Þ, whose valuesbelong to the interval ½0;K � 1�.

There are several points to be noted during the processof image encoding with LLGP. First, in order to alleviatethe variations of illumination, the sampled patches shouldalso be normalized to zero mean and unit variance.Second, Euclidean distance is used to calculate thedistance between the sampled patch and the codewords.Third, when position z is near to the boundary of theimage, the parts of the sampled patch which do notoverlap with image are filled with zero.

4.2. Block-based LLGP histogram sequence for face

representation

The LLGP pattern maps consist of the indices of thecluster centers, so, the traditional distance measures

ease cite this article as: S. Xie, et al., Learned local Gaborocess. (2009), doi:10.1016/j.sigpro.2009.02.016

(e.g., Euclidean distance) are not suitable to measure thedistance between them. For example, the indices ‘‘1’’ and‘‘60’’ have large Euclidean distance, but their perceptionalsimilarity might be larger than that between ‘‘1’’ and ‘‘2’’.Histogram, as a tool to extract statistical characteristic, isutilized to represent the pattern distribution in thepattern maps. However, using one single histogram forthe whole image leads to serious loss of spatial informa-tion. Therefore, in this paper, block-based histogrammethod is exploited, i.e., the pattern maps are dividedinto multiple non-overlapping blocks and one histogramis computed from each block. Then, all the block basedhistograms are concatenated together to form a descrip-tion for pattern map Lm;nð�Þ, which is formally defined asfollows:

Hm;nð�Þ ¼ ½Hm;v;0;Hm;v;1; . . . ;Hm;v;M�1� (4)

Here, M denotes the number of non-overlapping blocks,Hm;n;i ði ¼ 0;1; . . . ;M � 1Þ denotes the histogram of the i-thblock of pattern map Lm;nð�Þ and is defined as follows:

Hm;n;iðjÞ ¼X

z

BfLm;n;iðzÞ ¼ jg; j ¼ 0;1; . . . ;K � 1 (5)

where j denotes the index of the pattern, K denotes thesize of the LLGP codebook and Bf�g is defined as follows:

BfAg ¼1; if A is true

0; else

((6)

Finally, one image I is represented as one histgoramsequence, which concatenates all the block based histo-grams of all the LLGP pattern maps with various Gaborscales and orientations:

HðIÞ ¼ ½Hm0 ;n0ðIÞ;Hm0 ;n1

ðIÞ; . . . ;Hmo�1 ;ns�1ðIÞ� (7)

where mi ði ¼ 0;1; . . . ; o� 1Þ and nj ðj ¼ 0;1; . . . ; s� 1Þ de-notes the orientation and scale of Gabor filters, respec-tively. Note that the final histogram sequence consists of12�M histograms since 12 Gabor filters are used in thispaper.

4.3. Histogram sequence matching for face recognition

After block-based LLGP histogram sequence is ex-tracted for face representation, in recognition stage, thesimilarity between the two images I1 and I2 can becomputed as the similarity between their LLGP histogramsequences by the following direct histogram sequencematching method:

SðI1; I2Þ ¼

Xm;v

Xi

SimðH1m;n;i;H

2m;n;iÞ (8)

Note that, histogram intersection is used to measure thesimilarity and Nearest Neighbor rule is adopted for finalclassification.

One problem of the above direct LLGP histogramsequence matching is that all the facial parts and patternsare treated equally for face recognition. Previous studieshave shown that different facial parts (e.g., eyes, nose) areof different importance for face recognition [1,33]. There-fore, it is desirable to assign different weights to differentfacial parts according to their discriminative powers.

patterns for face representation and recognition, Signal

Page 7: Learned local Gabor patterns for face representation and ... Local Gabor Patterns... · CAS-PEAL-R1 [3] databases. Experimental results show the effectiveness of the LLGP representation

ARTICLE IN PRESS

S. Xie et al. / Signal Processing ] (]]]]) ]]]–]]] 7

On the other hand, the improtance of different patternsare also different for face recognition. Recently, Pierrardand Vetter shows that it is possible to determine aperson’s identity based on only a few well-chosen pixelssuch as scars and birthmarks [21]. So, the rare features(e.g., naevus) are more distinctive than the more frequentfeatures (e.g., the plain regions of the skin) for recognition.Thus, we propose an effective weighting strategy whichutilizes the discriminative powers of both different facialparts and various local patterns.

Formally, the similarity between the two images I1 andI2 is revised as follows:

SðI1; I2Þ ¼

Xm;v

Xi

wm;n;i � SimðH1m;n;i;H

2m;n;iÞ (9)

where wm;n;i and Hm;n;i, respectively, denote the weight andthe weighted histogram of the i-th block of pattern mapLm;n. Compared with Eq. (5), here, Hm;n;i is redefined asfollows:

Hm;n;iðjÞ ¼$m;n;j �X

z

BfLm;n;iðzÞ ¼ jg; j ¼ 0;1; . . . ;K � 1

(10)

where $m;n;j dentoes the weight for pattern Pm;n;j and isdefined as follows:

$m;n;j ¼1=PercentðPm;n;jÞP

j1=PercentðPm;n;jÞ(11)

Here, PercentðPm;n;jÞ denotes the percentage of the patchesquantized into local pattern Pm;n;j in the learning phase andPð�Þ denotes the sum operation. According to Eq. (11), the

weights of different patterns are normalized to theinterval [0,1]. Because the LLGP patterns are learned byclustering, we believe that the rareness of the patterns canbe reflected by the size of the corresponding cluster.Specifically, if the size of one cluster is large, it mightreflect the more frequent features such as the plainregions of the skin; if the size of cluster is small, it hasthe potential to represent the rare feature such as naevus.Thus, by assigning weights $m;n;j, higher priorities aregiven to those rare patterns, which can enhance thediscriminative power of our representation method.

As to the weights for different facial parts (i.e., blocksin our method), the method presented in [33] is exploited.In detail, given a training face dataset, the similarities ofdifferent samples from the same person compose thewithin-class variations and those from different personscompose the between-class variations. Then, the meanand variance of within-class and between-class simila-rities are calculated, respectively. Finally, the weights arecalculated based on FLD criteriaon as follows [33]:

wm;n;i ¼ðmWðm;n;iÞ �mBðm;n;iÞÞ

2

S2Wðm;n;iÞ þ S2

Bðm;n;iÞ(12)

Here, mWðm;n;iÞ and mBðm;n;iÞ denotes the within-class andbetween-class mean of the i-th block with scale n andorientation m, respectively; S2

Wðm;n;iÞ and S2Bðm;n;iÞ, respec-

tively, denotes the variance of the within-class similaritiesand that of the between-class similarities of the i-th blockwith scale n and orientation m. Of course, these weights

Please cite this article as: S. Xie, et al., Learned local GaborProcess. (2009), doi:10.1016/j.sigpro.2009.02.016

are also normalized to the interval [0, 1] before they areused to weight different blocks.

5. Experiments

In order to evaluate the proposed approach, we carryout some experiments on two large face databases: FERET[20] and CAS-PEAL-R1 [3]. First, we evaluate our approachwith different parameters using two probe sets of FERET.Then, we compare the results of our approach with someprevious approaches on FERET and CAS-PEAL-R1 data-bases. Finally, we test the LLGP representation methodwhen the training set is heterogeneous with the testingset.

In all our experiments, the face images are normalizedto the size of 80�88 pixels according to the eye positionsprovided along with the databases. No photometricnormalization is conducted and the size of Gabor filterwindow is fixed to 32�32 pixels.

5.1. Evaluation on parameter setting

The FERET was a general evaluation designed tomeasure the performance of laboratory algorithms [20].To obtain a robust assessment of performance, algorithmswere evaluated against different categories of imagesincluding some variations such as lighting change, peoplewearing glasses, and the time between the acquisitiondates of the database image. The FERET database consistsof one standard gallery (1196 images of 1196 subjects) andfour probe sets: Fb (1195 images of 1195 subjects), Fc(194 images of 194 subjects), Duplicate I (722 imagesof 243 subjects) (abbreviated as DupI), and Duplicate II(234 images of 75 subjects) (abbreviated as DupII). Here,we use the standard Gallery and two large probe sets, Fband DupI, to evaluate our approach with differentparameters. Note that, in our experiments, totally 1002frontal images of 429 subjects selected from the FERETtraining CD are used as the training set to learn thecodebooks, and the sampling step is 8�8 pixels.

According to Sections 3 and 4, there are mainlythree free parameters in LLGP: the size of the patterns(i.e., the patch size, m�n pixels), the size (K) of each LLGPcodebook, and the block size for the LLGP histogram. Twoexperiments are designed on FERET Fb and DupI probesets to show respectively how the rank-1 recognition rateschange with K and several m�n when the block sizeis fixed (8�11 pixels), and how the recognition accuracieschange with various block sizes and some combinations ofK and m�n. The results are shown in Fig. 5.

From Fig. 5(a) and (b), we find that, by and large, theperformances get better with the increase of K. However,when K varies from 50 to 100, the performances changeslightly, especially on Fb set. As to the size of the localpattern, 5�5 pixels achieve the best result on Fb set while3�5 pixels and 5�7 pixels perform the best on DupI set.Nevertheless, by and large, these two parameters do notaffect the final results significantly, i.e., we have a largerange to set them appropriately. From Fig. 5(c) and (d),one can find that, basically, the recognition accuracies

patterns for face representation and recognition, Signal

Page 8: Learned local Gabor patterns for face representation and ... Local Gabor Patterns... · CAS-PEAL-R1 [3] databases. Experimental results show the effectiveness of the LLGP representation

ARTICLE IN PRESS

1

0.98

0.96

0.94

0.92

0.920 40 60 80 100

K20 40 60 80 100

K

Ran

k-1

Rec

ogni

tion

Rat

e

1

0.98

0.96

0.94

0.92

0.9

Ran

k-1

Rec

ogni

tion

Rat

e

Ran

k-1

Rec

ogni

tion

Rat

e

0.8

0.75

0.7

0.65

0.6

0.55

0.5

Ran

k-1

Rec

ogni

tion

Rat

e

0.8

0.75

0.7

0.65

0.6

0.55

0.58x8 10x8 8x11 10x11 20x11 20x22

Block size8x8 10x8 8x11 10x11 20x11 20x22

Block size

mxn = 3x5mxn = 5x5mxn = 5x7mxn = 7x7

mxn = 3x3mxn = 3x5mxn = 5x5mxn = 5x7mxn = 7x7

mxn = 3x3

mxn = 5x5, K = 60mxn = 3x3, K = 70mxn = 5x5, K = 70mxn = 3x3, K = 80mxn = 5x5, K = 80

mxn = 3x3, K = 60mxn = 5x5, K = 60mxn = 3x3, K = 70mxn = 5x5, K = 70mxn = 3x3, K = 80mxn = 5x5, K = 80

mxn = 3x3, K = 60

Fb

Fb Dupl

Dupl

Fig. 5. Performances of LLGP approach on FERET Fb and DupI probe sets with different parameters. Note that the block size for LLGP histogram in (a) and

(b) is 8�11 pixels.

1

0.98

0.96

0.94

0.92

0.9200 400 600 800 1000

Size of Training Set200 400 600 800 1000

Size of Training Set

Ran

k-1

Rec

ogni

tion

Rat

e

Ran

k-1

Rec

ogni

tion

Rat

e

Fb0.8

0.75

0.7

0.65

0.6

0.55

0.5

mxn = 5x5, K = 60mxn = 3x3, K = 60

mxn = 5x5, K = 80mxn = 3x3, K = 80mxn = 5x5, K = 70mxn = 3x3, K = 70

mxn = 5x5, K = 60mxn = 3x3, K = 60

mxn = 5x5, K = 80mxn = 3x3, K = 80mxn = 5x5, K = 70mxn = 3x3, K = 70

Dupl

Fig. 6. The influences of the training sets with different size on the performances of LLGP.

S. Xie et al. / Signal Processing ] (]]]]) ]]]–]]]8

decrease with the increase of the block size. The possiblereason is that a smaller block size preserves more spatialinformation of the features, which is important for facerecognition.

In order to see the effects of the size of training set onperformances of LLGP, as shown in Fig. 6, we conduct aseries of experiments when the size of the image setvaries from 100 to 1000. Note that, in this experiment, theblock size for LLGP histogram is 8�11 pixels and the

Please cite this article as: S. Xie, et al., Learned local GaborProcess. (2009), doi:10.1016/j.sigpro.2009.02.016

sampling grid is also set to 8�8 pixels. One can see that,when the parameters (i.e., m, n, K and the block size) ofour approach are the same, the difference between thehighest and the lowest recognition rates is lowerthan 1 percent despite of a slight fluctuation on theresults, i.e., the method is quite stable even with a smalltraining set.

Considering the different races of human being on theearth, one might argue why the codebook constructed on

patterns for face representation and recognition, Signal

Page 9: Learned local Gabor patterns for face representation and ... Local Gabor Patterns... · CAS-PEAL-R1 [3] databases. Experimental results show the effectiveness of the LLGP representation

ARTICLE IN PRESS

Table 1Comparison with some previous approaches on the standard FERET

probe sets.

Methods Probe sets

Fb Fc DupI DupII

G PCA+LDA 0.96 0.84 0.70 0.57

LBP [1]* 0.97 0.79 0.66 0.64

LGBP [33]* 0.94 0.97 0.68 0.53

Weighted LGBP [33]* 0.98 0.97 0.74 0.71

LVP [15] 0.97 0.70 0.66 0.50

Weighted LVP_FR 0.99 0.80 0.70 0.60

HGPP [32]* 0.976 0.989 0.777 0.761

Weighted HGPP [32]* 0.975 0.995 0.795 0.778

LGT [8]* 0.97 0.90 0.71 0.67

LLGP 0.97 0.97 0.75 0.71

Weighted LLGP_F 0.99 0.99 0.78 0.77

Weighted LLGP_R 0.97 0.98 0.78 0.74

Weighted LLGP_FR 0.99 0.99 0.80 0.78

Note: *denotes that the results are from the corresponding papers

directly and bold values denote the best results.

S. Xie et al. / Signal Processing ] (]]]]) ]]]–]]] 9

a limited number of image set can represent all the faces.In our opinion, there are two main reasons as below: First,faces can be more accurately aligned (e.g., using the twoeyes) than most objects (e.g., chairs, bikes). Thus, all thealigned face images can be further decomposed intoone hierarchical structure (Fig. 1). As mentioned in ourcodebook construction process, the alignment operationalleviates the variations (e.g., rotation) of the same patchand this makes it possible to represent face images usingsome basic local patterns. Second, the local patternshave weak representation power because the size of thesampled patches is small (e.g., 3�3 pixels). So, despitethat the patch set is sampled from a training set withlimited size, it is possible to learn those basic localpatterns.

Because every position in the Gabor wavelet repre-sentation has to be labeled, the size of the patterns affectsthe computation cost greatly. Furthermore, the codebookof large size leads to the slower clustering procedure andthe smaller block size results in higher dimensionalfeature. Thus, considering the tradeoff between thecomputation cost and the accuracy, we set the parametersof LLGP approach in the following experiments as follows:m� n ¼ 5� 5, K ¼ 70 and the block size is set to 8�11pixels.

5.2. Results on FERET

In this subsection, we compare our approach withsome previous approaches on the four probe sets of FERET.Note that, the same training set as in Section 5.1 is alsoused to learn the codebooks and all the weights of LLGP.For clarity, hereinafter, ‘‘Weighted LLGP_F’’ denotes theweighted LLGP approach in which only different facialparts are weighted based on FLD criterion, ‘‘WeightedLLGP_R’’ denotes the weighted LLGP approach in whichonly different patterns are weighted based on theirrareness, ‘‘Weighted LVP_FR’’ and ‘‘Weighted LLGP_FR’’means that both of them are weighted. The method ‘‘GPCA+LDA’’ denotes the GFC method in [12], i.e., PCA+LDAon down-sampled augmented Gabor features. The resultsof our method are shown and compared with those ofprevious methods in Table 1.

From Table 1, we can get several major observations.First, LLGP performs much better than LVP except on Fbset. This shows that, the local patterns defined on Gaborfeature are more effective for face recognition than thosedefined on gray-level intensity. Second, LLGP outperformsLGBP and is comparable to HGPP, which shows that theproposed image representation method is effective forface recognition. Third, LLGP achieves better results thanLGT. This implies that learning patch-based codebooks byseparately treating different scale and orientation as inLLGP has stronger representation power than jet-basedcodebooks. Finally, the weighting strategy based ondifferent facial parts and patterns is quite effectiveto enhance the performances. One sees that, the resultsof ‘‘Weighted LLGP_FR’’ on DupI and DupII sets areimproved about 5 and 7 percents than LLGP, respectively,

Please cite this article as: S. Xie, et al., Learned local GaborProcess. (2009), doi:10.1016/j.sigpro.2009.02.016

and ‘‘Weighted LVP_FR’’ also performs much better thanLVP.

5.3. Results on CAS-PEAL-R1

In order to further verify the robustness of theproposed LLGP to variations in expressions, accessories,and lighting, more experiments are conducted onCAS-PEAL-R1 database. CAS-PEAL-R1 database contains30,863 images of the 1040 subjects, and displays differentsources of variations, especially Pose, Expression, Acces-sories, and Lighting (PEAL) [3]. In this paper, we performcomparison experiments on the three largest probe sets:Expression (1570 images of 377 subjects), Accessory(2285 images of 438 subjects) and Lighting (2243 imagesof 233 subjects) subsets. Note that the standard trainingset (1200 images of 300 subjects) of CAS-PEAL-R1 is usedto construct the LLGP codebooks and learn all the weightsneeded. Fig. 7 shows some example images of thesesubsets, from which one can see the large intra-classvariations, especially on the Lighting set.

Table 2 tabulates the comparisons between ourapproach and some approaches on the three probe setsof CAS-PEAL-R1. From Table 2, one can reach the similarconclusions to those in Section 5.2. One exception is that‘‘Weighted LLGP_F’’ performs slightly worse than LLGP onthe Lighting set. The reason might be that the variations inthe Lighting probe set are different from those of thetraining set, while FLD-based weights learning is prone toover-fitting. However, the weighting strategy based on therareness of patterns is not affected a lot because it justreflects the frequency of different patterns. Thus,‘‘Weighted LLGP_R’’ achieves better results on the Lightingset. Another exception is that ‘‘Weighted LVP_FR’’performs basically the same as LVP on Expressionand Accessory sets. The reason might be that, the limitedrepresentation power of LVP leads that the learnedweights do not reflect the discriminative powers ofdifferent blocks and patterns on these two sets. We also

patterns for face representation and recognition, Signal

Page 10: Learned local Gabor patterns for face representation and ... Local Gabor Patterns... · CAS-PEAL-R1 [3] databases. Experimental results show the effectiveness of the LLGP representation

ARTICLE IN PRESS

Fig. 7. Some example images in CAS-PEAL-R1 database. Note that the images in the same row are from the same person. (a) Expression; (b) Accessory and

(c) Lighting.

Table 2Comparisons with some previous approaches on the CAS-PEAL-R1 probe

sets.

Methods Probe sets

Expression Accessory Lighting

G PCA+LDA [3]* 0.91 0.83 0.45

LGBP [3]* 0.95 0.87 0.51

LVP [15] 0.96 0.86 0.29

Weighted LVP_FR 0.96 0.86 0.33

HGPP [32]* 0.964 0.919 0.617

Weighted HGPP [32]* 0.968 0.925 0.629LLGP 0.96 0.90 0.52

Weighted LLGP_F 0.96 0.91 0.50

Weighted LLGP_R 0.97 0.91 0.57

Weighted LLGP_FR 0.98 0.92 0.55

Note: *denotes that the results are from the corresponding papers

directly and bold values denote the best results.

S. Xie et al. / Signal Processing ] (]]]]) ]]]–]]]10

find that, on the difficult lighting set, the best result of ourapproach is a little lower than that of HGPP. This might bethat the learned codebooks consist of the general code-words, which do not represent those images under thedifficult conditions (e.g., lighting) effectively.

5.4. Generalizability of the learned codebooks

One possible concern on LLGP is the generalizingability of the learned codebooks. For instance, one maydoubt how it will be if the images in the training set forthe codebook learning are quite different from those of thetesting set in terms of population and/or imaging condi-tions, which is called by us heterogeneous training set.To investigate its influence on the performances of LLGP,experiments are conducted by exchanging the trainingsets of FERET and CAS-PEAL-R1. Specifically speaking,when we test on the FERET probe sets, the LLGPcodebooks are constructed from the CAS-PEAL-R1 trainingset; when we test on the CAS-PEAL-R1 probe sets, theFERET training set is used to construct LLGP codebooks.Correspondingly, the homogeneous training set denotesthe testing when the ‘‘right’’ training set is used to learnthe codebooks. In our experiments, four probe sets, i.e.,‘‘FERET: Fb’’, ‘‘FERET: DupI’’, ‘‘CAS-PEAL-R1: Accessory’’,‘‘CAS-PEAL-R1: Lighting’’, are used, and the comparisonresults are shown in Fig. 8.

Please cite this article as: S. Xie, et al., Learned local GaborProcess. (2009), doi:10.1016/j.sigpro.2009.02.016

From Fig. 8, it is clear that the results of theheterogeneous training test are quite comparable to thoseof the homogeneous one. In other words, the results ofLLGP approach are not influenced very much bythe heterogeneous training sets. Especially on theCAS-PEAL-R1 Lighting set, the results of heterogeneoustest are even a little better than those of homogeneoustest. This is because the learned patterns have weaksemantic meanings due to the small size of the patches(e.g., 5�5 pixels). Thus, although FERET and CAS-PEAL-R1databases consist of subjects of different populations(e.g., races), the LLGP codebooks constructed on thedifferent training sets might not vary a lot due to theweak semantic meaning. This also shows that LLGPrepresentation has good generalizing power with differenttraining sets.

6. Conclusion and future work

In this paper, we propose Learned Local Gabor Patterns(LLGP) to encode the local patterns in Gabor filtered faceimages. Unlike LBP/LGBP whose patterns are manuallydefined, the patterns in our approach are learned from atraining set, which implies that they are face-specific andmore desirable for face perception tasks. LLGP is evaluatedand compared with some existing local patternbased methods especially LBP and LGBP on FERET andCAS-PEAL-R1 database. Extensive comparison resultsillustrate the effectiveness and robustness of LLGP as aface representation method. Especially, our heterogeneoustesting results also show that the learned codebook canwell generalize to unseen face data.

From the results in this study, we can conclude that,under the local pattern framework inspired by LBP, thereis still research space to discover more appropriate localpattern encoding methods, at least for face-specificperception tasks. And learning local patterns by patch-clustering is one of the possible routes. At the same time,the practice in this paper again proves the significance ofGabor feature for face recognition, and its desirablecombination with local pattern encoding methods.

One problem of our approach might be that itscomputation cost is a little high, because each positionin the Gabor filtered image have to be mapped to the LLGPcodebooks. Thus, one of our future efforts is to reduce the

patterns for face representation and recognition, Signal

Page 11: Learned local Gabor patterns for face representation and ... Local Gabor Patterns... · CAS-PEAL-R1 [3] databases. Experimental results show the effectiveness of the LLGP representation

ARTICLE IN PRESS

1

0.98

0.96

0.94

0.92

0.950 60 70 80 90 100

K

50 60 70 80 90 100K

50 60 70 80 90 100K

50 60 70 80 90 100K

Ran

k-1

Rec

ogni

tion

Rat

e

1

0.9

0.8

0.7

0.6

0.5Ran

k-1

Rec

ogni

tion

Rat

e

Ran

k-1

Rec

ogni

tion

Rat

eR

ank-

1 R

ecog

nitio

n R

ate

FERET Fb FERET Dupl1

0.9

0.8

0.7

0.6

0.5

0.8

0.7

0.6

0.5

0.4

0.3

CAS-PEAL-R1 Accessory CAS-PEAL-R1 Lighting

LLGP Train on FERET

LLGP Train on CAS-PEAL-R1LLGP Train on FERET

LLGP Train on CAS-PEAL-R1

LLGP Train on FERETLLGP Train on CAS-PEAL-R1

LLGP Train on FERETLLGP Train on CAS-PEAL-R1

Fig. 8. The influences of heterogeneous training sets on the performances of LLGP approach.

S. Xie et al. / Signal Processing ] (]]]]) ]]]–]]] 11

computational cost by utilizing some feature selectionstrategies, e.g., just selecting the discriminative facialparts to match the two images. Another problem of ourapproach might be its relatively high dimensional histo-gram feature. This can be alleviated by applying somediscriminant analysis approaches (e.g., LDA) to thehistogram features.

From the generative viewpoint, this paper makes apreliminary attempt to learn and encode the localpatterns in Gabor domain. Evidently, if more constraintssuch as discrimination, saliency, or sparseness, areimposed in the process of codebook construction, thelearned patterns can be more suitable for classificationtasks. This will be our main future efforts.

Acknowledgments

This work is partially supported by NationalNatural Science Foundation of China under contractnos. 60772071, 60728203, and 60833013; Hi-TechResearch and Development Program of China undercontract no. 2007AA01Z163; 100 Talents Program of CAS,and National Basic Research Program of China(973 Program) under contract 2009CB320902. Portionsof the research in this paper use the FERET database offacial images collected under the FERET program, spon-sored by the DOD Counterdrug Technology DevelopmentProgram Office.

Please cite this article as: S. Xie, et al., Learned local GaborProcess. (2009), doi:10.1016/j.sigpro.2009.02.016

References

[1] T. Ahonen, A. Hadid, M. Pietikainen, Face description with localbinary patterns: application to face recognition, IEEE Transactionson Pattern Analysis and Machine Intelligence 28 (12) (2006)2037–2041.

[2] P.N. Belhumeur, J.P. Hespanha, D.J. Kriegman, Eigenfaces vs. Fish-erfaces: recognition using class specific linear projection, IEEETransactions on Pattern Analysis and Machine Intelligence 19 (7)(1997) 711–720.

[3] W. Gao, B. Cao, S. Shan, X. Chen, D. Zhou, X. Zhang, D. Zhao, The CAS-PEAL large-scale chinese face database and baseline evaluations,IEEE Transactions on System, Machine and Cybernetics Part A 38 (1)(2008) 149–161.

[4] Z. Hafed, M. Levine, Face recognition using the discrete cosinetransform, International Journal of Computer Vision 43 (3) (2001)167–188.

[5] J. Jones, L. Palmer, An evaluation of the two-dimensional Gaborfilter model of simple receptive fields in cat striate cortex, Journal ofNeurophysiology 58 (6) (1987) 1233–1258.

[6] M. Lades, J.C. Vorbruggen, J. Buhmann, J. Lange, C.v.d. Malsburg,R.P. Wurtz, W. Konen, Distortion invariant object recognition in thedynamic link architecture, IEEE Transactions on Computers 42 (3)(1993) 300–311.

[7] J. Lai, P. Yuen, G. Feng, Face recognition using holistic Fourierinvariant features, Pattern Recognition 34 (1) (2001) 95–109.

[8] Z. Lei, S. Li, R. Chu, X. Zhu, Face recognition with local Gabor textons,International Conference on Biometrics (2007) 49–57.

[9] T. Leung, J. Malik, Representing and recognizing the visualappearance of materials using three-dimensional textons, Interna-tional Journal of Computer Vision 43 (1) (2001) 29–44.

[10] M. Li, B.Z. Yuan, 2D-LDA: a statistical linear discriminantanalysis for image matrix, Pattern Recognition Letter 1 (2004)527–532.

[11] X. Li, S. Lin, S. Yan, D. Xu, Discriminant locally linear embeddingwith high-order tensor data, IEEE Transactions on Systems, Man,and Cybernetics, Part B: Cybernetics 38 (2) (2008) 342–352.

[12] C. Liu, H. Wechsler, Gabor feature based classificationusing the enhanced fisher linear discriminant model for face

patterns for face representation and recognition, Signal

Page 12: Learned local Gabor patterns for face representation and ... Local Gabor Patterns... · CAS-PEAL-R1 [3] databases. Experimental results show the effectiveness of the LLGP representation

ARTICLE IN PRESS

S. Xie et al. / Signal Processing ] (]]]]) ]]]–]]]12

recognition, IEEE Transactions on Image Processing 11 (4) (2002)467–476.

[13] J. MacQueen, Some methods for classification and analysis ofmultivariate observations, in: Proceedings of Fifth BerkeleySymposium on Mathematical Statistics and Probability, 1967,pp. 281–297.

[14] E. Meyers, L. Wolf, Using biologically inspired features for faceprocessing, International Journal of Computer Vision (2008)93–104.

[15] X. Meng, S. Shan, X. Chen, W. Gao, Local visual primitives (LVP) forface modeling and recognition, International Conference on PatternRecognition (2006) 536–539.

[16] T. Ojala, M. Pietikainen, T. Maenpaa. Multiresolution gray-scale,rotation invariant texture classification with local binary patterns,IEEE Transactions on Pattern Analysis and Machine Intelligence 24(7) (2002) 971–987.

[17] Y. Pang, Y. Yuan, X. Li, Gabor-based region covariance matrices forface recognition, IEEE Transactions on Circuits and Systems forVideo Technology 18 (7) (2008) 989–993.

[18] Y. Pang, D. Tao, Y. Yuan, X. Li, Binary two-dimensional PCA, IEEETransactions on System, Man, and Cybernetics, Part B: Cybernetics38 (4) (2008) 1176–1180.

[19] P. Penev, J. Atick, Local feature analysis: a general statistical theoryfor object representation, Network: Computation in Neural Systems7 (3) (1996) 477–500.

[20] P. Phillips, H. Moon, S. Rizvi, P. Rauss, The FERET evaluationmethodology for face-recognition algorithms, IEEE Transactions onPattern Analysis and Machine Intelligence 22 (10) (2000)1090–1104.

[21] J. Pierrard, T. Vetter, Skin detail analysis for face recognition, IEEEComputer Society Conference on Computer Vision and PatternRecognition, 2007.

[22] P. Quelhas, F. Monay, J.M. Odobez, D. Gatica-Perez, T. Tuytelaars,A thousand words in a scene, IEEE Transactions on Pattern Analysisand Machine Intelligence 29 (9) (2007) 1575–1589.

[23] M. Riesenhuber, T. Poggio, Hierarchical models of object recognitionin cortex, Nature Neuroscience 2 (11) (1999) 1019–1025.

Please cite this article as: S. Xie, et al., Learned local GaborProcess. (2009), doi:10.1016/j.sigpro.2009.02.016

[24] D. Tao, X. Li, X. Wu, S.J. Maybank, Geometric mean for subspaceselection, IEEE Transactions on Pattern Analysis and MachineIntelligence 31 (2) (2009) 260–274.

[25] M. Turk, A. Pentland, Eigenfaces for recognition, Journal of CognitiveNeuroscience 3 (1) (1991) 71–86.

[26] O. Tuzel, F. Porikli, P. Meer, Region covariance: a fast descriptor fordetection and classification, in: Lecture Notes in Computer Science,vol. 3952, ECCV 2006, Part II, pp. 589–600.

[27] L. Wiskott, J. Fellous, N. Kruger, C. Vonder, Face recognition byelastic bunch graph matching, IEEE Transactions on PatternAnalysis and Machine Intelligence 19 (7) (1997) 775–779.

[28] M. Varma, A. Zisserman, A statistical approach to texture classifica-tion from single images, International Journal of Computer Vision62 (1–2) (2005) 61–81.

[29] J. Yang, D. Zhang, A. Frangi, J. Yang, Two-dimensional PCA: a newapproach to appearance-based face representation and recognition,IEEE Transactions on Pattern Analysis and Machine Intelligence 26(1) (2004) 131–137.

[30] P. Yang, S. Shan, W. Gao, S. Li, D. Zhang, Face recognition using Ada-boosted Gabor features, in: Proceeding of the Sixth IEEE Interna-tional Conference on Automatic Face and Gesture Recognition,2004, Korea, May, pp. 356–361.

[31] J. Ye, J. Ravi, Q. Li, Two-dimensional linear discriminant analysis,Advances in Neural Information Processing System 2004.

[32] B. Zhang, S. Shan, X. Chen, W. Gao, Histogram of Gabor phasepatterns (HGPP): a novel object representation approach for facerecognition, IEEE Transactions on Image Processing 16 (1) (2007)57–68.

[33] W. Zhang, S. Shan, W. Gao, X. Chen, H. Zhang, Local Gabor binarypattern histogram sequence (LGBPHS): a novel non-statisticalmodel for face representation and recognition, InternationalConference on Computer Vision (2005) 786–791.

[34] W. Zhao, R. Chellappa, P. Phillips, A. Rosenfeld, Face recognition: aliterature survey, ACM Computing Surveys 35 (2003) 399–458.

[35] C. Zhong, Z. Sun, T. Tan, Robust 3D face recognition using learnedvisual codebook, IEEE Computer Society Conference on ComputerVision and Pattern Recognition, 2007.

patterns for face representation and recognition, Signal