Object recognition using Gabor co-occurrence similarity

Pattern Recognition 46 (2013) 434–448

Contents lists available at SciVerse ScienceDirect

Pattern Recognition

0031-32

http://d

n Corr

Univers

Tel.: þ8

E-m

journal homepage: www.elsevier.com/locate/pr

Object recognition using Gabor co-occurrence similarity

Jian Zou a,b,n, Chuan-Cai Liu b, Yue Zhang a,b, Gui-Fu Lu a,b

a School of Computer Science and Technology, Nanjing University of Science and Technology, Nanjing 210094, Chinab School of Mathematics and Physics, Anhui Polytechnic University, Wuhu 241000, China

a r t i c l e i n f o

Article history:

Received 14 September 2011

Received in revised form

18 May 2012

Accepted 26 June 2012Available online 4 July 2012

Key words:

Object recognition

Gabor magnitude

Co-occurrence matrix

Multinomial manifold

03/$ - see front matter & 2012 Elsevier Ltd. A

x.doi.org/10.1016/j.patcog.2012.06.018

esponding author at: School of Computer Scie

ity of Science and Technology, Nanjing, China

6 18605695628; fax: þ86 84315751.

ail address: [email protected] (J. Zo

a b s t r a c t

We present an object recognition approach using co-occurrence similarities of Gabor magnitude

textures in this paper. A novel image descriptor, multichannel Gabor magnitude co-occurrence matrices

(MGMCMs), is designed to characterize Gabor textures for object representation and similarity

matching. The descriptor is a generalization of multichannel color co-occurrence matrices (MCMs),

which focus on using robust and discriminative magnitude textures in filtered images. Our approach

starts from Gabor wavelet transformation of each object image. An exploratory learning algorithm is

proposed for learning channel-adaptive magnitude truncation parameters and level parameters. This

allows us to design the magnitude quantization that can reduce overall biased and peaked levels of

resulting feature distributions in each channel, to avoid over-sparse co-occurrence distributions on

average. The direction-based grouping is adopted for computational complexity reduction of MGMCMs

extraction under a specific neighborhood mode on the grouped rescaled magnitude images of per

object image. When each MGMCM is treated as a probability distribution lying on a multinomial

manifold, we represent per object image as a point on a product multinomial manifold. Using

multinomial geometry and metric extension technique, we construct the p-order Minkowski co-

occurrence information distance for similarity matching between the albums of Gabor magnitude

textures. The feasibility and effectiveness of the approach is validated by the experimental results on

the Yale and FERET face databases, PolyU palmprint database, COIL-20 object database and Zurich

buildings database.

& 2012 Elsevier Ltd. All rights reserved.

1. Introduction

Object recognition is studying and classifying an unknownobject into one of the set of predefined classes. Oftentimes, it isassumed that the object being observed has been detected or thereis a single object in the image. Object recognition is currently one ofthe most actively researched areas of computer vision, imageprocessing and analysis. The main challenge in object recognitionarises from the varying factors, such as shape, scaling, rotation,distortion, illumination and poses etc., and a successful recognitionsystem should be robust to such changes. According to the wayimage data is represented, objects can be described by differentcues thereby deriving model-based, shape-based, and appearance-based recognition methods in mainstream. Based on the appliedfeatures, these methods can be sub-divided into two main classes,i.e., local methods and global methods [1,2].

Texture describes visual information that is related to localspatial variation of the pixel intensities (or local filter responses) in

ll rights reserved.

nce and Technology, Nanjing

.

u).

image (or filtered image) subregions. It is reasonable to assume thatthe intensity (or filter response) variation of different objects isdifferent. Therefore, one can help object recognition by means ofcharacterizing texture [3,4]. Texture analysis using 2D Gabor filtersfalls into the category of frequency-based approaches, which arebased on the premise that texture is an image pattern containing arepetitive structure that can be effectively characterized in afrequency domain. The Gabor filters have been found to yield localresponses robust to many variations in the imaging conditions,including translations, deformations, and background changes [7,8].Gabor features constructed from post-processed Gabor filterresponses have been successfully used in various important com-puter vision tasks. The local multivariate and global high-dimen-sional feature forms bring lots of challenges to common learning orrecognition systems, when taking the complete set of Gabor filteredimages of each associated image as a whole. This situation causeslocal methods to be very popular though many of them are eithertime-consuming or manual annotation-involved, while globalmethods need to be supported by certain compressing or samplingtechnique [10,11]. The redundant information existing amongdifferent Gabor features makes magnitude features to be widelyused due to the easy-to-use univariate positive value space anddiscriminative texture information [7–11]. Psychological research

www.elsevier.com/locate/pr

www.elsevier.com/locate/pr

dx.doi.org/10.1016/j.patcog.2012.06.018



mailto:[email protected]


J. Zou et al. / Pattern Recognition 46 (2013) 434–448 435

on human texture perception finds that two homogeneous texturesare discriminative if they produce dissimilar marginal distributions[12]. This fact indicates the feasibility to recognize object bymatching multichannel magnitude distributions similarities underspecific texture representation. The existing problems, however,obstruct the application prospect in this way. On the parameterizedconsideration, Gamma models used to be adopted for modelingGabor magnitude textures. But, it seems impracticable to useaccurate intrinsic metric for model-based match, because of com-plex geometry structure of Gamma family and high computationalcomplexity in computing information distance on Gamma manifold[13,14]. Similar problem also appears in the case where eachchannel magnitude set is modeled as a sampling realization of aprobability density function lying on a nonparametric statisticalmanifold [15]. On the non-parameterized consideration, the multi-dimensional histograms are widely applied to multichannel data orfeature sets. But, the loss of spatial information, over-sparsenessand high computational cost degrade the performance of thedescriptors in similarity matching [9].

Human texture discrimination in terms of texture statisticalproperties has been investigated in past few decades. A basicconclusion is that the textures in gray-level images are discrimi-nated spontaneously only if they differ in second order moments.Classical co-occurrence matrix methods aim to characterize imagetexture by use of spatial information of original pixel features[15,16,40]. The most popular second-order statistical features fortexture analysis are derived from gray level co-occurrence matrix(GLCM), originally addressed by Haralick et al. [15]. But it is onlysuitable for single band image. Rosenfeld et al. [16] generalizedGLCM to multiband co-occurrence matrix (MBCM), the mostappropriate descriptor for multispectral texture in multibandimage. Unfortunately, MBCM is impracticable for image withdozens of spectral bands, because the computational complexityof MBCM extraction and dissimilarity quantization exponentiallygrows with the number of spectral bands and the number ofquantization levels of spectral components [16,17]. Palm [18]firstly simplified joint co-occurrence probability model of colortexture by introducing integrative color co-occurrence matrices(CCMs) and treating them as the generalization of GLCM. Similartechniques were exploited by Arvis et al. and Muselet and Macaire[19,20]. According to the mode of considered spectral band pairs,the CCMs can be categorized into single-channel CCMs (SCMs) andmulti-channel CCMs (MCMs). Most of above approaches aim touse the vectorized normalized Haralick features. But, the inherentshortages hinder them from getting the better performance insome more challenging tasks. Then the actual problem is theignorance of the best combination of existing classical featuresand the incomplete utilization of the statistical information ofassociated co-occurrence matrices. Moreover, the co-occurrenceinformation extracted from raw pixel features tends to beimpacted by some imaging condition changes. Therefore, thetraditional co-occurrence matrix methods tend to underperformin the recognition tasks where these changes are involved.

Comparative studies showed that when applied to textureclassification, the approach based on multidimensional co-occur-rence matrices has certain advantages to wavelet-based opponents[21,23,40]. This inspires us to develop a more competitive co-occurrence matrix method using robust Gabor magnitude featuresrather than raw pixel features for recognition tasks. Because themagnitude distributions of filtered images are biased, originalmagnitude quantization tends to obtain over-sparse co-occurrencedistribution that is not conducive to significant similarity matching[20]. For this reason, we rescale each magnitude image by atransformation determined by channel-adaptive truncation andlevel parameters. Each pair of channel-adaptive parameters is theoptimal solution of proposed learning algorithm that minimizes the

cumulative sum of absolute kurtosis and skewness of a collection ofchannel magnitude sets. Taking the complexity into account, wechoose the direction-based grouping scheme to limit multichannelco-occurrence information extraction in the grouped rescaledmagnitude images. Three interpolation-free neighborhood modesare proposed for possible manner to count co-occurring featurepairs in a local rhombic region of a center pixel on a pair of rescaledmagnitude images. Thereby the model of MGMCMs is defined as ageneralization of MCMs, and a novel object representation isintroduced in the pattern of MGMCMs-characterizing Gabor mag-nitude textures. We treat each MGMCM as a probability distribu-tion lying on a multinomial manifold (i.e. simplex) [21]. Thus, anobject image can be further represented as a point on a productmultinomial manifold. For algorithmic implementation, the geode-sic distance metric of each factor multinomial manifold is extendedto its closure. The p-order Minkowski co-occurrence informationmetric is then built by the extended factor distances for matchpurpose.

The main contributions of our method include: (1) a novelimage descriptor MGMCMs is designed to characterize multi-channel magnitude texture for object representation and similar-ity match; (2) an exploratory learning algorithm is designed forlearning channel-adaptive parameters for robust and discrimina-tive co-occurrence information extraction; (3) with multinomialgeometry and metric extension technique, the p-order Minkowskico-occurrence information metric is built for magnitude co-occurrence similarity matching using complete information ofMGMCMs.

The rest of this paper is organized as follows. Section 2 gives abrief review of Gabor magnitude and the models of CCMs. Section3 details Gabor magnitude co-occurrence probability model.Experiments and analysis are presented in Section 4, and Section5 gives some conclusions and future work.

2. A brief review of Gabor magnitude and CCMs

2.1. Gabor filter and Gabor magnitude

In the spatial domain, a 2D Gabor filter (kernel, wavelet orfunction) is a Gaussian kernel function modulated by a sinusoidalplane wave [9,10]:

Gu,nðz;u,v,s,ku,nÞ ¼Jku,nJ

2

s2e�Jku,nJ

2JzJ2=2s2

e�iku,nz�e�s2=2

h i, ð1Þ

where z¼ ðx,yÞ represents a 2-dimensional input point. The para-meters u and v define the orientation and scale of the Gaborkernel. JUJ denotes the norm operator, and s refers to the standarddeviation of the Gaussian window in the kernel. An attractivemathematical property of Gabor filters is that they minimize thejoint uncertainty in space and frequency [5,28]. In general, thefilter bank associated with U directions and V scales are employedto characterize image brightness appearance. The wave vector ku,ncan be defined as ku,n ¼ kneifu , where kn ¼ kmax=f n, fu ¼ pu=U andkmax is the maximum frequency, while f n is the spatial frequencybetween kernels in the frequency domain. The filter bank isdenoted as

Gu,nðUÞ : ðu,nÞA 0,1,. . .,U�1f g � 0,1,. . .,V�1f g� �

, ð2Þ

The acquisition of Gabor magnitude features of an input imageinvolves convoluting the image with each filter of a given Gaborfilter bank under the fixed convolution mask in the size of a� b.The magnitude at the point z0 ¼ ðx0,y0Þ obtained by convoluting

J. Zou et al. / Pattern Recognition 46 (2013) 434–448436

image I sized in A�B and Gabor filter Gu,nðUÞ can be represented as

Ou,nðz; u,v,s,ku,nÞ ¼ JXb=2�1

x ¼ �b=2

Xa=2�1

y ¼ �a=2

Iðx0þx,y0þyÞGu,nðx,y; u,v,s,ku,nÞJ,

ð3Þ

and the output of (u,v)-channel magnitude image is then denotedas

Ou,nðIÞ ¼ Ou,nðzÞ : zA 1,. . .,Af g � 1,. . .,Bf g� �

: ð4Þ

Thus, the set of Gabor magnitude images of I outputted from filterbank described by Eq. (2) can be represented by

OðIÞ ¼ Ou,nðIÞ : ðu,nÞA 0,1,. . .,U�1f g � 0,1,. . .,V�1f g� �

: ð5Þ

The complexity of Gabor convolution is determined by the sizeof the convolution mask [23]. Specifically, the complexity ofcalculating the filter response for one point is O(ab), where a andb are the width and height of the mask. If the convolution isconducting on the entire image of size A�B, the complexity isO(abAB). For extracting the complete set of magnitude images of anoriginal image, the complexity will linearly increases w.r.t. thenumber of filter channels. The size of convolution mask is consis-tently set as a¼A/2 and b¼ B/2 in our work.

2.2. Color co-occurrence matrix (CCM)

A CCM is represented as a three-dimensional matrix, where thecolors of any pair are captured in the first two dimensions of thematrix and the spatial distance between them is captured in thethird dimension [18,41]. Let I be a A�B�K color image whose perpixel is coded by a vector of K-dimensional color space. Assume thateach color component of the color space is quantized in W levels. Wedenote the kth channel image of I as Ik, and the set of all channelimages is denoted as CIS(I) ¼ {I1,...IK}. Given a pair of channel imagesðIu,IvÞACIS(I)�CIS(I) and the local image structure parameterized by

a spatial distance d and a neighborhood direction y, the cell entry ofW �W CCM is defined as

Cu,vd,yðw, ~wÞ ¼ Pr Iu

ðz1Þ ¼w4Ivðz2Þ ¼ ~w : 9z1�z29y ¼ d

� �: ð6Þ

Particularly, Cu,vd,y degenerates to a GLCM as K¼1, and then u¼v¼1

uniquely. Cu,vd,y denotes a GLCM-like SCM in the case of Ka1 and

u¼ v, otherwise representing a MCM. Here 9z1�z29y denotes thespatial distance between pixel positions z1,z2A 1,. . .,f Ag � 1,. . .,Bf g

along the direction y: As a second order statistical measurement,the CCMs of a color image characterize the spatial interaction in acolor image between color components. The RGB color space ismost commonly used to extract CCMs from color images foranalysis and application. For a A�B�K color image, the memorycomplexity and computational complexity of extracting arbitrarilya co-occurrence frequency-based CCM are O(W2) and O(AB)respectively [20,31].

3. Gabor magnitude co-occurrence probability models

Gabor filters have many merits such as localizability, orienta-tion selectivity and spatial frequency characteristics. These advan-tages make Gabor features to possess the attractions of rotationinvariance and the robustness to local distortions caused byvariance of illumination, pose, part occlusion and small scalechanges [5–11]. As mentioned previously, there are certain dis-advantages to use local or global Gabor features of multichannelfiltered images for recognition tasks. The color (or gray level)of per pixel in a color (or grayscale) image is not only dependenton the reflectance properties of the observed object, but is alsoa function of both the camera and the illumination. So, the CCMs

(or GLCMs) are not robust to varying factors (like pose, scale andillumination etc.), though they commonly served as the descrip-tors combining color and texture of the images [17–20]. Toovercome individual weakness, we integrate two kinds of texturefeatures to build a synergy by use of their relative advantages.

3.1. Magnitude quantization

The over-sparseness of high-dimensional frequency data willcause the larger standard deviation in similarity match [27,42]. Forco-occurrence information description, image blur trick quantizinggray value or each color spectral component only in a few dozen oflevels has the advantage to reduce the sparseness of co-occurrencematrix, besides to lower computational cost. As demonstrated in[9], Gabor magnitudes in a filtered image are typically right-biaseddistributed that is usually fitted by a Gamma distribution. So,uniform quantization on original magnitude value space tends toyield over-sparse co-occurrence matrices, leading to the weakenedmatching performance [27]. We here handle the issue by trans-forming (u,v)-channel magnitude images with the truncation func-tion defined as

Tu,vðOu,nðzÞÞ ¼Ou,nðzÞ if Ou,nðzÞrMu,v

Mu,v if Ou,nðzÞ4Mu,v

(ð7Þ

based on the channel threshold Mu,v at first. And then we rescalethe truncated image in Wu,v levels with the transform:

~Ou,nðzÞ ¼ f loor ðWu,v�1ÞTu,vðOu,nðzÞÞ

Mu,v

� �, ð8Þ

where f loorðUÞ is the function that rounds a value to the nearestinteger less than or equal to itself. Intuitively, the truncationtransforms with appropriate setting of truncation parameterswhich can effectively reduce the biased levels of channel magni-tude sets, while channel magnitude distribution information can bewell preserved. As to level parameters, the use of too small levelparameters will cause more discriminative information of magni-tude textures to be smoothed in the rescaled images. Reversely, theuse of too large level parameters will result in over-sparse co-occurrence distributions as well as higher complexity. For a data orfeature set, the statistics of skewness is a measure to describe thebias of distribution, while the statistics of kurtosis is a measure todescribe the peaked or flat degree of distribution relative to anormal distribution [29]. Fig. 1 shows the feature distributions of(0,0)-channel original magnitude image and five rescaled magni-tude images of an object.

Each magnitude quantization scheme to obtain the rescaledimage shown above is parameterized by the channel truncationparameter M0,0 and the level parameter W0,0. Table 1 shows thestatistics of skewness and kurtosis computed from original mag-nitude set and five quantized magnitude sets.

The illustrated and tabulated results shows that the use of toolarge M0,0 in W0,0-level transform makes the quantized magni-tudes more biased and peaked, especially when a too small W0,0 isadopted. Among five transforms, the one parameterized by W0,0¼

16 and M0,0¼40 yields the smallest skewness and kurtosis with16-level quantization. It implies that the quantized magnitudesare then relatively evenly distributed. Motivated by these results,we use the optimization program:

ða,WÞ ¼ argminaAa;W AW

XIA I

9Sð ~Ou,nðI;MðaÞÞ,WÞ9þ9Kð ~Ou,nðI;MðaÞ,WÞÞ9� �

,

ð9Þ

to obtain (u, v)-channel adaptive truncation parameter Mu,v ¼

MðaÞ and level parameter Wu,v ¼ W in unsupervised manner.In Eq. (9), 9SðUÞ9 and 9KðUÞ9 are absolute skewness and kurtosis

Fig. 1. Original and rescaled magnitude images and feature distributions under different quantization modes (horizontal axis: magnitude or quantized feature, vertical

axis: frequency) (a) original magnitude image and feature distribution, (b) case of W0,0¼32, M0,0¼200, (c) case of W0,0¼32, M0,0¼60, (d) case of W0,0¼16, M0,0¼200,

(e) case of W0,0¼16, M0,0¼40, (f) case of W0,0¼8, M0,0¼40.

Table 1Statistics of original and quantized sets under transforms parameterized by (W0,0,

M0,0).

Statistics Original magnitude set O0,0 and quantized magnitude set ~OM0,0 ,W0,0

0,0

O0,0 ~O200,32

0,0~O

60,32

0,0~O

200,16

0,0~O

40,16

0,0~O

40,8

0,0

Skewness 1.90 2.00 1.81 2.17 1.41 1.50

Kurtosis 6.06 6.43 5.45 7.08 3.65 3.90


operators for quantized (u,v)-channel magnitude set ~Ou,n of animage I under the given level parameter W and a-determiningtruncation parameter MðaÞ, and a and W are the sets of considered

values of a and W respectively. We limit to consider the explora-tory set I composed of few images from each object class, andMðaÞ is the a-quantile of the set consisting of all magnitudes onfiltered images of I. For a recognition task, we can build I by theentire or a reduced training set at least. According to the complex-ity analysis mentioned in this paper, extracting and matchingMGMCMs that relies on the estimated parameters Mu,v and Wu,v

involves high complexity. It manifests that for cross-validationrecognition experiments, the computational cost can be largelyreduced as the acquisition of channel adaptive parameters isbased on a fixed exploratory set. The rationality of using skewnessand kurtosis to construct the object function in Eq. (9) lies in theircapabilities to characterize distribution shapes and computationalconvenience.


3.2. Grouping and neighborhood mode

Unlike color image which only has a few spectral channels, thefiltered magnitude textures of per object image usually involvesdozens of spectral bands. This will bring a large computationalburden for an image to extract MCM-like MGMCMs from thecomplete set of rescaled magnitude images without any restric-tion. As can be seen, only 3 MCMs can describe multichannel colortexture of a RGB image. But, by use of the U�V-channel Gaborfilter bank, C2

U�V MGMCMs are needed for an image to describeGabor magnitude textures by the direct analogy of MCMs. Thecomplexity can be evidently lowered by proper grouping schemethat restricts MGMCMs-based texture description to be conductedin group by group of rescaled magnitude images. It is clear thatthe direction-based and scale-based grouping schemes are select-able to define multichannel co-occurrence probability models todescribe the grouped magnitude textures. However, as the mostcommon 40- or 15-channel (i.e. U¼8, V¼5 or U¼5, V¼3) filterbank [11] is employed, by comparison, the scale-based groupingmakes our approach still bear relatively high computational cost.Therefore we choose the direction-based grouping in the proposedalgorithm. For each object image I in a recognition task, we denotethe set of rescaled magnitude images described by Eq. (8) as

~OðIÞ ¼ ~Ou,nðIÞ : ðu,nÞA 0,1,. . .,U�1f g � 0,1,. . .,V�1f g

n oð10Þ

Then, ~O Ið Þ is divided into U groups and the uth direction group ofquantized magnitude images is expressed as

~OuðIÞ ¼ ~Ou,nðIÞ : nA 0,1,. . .,V�1f g

n o: ð11Þ

Texture can be characterized by textural primitives as unitelements and neighborhoods in which the organization andrelationships between the properties of these primitives aredefined. Pixels are thus treated as the unit elements and co-occurrence information is locally extracted from pixel neighbor-hoods. To capture the spatial information of a center pixel, threeneighborhood modes, i.e. 4-neighborhood (N4), 8-neighborhood(N8) and 12-neighborhood (N12) (illustrated in Fig. 2) are proposedwhich served as possible textural primitives in our method.Compared with traditional modes localized by some points on acircle like N4 [16,18], these local image structures have two traits.First, they are all composed of some actual pixels without theinterpolation points. Second, they offer the structures to capturethe co-occurrence information of a center pixel and its nearestneighbor pixels.

3.3. MGMCMs model and multichannel magnitude texture

representation

After the procedure of magnitude quantization, we can for-mally introduce the models of Gabor magnitude co-occurrencematrices (GMCMs) by using a specific neighborhood mode NðUÞ.The cell entry of a GMCM defined for a group of requantized

Fig. 2. Three neighborhood modes for local co-occu

magnitude images of an object image I is

CIv,v09uðw, ~wÞ ¼ Pr ~Ou,nðz1Þ ¼w4 ~Ou,n0 ðz2Þ ¼ ~w : z2ANðz1Þ

n o,

ðw, ~wÞAWu,n �Wu,n0 : ð12Þ

here z1,z2Af1,. . .,Ag � f1,. . .,Bg, and Wu,n ¼ f0,1,. . .,Wu,n�1g andWu,n0 ¼ f0,1,. . .,Wu,n0�1g are ðu,vÞ- and ðu,v0Þ-channel adaptivelevel sets. The cell CI

v,v09uðw, ~wÞ indicates the co-occurrence prob-ability that in the rescaled magnitude image ~Ou,n0 ðIÞ a pixel z2

whose level ~Ou,n0 ðz2Þ is equal to ~w is located in the neighborhoodNðz1Þ of a pixel z1, while in the rescaled magnitude image ~Ou,nðz1Þ,the level of pixel z1 is equal to w. If vav0, CI

v,v09u represents aMGMCM, otherwise it denotes a single-channel GMCM (SGMCM).Compared with SGMCMs, MGMCMs have the attraction of usingspatial information to capture the correlation between the mag-nitude textures of different channels. The empirical results haveverified that multichannel co-occurrence matrices contain themore discriminative spatial information than single-channelcounterparts [18,19]. The channel-adaptive levels causes that eachMGMCM need not be a square matrix linked to a more naturalform of a co-occurrence probability distribution. So, MGMCMs canbe viewed as a generalization of traditional co-occurrencematrices [15–18]. By above definition, 8 groups of rescaledmagnitude images of an original image can determine 120 GMCMsincluding 8� C2

5 ¼ 80 MGMCMs and 8� C15 ¼ 40 SGMCMs, when

40-channel Gabor filter bank is used. This shows that propergrouping can significantly reduce algorithmic complexity.

According to the manner of the use of co-occurrence informa-tion, texture representation strategy can be categorized into threecategories: single-channel, multichannel and integrative [18–20].In this paper, we highlight multichannel strategy and useMGMCMs for Gabor texture representation and co-occurrencesimilarity matching. The choice of multichannel strategy comesfrom the compromised consideration of algorithmic performanceand computational complexity. The novel Gabor texture repre-sentation characterizes the multichannel magnitude textures of animage I as the MGMCM set

MGMCMðIÞ ¼ CI0,190,. . .,CI

0,V�190,. . .,CI0,19U�1,. . .,CI

V�2,V�19U�1

n o: ð13Þ

Fig. 3 shows the fragments of GMCMs-based texture represen-tations of a building image.

3.4. Co-occurrence information metric

3.4.1. Multinomial manifold and geodesic distance metric extension

According to the theory of differential geometry, the multi-nomial manifold or n-simplex

Pn ¼ y¼ ðy1,. . .,ynþ1ÞARnþ1

þ :Xnþ1

j ¼ 1

yj¼ 1

8<:

9=; ð14Þ

rrence information description and quantization.

Fig. 3. The contrast of same group of original and quantized magnitude images of an object image, and corresponding fragments of GMCMs-representing magnitude

textures: (a) original image; (b) corresponding gray image; (c) original magnitude images corresponding to channels u¼1 and v¼0,1,2,3,4; (d) rescaled magnitude images

obtained by proposed quantization technique, where the optimization algorithm is conducted in the case of a¼ 0:92,0:94,0:96,0:98f g and W ¼ 12,16,20,24f g to obtain

channel-adaptive truncation and level parameters; (e) the fragment of integrative GMCMs-representing magnitude textures, where M00, M11, M22, M33 and M44 form the

fragment of SGMCMs-representing magnitude textures, and M01, M02, M03, M04, M12, M13, M14, M23, M24 and M34, form the fragment of MGMCMs-representing magnitude

textures.



is differentiable homeomorphism to nþ1-dimensional multino-mial distribution family:

pðx1,. . .,xnþ1;C,y1,. . .,ynþ1Þ : xkAZþ ,Xnþ1

k ¼ 1

xk ¼ C,ykARþ ,Xnþ1

k ¼ 1

yk¼ 1

( ),

ð15Þ

where

pðx1,. . .,xnþ1;C,y1,. . .,ynþ1Þ ¼

C!Qnþ 1

k ¼ 1xk !

Qnþ1k ¼ 1 yk

� �xk

, whenXnþ1

k ¼ 1

xk ¼ C

0, otherwise:

8>><>>:

ð16Þ

The geodesic distance on a Riemannian-structured statisticalmanifold is regarded as the most natural information metric toevaluate the dissimilarity between models on the manifold. As Pn

is equipped with Fisher–Riemannian information structure, wecan obtain the geodesic distance

distðy,y0Þ ¼ 2arccosXnþ1

j ¼ 1

ffiffiffiffiffiffiffiffiffiffiffiffiffiyjðy0Þj

q0@

1A, ð17Þ

between any y,y0APn, based on the fact that Pn is isometrichomeomorphism to positive n-sphere. For more details, see [21].

The open simplex Pn is not complete, and its closure Pn ¼Pn [

@Pn is not a differentiable manifold [18], where

@Pn ¼ ðy1,. . .,ynþ1ÞT ARnþ1 : yj

Z0,Ynþ1

j ¼ 1

yj¼ 0,

Xnþ1

j ¼ 1

yj¼ 1

8<:

9=; ð18Þ

is the boundary of Pn. Strictly, the Fisher geometry on Pn is nolonger suitable to quantify the dissimilarity between probabilitymodels as the boundary point is involved. Using the continuity onPn of the vector function distðU,UÞ, we work out the problem byextending the metric to the closure. The extended metric isdenoted as ~distðU,UÞ.

3.4.2. The construction of co-occurrence information metric

A MGMCM is a discrete probability distribution essentiallyreflecting the probabilistic generative mechanism of co-occurringfeature pairs on a pair of grouped quantized magnitude images.When considering all possible two-tuples of associated channellevel sets, we can model the numbers of co-occurring feature pairson a pair of grouped rescaled magnitude image as a samplingrealization of an underlying multinomial distribution. Therefore,we can further characterize object representation with a point ona product multinomial manifold. By the aforementioned proce-dures, we use the concatenated vectorized form

vecðMGMCMðIÞÞ ¼ vec vec CI0,190

� �,. . .,vec CI

0,V�190

� �,. . .,

�vec CI

0,19U�1

� �,. . .,vec CI

V�2,V�19U�1

� ��ð19Þ

to represent image I as a point on a product multinomial manifold.Here, MGMCMðIÞ is the set of MGMCMs of image I, and vec CI

v,v09u

� �is the vectorized form of CI

v,v09u, and vecðUÞ is the vectorizationoperator that creates a column vector from a matrix.

In practice, the cell entries of a MGMCM of an image areunknown. This is trivial issue because the co-occurrence frequen-cies are the maximum likelihood estimates (MLEs) of the corre-sponding co-occurrence probabilities. The consistency of MLE [30]implies the feasibility to use co-occurrence frequency matrices foranalysis and computation. Note that a co-occurrence frequencymatrix generally contains null cell entries, and then its vectorizedform is a point of the closure of a simplex. The invariance of MLE[30] shows the rationality of using co-occurrence frequencies-baseddissimilarity measurements in matching step. So, we use p-order

(pZ1) Minkowski distance

dpðMGMCMðIÞ,MGMCMðI0ÞÞ ¼X

v,v09u

~distp

vec CIv,v09u

� �,vec CI0

v,v09u

� �� 24

35

1=p

ð20Þ

in proposed algorithm to quantify the dissimilarity between twoimages I and I0.

The complexity to obtain a frequency-based MGMCM is O(AB)that is unrelated to channel-adaptive parameters and the mode oflocal neighborhood, where A and B are the height and width oforiginal image I. But, the channel level parameters are related tothe complexity of information distance quantization betweenproduct co-occurrence frequency vectors. There is a heavy com-plexity

Pv,v09uOððWu,v �Wu,v0 Þ

3Þ finding the minimum matching

between product co-occurrence frequency-based models [31,17].

4. Experiments

4.1. Experimental setup

In this section, we study the performance of the proposedmethod, and compare it with other popular algorithms. Five publiclyavailable image databases: the Yale face database [32], the FERETface database [33], the PolyU palmprint database [34], the COIL-20object database [35], and the Zurich buildings database (ZuBuD)[36] are employed. For our experiments, the images of the first fourdatabases are cropped in the size of 48�48 while the images ofZuBuD are cropped to 48�32. A series of validation experiments arefirstly conducted on the Yale database to show the performancevariations of our algorithm with respect to (w.r.t.) different para-meter settings. The performance evaluation experiments are imple-mented on the FERET, PolyU, COIL-20 and ZuBuD databases, wherewe compare our algorithm with PCA (Principle Component Analysis)[37], LDA (Linear Discriminant Analysis) [38], KDA (Linear Discri-minant Analysis with Gaussian Kernel) [39], GaborþPCA [11],GaborþLDA [11], GaborþKDA [11], MCPHM (Multichannel ProductHistogram Match) [12], MGFRþ(2D)2PCA [22], GMPTRþNLDA [9],CCMCFM (CCMs Classic Features Match) [18]. Here, PCA, LDA andKDA are classical subspace methods for feature extraction of theconcatenated raw pixel gray values of original image. GaborþPCA,GaborþLDA and GaborþKDA are the algorithms applying PCA, LDAand KDA to globally concatenated magnitude features. The MCPHMwas designed for object recognition based on product histogrammultichannel texture representation in nonparametric statisticalmanifold framework. By the use of 40-channel Gabor magnitudefeatures, we execute MCPHM by matching product magnitudehistograms, and the root-mean-squared (rms) geodesic distance isused as the information metric. Corresponding to each object image,each product magnitude histogram is stringed by forty 30-bin factorhistograms. The purpose to compare MCPHM with our algorithm isto show the performance difference between histogram approachand the proposed co-occurrence matrix approach in that they are allbased on the multinomial geometry. Using similar groupingschemes for Gabor filtered images as adopted in our approach,two kinds of Gabor magnitude based face representations, i.e. EGFRand MGFR were presented, and the algorithm (2D)2PCA were usedfor feature extraction under such representations [22]. We select thebest performer, MGFRþ(2D)2PCA for comparative purposes. In [14],Gabor magnitude-based texture representation (GMTR) and Gaborphase-based texture representation (GPTR) were introduced todesign GMPTR pattern for characterizing face representation.GMPTRþNLDA is the algorithm to conduct NLDA for GMPTR-representing features for further dimension reduction. For imple-menting GMPTRþNLDA, 4�4 regional partition scheme is adopted

1 2 30

10

20

30

40

50

60

70

80

90

100

Number of train

Rec

ogni

tion

rate

(%)

U=5, V=3U=6, V=4U=8, V=5

Fig. 5. Recognition performance comparison w.r.t. the used.

1 2 30

10

20

30

40

50

60

70

80

90

100

Number of train

Rec

ogni

tion

rate

(%)

N4N8N12

Fig. 6. Recognition performance comparison w.r.t. the used filter banks neighborhood

modes.


for every magnitude image while 2�2 regional partition scheme isused for every phase image. CCMCFM is the color texture classifica-tion approach using Haralick features of CCMs. The algorithm isspecially executed on ZuBuD to be compared with our algorithmthat has no use of any color information. To implement CCMCFM,eight Haralick features, i.e. contrast, correlation, entropy, dissim-ilarity, sum variances, homogeneity, information measure of corre-lation I and II [15], are extracted from each CCM of each originalimage, and then the normalized forms of these features areconcatenated to build a 48-dimensional feature for the nearestneighbor (1NN) match.

4.2. Face recognition on Yale and FERET databases

4.2.1. Results on the Yale database

The Yale dataset consists of 165 images of 15 subjects undervarious facial expressions and lighting conditions. Fig. 4 shows allsample images of a subject of the Yale database.

We first show the recognition performance variations of ouralgorithm w.r.t. different settings of Gabor filter banks. For thisconsideration, we adopt nine filter banks of which the first threehave the same parameters s¼ 2p, kmax ¼ p=2, f ¼

ffiffiffi2p

while theyhave different settings of U and V: U¼5, V¼3; U¼6, V¼4; U¼8,V¼5. The other six filter banks are the only ones that replace themaximum frequency parameter kmax ¼ p=2 with p=4 and p, wherethe setting of kmax is the main factor that answers for the sensitivityof some existing algorithms [9]. We serve the first k (k¼1, 2, 3)image(s) of each subject as training samples and the rest are usedfor testing. For learning channel adaptive parameters, we use thesetting of a¼ 0:92,0:94,0:96,0:98f g and W ¼ 12,16,20,24f g andkeep it for the rest of the experiments. The N8 neighborhood modeis adopted for MGMCMs extraction, and the direct sum metric d1 isused to conduct our algorithm for the best face matching.

Experimental result shows that the recognition rate only varieswith the combination of U and V while it is unchanged w.r.t. theconsidered settings of kmax. The rate variations are shown in Fig. 5.

According to the recognition accuracies, the 40-channel filterbank brings the best recognition result that accords with what isreported in [11]. Compared with the performances of other twofilter banks, the recognition advantage of 40-channel filter bank isevidently weakened as more training images are available. Whenusing single training sample, the magnitude textures output from24-channel filter bank bring the higher recognition rate than thatoutput from 15-channel filter bank in sets of MGMCMs matching.But, the latter lightly outperforms the former in the cases wheretwo or three training samples are used. Based on these results, the40-channel filter bank parameterized by U¼8, V¼5 andkmax ¼ p=2 is consistently adopted in the remaining experiments.

We also consider algorithmic performance variation w.r.t. theuse of different neighborhood modes. In this respect, we use N4, N8

and N12 neighborhood in turn to count the co-occurring featurepairs on every pair of grouped quantized magnitude images forMGMCMs extraction. Following the above experimental setting,the resulting recognition rates are illustrated in Fig. 6.

We can see the recognition performances based on three neigh-borhood mode are almost identical. As the first image of per subjectis used as the training sample, three neighborhood modes producethe same recognition rate of 62%. The same rate soars to 91.11% inthe case where the first two images of per person are offered for

Fig. 4. Image variations of a sub

training. However, the recognition rates obtained by using threetraining samples have no significant increase. Then the rates basedon the uses of N4, N8 and N12 are 93.33%, 92.50% and 92.50%correspondingly.

The channel adaptive parameters obtained by the proposedlearning algorithm are the main factor to determine the discrimi-nation of sets of MGMCMs of involved images of a recognitiontask. We now conduct the comparison for the outputs of ouralgorithm and 40 empirical settings in a 5-fold cross-validationexperiment. Two images of per subject are randomly selected ineach fold as the training samples while the remaining imagesserved as the testing samples. We take the training sample setformed in each fold as the exploratory set while each setting ofunified truncation and level parameters of all channels is a certain

ject from the Yale database.

Table 2The numbers of GMCMs of each face image needed for matching under different

strategies.

Grouping scheme Matching strategy

Single-channel Multichannel Integrative

Direction-based 40 80 120

Scale-based 40 140 180

Direction-scale-based 40 0 40

1 1.5 2 2.5 3 3.5 4 4.5 565

70

75

80

85

90

95

100

p

Rec

ogni

tion

rate

(%)

Direction-scale based single-channelDirection based mulchannelScale based multichannelDirection based integrativeScale based integrative

Fig. 8. Performance comparison of algorithms formulated by different combina-

tions of associated technical elements.


two-tuple of the given truncation parameter set T and levelparameter set W. Here W ¼ 12,16,20,24f g and T is the set from10 to 100 by the step of 10. Under each learned or given setting,the N4 mode is used for MGMCMs extraction while the d1 isadopted for similarity matching between sets of MGMCMs. InFig. 7, the average recognition rates are illustrated by the verticalcoordinates of some points on four level-determined fold lines anda horizontal straight line.

We can see that the set of MGMCMs of per image extractedbased on the learned channel-adaptive parameters owns the morediscriminative spatial information, in terms of the recognitionrates and the meaning of average. Four intertwined fold linesreflect if any pair of truncation and level parameters are identi-cally set for 40 channel magnitude transforms described by Eq. (8),the performance of our algorithm is very robust to level para-meter. The very similar extension trends of four fold lines showsthat for a given channel-unified level parameter, different settingsof channel-unified truncation parameter brings the significantperformance variation of the resulting algorithms. As illustratedin Fig. 7, the sharp variation occurs when truncation parametervaries from 10 to 30, and this trend evidently slow down whentruncation parameter is larger than 30, especially when theparameter is larger than 60.

Using the N8 neighborhood mode, we now empirically evaluatealgorithmic performance w.r.t. different combinations of groupingscheme, match strategy and p-distance metric. Besides the direc-tion-based grouping, we also consider the scale-based and thedirection-scale-based groupings. For the first two groupings, weadopt both multichannel and integrative GMCMs-based matchstrategies. Note that the single-channel match strategy based onany grouping scheme is equivalent to the one made under thedirection-scale-based grouping. To match the albums of 40-chan-nel Gabor magnitude textures under different strategies for facerecognition, the numbers of involved GMCMs of each face imageextracted under different grouping schemes are listed in Table 2.

In this experiment, nine p-information metrics parameterizedby p from 1 to 5 by the step of 0.5 are used for the associatedmatch trials. The first two images of per subject are used fortraining, and the rest are used for testing. The recognition ratesobtained in all match trials are shown in Fig. 8.

As can be observed, the recognition performances obtained bymultichannel and integrative match strategies are far better thanthat of single-channel match strategy. With the single-channel

10 20 30 40 50 60 70 80 90 10065

70

75

80

85

90

Unified truncation parameter

Rec

ogni

tion

rate

(%)

12-Level16-Level20-Level24-LevelAdaptive learning

Fig. 7. MGMCMs’ discrimination comparison w.r.t. the used channel parameters

obtained by different manners.

strategy, the direct sum metric only gains the highest recognitionrate 74.07% (p¼1). But, when combining direction-based groupingwith multichannel and integrative match strategies in turn, we canget the top rates 92.59% (p¼2.3) and 94.07% (p¼1.2) respectively.The higher recognition rates appear in the cases of combining thescale-based grouping with multichannel and integrative matchstrategies, where the multichannel match gains the highest rate95.56% corresponding to p¼3.5 and p¼4, and the integrative matchgets the rate 94.81% corresponding to p¼2, 2.5 and 3. One can seethat different combinations cause different rate fluctuations w.r.t.the varying p-value. The multichannel strategy based on direction-based grouping scheme gives the most robust recognition perfor-mance, while the single-channel and integrative match strategiesbring the larger rate fluctuations. In addition, the integrated co-occurrence information is more discriminative that brings the higherrecognition rates only as p value is smaller. As tabulated above, theconduction of integrative match strategy, however, is supported bymore GMCMs. This reflects that the proposed algorithm balanceswell the algorithmic performance and complexity, which should bepreferred in recognition tasks where large-scale image databases areinvolved. The widely used direct sum metric and mean-square-rootmetric (d2) get the promising recognition results in every trial, so wecontinue to try the first in next experiment.

In our approach, each MGMCM is treated as a probabilitydistribution. So, common distribution based distance metrics, suchas Lp-Euclidean distance [24], w2-statistics [24], Jeffery divergence[24], Bhattacharyya distance [25] and Kolmogorov–Smirnov distance[26] can be used to measure the information distance between factorMGMCMs under a specific magnitude texture representation. Differ-ently, the proposed p-information metric is derived from multinomialFisher geometry, the most natural geometry for the model space ofMGMCMs. We now reveal the performance differences among seven


direct sum distance metrics. Apart from the proposed directsum metric, the rest of six metrics are the ones whose factors areL1-Euclidean distance, L2-Euclidean distance,w2-statistics, Jefferydivergence, Bhattacharyya distance and Kolmogorov–Smirnov dis-tance. The N12 neighborhood mode is still adopted. We use the first k

(k¼1, 2, 3) images of per subject for training while the remaining 11-k images are used for testing. We show the recognition rates in Fig. 9.

We can see that the proposed direct sum metric is the bestperformer. The direct sum metrics built by factor L1-Euclideandistance and w2-statistics are comparable. Clearly, the Kolmogorov–Smirnov distance-based direct sum metric gets the moderaterecognition performance among seven metrics. According to therecognition rates, the Jeffery divergence-based direct sum metricis comparable to the rest of the metrics whose factors areL2-Euclidean distance and Bhattacharyya distance. In addition, theL2-Euclidean distance based direct sum metric lightly outperformsthe opponent built by factor Bhattacharyya distances. From Fig. 7,we find that geometry adaptive co-occurrence information metricsd1 and d2 only produce slight performance difference in the case

1 2 330

40

50

60

70

80

90

100

Number of train

Rec

ogni

tion

rate

(%)

L1 distanceL2 distance

Jeffery divergenceChi-square statisticsBhattacharyya distance

Kolmogorov-Smirnov distanceProposed

Fig. 9. Performance comparison of seven direct sum metrics.

Fig. 10. Image variations of samp

Table 3Recognition accuracy (mean7std) % of ten met

Method Number of train k

1 2

PCA 32.19713.16(52) 4

LDA – 3

KDA – 4

GaborþPCA 38.45711.88(79) 4

GaborþLDA – 4

GaborþKDA – 5

MGFRþ(2D)2PCA 43.6278.34(6�12) 5

GMPTRþNLDA – 4

MCPHM 32.24712.53(1200) 4

Proposed 43.2778.42 5

the first two images of per person are used as training samples.However, there is a significant performance difference existing inthe cases as the L1- and L2-Euclidean distance-built direct summetrics are used for similarity match of non-Euclidean structuredvectorized forms of MGMCMs.

Based on the experimental results on Yale database, we use N4

neighborhood mode together with the direct sum metric d1 forfurther experimental evaluation of algorithmic performance. Thecross-validation is carried out to get the average recognition andstandard deviation of each algorithm on each object database. Forthis purpose, k (k¼1, 2, 3 for FERET, PolyU and ZuBuD while k¼1,3, 5 for COIL-20) image(s) of each object is randomly selected fortraining and the remaining images for testing. Except the unified10-fold cross-validation on COIL-20, the multiplicity T of cross-validation is set as the number of images of per object as k¼1otherwise T¼10 (k¼2, 3) for FERET, PolyU, and ZuBuD databases.In the learning of adaptive channel parameters Mu,v and Wu,v foreach database, we use the exploratory sample set only consistingof the first image of per object.

4.2.2. Results on the FERET database

The FERET database is one of the standard databases used intesting and evaluating face recognition algorithms. For our experi-ments, we chose a subset of the FERET database that includes 1400images of 200 subjects each one contributes seven images. Theseven images of each seven consist of 3 front images, with variedilluminations and facial expressions, and four profile images ran-ging from 7151 to 7251 pose. Seven samples with different factorvariations of seven persons from the database are shown in Fig. 10.

To evaluate the performance of our algorithm on the FERETdatabase, we compare it with PCA, LDA, KDA, GaborþPCA, GaborþLDA, GaborþKDA, MCPHM, MGFRþ (2D)2PCA and GMPTRþNLDA.For Gabor magnitude based subspace learners, the downsamplingwith the downsampling rate 2�2¼4 is conducted for each magni-tude image to solve the storage problem. With the detailed experi-mental settings, the recognition rates, standard deviations, andcorresponding dimensions of the reduced feature spaces are shownin Table 3. The bold values indicate that the corresponding algorithmsget the best performances under specific condition, so does in lattertables.

les from the FERET database.

hods conducted on the FERET face database.

3

2.7977.02(65) 55.11712.46(78)

7.0575.35(14) 48.23712.49(12)

2.3077.12(399) 54.75712.97(599)

9.8076.45(90) 61.01712.27(88)

1.2577.64(62) 54.64714.39(70)

1.4075.94(399) 62.80712.39(599)

5.7975.00(6�7) 67.93712.15(6�9)

5.1279.38 (199) 60.55714.14 (199)

3.7777.84(1200) 56.31714.19 (1200)

6.9079.16 68.55710.79


One can observe that our method gains the highest averagerecognition rates when two or three images of per person arerandomly selected for training in 10-fold cross-validation. Itsperformance is still superior to that of other algorithms exceptMGFRþ(2D)2PCA in the case where single image of per person israndomly selected for training in 7-fold cross-validation. Then theaverage recognition rate of our algorithm is slightly less than therate obtained by MGFRþ(2D)2PCA. The use of raw gray valuefeatures makes the performances of PCA, LDA and KDA inferior tothose of the counterparts based on the use of magnitude features.Due to the fact that GMPTRþNLDA ignores the geometries ofGamma family and generalized Gaussian family to GMPTR pat-terns, the algorithm only get the moderate performance when twoor three images of per person are randomly used as the trainingsamples. An evident drawback of MCPHM is the neglect of spatialinformation in magnitude images. As a result, the algorithm onlyoutperforms PCA, LDA, KDA and GaborþLDA while it is compar-able to GMPTRþNLDA.

4.3. Palmprint recognition on the PolyU database

The PolyU palmprint database contains 600 Gy scale images of100 different palms with six samples for each palm. Six samplesfrom each of these palms were collected in two sessions, wherethe first three samples were captured in the first session, and theother three in the second session. The average interval betweenthe first and the second collection was two months. Whenpalmprints are collected in different sessions, direction andamount of stretching of a palm may vary so that even palmprintsfrom the same palm may have a little rotation and translation. Inaddition, palms differ in size and the lighting, translation, andorientation conditions in both sessions are very different. Allsamples of the first two palms are shown in Fig. 11.

According to the aforementioned experimental settings, theresulting average recognition rates and standard deviations, andcorresponding feature dimensions are shown in Table 4.

Fig. 11. The complete samples of two

Table 4Recognition accuracy (mean7std) % of ten me


1 2

PCA 25.1076.02(78) 45

LDA – 44

KDA – 46

GaborþPCA 25.7076.52(78) 47

GaborþLDA – 46

GaborþKDA – 47

MGFRþ(2D)2PCA 26.1076.27 (7�7) 47

GMPTRþNLDA – 49

MCPHM 26.9376.74(1200) 50

Proposed 27.5776.69 51

In terms of the recognition rate, our method gains the bestperformance in all test cases. It is the fact that the subtle texturedifferences of palmprints answers for their discriminant. As aresult, including our method, MGFRþ(2D)2PCA, GMPTRþNLDAand MCPHM based on characterizing Gabor texture, all getrelatively good recognition results. Among three raw gray valuefeature-based methods, the kernel subspace method KDA is thebest performer in the cases two or three images of per individualare used as training samples. Compared to KDA, PCA gets theslightly worse result but it is still superior to that of LDA. Similarcircumstances also appear in the recognition results of threeGabor feature-based subspace methods. Clearly, the use of Gaborfeatures makes subspace methods more competitive compared tothe use of raw pixel features.

4.4. Object recognition on the COIL-20 database

COIL-20 is a database that consists of 1440 grayscale images of20 objects. The objects are placed on a motorized turntable againsta black background. Images of the objects were taken at poseintervals of 5 degrees, and this corresponds to 72 poses per object.The samples of 20 objects of this database are shown in Fig.12.

As mentioned above, we randomly choose k (k¼1, 3, 5) imagesof per object of the database for training and the rest are used fortesting. Table 5 records the recognition results obtained by 10-foldcross-validation.

We can see that the shape characteristics of the objects in COIL-20 are evident and discriminative besides their texture character-istics. Although our method highlights the use of multispectralmagnitude texture rather than object shape, the excellent imagedescriptor MGMCMs makes it to be the best performer on thisdatabase according to the recognition accuracies. The Gabor texturebased algorithms MGFRþ(2D)2PCA and GMPTRþNLDA are compar-able, and get relatively good results. However, their performancesare merely comparable to the Gabor feature-based subspace methodGaborþPCA and GaborþKDA. MCPHM fails to get satisfactory

palms of PolyU palmprint database.

thods conducted on the PolyU database.

3

.7079.23(78) 58.8778.94(80)

.8579.65(200) 54.1077.78(300)

.0379.30(199) 58.9378.26(299)

.4579.93(100) 61.1078.71(100)

.0579.15(17) 58.7778.68(17)

.6379.80(399) 61.2378.81(299)

.8379.99(9�6) 61.5378.93(11�6)

.97710.11 (99) 63.5376.35 (99)

.38710.10(1200) 64.87710.10 (1200)

.10710.11 65.30710.21

Fig. 13. Samples of four buildings of ZuBuD.

Fig. 12. Samples of 20 objects from COIL-20 database.

Table 5Recognition accuracy (mean7std) % of ten methods conducted on the COIL-20 database.


1 3 5

PCA 63.6972.48(19) 80.1974.00(12) 86.1373.18(17)

LDA – 77.7574.95(15) 83.6373.53(18)

KDA – 79.3674.34(59) 84.2873.27(99)

GaborþPCA 66.6972.23(19) 83.3674.20(38) 87.4072.81(25)

GaborþLDA – 79.2773.84(16) 84.4373.50(18)

GaborþKDA – 83.4774.17(59) 87.5573.02(99)

MGFRþ(2D)2PCA 64.6673.56 (6�12) 83.7574.82(9�6) 86.6272.88(11�6)

GMPTRþNLDA – 82.0773.81 (19) 86.8572.89 (19)

MCPHM 66.1273.05(1200) 80.4772.92(1200) 83.9372.93 (1200)

Proposed 70.7872.91 87.2573.61 89.1373.59

Table 6Recognition accuracy (mean7std) % of eleven methods conducted on the Zurich buildings database.


1 2 3

PCA 41.6476.16(19) 56.0476.50(25) 64.5377.09(26)

LDA – 48.0975.73(31) 55.8276.54(36)

KDA – 49.1876.23(401) 57.1676.76(602)

GaborþPCA 49.1077.17(49) 62.8076.74(40) 70.8576.52(40)

GaborþLDA – 58.3578.45(100) 67.0677.86(110)

GaborþKDA – 66.3577.29(401) 74.2076.98(602)

MGFRþ(2D)2PCA 58.0378.56 (6�6) 72.9977.42(9�6) 81.0277.85(6�12)

GMPTRþNLDA – 50.2576.90 (200) 60.2278.33 (200)

MCPHM 54.4578.33(1200) 69.2478.85(1200) 77.2477.27 (1200)

CCMCFM 42.0974.43 (48) 47.6375.10(48) 54.9876.13 (48)

Proposed 62.6678.24 77.1178.23 84.6577.53


recognition result in the cases using two or three samples of persubject for learning, regardless of the moderate performance in thecase of training based on single sample of per subject. Still, Gaborfeature-based methods GaborþPCA, GaborþLDA and GaborþKDAoutperform respective opponents PCA, LDA and KDA with the higherrecognition rates.

4.5. Building recognition on the ZuBuD database

The Zurich buildings database (ZuBuD) contains 1005 color imagesof 201 buildings (5 images per building). Each of the buildings has

been photographed from five different viewpoints. The 1005 imageshave been captured in different seasons and different weather. Therepresentative samples of four buildings are shown in Fig. 13.

On this database, we compare our algorithm with PCA, LDA,KDA, GaborþPCA, GaborþLDA, GaborþKDA, MGFRþ(2D)2PCA,GMPTRþNLDA, MCPHM and CCMCFM for performance evalua-tion. Except that CCMCFM adopts original color images to extractthe Haralick features from CCMs, the rest of algorithms areimplemented on the transformed gray image set. The recognitionrates and standard deviations, and corresponding feature dimen-sions are shown in Table 6.


We can find that our method gets the best performance andholds significant advantage to other algorithms in terms of therecognition rates. Among gray value-based learning algorithms,LDA and KDA do not obtain satisfactory recognition results andtheir performances are inferior to that of PCA. But, the situation isvery different as these algorithms are applied to concatenatedmagnitude features. Then GaborþKDA outperforms GaborþPCAwith the higher recognition rates, while GaborþLDA narrows theperformance difference relative to PCA. We note that MCPHM andMGFRþ(2D)2PCA obtain promising performances on this database.They gain the recognition rates of 77.24% and 81.02% as threesample images of per building are used for training. But,GMPTRþNLDA only gets a recession recognition performance.Evidently, CCMCFM underperforms here in terms of the recognitionrates. The result shows that the spatial co-occurrence informationof Gabor magnitude textures of buildings is very discriminativethough the color information is then entirely neglected. Combinedwith the previous results, the prominent performance of ourmethod is benefited from the discriminative descriptor MGMCMsand the geometry adaptive information integration system.

5. Conclusions and future work

In this paper, we present a novel object recognition approachusing the co-occurrence similarities of multichannel Gabor mag-nitude textures. The novel image descriptor MGMCMs is designedfor magnitude texture description and similarity matching. Inview of the bias characteristics of Gabor magnitude distribution,we design an exploratory learning algorithm to obtain channel-adaptive transforms to rescale magnitude images for discrimina-tive and robust MGMCMs extraction. The direction-based

grouping scheme is adopted for computational efficiency whilethree neighborhood modes are proposed to extract local magni-tude co-occurrence information of a center pixel with its nearestneighbor pixels. The p-order Minkowski co-occurrence informa-tion metric is designed for similarity matching between sets of co-occurrence frequency-based MGMCMs. The satisfactory recogni-tion results on several tested object databases indicate thefollowing facts. First, the descriptor MGMCMs is discriminative,and the proposed p-metric can serve as a competent informationdivergence integration system for sets of MGMCMs. Second, theformulation of our algorithm is appropriate as the performanceand complexity are simultaneously considered. Third, ourapproach is competitive as applied to different types of objectdatabases, because MGMCMs integrate the merits of both robustGabor magnitude features and discriminative magnitude textures.

For further study, we plan to generalize the method to asupervised framework, and exploit other applications like imageretrieval.

Acknowledgments

The authors acknowledge the editors and the anonymousreviewers for providing constructive suggestions and valuablecomments which considerably improved the quality of this paper.This paper was supported by National Natural Science Foundationof China (Grant No. 90820004, 60472061, 60632050, 71171003),National 863 Project (Grant No. 2006AA04Z238, 200 6AA01Z119),Ministry of Industry and Information Technology of China (GrantNo. E0310/1112/ JC01), and Anhui Natural Science Foundation(Grant No. KJ2011B022).

Appendix A. (Minkowski’s sum inequality [43])

If p41 and xi,yi40, then Minkowski’s sum inequality states that

Xn

i ¼ 1

9xiþyi9p

!1=p

rXn

i ¼ 1

9xi9p

!1=p

þXn

i ¼ 1

9yi9p

!1=p

: ð21Þ

Equality holds if and only if the sequences x1,x2,. . . and y1,y2,. . . are proportional.

Appendix B

Proposition 1. Let MGMCMK ¼ vecðMGMCMðI1ÞÞ,vecðMGMCMðI2ÞÞ,. . .,vecðMGMCMðIK ÞÞ� �

be the set of co-occurrence frequency-basedvectorized forms of MGMCMs of K images I1,. . .,IK , whose element vecðMGMCMðIiÞÞ is described in Eq. (19). Then the p-order Minkowskiextended geodesic distance dpðU,UÞ defined in Eq. (20) is a metric of MGMCMK �

QP¼

QU�1u ¼ 0

Qvav0Pnv,v09u

, where nv,v09u ¼Wv,u �Wv0 ,u�1and v,v0A 0,1,. . .,V�1f g.

Proof. First, we compactifyQ

P to approximate product simplex closure by rounding each factor Pnv,v0 9uin turn, where the ‘‘rounding’’

procedure is detailed in [21]. For any e40, let the e-rounded Pnv,v0 9ube Penv,v0 9u

that consists of points with a single, but not more than one

component having zero probability. In this way, Penv,v0 9uis a geodesically complete compact manifold under the same Fisher–Riemannian

metric as on Pnv,v0 9u[21]. Thus, we obtain the compactified product manifold

QPe¼QU�1

u ¼ 0

Qvav0P

env,v0 9u

,dp

� �, where the p-order Minkowski

geodesic distance metric dp is built by factor geodesic distance distv,v09uðU,UÞ on Penv,v0 9u, and uA 0,1,. . .,U�1f g,v,v0A 0,1,. . .,V�1f g.

Second, we approximate the product co-occurrence models of MGMCMK with the points onQ

Pe

by injective:

re : MGMCMK-Q

Pe,

vecðMGMCMðIiÞÞ/

vecðMGMCMðIiÞÞ ¼ vecðMGMCMðIiÞÞ, if vecðMGMCMðIiÞÞAQ

Pe,

vecðMGMCMðIiÞÞAQ

v,v09uBe vec CIi

v,v09u

� �� \ Penv,v0 9u

� �, otherwise:

8><>: ð22Þ


here Be vec CIi

v,v09u

� �� is the e-ball whose center is vec CIi

v,v09u

� �APnv,v09u

under Euclidean metric of Rnv,v09uþ1. For any vecðMGMCMðIiÞÞ,

vecðMGMCMðIjÞÞ,vecðMGMCMðIkÞÞAMGMCMK , the definition and continuity of ~distðU,UÞ described in Section 3.4.1 and the definition of

distðU,UÞ implies

dpðvecðMGMCMðIiÞÞ,vecðMGMCMðIkÞÞÞ ¼ dp reðvecðMGMCMðIiÞÞÞ,reðvecðMGMCMðIkÞÞÞ

dpðvecðMGMCMðIkÞÞ,vecðMGMCMðIjÞÞÞ ¼ dp reðvecðMGMCMðIkÞÞÞ,reðvecðMGMCMðIjÞÞÞ

dpðvecðMGMCMðIiÞÞ,vecðMGMCMðIjÞÞÞ ¼ dp reðvecðMGMCMðIiÞÞÞ,reðvecðMGMCMðIjÞÞÞ

ð23Þ

as e is small enough.

Finally, we prove dpðU,UÞ is a metric of MGMCMK by only indicating that it satisfies triangle inequality because the non-negativity and

symmetry of the bivariate vector function then holds. To this end, the sum

dpðvecðMGMCMðIiÞÞ,vecðMGMCMðIkÞÞÞþdpðvecðMGMCMðIkÞÞ,vecðMGMCMðIjÞÞÞ ð24Þ

is firstly expressed as

dp reðvecðMGMCMðIiÞÞÞ,reðvecðMGMCMðIkÞÞÞ

þdp reðvecðMGMCMðIkÞÞÞ,reðvecðMGMCMðIjÞÞÞ

¼X

v,v09udist

pvec CIi

v,v09u

� �,vec CIk

v,v09u

� �� 1=p

þX

v,v09udist

pvec CIk

v,v09u

� �,vec C

Ij

v,v09u

� �� 1=p

ð25Þ

Note that distðU,UÞ is a metric of Penv,v09uthat satisfies the triangle inequality. So, d1ðU,UÞ satisfies the triangle inequality. As p41, we can

shrink the sum described in Eq. (25) as

ZP

v,v09u dist vec CIi

v,v09u

� �,vec CIk

v,v09u

� �� þdist vec CIk

v,v09u

� �,vec C

Ij

v,v09u

� �� p� �1=p

Z

Xv,v09u

distp

vec CIi

v,v09u

� �,vec C

Ij

v,v09u

� �� 1=p

¼ dp reðvecðMGMCMðIiÞÞÞ,reðvecðMGMCMðIjÞÞÞ

¼ dpðvecðMGMCMðIiÞÞ,vecðMGMCMðIjÞÞÞ ð26Þ

by the Minkowski’s inequality, thus dpðU,UÞ (pZ 1) satisfies the triangle inequality. In Eqs. (25) and (26)

reðvecðMGMCMðIiÞÞÞ ¼ vec CIi

v,v09u

� �: uA 0,1,. . .,U�1f g; v,v0ð ÞA 0,1,. . .,V�1f g 0,1,. . .,V�1f g

� �ð27Þ

is the concatenated vectorized form whose component vec CIi

v,v09u

� �APenv,v09u

is the counterpart of vec CIi

v,v09u

� �APenv,v09u

under the mapping

re. reðvecðMGMCMðIjÞÞÞ and reðvecðMGMCMðIkÞÞÞ give the similar representations for image Ij and Ik.

So dpðU,UÞ is a metric on MGMCMK &.

References

[1] D.B. Sharon, E. Shimon, Visual recognition and categorization on the basis of

similarities to multiple class prototypes, International Journal of Computer

Vision 33 (3) (1997) 201–228.[2] P.M. Roth, M. Winter, Survey of Appearance-Based Methods for Object

Recognition, Technical Report ICG-TR-01/08, Graz, January 15, 2008.[3] R.W. Ehricha, J.P. Foithb, A view of texture topology and texture description,

Computer Graphics and Image Processing 8 (2) (1978) 174–202.[4] M. Tsaneva, D. Petkov, Recognition of objects on the Earth’s surface through

texture analysis of satellite images, in: Proceeding of the Third Scientific

Conference with International Participation Space, Ecology, Nanotechnology,

Safety, 27–29, June 2007, Varna, Bulgaria.[5] J. Daugman, Complete discrete 2D Gabor transform by neural networks for

image analysis and compression, IEEE Transactions on Acoustics, Speech, and

Signal Processing 36 (7) (1988) 1169–1179.[6] J.P. Jones, L.A. Palmer, An evaluation of the two-dimensional Gabor filter

model of simple receptive fields in cat striate cortex, Journal of Neurophysiol-

ogy 58 (6) (1987) 1233–1258.[7] L.G. Zhang, D. Tjondronegoro, Selecting, Optimizing and fusing ‘salient’ Gabor

features for facial expression recognition, Lecture Notes in Computer Science

5863 (1) (2009) 724–732.[8] C.J. Lee, S.D. Wang, Fingerprint feature extraction using Gabor filters, Electro-

nics Letters 35 (4) (1999) 288–290.[9] L Yu, Z.S. He, Q. Cao, Gabor texture representation method for face recognition

using the Gamma and generalized Gaussian models, Image and Vision

Computing 28 (1) (2010) 177–187.[10] C.J. Liu, Gabor feature based classification using the enhanced fisher linear

discriminant model for face recognition, IEEE Transactions on Image Proces-

sing 11 (4) (2002) 467–476.

[11] L.L. Shen, L. Bai, M. Fairhurst, Gabor wavelets and general discriminantanalysis for face identification and verification, Image and Vision Computing25 (5) (2007) 553–563.

[12] W. Mio, D. Badlyans, X.W. Liu, A computational approach to Fisher informa-tion geometry with applications to image analysis, in: Proceedings ofEMMCVPR 2005, St. Augustine, FL, USA, Lecture Notes in Computer Science3757 (2) (2005) 18–33.

[13] S. Amari, H. Nagaoka, Methods of Information Geometry, AMS, OxfordUniversity Press, 2000.

[14] K. Arwini, L. Del Riego, C.T.J. Dodson, Universal connection and curvature forstatistical manifold geometry, Houston Journal of Mathematics (2010) 1–13.

[15] R.M. Haralick, K. Shangmugam, I. Dinstein, Textural feature for imageclassification, IEEE Transactions on Systems, Man, and Cybernetics 3 (6)(1973) 610–621.

[16] A. Rosenfeld, C. -Y. Wang, A.Y. Wu, Multispectral texture, IEEE Transactions onSystems, Man and Cybernetics, Part—SMC 12 (1) (1982) 79–84.

[17] M. Werman, S. Peleg, A. Rosenfeld, A distance metric for multidimensionalhistograms, Computer Vision, Graphics, and Image Processing 32 (1985)328–336.

[18] C. Palm, Color texture classification by integrative co-occurrence matrices,Pattern Recognition 37 (5) (2004) 965–976.

[19] V. Arvis, C. Debain, M. Berducat, A. Benassi, Generalization of the cooccur-rence matrix for colour images, application to colour texture classification,Image Analysis and Stereology 23 (2004) 63–72.

[20] D. Muselet, L. Macaire, Combining color and spatial information for objectrecognition across illumination changes, Pattern Recognition Letters 28 (10)(2007) 1176–1185.

[21] G. Lebanon, Riemannian Geometry and Statistical Machine Learning, Ph.D.Thesis, Carnegie Mellon University, 2005.

[22] L. Wang, Y.P. Li, C.B. Wang, H.Z. Zhang, 2D Gaborface representation methodfor face recognition with ensemble and multichannel model, Image andVision Computing 26 (2008) 820–828.


[23] G. Amayeh, A. Tavakkoli, G. Bebis, Accurate and efficient computation ofGabor features in real-time applications, in: Proceedings of the 5th Inter-natonal Symposium on Visual Computing, 2009, pp. 243–252.

[24] D. Malerba, F. Esposito, M. Monopoli, Comparing dissimilarity measures forprobabilistic symbolic objects, in: Data Mining III, Series ManagementInformation Systems, vol. 6, WIT Press, Southampton, UK, 2002, pp. 31–40.

[25] N.A. Thacker, F.J. Aherne, P.I. Rockett, The Bhattacharyya metric as an absolutesimilarity measure for frequency coded data, Kybernetika 3 (44) (1998)363–368.

[26] Y. Tan, J. Nagamani, H. Lu, Modified Kolmogorov–Smirnov metric for shotboundary detection, Electronics Letters 39 (18) (2003) 1313–1315.

[27] Z. Zhang, Z. Li, M.S. Drew, Learning image similarities via probabilistic featurematching, in: Proceedings of ICIP, 2010, pp. 1857–1860.

[28] J.J. Collins, S.J. McKenna, S.G. Gong, Representation and invariance in facerecognition, in: Proceedings of the European Workshop on Parallel Modellingof Neural Operators, Faro, Portugal, 2007, pp. 9–20.

[29] D.N. Joanes, C.A. Gill, Comparing measures of sample skewness and kurtosis,Journal of the Royal Statistical Society (Series D): The Statistician 47 (1)(1998) 183–189.

[30] L. Le Cam, Theory of Point Estimation, 2nd ed., Springer, 1998.[31] D. Iakovidis, D. Maroulis, D. Bariamis, FPGA architecture for fast parallel

computation of co-occurrence matrices, Microprocessors and Microsystems31 (2) (2007) 160–165.

[32] Yale Face Database, Available from: /http://cvc.yale.edu/projects/yalefaces/yalefaces.htmlS.

[33] FERET Face Database, Available from: /http://www.nist.gov/humanid/feretS.[34] PolyU Palmprint Database, Available from: /http://www4.comp.polyu.edu.

hk/� icba/S.[35] COIL-20 Database, Available from: /http://www.cs.columbia.edu/CAVE/soft

ware/softlib/coil-20.phpS.[36] Zurich Buildings Database, Available from: /http://www.vision.ee.ethz.ch/

showroom/zubud/S.[37] I.T. Jolliffe, Principal Component Analysis, Springer, New York, 1989.[38] G.J. McLachlan, Discriminant Analysis and Statistical Pattern Recognition,

Wiley Interscience, 2004.[39] S. Mika, G. Ratsch, J. Weston, B. Scholkopf, K.R. Muller, Fisher discriminant

analysis with kernels, in: Y.-H. Hu, J. Larsen, E. Wilson, S. Douglas (Eds.),

Neural Networks for Signal Processing IX, IEEE, Piscataway, NJ, 1999,

pp. 41–48.[40] A. Materka, M. Strzelecki, Texture Analysis Methods—A Review, COST B11

Report, Institute of Electronics, Technical University of Lodz, Brussels, 1998.[41] A. Vadivel, Shamik Sural, A.K. Majumdar, An integrated color and intensity co-

occurrence matrix, Pattern Recognition Letters 28 (8) (2007) 974–983.[42] J. Dillon, Y. Mao, G. Lebanon, J. Zhang, Statistical translation, heat kernel and

expected distances, in: Proceedings of the 23rd Conference on Uncertainty in

Artificial Intelligence, vol. 2, 2007, pp. 93–100.[43] G.H. Hardy, J.E. Littlewood, G. Polya, Inequalities, Cambridge Mathematical

Library, second ed., Cambridge University Press, Cambridge, 1952.

Jian Zou is a Ph.D. student in the Department of Computer Science and Technology, Nanjing University of Science and Technology, Nanjing, China. His current researchinterests include pattern recognition, computer vision and information statistics.

Chuan-Cai Liu is a professor at Department of Computer Science and Technology, Nanjing University of Science and Technology, Nanjing, China. He is on the professionalcommittee of the Robot of Chinese Association of Artificial Intelligence. His scientific interests include computer vision, pattern recognition, and intelligent robot.

Yue Zhang is a Ph.D. student in the Department of Computer Science and Technology, Nanjing University of Science and Technology, Nanjing, China. His current researchinterests include pattern recognition, computer vision and stochastic process.

Gui-Fu Lu is a Ph.D. student in the Department of Computer Science and Technology, Nanjing University of Science and Technology, Nanjing, China. His current researchinterests include computer vision, pattern recognition and image processing.

http://cvc.yale.edu/projects/yalefaces/yalefaces.html

http://cvc.yale.edu/projects/yalefaces/yalefaces.html

http://www.nist.gov/humanid/feret

http://www4.comp.polyu.edu.hk/~icba/



http://www.cs.columbia.edu/CAVE/software/softlib/coil-20.php

http://www.cs.columbia.edu/CAVE/software/softlib/coil-20.php

http://www.vision.ee.ethz.ch/showroom/zubud/

http://www.vision.ee.ethz.ch/showroom/zubud/

Object recognition using Gabor co-occurrence similarity

Documents

Transcript of Object recognition using Gabor co-occurrence similarity