Image clustering by hyper-graph regularized non-negative matrix factorization

Image clustering by hyper-graph regularized non-negativematrix factorization

Kun Zeng a, Jun Yu b,c,n, Cuihua Li a, Jane You c, Taisong Jin a

a Computer Science Department, School of Information Science and Engineering, Xiamen University, Xiamen, Chinab School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou 310018, Chinac Department of Computing, The Hong Kong polytechnic University, Kowloon, Hong Kong

a r t i c l e i n f o

Article history:Received 14 September 2013Received in revised form30 November 2013Accepted 26 January 2014Communicated by Mingli SongAvailable online 17 February 2014

Keywords:Non-negative matrix factorizationHyper-graph laplacianImage clusteringDimension reductionManifold regularization

a b s t r a c t

Image clustering is a critical step for the applications of content-based image retrieval, image annotationand other high-level image processing. To achieve these tasks, it is essential to obtain proper representa-tion of the images. Non-negative Matrix Factorization (NMF) learns a part-based representation of the data,which is in accordance with how the brain recognizes objects. Due to its psychological and physiologicalinterpretation, NMF has been successfully applied in a wide range of application such as patternrecognition, image processing and computer vision. On the other hand, manifold learning methodsdiscover intrinsic geometrical structure of the high dimension data space. Incorporating manifoldregularizer to standard NMF framework leads to novel performance. In this paper, we proposed a novelalgorithm, call Hyper-graph regularized Non-negative Matrix Factorization (HNMF) for this purpose. HNMFcaptures intrinsic geometrical structure by constructing a hyper-graph instead of a simple graph. Hyper-graph model considers high-order relationship of samples and outperforms simple graph model. Empiricalexperiments demonstrate the effectiveness of the proposed algorithm in comparison to the state-of-the-artalgorithms, especially some related works based on NMF.

& 2014 Elsevier B.V. All rights reserved.

1. Introduction

In order to improve the performance of applications such asimage retrieval, image annotation and image indexing, image cluster-ing is used to better represent and browse images. Finding a suitabledata representation is a critical problem in many data analysis tasks[1,2,5–11]. Various researchers have long sought appropriate datarepresentation which typically makes intrinsic structure in the dataexplicit so further process, such as clustering [5,6,10,11,40,43] andclassification [11,39,41,42,44] can be applied. One of the most populardata representation techniques is matrix factorization. There are anumber of matrix factorizations including Singular Value Decom-position (SVD), Non-negative Matrix Factorization (NMF) [5] etc.

NMF learns a part-based representation of the data, which is inaccordance with how the brain recognizes objects [12–14] andgets lots of attention after popularized by Lee and Seung [5] due totheir contributing work of a simple effective procedure. NMF seekstwo non-negative matrices whose product approximates theoriginal matrix best. The non-negativity constraint leads NMF toa part-based and sparse representation of the object because itonly allows additive combination of the original data.

Manifold learning methods such as ISOMAP [15], Locally LinearEmbedding (LLE) [16], Laplacian Eigenmap (LE) [17], Local TangentSpace alignment(LTSA) [37], Discriminative Locality Alignment (DLA)[36] etc. are proposed to detect underlying intrinsic structure of dataaccording to local invariance assumption that nearby points in originalspace are likely to have similar embeddings. Isomap [15] preservesglobal geodesic distances of all pairs of measurements. LLE [16]assumes that one sample can be represented by linear combinationof its neighbors and the relationship between the sample and itsneighbors are preserved after dimensionality reduction. LE [17]assumes that the representation of two nearby data points in theintrinsic structure are also close to each other in the embedding spaceand preserves proximity relationships by constructing an undirectedweighted graph which indicates neighbor relations of pairwise mea-surements. LTSA [37] exploits the local tangent information and alignsit to provide a global coordinate. DLA [36] imposes discriminativeinformation in the part optimization stage and preserves the discri-minative ability. High performance can be achieved if geometricalstructure is exploited and local invariance is considered. Laplacianregularization [6] and Hessian regularization [33–35] are usually usedto discovering the geometrical structure. In [34,35], Hessian regular-ization is presented for image annotation. Hessian regularized supportvector machine is presented in [34] for image annotation on the cloudand shows the effectiveness for large-scale image annotation. Multi-view Hessian regularization (mHR) [35] optimally combines multiple

Contents lists available at ScienceDirect

journal homepage: www.elsevier.com/locate/neucom

Neurocomputing

http://dx.doi.org/10.1016/j.neucom.2014.01.0430925-2312 & 2014 Elsevier B.V. All rights reserved.

n Corresponding author.E-mail address: [email protected] (J. Yu).

Neurocomputing 138 (2014) 209–217

www.sciencedirect.com/science/journal/09252312

www.elsevier.com/locate/neucom

http://dx.doi.org/10.1016/j.neucom.2014.01.043



http://crossmark.crossref.org/dialog/?doi=10.1016/j.neucom.2014.01.043&domain=pdf



mailto:[email protected]


Hessian regularization (HR) and steers the classification function thatvaries linearly along the data manifold. In [6], a graph regularized NMF(GNMF) approach is proposed to encode the geometrical informationof the data space. GNMF constructs a k-nearest neighbor based simplegraph to consider the local invariance. However, the high-orderrelationship among samples is neglected in graph-based learningmethods, which only consider the pairwise relationship betweentwo samples. This problem has been solved by Hypergraph learning[24]. Unlike a graph that has an edge between two vertices, a set ofvertices is connected by a hyperedge in a hypergraph. Modeling thehigh-order relationship among samples will significantly enhanceclassification performance. Yu et al. [26] proposed a novel hypergraphlearning method by adaptively coordinating the weights of hyper-edges. Hong et al. [3] incorporated multiple features into hypergraphlearning. In [38], the hypergraph learning was applied into the field ofweb image reranking.

Inspired by the advantages of hyper-graph learning and NMF, wepropose a novel algorithm named Hyper-graph Regularized Non-negative Matrix Factorization (HNMF), which incorporates hyper-graph regularizer into the standard NMF framework. HNMF showsbetter performance because hyper-graph regularizer exploits theintrinsic manifold structure of the sample data according to LE [17].We encode the geometrical information of the data space byconstructing a hyper-graph instead of a simple graph. HNMF findsthe data representations that are not only part-based, but also sparse.In HNMF, first, we construct a hypergraph by initiating each hyper-edge in according to its k-nearest neighbors of the correspondingvertex. The weight of each hyperedge is calculated by accumulatingthe affinity measure between the corresponding vertex and itsk-nearest neighbors. Second, the hypergraph Laplacian regularizercan be obtained and is added to the standard NMF framework. Last, aniterative multiplicative updating algorithm is used to solve this newproblem and part-based representation is then obtained.

The contributions of this paper are:

1) A novel algorithm named HNMF is proposed for image representa-tion. To explore intrinsic geometrical structure of the data space byconsidering the high-order information, we incorporate hyper-graph regularizer into the standard NMF framework. Hence, ouralgorithm is particularly applicable when the data is sampled froma submanifold embedded in high dimensional ambient space.

2) Our proposal is formulated as a convex optimization problemand an iterative multiplicative updating algorithm is proposedto solve it. We provide a proof to demonstrate the convergenceof the algorithm.

3) We conduct comprehensive experiments to empirically analyzeour algorithm on three image databases. The experimentalresults demonstrate that our algorithm outperforms othermethods including K-means, PCA [18], NMF [5] and GNMF [6,7].

The rest of this paper is organized as follows: in Section 2, weintroduce related work including NMF and Hyper-graph learning. InSection 3, we propose the novel algorithm HNMF in details. Extensiveexperimental results on clustering are presented in Section 4. Weconclude our research work and propose future work in Section 5.

2. Related work

2.1. Non-negative matrix factorization (NMF)

NMF is a matrix factorization method that seeks two non-negativematrices whose product approximates the original matrix best.

Given a matrix X¼ ½x1;x2;⋯; xN � ARM�N , each column xt is asample vector. NMF aims to factorize the original matrix X intotwo matrices B¼ ½b1;b2;⋯;bK �ARM�K and F¼ ½f1; f2;⋯; fN�ARK�N ,

which are non-negative. In other words, each element of thosematrices is greater than or equals to zero. The matrix factorizationcan be described as

X� BF ð1Þ

The commonly used cost function that quantifies the quality ofapproximation is the square of the Frobenius norm of the twomatrices difference:

O¼ ‖X�BF‖2F ð2Þ

From Eq. (1) we can obtain

xi �∑jf jibj; ð3Þ

where xi is the i-th column of X, bj is the j-th column of B, and fji isthe (j,i)-th element of F. Thus, each sample data vector can beapproximated by a linear combination of the columns of B,weighed by the elements of F. Each column vector of F can beviewed as a new data representation of the corresponding sampledata vector of X. Usually we have K⪡M and K⪡N, so fi can beregarded as a dimension-reduction representation of xi withrespect to the new basis B. If the basis can discover the intrinsicstructure of data, a good approximation can be achieved.

In real world, many data types are non-negative, such as imageand gene data. Negative value is physical meaningless for thesedata. NMF incorporates non-negative constraint and learns part-based representation. Because of its excellent performance, NMF ispopular and is widely used in many fields, such as machinelearning, computer vision and image processing.

A number of NMF [6,7,10,11,19–22,30–32] based methods havebeen proposed in recent years. NMF with Sparseness Constraint(NMFSC) [19] imposed sparseness constraint to make sure ofdiscovering part-based representations. L1/2-NMF [20] incorpo-rated L1/2 sparseness constraint to NMF framework and providedaccurate results in hyper spectral unmixing. In [21], a methodcalled Local NMF (LNMF) was proposed by imposing localizationconstraint to learn spatially localized representation. Non-negativeLocal Coordinate Factorization (NLCF) [22] added a local coordi-nate constraint which assumed that each data can be representedby a linear combination of only a few nearby basis vectors.Constrained NMF (CNMF) [10] imposed the label information toobjective function as hard constraints. GNMF [6,7] encoded thegeometrical information of the data space by constructing a simplegraph. It considers only the pair-wise relationship between twosamples, but ignores the relationship in a higher order which maybe critical to data analysis. Modeling the high-order relationshipamong samples will lead to performance improvement.

2.2. Hyper-graph

The concept of hyper-graph learning [3,4,23–26] is inspired fromthe theory of simple graph. In a simple graph, an edge is connectedwith two vertices and the weight of the edge only indicates therelationship between two corresponding vertices. While in reality,the information among more sample data is critical. The edges ofhyper-graph can connect more than two nodes.

Hyper-graph G¼(V, E, W) is composed of the vertex set V andthe hyper-edge set E. Each hyper-edge e is a subset of V. W is adiagonal matrix which indicates the weights of the hyper-edges.The weight of hyper-edge e is denoted as w(e). The incidencematrix H of G is defined as followed:

Hðv; eÞ ¼1; if vAe

0; if v=2e

(ð4Þ

K. Zeng et al. / Neurocomputing 138 (2014) 209–217210

The degree of a vertice v is define as

dðvÞ ¼ ∑feAEjvA eg

wðeÞ ¼ ∑eAE

wðeÞHðv; eÞ ð5Þ

The degree of a hyper-edge e is defined as

δðeÞ ¼ jej ¼ ∑vAV

Hðv; eÞ ð6Þ

We denote Dv as a diagonal matrix whose elements corresponding tothe vertice degree and De as a diagonal matrix whose elementscorresponding to the hyper-edge degree respectively. According to[24], the unnormalized hyper-graph Laplacian matrix Lhyper ¼ ½lij� canbe calculated as

Lhyper ¼Dv�S; ð7Þwhere S¼HWD�1

e HT . For convenience, Table 1 lists importantnotations used in the paper.

3. Hyper-graph regularized non-negative matrix factorization

In this section, we introduce details of Hyper-graph regularizedNon-negative Matrix Factorization (HNMF) algorithm.

3.1. Objective function

We could recall that NMF tries to seek a set of basis vectors thatis used to approximate the data. Each data xi can be represented asa new dimension-reduction coordinate vector fi with respected tobasis matrix B. If the basis matrix can respect the intrinsicgeometry, good approximation performance can be achieved.Manifold learning methods find the underlying manifold structureon the basis of the local invariant assumption that the representa-tion of two nearby data points in the intrinsic structure are alsoclose to each other in embedding space.

Laplacian eigenmap (LE) [17] is one of the classical manifoldalgorithms, but this algorithm is based on simple graph whichonly considers the relation between two vertices. The relationshipamong three or more vertices is taken into consideration in hyper-graph. In our new algorithm HNMF, hyper-graph Laplacian reg-ularized term is added to the objective function.

By incorporating the hyper-graph Laplacian regularized terminto the standard NMF objective function, we get the followingminimization problem:

O¼ ‖X�BF‖2F þα

2∑eAE

∑ði;jÞAe

wðeÞδðeÞ ‖f i�f j‖2 ð8Þ

which can be rewritten as follows:

O¼ ‖X�BF‖2F þαTrðFLhyperFT Þ ð9ÞTrðUÞ denotes trace of the matrix. The objective function is notconvex in both B and F together. In the following, we describe aniterative updating algorithm to obtain the local optima of theobjective function.

It should be noticed that we construct the hyper-graph byinitiating value for each hyperedge's weight according to the rulesused in [27]. First, a jV j � jV j affinity matrix A is calculated

Aij ¼ exp �‖vi�vj‖2

s2

!ð10Þ

where s is the average distance among all vertices. Second, thehyper-edge ei in hyper-graph is constructed as knnðvi; pÞ [ fvig,where viAV and knnðvi; pÞ stands for a node set whose elements isthe p nearest neighbors of vi. At last, the initial weight for eachhyper-edge is calculated

Wi ¼ ∑vj A ei

Aij ð11Þ

3.2. Updating roles

Notice that ‖A‖2F ¼ TrðAAT Þ, we have

O¼ TrððX�BFÞðX�BFÞT ÞþαTrðFLhyperFT Þ¼ TrðXXT �2BFXT þBFFTBT ÞþαTrðFLhyperFT Þ ð12Þ

Let ψ ik and ϕkj be the Langrange multiplier for constraints bikZ0and f kjZ0, respectively. We define matrix Ψ¼ ½ψ ik� and Φ¼ ½ϕkj�,then the Langrange function Q is

Q ¼ TrðXXT �2BFXT þBFFTBT ÞþαTrðFLhyperFT ÞþTrðΨBT ÞþTrðΦFT Þð13Þ

The partial derivatives of Q with respect to B and F are

∂Q∂B

¼ �2XFT þ2BFFT þΨ ð14Þ

∂Q∂F

¼ �2BTXþ2BTBFþ2αFLhyperþΦ ð15Þ

Using the KKT condition ψ ikbik ¼ 0 and ϕkjf kj ¼ 0, we get thefollowing equations for bik and f kj:

�ðXFT ÞikbikþðBFFT Þikbik ¼ 0 ð16Þ

ðBTBFÞkjf kjþðαFDvÞkjf kj ¼ ðBTXÞkjf kjþðαFSÞkjf kj: ð17Þ

The equations lead to the following update rules:

bik’bikðXFT ÞikðBFFT Þik

ð18Þ

f kj’f kjðBTXþαFSÞkj

ðBTBFþαFDvÞkjð19Þ

3.3. Connection with the gradient descent method

The update rules in Eqs. (18) and (19) can be seen as specialcases of gradient descent algorithm with an automatic stepparameter selection. For our problem, gradient descent algorithmleads to the updating rules as follows:

bik’bikþηik∂O∂bik

; f kj’f kjþδkj∂O∂f kj

; ð20Þ

the ηik and δkj are the step size parameters.

Table 1Important notations used in this paper.

Notations Descriptions

G¼ ðV ; E;WÞ The representation of a hypergraph, where V and E indicatethe sets of vertices and hyperedges respectively

Dv The diagonal matrix of the vertex degreesDe The diagonal matrix of the hyperedge degreesH The incidence matrix for the hypergraphW The diagonal weight matrix and its (i,i)-th element is the

weight of the i-thhyperedgeLhyper The constructed hyper-graph Laplacian matrixδðeÞ The degree of the hyperedge ewðeÞ The weight of hyperedge eX¼ ½x1 ;x2;⋯; xN � Sample data matrix, each column xi is a sample vectorB¼ ½b1 ;b2 ;⋯;bK � The basis matrix of NMF, each column bi is a basisF¼ ½f1; f2 ;⋯; fN � The new representation matrix of NMF, each column f i is a

new representation of xi

K The number of factorsM The number of featuresN The number of sample pointsp The number of nearest neighbors to construct a hyper-edge

K. Zeng et al. / Neurocomputing 138 (2014) 209–217 211

Let ηik ¼ �ðbik=2ðBFFT ÞikÞ, we have

bikþηik∂O∂bik

¼ bik�bik

2ðBFFT Þik∂O∂bik

¼ bik�bik

2ðBFFT Þikð�2XFT þ2BFFT Þik ¼ bik

ðXFT ÞikðBFFT Þik

ð21Þ

Similarly, Let δkj ¼ �ðf kj=2ðBTBFþαFDvÞkjÞ, we have

f kjþδkj∂O∂f kj

¼ f kj�f kj

2ðBTBFþαFDvÞkj∂O∂f kj

¼ f kj�f kj

2ðBTBFþαFDvÞkjð�2BTXþ2BTBFþ2αFLhyperÞkj

¼ f kjðBTXþαFSÞkj

ðBTBFþαFDvÞkjð22Þ

It is concluded that the update rules in Eq. (18) and Eq. (19) aregradient descent algorithm essentially. The advantage of theupdate rules is the guarantee of non-negativity of B and F.

The proof of its convergence via auxiliary function approach[28] is the same as the convergence proof presented in [7]. Anexperiment is designed to show the convergence property of theproposed algorithms. The result can be seen in Fig. 1 that all threealgorithms e.g. NMF, GNMF and HNMF converge fast within 200iterations.

3.4. Computational complexity analysis

We discuss the computational cost of our proposed algorithmin comparison to standard NMF in this subsection. The arithmeticoperations are used to measure the computational cost instead of

0 100 200 300 400 5002

4

6

8

10

12

14

16

18

20

Iteration#

obje

ctiv

e fu

nctio

n va

lue

NMF

0 100 200 300 400 5000

1

2

3

4

5

6

7

8

9

Iteration#

obje

ctiv

e fu

nctio

n va

lue

GNMF

0 100 200 300 400 5004

6

8

10

12

14

16

18

Iteration#

obje

ctiv

e fu

nctio

n va

lue

HNMF

0 100 200 300 400 50020

40

60

80

100

120

140

Iteration#

obje

ctiv

e fu

nctio

n va

lue

NMF

0 100 200 300 400 5000

0.5

1

1.5

2

2.5

3

3.5x 104

x 104

x 104

Iteration#

obje

ctiv

e fu

nctio

n va

lue

GNMF

0 100 200 300 400 50020

40

60

80

100

120

140

160

Iteration#ob

ject

ive

func

tion

valu

e

HNMF

0 100 200 300 400 5004

6

8

10

12

14

16

18

Iteration#

obje

ctiv

e fu

nctio

n va

lue

NMF

0 100 200 300 400 5000

0.5

1

1.5

2

2.5

Iteration#

obje

ctiv

e fu

nctio

n va

lue

GNMF

0 100 200 300 400 5004

6

8

10

12

14

16

Iteration#

obje

ctiv

e fu

nctio

n va

lue

HNMF

Fig. 1. Convergence curves of NMF, GNMF and HNMF. (a) USPS, (b) ORL and (c) Yale.


big O notation because that the notation is not precise enough todifferentiate among the complexities of NMF, GNMF and HNMF.

The four operation abbreviations used in this paper are listed inTable 2. Table 3 counts the computational operations of eachmatrices' multiplication. Based on update rules, we also count thearithmetic operations of each iteration in HNMF and summarizethe result in Table 4. The computational complexity of the threealgorithms is all OðMNKÞ.

4. Experimental results

In this section, our proposed algorithm HNMF is used to clusterdata in order to show its effectiveness compared with K-means,PCA, NMF and GNMF.

4.1. Evaluation metrics

We adopt two metrics to measure the clustering performance[6,29]. The first metric is accuracy (AC), which is used to evaluatethe percentage of correct labels we get. The second one is thenormalized mutual information (NMI) which is used to measurehow similar the true labels and the labels obtained are. These twometrics are the standard measures widely used for clustering. Theclassification accuracy (AC) is defined as follows:

AC ¼∑ni ¼ 1δðgndi;mapðriÞÞ

nð23Þ

where n is the number of sample images the database contained,gndi is the label provided by the database, ri is the cluster label of thesample xi we obtain by applying our algorithm, δðx; yÞ is the deltafunction andmapðriÞ is the permutation mapping function that mapseach cluster label ri to the equivalent label from the database.

Mutual information (MI) is widely used in clustering applica-tions. It measures how similar two cluster sets are. Given two setsof clusters C and C0, their mutual information (MI) is defined as

MIðC;C 0Þ ¼ ∑ci AC;c0j AC0

pðci; c0jÞlogpðci; c0jÞpðciÞpðc0jÞ

ð24Þ

where pðciÞ; pðc0jÞ is the probabilities that a sample x selectedarbitrarily from the data set belongs to the cluster C and C0,respectively, pðci; c0jÞ is the joint probability that x belongs to thetwo clusters at the same time. Let HðCÞ and HðC 0Þ denotes theentropies of C and C0 respectively, we get the NMI metric

NMIðC;C 0Þ ¼ MIðC;C0ÞmaxðHðCÞ;HðC0ÞÞ ð25Þ

4.2. Data sets

In our experiments, we use three image databases whoseimportant statistics are summarized below (see also Table 5).Example images from these image databases are shown in Fig. 1.

1) USPS: it consists of 9298 Gy images, which are handwrittendigits of “0” through “9”. The scale of each image is 16�16. Werandomly select 100 images for each of the digits and run ouralgorithm on the total 1000 images.

2) ORL: it contains 10 different gray face images for each of the 40human subjects. These images are taken in different time, withvaried lighting, facial expressions, and facial details. Theseimages are resized to 32�32.

3) Yale: there are 11 gray scale images for each of 15 individuals,one per different facial expression or configuration. We alsoresized them to 32�32. Fig. 2.

4.3. Performance evaluation and comparisons

To show the clustering performance, we compare our algo-rithm with other four clustering algorithms. The algorithms thatwe evaluated are listed below:

1) HNMF: our proposed Hyper-graph Regularized Non-negativeMatrix Factorization encodes the intrinsic geometrical

Table 2Abbreviations for reporting operation counts.

Abbreviations Description

fladd A floating-point additionflmlt A floating-point multiplicationfldiv A floating-point divisionflam An addition and multiplication

Table 3Computational operation counts for each matrices' multiplication.

fladd flmlt

XFT MNK MNK

BFFT ðMþNÞK2 ðMþNÞK2

BTX MNK MNK

BTBF ðMþNÞK2 ðMþNÞK2

FS NpK NpKFDv NK NK

N: the number of sample points; M: the number of features; K: the number offactors.

Table 4Computational operation counts for each Iteration in NMF, GNMF and HNMF.

fladd flmlt fldiv Overall

NMF 2MNKþ2ðMþNÞK2 2MNKþ2ðMþNÞK2þðMþNÞK ðMþNÞK OðMNKÞGNMF 2MNKþ2ðMþNÞK2þNðpþ3ÞK 2MNKþ2ðMþNÞK2þðMþNÞKþNðpþ1ÞK ðMþNÞK OðMNKÞHNMF 2MNKþ2ðMþNÞK2þNðpþ3ÞK 2MNKþ2ðMþNÞK2þðMþNÞKþNðpþ1ÞK ðMþNÞK OðMNKÞ

N: the number of sample points; M: the number of features; K: the number of factors.P: the number of nearest neighbors to construct an edge or hyper-edge.

Table 5Statistics of three real world data sets.

Data sets #Samples #Features #Classes

USPS 1000 256 10ORL 400 1024 40Yale 165 1024 15


information by constructing a hyper-graph into matrix factor-ization. In HNMF, the number of nearest neighbors to constructa hyper-edge is set to 10 and the regularization parameter is setto 100. The parameter selection will be discussed in the latersection.

2) GNMF: graph regularized Non-negative Matrix Factorizationwhich encodes the geometrical information by constructing asimple graph instead. In GNMF, the number of nearest neigh-bors is set to 5 and the regularization parameter is set to 100respectively according to [6,7].

3) NMF: the original Non-negative Matrix Factorization [5] whichincorporates non-negative constraint only and tries to decom-pose the original matrix to two non-negative matrices whoseproduct is as close to the original matrix as possible.

4) PCA: Principle Component Analysis is one of the unsuperviseddimension reduction algorithms. It maximizes the mutualinformation between original high-dimensional data and pro-jected low-dimensional data.

5) K means: the canonical K-means clustering method. It auto-matically clusters data samples according to which clustercentroid each sample lies close to.

4.4. Clustering result

Tables 6–8 show the clustering results on the USPS, ORL andYale data sets, respectively. We conduct 20 test runs on differentrandomly chosen clusters for each given cluster number and thetables report the mean and standard error of the performance.

It is observed that HNMF outperforms the other clusteringmethods regardless of the data sets. It demonstrates that HNMFcan learn a better compact representation by leveraging the powerof the part-based representation and high-order hyper-graphLaplacian regularization.

Although both GNMF and HNMF consider the geometrical struc-ture of the data, HNMF always results in better performance. It showsthat hyper-graph Laplacian regularizer discovers the intrinsic geo-metrical structure better than graph Laplacian regularizer.

4.5. Parameters selection

There are two essential parameters in our proposed HNMFalgorithm. They are the number of nodes one hyper-edge connectsp and the regularization parameter α. In our experiments, we use

Fig. 2. Image samples from data sets. From top to bottom, images in each row are from USPS, ORL and Yale data sets, respectively.

Table 6Clustering performance on USPS.

K Accuracy (%) Normalized mutual information (%)

K-means PCA NMF GNMF HNMF Kmeans PCA NMF GNMF HNMF

4 83.4874.51 82.46713 82.7770.02 75.86717.3 94.3079.85 63.9271.33 73.7177.72 62.1670.02 74.35713.7 88.7477.265 83.5376.02 77.3477.56 78.2970.03 81.35711.92 84.51712.3 66.4272.05 59.3073.23 56.9470.05 74.876.94 78.4476.386 72.6678.32 68.4377.39 71.5473.84 78.6479.54 80.1978.74 57.7873.52 64.7475.02 51.1371.82 74.4375.40 77.7974.227 66.8275.89 73.7977.56 62.5674.88 77.1277.31 82.10711.0 59.9072.48 65.4972.34 54.0772.57 76.7574.01 76.8976.488 75.5775.75 73.6874.91 66.8774.13 71.9876.59 78.0175.05 65.0571.87 67.0871.71 57.1771.78 75.1873.14 76.0371.739 69.9273.64 73.975.73 68.6575.07 67.5778.85 75.5477.07 65.7571.49 67.1172.30 59.9871.86 71.7374.85 76.973.84

10 71.2774.01 68.7275.58 64.9374.38 69.9077.53 75.0775.74 64.2671.53 64.5972.09 62.3372.36 72.4974.66 74.5772.14

Table 7Clustering performance on ORL.


K-means PCA NMF GNMF HNMF Kmeans PCA NMF GNMF HNMF

5 71.9079.41 71.80711.3 72.40710.9 69.5078.98 75.807150 72.4177.34 67.5478.31 73.8078.43 70.6176.24 75.02712.110 62.7578.87 69.3079.55 66.1076.85 66.3077.47 73.5579.81 70.1777.06 75.9875.46 77.8874.58 76.5574.88 78.2676.3015 58.7075.35 59.8775.44 62.7376.69 57.9775.40 63.0074.93 72.5373.37 71.1373.53 75.0774.66 73.4072.60 75.2173.0420 59.9775.50 61.3875.17 61.4074.45 56.1876.16 66.5074.15 75.5472.92 75.7773.04 75.6672.41 73.2074.36 77.1572.6025 58.9475.24 56.9674.50 59.6473.97 59.1474.68 61.5673.41 74.4672.56 72.7572.02 74.5572.40 74.7172.94 75.0172.1630 58.5374.31 56.5773.35 58.1573.30 58.2273.50 61.0573.87 74.5272.19 73.9071.79 74.2771.85 76.5172.04 77.2671.9335 55.6773.39 56.6773.03 57.5473.30 55.3171.43 58.4072.37 73.7971.66 74.9471.51 74.7571.27 73.4572.44 75.3171.2340 55.8472.75 55.6073.23 53.2672.91 56.5571.96 57.0972.56 74.4471.44 74.2771.56 73.2171.19 74.5471.14 75.0671.15


grid search to choose these two parameters. We obtained theappropriate parameter values when the algorithm achieved bestperformance with respect to p varied from 2 to 20 and α varied in{1,10,100,1000,10000}.

Figs. 3 and 4 demonstrate how the clustering performances ofHNMF and GNMF change when the parameters vary. These figuresshow the effectiveness of our proposed method.

From Fig. 3, we can get that HNMF and GNMF both show stableclustering performance with respect to α which varies in{1,10,100,1000,10000}. GNMF achieves best performance whenα¼100 on all these databases, while HNMF achieves best perfor-mance when α¼100 on ORL and Yale databases and α¼1000 onUSPS database. When the value α is too small, the proportion ofthe hyper-graph regularization term of the objective function isnegligible. Minimizing the objective function does not meanexploiting the intrinsic manifold structure. While the value α istoo large, the reconstruction error is also too large that the

representation after dimensionality reduction is not appropriateto represent the original sample data. Therefore, the curves inFig. 3 first rise to a peak and then fall immediately.

We can see from Fig. 4 that HNMF outperforms other algorithmssuch as K-means, PCA, NMF and GNMF on these databases. In thisexperiment, the parameter α is set to 100 on ORL and Yale databasesand set to 1000 on USPS and p varies from 2 to 20. It can be seen thatHNMF shows more stable clustering performance with respect to pthan GNMF. Fig. 4 suggests HNMF performs best when p¼6 andGNMF performs better when p is small. When p is too small, eachhyper-graph cannot group enough sample data which is “close” toeach other so the intrinsic structure cannot be discovered. While whenp is too large, each hyper-graph not only includes inlier samples, butalso outliers which lead to the incorrect clustering decision. Hence, thecurves in Fig. 4 first ascend to a high-point and then fall smoothly.It also can be concluded that GNMF fails to discover the intrinsicstructure on those databases and its clustering performance is poor,

Table 8Clustering performance on Yale.


Kmeans PCA NMF GNMF HNMF Kmeans PCA NMF GNMF HNMF

5 54.7377.86 54.9173.26 47.5579.08 55.8277.58 62.9179.16 45.3176.11 43.3475.02 37.8977.04 43.4575.04 54.1977.2610 42.7374.99 43.7374.04 41.1473.17 47.3672.48 50.6475.50 41.4974.68 46.3873.74 40.6672.52 47.5872.60 49.3674.9611 45.5074.81 43.5573.78 44.4274.81 40.6674.68 48.4374.43 47.873.83 45.0173.91 45.0173.29 41.5973.45 49.1773.9812 39.4374.02 42.1274.42 39.2874.77 43.4872.48 46.2173.30 44.4173.12 46.3573.41 41.2373.55 47.4771.82 46.3672.9213 39.5573.65 39.0974.13 39.5873.00 37.5573.21 44.0273.76 43.5773.06 43.7973.67 44.0773.00 41.6472.58 46.1272.6814 42.0873.21 41.2073.71 41.2372.96 41.6672.24 46.4074.19 48.3972.67 47.5571.98 47.2371.78 46.2771.63 50.2873.7015 39.6772.70 38.9174.13 34.0072.93 39.3672.71 42.7973.94 46.3872.74 46.3873.63 41.9472.82 44.3672.35 49.2173.08

0 1 2 3 465

70

75

80

log α

Acc

urac

y(%

)

HNMFGNMF

0 1 2 3 450

51

52

53

54

55

56

57

58

59

60

log α

Acc

urac

y(%

)

HNMFGNMF

0 1 2 3 435

36

37

38

39

40

41

42

43

44

45

log α

Acc

urac

y(%

)

HNMFGNMF

Fig. 3. The performance of HNMF and GNMF vs. parameter α. (a) USPS, (b) ORL and (c) Yale.

5 10 15 2055

60

65

70

75

80

p

Acc

urac

y(%

)

HNMFGNMFNMFPCAKmeans

5 10 15 2025

30

35

40

45

50

55

60

p

Acc

urac

y(%

)


5 10 15 20

26

28

30

32

34

36

38

40

42

44

46

p

Acc

urac

y(%

)


Fig. 4. The performance of HNMF and GNMF vs. parameter p. (a) USPS, (b) ORL and (c) Yale.


while HNMF performs better than others because of its more respectto the intrinsic structure.

5. Conclusion and future work

In this paper, we present a new matrix factorization methodcalled Hyper-graph regularized Non-negative Matrix Factorization(HNMF). HNMF captures the intrinsic geometrical structure byadding Hyper-graph Laplacian regularizer to NMF framework andproduces more discriminating representations than the ordinaryNMF approaches. Experimental results on three image data setshave demonstrated the effectiveness of HNMF in comparison tothe state-of-the-art methods.

In future, we would like to extend our work to explore morefeasible constraints to the NMF framework to generate better part-based representations. Also adaptive hyper-graph Laplacian reg-ularizer will be taken into our consideration.

Acknowledgment

This work has been supported by the Grant of the NationalNatural Science Foundation of China (No. 61100104), the Hong KongPolytechnic University for the research project (G-YK77), the Programfor New Century Excellent Talents in University (No. NCET-12–0323),the Hong Kong Scholar Programme (No. XJ2013038 and G-YZ40), theNatural Science Foundation of Fujian Province of China (2012J01287),the National Defense Basic Scientific Research Program of Chinaunder Grant Bnnn0110155, National Natural Science Foundation ofChina, under Grant number 61373077, the Specialized Research Fundfor the Doctoral Program of Higher Education of China (No.20110121110020) and the National Defense Science and TechnologyKey Laboratory Foundation.

References

[1] M. Song, D. Tao, C. Chen, J. Bu, Y. Yang, Color-to-gray based on chance ofhappening preservation, Neurocomputing 119 (2013) 222–231.

[2] X. Deng, X. Liu, M. Song, J. Cheng, J. Bu, C. Chen, LF-EME: local features withelastic manifold embedding for human action recognition, Neurocomputing99 (2013) 144–153.

[3] C. Hong, J. Yu, J. Li, X. Chen, Multi-view hypergraph learning by patchalignment framework, Neurocomputing 118 (2013) 79–86.

[4] C. Wang, J. Yu, D. Tao, High-level attributes modeling for indoor scenesclassification, Neurocomputing 121 (2013) 337–343.

[5] D. Lee, H. Seung, Learning the parts of objects by non-negative matrixfactorization, Nature 401 (1999) 788–791.

[6] D. Cai, X. He, X. Wu, J. Han, Non-negative matrix factorization on manifold, in:Proceedings of the Eighth IEEE International Conference on Data Mining,2008, pp. 63–72.

[7] D. Cai, X. He, J. Han, T. Huang, Graph regularized non-negative matrixfactorization for data representation, IEEE Trans. Pattern Anal. Mach. Intell.33 (8) (2011) 1548–1560.

[8] X. He, P. Niyogi, Locality preserving projections, Advances in Neural Informa-tion Processing Systems, vol. 16, MIT Press, 2003.

[9] X. He, S. Yan, Y. Hu, P. Niyogi, H. Zhang, Face recognition using Laplacian faces,IEEE Trans. Pattern Anal. Mach. Intell. 27 (3) (2005) 328–340.

[10] H. Liu, Z. Wu, X. Li, D. Cai, T. Huang, Constrained non-negative matrixfactorization for image representation, IEEE Trans. Pattern Anal. Mach. Intell.34 (7) (2012) 1299–1311.

[11] Z. Li, J. Liu, H. Lu, Structure preserving non-negative matrix factorization fordimensionality reduction, Comput. Vis. Image Underst. 117 (9) (2013) 1175–1189.

[12] S.E. Palmer, Hierarchical structure in perceptual representation, Cogn. Psychol.9 (1977) 441–474.

[13] E. Wachsmuth, M. Oram, D. Perrett, Recognition of objects and their compo-nent parts: responses of single units in the temporal cortex of the macaque,Cereb. Cortex 4 (1994) 509–522.

[14] N. Logothetis, D. Sheinberg, Visual object recognition, Ann. Rev. Neurosci. 19(1996) 577–621.

[15] J. Tenenbaum, V. de Silva, J. Langford, A global geometric framework fornonlinear dimensionality reduction, Science 290 (5500) (2000) 2319–2323.

[16] S. Roweis, L. Saul, Nonlinear dimensionality reduction by locally linearembedding, Science 290 (5500) (2000) 2323–2326.

[17] M. Belkin, P. Niyogi, Laplacian eigenmaps and spectral techniques for embed-ding and clustering, Advances in Neural Information Processing Systems, vol. 14,MIT Press, Cambridge, MA (2001) 585–591.

[18] I.T. Jolliffe, Principal Component Analysis, Springer-Verlag, NewYork, 1989.[19] P.O. Hoyer, P. Dayan, Non-negative matrix factorization with sparseness

constraints, J. Mach. Learn. Res. 5 (2004) 1457–1469.[20] Y. Qian, S. Jia, J. Zhou, A. Robles-Kelly, Hyperspectral unmixing via L1/2

sparsity-constained nonnegative matrix factorization, IEEE Trans. Geosci.Remote Sens. 49 (11) (2011) 4282–4297.

[21] F. Tao, S. Li, H. Shum, Local non-negative matrix factorization as a visualrepresentation, in: Proceedings of the Second International Conference onDevelopment and learning, 2002, pp. 178–183.

[22] Y. Chen, J. Zhang, D. Cai, W. Liu, X. He, Nonnegative local coordinatefactorization for image representation, IEEE Trans. Image Process. 22 (3) (2013)969–979.

[23] S. Agarwal, K. Branson, S. Belongie, Higher order learning with graphs, in:Proceedings of the 23th International Conference on Machine Learning,Pittsbrugh, PA, 2006, pp. 17–24.

[24] D. Zhou, J. Huang, B. Scholkopf, Learning with Hypergraphs: Clustering,Classification, and Embedding, MIT Press, Cambridge, MA (2006) 1601–1608(Advances in Neural Information Processing Systems (NIPS).

[25] Y. Huan, Q. Liu, F. Lv, Y. Gong, D. Metaxax, Unsupervised image categorizationby hypergraph partition, IEEE Trans. Pattern Anal. Mach. Intell. 33 (6) (2011)1266–1273.

[26] J. Yu, D. Tao, M. Wang, Adaptive hypergraph learning and its application inimage classification, IEEE Trans. Image Process. 21 (7) (2012) 3262–3272.

[27] Y. Huang, Q. Liu, S. Zhang, D. Metaxas, Image retrieval via probabilistic hypergraphranking, in: Proceedings of the International Conference on Computer Vision andPattern Recognition, San Francisco, CA, 2010, pp. 3376–3383.

[28] D. Leeand, H. Seung, Algorithms for non-negative matrix factorization, Adv.Neural Inf. Process. Syst. 13 (2001) 556–562.

[29] W. Xu, X. Liu, Y. Gong, Document Clustering Based on Non-NegativeMatrix Factorization, in: Proceedings of the Annual International ACM SIGIRConference on Research and Development on Information Retrieval, 2003,pp. 267–273.

[30] N. Guan, D. Tao, Z. Luo, B. Yuan, Manifold regularized discriminative non-negative matrix factorization with fast gradient descent, IEEE Trans. ImageProcess. 20 (7) (2011) 2030–2048.

[31] N. Guan, D. Tao, Z. Luo, B. Yuan, Non-negative patch alignment framework,IEEE Trans. Neural Netw. 22 (8) (2011) 1218–1230.

[32] N. Guan, D. Tao, Z. Luo, B. Yuan, NeNMF: an optimal gradient method for non-negativematrix factorization, IEEE Trans. Signal Process. 60 (6) (2012) (2882–1230).

[33] W. Liu, D. Tao, L. Cheng, Y. Tang, Multiview Hessian discriminative sparsecoding for image annotation, Comput. Vis. Image Underst. 118 (2014) 50–60.

[34] W. Liu, D. Tao, Hessian regularized support vector machines for mobile imageannotation on the cloud, IEEE Trans. Multimed. 15 (4) (2013) 833–844.

[35] W. Liu, D. Tao, Multiview Hessian regularization for image annotation, IEEETrans. Image Process. 22 (7) (2013) 2676–2687.

[36] T. Zhang, D. Tao, X. Li, J. Yang, Patch alignment for dimensionality reduction,IEEE Trans. Knowl. Data Eng. 21 (9) (2009) 1299–1313.

[37] Z. Zhang, H. Zha, Principal manifolds and nonlinear dimensionality reductionvia tangent space alignment, SIAM J. Scientific Comput. 26 (1) (2004) 313–338.

[38] J. Yu, Y. Rui, B. Chen, Exploiting click constraints and multiview features forimage reranking, IEEE Trans. Multimed. 16 (1) (2014) 159–168.

[39] J. Yu, M. Wang, D. Tao, Semi-supervised multiview distance metric learning forcartoon synthesis, IEEE Trans. Image Process. 21 (11) (2012) 4636–4648.

[40] J. Yu, D. Liu, D. Tao, H. Seah, Complex object correspondence constructionin 2danimation, IEEE Trans. Image Process. 20 (11) (2011) 3257–3269.

[41] J. Yu, D. Tao, D. Liu, H. Seah, On combining multiview features for cartooncharacter retrieval and clip synthesis, IEEE Trans. Syst., Man, Cybern., Part B 42(5) (2012) 1413–1427.

[42] J. Yu, D. Tao, Y. Rui, J. Cheng, Pairwise constraints based multiview featuresfusion for scene classification, Pattern Recogn. 46 (2) (2013) 483–496.

[43] J. Yu, D. Tao, J. Li, J. Cheng, Semantic preserving distance metric learning andapplications, Inf. Sci. (2014), http://dx.doi.org/10.1016/j.ins.2014.01.025.

[44] J. Yu, D. Tao, Modern Machine Learning Techniques and Their Applications inCartoon Animation Research, Wiley-IEEE Press, March 2013.

Dr. Jun Yu received his B.Eng. and Ph.D. from ZhejiangUniversity, Zhejiang, China. He is currently a Professorwith the School of Computer Science and Technology,Hangzhou Dianzi University. He was an AssociateProfessor with School of Information Science andTechnology, Xiamen University. From 2009 to 2011, heworked in Singapore Nanyang Technological University.From 2012 to 2013, he was a visiting researcher inMicrosoft Research Asia (MSRA). Over the past years,his research interests include multimedia analysis,machine learning and image processing. He hasauthored and co-authored more than 50 scientificarticles. He has (co-)chaired for several special sessions,

invited sessions, and workshops. He served as a program committee member orreviewer top conferences and prestigious journals. He is a Professional Member ofthe IEEE, ACM and CCF.


http://refhub.elsevier.com/S0925-2312(14)00256-2/sbref1




















































































http://dx.doi.org/10.1016/j.ins.2014.01.025





Jin taisong received the Ph.D. degree in computerscience from Beijing institute of technology, China, in2007. He is currently an assistant Professor in theDepartment of Computer Science, Xiamen University.His research interests include machine learning andcomputer vision.

Kun Zeng received the B.S. and M.S. degree fromComputer Science Department, Xiamen University,in2005 and 2008, respectively. He is currently pursuingthe Ph.D. degree with School of Information Science andTechnology, Xiamen University. His current researchinterests include computer vision,image processing andmachine learning.

Jane You is currently a full-professor in the Departmentof Computing at the Hong Kong Polytechnic University.She also serves as the Chair of Department ResearchCommittee (DRC) and the Associate Head of theDepartment (Research). Prof. You obtained her B.Eng.in Electronic Engineering from Xi'an Jiaotong Univer-sity in 1986 and Ph.D. in Computer Science from LaTrobe University, Australia in 1992. She was a lecturerat the University of South Australia and senior lecturer(tenured) at Griffith University from 1993 till 2002.Prof. You was awarded French Foreign Ministry Inter-national Postdoctoral Fellowship in 1993 and workedon the project on real-time object recognition and

tracking at Universite Paris XI. She also obtained the Academic Certificate issuedby French Education Ministry in 1994.

Prof. Jane You has worked extensively in the fields of image processing, medicalimaging, computer-aided detection/diagnosis, pattern recognition. So far, she hasmore than 200 research papers published. Prof. You is also a key team member forthree successful patents (one HK patent, two US patents). Her recent work onretinal imaging has won a Special Prize and Gold Medal with Jury's Commendationat the 39th International Exhibition of Inventions of Geneva (April 2011) and thesecond place in an international competition (SPIE Medical Imaging, 2009 Retino-pathy Online Challenge (ROC, 2009)). Prof. You serves as an associate editor ofPattern Recognition and other journals.

Li Cuihua received the B.S. degree in 1983 fromShandong University , the M.S. degree in computationalmathematics in 1989 and Ph.D. in automatic controltheory and engineering in 1999 from Xi'an JiaotongUniversity. He was an associate professor in the Schoolof Science at Xi'an Jiaotong University before 1999. Heis now with the Department of Computer Science atXiamen University. He is a Member of editorial board,both Chinese Science Bulletin and Journal of XiamenUniversity natural science. He is a Distinguished Mem-ber of China Computer Federation (CCF). His researchinterests include computer vision, Video and imageprocessing, and Super-resolution image reconstruction

algorithms.


Image clustering by hyper-graph regularized non-negative matrix factorization

Documents

Transcript of Image clustering by hyper-graph regularized non-negative matrix factorization