A Semi-supervised Clustering via Orthogonal Projection
-
Upload
angela-horton -
Category
Documents
-
view
220 -
download
0
Transcript of A Semi-supervised Clustering via Orthogonal Projection
8/7/2019 A Semi-supervised Clustering via Orthogonal Projection
http://slidepdf.com/reader/full/a-semi-supervised-clustering-via-orthogonal-projection 1/4
2009 ISECS International Colloquium on Computing, Communication, Control, and Management
978-1-4244-4246-1/09/$25.00 ©2009 IEEE CCCM 2009
A Semi-supervised Clustering via Orthogonal Projection
Cui PengHarbin Engineering University
Harbin 150001, [email protected]
Zhang Ru-boHarbin Engineering University
Harbin 150001, [email protected]
Abstract — As dimensionality is very high, image feature space
is usually complex. For effectively processing this space,
technology of dimensionality reduction is widely used. Semi-
supervised clustering incorporates limited information into
unsupervised clustering in order to improve clustering
performance. However, many existing semi-supervised
clustering methods can not be used to handle high-dimensional
sparse data. To solve this problem, we proposed a semi-supervised fuzzy clustering method via constrained orthogonal
projection. With results of experiments on different datasets, it
shows the method has good clustering performance for
handling high dimensionality data.
Keywords-dimension reduction; clustering; projection; semi-
supervised learning
I. I NTRODUCTION
In recent years, because of fast extension of featureinformation and volume of image data, many tasks inmultimedia processing have become increasinglychallenging Dimensionality reduction techniques have been
proposed to uncover the underlying low dimensionalstructures of the high-dimensional image space [1].Theseefforts have proved to be very useful in image retrieval,classification and clustering. There are a number of dimensionality reduction techniques in the literature. One of the classical methods is Principal Component Analysis(PCA) [2], which minimizes the information loss in thereduction process. One of the disadvantages of PCA is thatit likely distorts the local structures of a dataset. LocalityPreserving Projection (LPP) [3-4] encodes the localneighborhood structure into a similarity matrix and derives alinear manifold embedding as the optimal approximation tothis matrix, but LPP, on the other hand, may overlook theglobal structures.
Recently, semi-supervised learning has gained muchattention [6-10], which leverages domain knowledgerepresented in the form of pairwise constraints. Variousreduction techniques have been developed to utilize this formof knowledge[11-12].
The constrained FLD defines the embedding basedsolely on must-link constraints. Semi-SupervisedDimensionality Reduction (SSDR) [13], preserves theintrinsic global covariance structure of the data whileexploiting both constraints.
As many semi-supervised clustering methods are baseddensity or distance, they are difficult to handle high-dimensional data. Thus, reduced feature must be added intosemi-supervised clustering process. We proposeCOPFC(Constrained Orthogonal Projection FuzzyClustering)method to solve this problem.
II. COPEC METHOD FRAMEWORK
Figure 1. COPFC framework
Figure 1 shows the framework of the COPFC method.Given a set of instances and a set of supervision in the formof must-link constraints CML={( xi, x j)}, ( xi, x j) where ( xi, x j)must reside in the same cluster, and cannot-link constraints,C CL={( xi, x j)}, ( xi, x j) where should be in the differentclusters, the COPFC method is composed of three steps. Inthe first step, a preprocessing method is exploited to reduce
the unlabelled instances and pairwise constraints accordingto the transitivity property of must-link constraints. In thesecond step, a constraint-guided Orthogonal projectionmethod, called COPFC proj, is used to project the originaldata into a low-dimensional space. Finally, we apply a semi-supervised fuzzy clustering algorithm, called COPFC fuzzy,
produce the clustering results on the projected low-dimensional dataset.
356
8/7/2019 A Semi-supervised Clustering via Orthogonal Projection
http://slidepdf.com/reader/full/a-semi-supervised-clustering-via-orthogonal-projection 2/4
III. COPFC PROJ - A CONSTRAINED ORTHOGONAL
PROJECTION METHOD
In a typical image retrieval system, each image isrepresented by an m -dimensional feature vector x whose jthvalue is denoted as x j. During the retrieval process, the user is allowed to mark several images with must-links whichmatch his query interest, and also to indicate thoseapparently irrelevant with cannot-links. COPFC proj is alinear method and depends on a set of l axes pi. For a givenimage x, its embedding coordinates are the projection of x onto l axes, which are
1, 1
m x
i j ij j P x p i l
== ≤ ≤∑ .
As the images in the set ML are considered mutuallysimilar to each other, they should be kept compactly in thenew space. In other words, the distances among them should
be kept small, while the irrelevant images in CL are to bemapped far apart from those in ML as much as possible. The
above two criteria can be formally stated as follows:2
1
min ( )l
x y
i i
x ML y ML i
P P ∈ ∈ =
−∑ ∑ ∑ (1)
2
1
max ( )l
x y
i i
x ML y CL i
P P ∈ ∈ =
−∑ ∑ ∑ (2)
Intuitively, equation (1) forces the embedding to havethe image points in reside in a small local neighborhood inthe new feature space, and equation (2) reflects our objective to prevent the points in and close together after theembedding. To construct a salient embedding, COPFC proj combines these two criteria and finds the axis in the one-by-
one fashion which optimizes the following objective,2min ( ) x y
i i
x ML y ML
P P ∈ ∈
−∑ ∑ (3)
subject to 2min ( ) 1 x y
i i
x ML y CL
P P ∈ ∈
− =∑ ∑ (4)
1 2 3 1... 0T T T T
i i i i i p p p p p p p p −= = = = = (5)
T is the transpose of a vector. The choice of constant 1 onthe right hand side of equation (4) is rather arbitrary as anyother value (except 0) would not cause any substantialchanges in the embedding produced. The constraint inequation (5) is to force all the axes to be mutuallyorthogonal. Equations (3) and (4) are implicit functions of
the axes pi , which should be re-written in the explicit forms.First, we introduce the necessary notations. For a given set X of image points, the mean of X is an -dimensional columnvector M ( X ) , whose i th component is
1( )i i
x X
M X x X ∈
= ∑ (6)
and its covariance matrix C ( X ) is an m×m matrix:
1( ) ( ) ( )
ij i j i j
x X
C X x x M X M X X ∈
⎛ ⎞= −⎜ ⎟
⎝ ⎠∑ (7)
For two sets X and Y , define an m×m matrix M ( X ,Y ) , in
which
( , ) ( ( ) ( ))( ( ) ( ))T M X Y M X M Y M X M Y = − − .Accordingly, we can
rewrite equation (3) as follows:22( ) 2 ( ( )) x y T
i i i i
x ML y ML
P P p ML C ML p∈ ∈
− =∑ ∑ (8)
Similarly, we can rewrite equation (4) as follows:2( ) ( ( ( ) ( )
( , )))
x y T
i i i
x ML y CL
i
P P p ML CL C X C Y
M X Y p
∈ ∈
− = +
+
∑ ∑ (9)
Hence, the problem to be solved is min T
i i p Ap , subject
to1 11, ... 0T T T
i i i i i p Bp p p p p −= = = = , where
22 ( ), ( ( ) ( ) ( , )) A ML C ML B ML CL C X C Y M X Y = = + + .
It is easy to see that both A and B are symmetric and positive semi-definite. The above problem can be solvedusing the Lagrange Multipliers method. Below we discussthe procedure to obtain the optimal axes.
The first projection axis is the eigenvector of thegeneralized eigen-problem Ap1= λ Bp1 corresponding to thesmallest eigenvalue. After that, we compute the remainingaxes one by one in the following fashion. Suppose wealready obtained the first (k -1) axes, define:
( 1)
1 2 1
( 1) ( 1) 1 ( 1)
[ , ,..., ],
[ ]
k
k
k k T k
P p p p
Q P B P
−−
− − − −
=
=(10)
Then the k th axis pk is the eigenvector associated with thesmallest eigenvalue for the eigen-problem:
1 ( 1) ( 1) 1 ( 1) 1( [ ] [ ] )k k k T
k k I B P Q P B Ap pλ − − − − − −− = (11)
We adopt the above procedure to determine the optimal l
orthogonal projection axes, which can preserve the metricstructure of the image space for the given relevancefeedback information. The new coordinates for the imagedata points can then be derived accordingly.
IV. COPFC FUZZY SEMI-SUPERVISED CLUSTERING
COPFC fuzzy is new search-based semi-supervisedclustering algorithm that allows the constraints to help theclustering process towards an appropriate partition. To thisend, we define an objective function that takes into account both the feature-based similarity between data points andthe pairwise constraints [14-16]. Let ML be the set of must-
link constraints, i.e.( xi, x j)∈ML implies that xi and x j should
be assigned to the same cluster, and CL the set of cannot-
link constraints,( xi, x j)∈CL xi and x j should be assigned todifferent clusters. we can write the objective function
COPFC fuzzy must minimize::
2 2
1 1
( , ) 1 1, ( , ) 1
2
1 1
( , ) ( ) ( , )
( )
i j i j
C N
ik i k
k i
C C C
ik jl ik jk
ML k l l k x x CL k
C N
ik
k i
J V U u d
u u u u
u
μ
λ
γ
= =
∈ = = ≠ ∈ =
= =
=
⎛ ⎞+ +⎜ ⎟⎜ ⎟
⎝ ⎠
⎡ ⎤− ⎢ ⎥
⎣ ⎦
∑∑
∑ ∑ ∑ ∑ ∑
∑ ∑
x x
x
(12)
357
8/7/2019 A Semi-supervised Clustering via Orthogonal Projection
http://slidepdf.com/reader/full/a-semi-supervised-clustering-via-orthogonal-projection 3/4
The first term in equation (12) is the sum of squareddistances to the prototypes weighted by constrainedmemberships (Fuzzy C-Means objective function). Thisterm reinforces the compactness of the clusters.
The second component in equation (12) is composed of:the cost of violating the pairwise must-link constraints; thecost of violating the pairwise cannot-link constraints. Thisterm is weighted by λ, a constant factor that specifies therelative importance of the supervision.
The third component in equation (12) is the sum of thesquares of the cardinalities of the clusters controls thecompetition between clusters. It is weighted by γ.
When the parameters are well chosen, the final partitionwill minimize the sum of intra-cluster distances, while partitioning the data set into the smallest number of clusterssuch that the specified constraints are respected as well as possible.
V. EXPERIMENTAL EVALUATION
A. Dataset selection and evaluation criterion
We performed experiments on COREL image databaseand 2 datasets from UCI as follows:
(1) We selected 1500 images from COREL imagedatabase. They were divided into 15 sufficiently distinctclasses of 100 images each. In our experiments, each imagewas represented by a 37-dimensional vector, which included3 types of features extracted for the image. We comparedCOPFC proj algorithm against PCA and SSDR. The performance of each technique was evaluated under various
amounts of domain knowledge and different reduceddimensionalities. In different scenarios, after thedimensionality reduction, the Kmeans was applied toclassify the test images.
(2) Iris and Wine datasets from UCI repository. Irisdataset contains three classes of 50 instances each and 4numerical attributes; Wine dataset contains three classes 178instances, and 13 numerical attributes. The simplicity andlow dimension of this data set also allows us to display theconstraints that are actually selected. To evaluate clustering performance of COPFC fuzzy, we compared COPFC fuzzy
algorithm against Kmeans and PCKmeans algorithm.(3) Evaluation criterion. In this paper, we use Corrected
Rand Index (CRI) as the clustering validation measure.
CRI( 1) / 2
A C
n n C
−=
× − −(13)
where A is number of instance pairs which assigned cluster meets with actual cluster; n is number of all instances in thedataset, then ( 1) / 2n n× − is number all instance pairs in
dataset; C is number of all constraints.For each dataset, we run each experiment 20 times. To
study the effect of constraints 100 constraints are generatedrandomly for test set. Each point on the learning curve is anaverage of results over 20 runs.
B. The effectiveness of COPFC
In figure 2, we use three different dimensionalityreduction methods (COPFC proj, PCA, SSDR) for originalimages. Dimensionalities are reduced 15, 20 respectively.
For data of reduced dimension, we used Kmeans for clustering. The curves in figure 2 show clustering performance of PCA method is independent of number of constraints. However clustering performance of SSDR hadslight changes. For COPFC proj, clustering performanceobtained largely improvement with increasing number of constraints. When there are small amount of constraints,clustering performance of COPFC proj is worst in theremethods. In general, COPFC proj outperforms PCA andSSDR for reducing dimensionalities.
10 20 30 40 50 60 7080 90100 Number of constraints
0.6
0.65
0.7
0.75
COPFC proj
SSDR PCA
0.8
0.85
Dimension=20
(a) (b)
Figure 2. Clustering performance with different number of constraints
Figure 3 shows clustering performance of three methodson Iris and Wine datasets. For all datasets, COPFC fuzzy allobtained best performance. In three methods, clustering performance of Kmeans is worst. Though clustering performance of PCKmeans is effectively improved, it still is
worse than that of COPFC fuzzy.
10 20 304
0
5
0607080
9
0
10
0Numberofconstraints
C R I
0.8
0.83
0.86
0.8
9
0.950.9
2
0.98
1.01
COPFC
PCKmeans
Kmeans
C R I
(a) Iris dataset (b) Wine dataset
Figure 3. Clustering performance on UCI datasets
VI. CONCLUSION AND FUTURE WORK We propose a semi-supervised fuzzy clustering via
orthogonal projection to handle high-dimensional sparsedata in image feature space. The method reducesdimensionalities of images via orthogonal projection, andclusters data of reduced dimensionalities by constrainedfuzzy clustering algorithm.
There are several potential directions for future research.First, we are interested in automatically identifying the rightnumber for the reduced dimensionality based on the background knowledge other than providing a pre-specifiedvalue. Second, we plan to explore alternative methods toemploy supervision in guiding the unsupervised clustering.
358
8/7/2019 A Semi-supervised Clustering via Orthogonal Projection
http://slidepdf.com/reader/full/a-semi-supervised-clustering-via-orthogonal-projection 4/4
R EFERENCES
[1] X. Yang, H. Fu and H. Zha. “Semi-Supervised Nonlinear Dimensionality Reduction”. In Proc. of the 23rdIntl. Conf. onMachine Learning , 2006.
[2] C. Ding and X. He. “K-Means Clustering via Principal ComponentAnalysis”. In Proc. of the 21st Intl. Conf. on Machine Learning , 2004.
[3] D. Cai, and X. F. He. “Orthogonal Locality Preserving Projection”. In Proc. of the 28th Intl. ACM SIGIR Conf. on Research and Development in information Retrieval ,2005.
[4] X. F. He and P. Niyogi. “Locality Preserving Projections”. Neural Information Processing Systems . NIPS ’03, 2003.
[5] H. Cheng, K. Hua, and K. Vu. “Semi-Supervised DimensionalityReduction in Image Feature Space.Technical Report”, University of Central Florida, 2007.
[6] Wagstaff. K and Cardie C. “Clustering with instance—levelconstraints”. Proc. of the 17th Int’1 Conf. on Machine Learning . San
Francisco: Morgan Kaufmann Publishers, 2000.[7] S. Basu. “Semi-supervised Clustering: Probabilistic Models,
Algorithms and Experiments”. Austin: The University of Texas, 2005
[8] S. Basu , A. Banerjee and R.J. Mooney, “Semi-supervised clustering by seeding”. Proceedings of the 19th Int’l Conf. on Machine Learning(ICML 2002). 19−26
[9] Wagstaff K, Cardie C and Rogers S. “Constrained K-means clusteringwith background knowledge”. Proc. of the 18th Int’l Conf. onMachine Learning . Williamstown: Williams College, MorganKaufmann Publishers, 2001. 577−584.
[10] Klein D, Kamvar SD andManning CD. “From instance-Levelconstraints to space-level constraints: Making the most of prior knowledge in data clustering”. In Proc. of the 19th Int’l Conf. onMachine Learning . University of New South Wales. Sydney: MorganKaufmann Publishers, 2002. 307−314.
[11]
Hertz T, Shental N and Bar-Hillel A. “Enhancing image and videoretrieval: Learning via equivalence constraint”. Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. Madison: IEEEComputer Society, 2003. pp.668−674.
[12] T. Deselaers, D. Keysers, and H. Ney. “Features for Image Retrieval – a Quantitative Comparison”. In Pattern Recognition, 26th DAGM Symposium, 2004.
[13] D. Zhang, Z. H. Zhou, and S. Chen. “Semi-SupervisedDimensionality Reduction”. In Proc. of the 2007 SIAM Intl.Conf. on Data Mining. SDM ’07 , 2007.
[14] N. Grira, M. Crucianu, N. Boujemaa. “Semi-supervised fuzzyclustering with pairwise-constrained competitive agglomeration”, in: IEEE International Conference on Fuzzy Systems , 2005.
[15] H. Frigui, R. Krishnapuram. “Clustering by competitiveagglomeration”, Pattern Recognition 30 (7) ,1997 1109–1119.
[16] M. Bilenko, R.J. Mooney. “Adaptive duplicate detection usinglearnable string similarity measures”. in: International Conference on Knowledge Discovery and Data Mining , Washington, DC, 2003, pp.39–48.
359