Nonlinear Adaptive Distance Metric Learning for Clustering

1Intelligent Database Systems Lab

國立雲林科技大學National Yunlin University of Science and Technology

Nonlinear Adaptive Distance Metric Learning for Clustering

Jianhui Chen, Zheng Zhao, Jieping Ye, Huan Liu

KDD, 2007

Reported by Wen-Chung Liao, 2010/01/19

2

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Outlines Motivation Objective ADAPTIVE DISTANCE METRIC LEARNING:

THE LINEAR CASE ADAPTIVE DISTANCE METRIC LEARNING:

THE NONLINEAR CASE NAML Experiment Conclusions Comments

In distance metric learning, the goal is to achieve better

compactness (reduced dimensionality) separability (inter-cluster distance)

on the data, in comparison with usual distance metrics, such as Euclidean distance.

3


N.Y.U.S.T.

I. M.Motivation

Traditionally, dimensionality reduction and clustering are applied in two separate steps. ─ If distance metric learning (via dimensionality reductio

n) and clustering can be performed together, the cluster separability in the data can be better maximized in the dimensionality-reduced space.

Many real-world applications may involve data with nonlinear and complex patterns.─ Kernel methods

4


N.Y.U.S.T.

I. M.Objectives Propose NAML for simultaneous distance metric le

arning and clustering. NAML

─ first maps the data to a high-dimensional space through a kernel function;

─ next applies a linear projection to find a low-dimensional manifold;

─ and then perform clustering in the low-dimensional space. The key idea of NAML is to integrate

─ kernel learning, ─ dimensionality reduction, ─ clustering

in a joint framework so that the separability of the data is maximized in the low-dimensional space.

5


N.Y.U.S.T.

I. M.ADAPTIVE DISTANCE METRIC LEARNING: THE LINEAR CASE

Data set: Data matrix:

Mahalanobis distance measure:

Linear transformation W:

≡MaxMin

Cluster indicator matrix Weighted cluster indicator matrix

6


N.Y.U.S.T.

I. M.ADAPTIVE DISTANCE METRIC LEARNING: THE NONLINEAR CASE

Kernel method:Hilbert space (feature space)nonlinear mapping

Symmetric kernel function K:kernel Gram matrix G:

For a given kernel function K, the nonlinear adaptive metric learning problem can be formulated as :

(inner product)

ψK (X) : the data matrix in the feature space

convex combination of p kernel matrices

7


N.Y.U.S.T.


3.1 The computation of L for given Q and G

≡

8


N.Y.U.S.T.


3.2 The computation of Q for given L and G

Max≡

9


N.Y.U.S.T.


3.3 The computation of G for given Q and L

Min≡

MOSEK

10


N.Y.U.S.T.

I. M.NAML

O(pk3n3)

11


N.Y.U.S.T.

I. M.Experiment

• K-means algorithm as the baseline for comparison• three representative unsupervised distance metric learning algor

ithms: Principle Component Analysis (PCA), Local Linear Embeddin

g (LLE), and Laplacian Eigenmap (Leigs)

Performance Measures

ci the obtained cluster indicator yi the true class label

C the set of cluster indicators Y be the set of class labels

12


N.Y.U.S.T.

I. M.ExperimentsExperimental Results (10 RBF kernels for NAML)

13


N.Y.U.S.T.

I. M.

Sensitivity Study: the effect of the input kernels and the regularization parameter λ

NAML provides a way to learn from multiple input kernelsand generate a metric, with which an unsupervised learningalgorithm, like K-means, is more likely to perform as wellas with the best input kernel

K-means using 10 kernels for NAML

the quality of the initial kernel is low

14


N.Y.U.S.T.

I. M.ExperimentsA series of different λ values ranging from 10−8 to 105

•λ value in the range of [10−4, 102] is helpful in most cases

15


N.Y.U.S.T.

I. M.Conclusion NAML The joint kernel learning, metric learning, and clustering can b

e formulated as a trace maximization problem, which can be solved iteratively in an EM framework.

NAML is effective in learning a good distance metric and improving the clustering performance.

multiple biological data, e.g., amino acid sequences, hydropathy profiles, and gene expression data.

A future work is to study how to combine a set of pre-specified Laplacian matrices to achieve better performance in spectral clustering.

16


N.Y.U.S.T.

I. M.Comments

Advantage─ A joint framework for kernel learning, distance

metric learning and clustering Shortage

Applications─ clustering

Nonlinear Adaptive Distance Metric Learning for Clustering

Documents

Transcript of Nonlinear Adaptive Distance Metric Learning for Clustering