Clustering Detecting margin regions

1
Clustering Detecting margin regions Max-margin Clustering: Detecting Margins from Projections of Points on Lines Raghuraman Gopalan 1 , and Jagan Sankaranarayanan 2 1 Center for Automation Research, University of Maryland, College Park, MD USA; 2 NEC Labs, Cupertino, CA USA Given an unlabelled set of points forming k clusters, find a grouping with maximum separating margin among the clusters Prior work: (Mostly) Establish feedback between different label proposals, and run a supervised classifier on it Goal: To understand the relation between data points and margin regions by analyzing projections of data on lines Two-cluster Problem Proposition 1 SI* exists ONLY on line segments in margin region that are perpendicular to the separating hyperplane Such line segments directly Assumptions Linearly separable clusters Kernel trick for non- linear case No outliers in data (max margin exist only between clusters) Enforce global cluster balance Multi-cluster Problem Location information of projected points (SI) alone is insufficient to detect margins The Role of Distance of Projection i i m M min Defn: D min of a line interval is the minimum distance of projection of points in that interval. No outlier assumption: Max margin between points within a cluster Proposition 2 For line intervals in margin region, perpendicular to the separating hyperplane Proposition 3 For line intervals inside a cluster of length more than M m Proposition 4 An interval with SI having no projected points with distance of projection less than D min* , can lie only outside a cluster; where i i Int D min min min * 2 / max min m Int M D CL i i D SI min ] [ min* i i D min min* A Pair-wise Similarity Measure for Clustering ) ] [ max exp( ) , ( : D Int D j i SI D x x f ij f(x i ,x j )=1, iff x i =x j f(x i ,x j )<<1, iff x i and x j are from different clusters, and Int ij is perpendicular to their separating hyperplane Max-margin Clustering Algorithm Draw lines between all pairs of points Estimate the probability of presence of margins between a pair of points x i and x j by computing f(x i ,x j ) Perform global clustering using f between all point-pairs Results Summary Obtaining statistics of location and distance of projection of points that are specific to line segments in margin regions (Prop. 1 to 4) A pair-wise similarity measure to perform clustering, which avoids some optimization- related challenges prevalent in most existing methods References 1. F. De la Torre, and T. Kanade, “Discriminative cluster analysis”, ICML, pp. 241-248, 2006. ([8] in table) 2. K. Zhang, I.W. Tsang, and J.T. Kwok, “Maximum margin clustering made practical”, IEEE Trans. Neural Networks, 20(4), pp. 583-596, 2009. ([31] in table) Problem Statement

description

Max-margin Clustering: Detecting Margins from Projections of Points on Lines. Raghuraman Gopalan 1 , and Jagan Sankaranarayanan 2 1 Center for Automation Research, University of Maryland, College Park, MD USA; 2 NEC Labs, Cupertino, CA USA . Multi-cluster Problem. - PowerPoint PPT Presentation

Transcript of Clustering Detecting margin regions

Page 1: Clustering Detecting margin regions

Clustering Detecting margin regions

Max-margin Clustering: Detecting Margins from Projections of Points on LinesRaghuraman Gopalan1, and Jagan Sankaranarayanan2

1Center for Automation Research, University of Maryland, College Park, MD USA; 2NEC Labs, Cupertino, CA USA

Given an unlabelled set of points forming k clusters, find a grouping with maximum separating margin among the clusters

Prior work: (Mostly) Establish feedback between different label proposals, and run a supervised classifier on it

Goal: To understand the relation between data points and margin regions by analyzing projections of data on lines

Two-cluster Problem

Proposition 1SI* exists ONLY on line segments in margin region that are perpendicular to the separating hyperplane

Such line segments directly provide cluster groupings

AssumptionsLinearly separable clusters

Kernel trick for non-linear caseNo outliers in data (max margin exist only between clusters)

Enforce global cluster balance

Multi-cluster Problem

Location information of projected points (SI) alone is insufficient to detect margins

The Role of Distance of Projection

iimM min

Defn: Dmin of a line interval is the minimum distance of projection of points in that interval.

No outlier assumption: Max margin between points within a cluster

Proposition 2For line intervals in margin region, perpendicular to the separating hyperplane

Proposition 3For line intervals inside a cluster of length more than Mm

Proposition 4An interval with SI having no projected points with distance of projection less than Dmin*, can lie only outside a cluster; where

iiIntD minmin min*

2/max min mInt

MDCL

iiDSI min][min*

iiD minmin*

A Pair-wise Similarity Measure for Clustering)][maxexp(),(

: DIntDji SIDxxfij

f(xi,xj)=1, iff xi=xj

f(xi,xj)<<1, iff xi and xj are from different clusters, and Intij is perpendicular to their separating hyperplane

Max-margin Clustering Algorithm Draw lines between all pairs of points Estimate the probability of presence of margins between a pair of points xi and xj by computing f(xi,xj) Perform global clustering using f between all point-pairs

Results

Summary

Obtaining statistics of location and distance of projection of points that are specific to line segments in margin regions (Prop. 1 to 4) A pair-wise similarity measure to perform clustering, which avoids some optimization-related challenges prevalent in most existing methods

References1. F. De la Torre, and T. Kanade, “Discriminative cluster analysis”, ICML, pp. 241-248, 2006. ([8] in table)2. K. Zhang, I.W. Tsang, and J.T. Kwok, “Maximum margin clustering made practical”, IEEE Trans.

Neural Networks, 20(4), pp. 583-596, 2009. ([31] in table)

Problem Statement