Support Vector Clustering Algorithm presentation by : Jialiang Wu.

20
Support Vector Clustering Algorithm presentation by : Jialiang Wu
  • date post

    22-Dec-2015
  • Category

    Documents

  • view

    223
  • download

    3

Transcript of Support Vector Clustering Algorithm presentation by : Jialiang Wu.

Support Vector Clustering Algorithm

presentation by : Jialiang Wu

Reference paper and code website

• Support Vector Clustering by Asa Ben-Hur, David Horn, Hava T. Siegelmann, and Vladimir Vapnik.

• www.cs.tau.ac.il/~borens/course/ml/cluster.html by Elhanan Borenstein, Ofer,and Orit.

Clustering

Clustering algorithm groups data according to the distance between points.

• Points are close to each other will be allocated to the same cluster.

• Clustering is most effective is data has some geometric structure.

• Outliers may cause unjust increase in cluster size or a fault clustering.

Support Vector Machine(SVM)

• SVM maps the data from data space to a higher dimensional feature space through a suitable nonlinear mapping.

• Data from two categories can always be separated by a hyper-plane.

Support Vector Machine(SVM)Main Idea: 1.Much of the geometry of the data in the embedding space (relative positions) is contained in all pairwise inner product. We can work in that space by specifying an inner product

function between points in it. An explicit mapping is not necessary.

2. In many cases, the inner product have simple kernel representation and therefore can be easily evaluated.

Support Vector Clustering(SVC)

• SVC map data from data space to higher dimensional feature space using a Gaussian kernel.

• In feature space we look for the smallest sphere the encloses the image of the data.

• When the sphere is mapped back to data space, it forms a set of contours, which enclose the data points.

Support Vector Clustering(SVC)

• The clustering level is controlled by: 1) q---the width parameter of Gaussian kernel: q increase number of disconnected contour increase, number of clusters increase. 2) C--- the soft margin constant that allow sphere in feature space not to enclose all points.

clustering controlled by q

Cross Dataset:q=0.5,C=1

Cross Dataset:as q grows...

Cross Dataset:as q grows, the number of cluster increase

Circle with noise: #noise pts.=30,q=2,C=1

Circle with noise: #noise pts.=30, q=2,C=1

Circle with noise: #noise pts.=30, q=10,C=1

Circle with noise: #noise pts.=30, q=10,C=1

Circle with noise: #noise pts.=100, q=2,C=1

Circle with noise: #noise pts.=100, q=2,C=1

Conclusions

• points located close to one another tend to be allocated to the same cluster.

• the number of clusters increase as q grows.• q depends considerably on the specific sample

points(scaling, range, scatter,etc.) , there is no one q which is always appropriate. Use drill-down search for dataset is a solution but it's very time consuming.

• When samples represent a relatively large number of classes, the SVC in less efficient.

My work on progress

• Theoretical exploration:

To find out whether there is restriction we can impose on the inner product such that the mapped back figure in the data space is connected (or has only one component).

• Importance

Q & A