Data Clustering: 50 years beyond K-means
description
Transcript of Data Clustering: 50 years beyond K-means
![Page 1: Data Clustering: 50 years beyond K-means](https://reader035.fdocuments.in/reader035/viewer/2022062304/56813c1a550346895da58ebc/html5/thumbnails/1.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Data Clustering: 50 years beyond K-means
Presenter : Jiang-Shan Wang
Authors : Anil K. Jain
PRL 2010
國立雲林科技大學National Yunlin University of Science and Technology
1
![Page 2: Data Clustering: 50 years beyond K-means](https://reader035.fdocuments.in/reader035/viewer/2022062304/56813c1a550346895da58ebc/html5/thumbnails/2.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Outline
Motivation
Objective
Data clustering
User’s dilemma
K-means
Extensions of K-means
Trends in data clustering
Summary
Comments
2
![Page 3: Data Clustering: 50 years beyond K-means](https://reader035.fdocuments.in/reader035/viewer/2022062304/56813c1a550346895da58ebc/html5/thumbnails/3.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Motivation
Providing a brief overview of clustering and point out some of the emerging and useful research directions.
3
![Page 4: Data Clustering: 50 years beyond K-means](https://reader035.fdocuments.in/reader035/viewer/2022062304/56813c1a550346895da58ebc/html5/thumbnails/4.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Objective
4
Summarizing well known clustering methods, discuss the major challenge and key issues in designing clustering algorithm, and point out some of the emerging and useful research directions.
![Page 5: Data Clustering: 50 years beyond K-means](https://reader035.fdocuments.in/reader035/viewer/2022062304/56813c1a550346895da58ebc/html5/thumbnails/5.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Data clustering
5
Three main purposes: Underlying structure
Natural classification
Compression
![Page 6: Data Clustering: 50 years beyond K-means](https://reader035.fdocuments.in/reader035/viewer/2022062304/56813c1a550346895da58ebc/html5/thumbnails/6.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.K-means
Three parameters Number of clusters
Cluster initialization
Distance metrics
6
![Page 7: Data Clustering: 50 years beyond K-means](https://reader035.fdocuments.in/reader035/viewer/2022062304/56813c1a550346895da58ebc/html5/thumbnails/7.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Extensions of K-means
Fuzzy C-means
Bisecting K-means
X-means
K-medoid
Kernel K-means
7
![Page 8: Data Clustering: 50 years beyond K-means](https://reader035.fdocuments.in/reader035/viewer/2022062304/56813c1a550346895da58ebc/html5/thumbnails/8.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.User’s dilemma
Representation
8
![Page 9: Data Clustering: 50 years beyond K-means](https://reader035.fdocuments.in/reader035/viewer/2022062304/56813c1a550346895da58ebc/html5/thumbnails/9.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.User’s dilemma
Purpose of grouping
9
![Page 10: Data Clustering: 50 years beyond K-means](https://reader035.fdocuments.in/reader035/viewer/2022062304/56813c1a550346895da58ebc/html5/thumbnails/10.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.User’s dilemma
Number of clusters
10
![Page 11: Data Clustering: 50 years beyond K-means](https://reader035.fdocuments.in/reader035/viewer/2022062304/56813c1a550346895da58ebc/html5/thumbnails/11.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.User’s dilemma
Cluster validity
11
![Page 12: Data Clustering: 50 years beyond K-means](https://reader035.fdocuments.in/reader035/viewer/2022062304/56813c1a550346895da58ebc/html5/thumbnails/12.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.User’s dilemma
Comparing clustering algorithm
12
![Page 13: Data Clustering: 50 years beyond K-means](https://reader035.fdocuments.in/reader035/viewer/2022062304/56813c1a550346895da58ebc/html5/thumbnails/13.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.User’s dilemma
Comparing clustering algorithm
13
![Page 14: Data Clustering: 50 years beyond K-means](https://reader035.fdocuments.in/reader035/viewer/2022062304/56813c1a550346895da58ebc/html5/thumbnails/14.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.User’s dilemma
Admissibility analysis of clustering algorithms
Fisher and vanNess’s criteria Convex
Cluster proportion
Cluster omission
Monotone
Kleinberg’s criteria Scale invariance
Richness
consistency
14
![Page 15: Data Clustering: 50 years beyond K-means](https://reader035.fdocuments.in/reader035/viewer/2022062304/56813c1a550346895da58ebc/html5/thumbnails/15.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Trends in data clustering
Clustering ensembles
15
![Page 16: Data Clustering: 50 years beyond K-means](https://reader035.fdocuments.in/reader035/viewer/2022062304/56813c1a550346895da58ebc/html5/thumbnails/16.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Trends in data clustering
Semi-supervised clustering
16
![Page 17: Data Clustering: 50 years beyond K-means](https://reader035.fdocuments.in/reader035/viewer/2022062304/56813c1a550346895da58ebc/html5/thumbnails/17.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Trends in data clustering
Large-scale clustering
Studies Efficient Nearest Neighbor
Data summarization
Distributed computing
Incremental clustering
Sampling-based methods
17
![Page 18: Data Clustering: 50 years beyond K-means](https://reader035.fdocuments.in/reader035/viewer/2022062304/56813c1a550346895da58ebc/html5/thumbnails/18.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Trends in data clustering
Multi-way clustering
Heterogeneous data Rank data
Dynamic data
Graph data
Relational data
18
![Page 19: Data Clustering: 50 years beyond K-means](https://reader035.fdocuments.in/reader035/viewer/2022062304/56813c1a550346895da58ebc/html5/thumbnails/19.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Summary
19
There needs to be a suite of benchmark data.
A tighter integration between clustering algorithms and the application needs.
Optimization problems.
Stability or consistency.
Choose clustering principles according to satisfiability of the stated axioms.
Develop semi-supervised clustering.
![Page 20: Data Clustering: 50 years beyond K-means](https://reader035.fdocuments.in/reader035/viewer/2022062304/56813c1a550346895da58ebc/html5/thumbnails/20.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Comments
20
Advantage Many figures to understanding.
Drawback …
Application Clustering.