A Game Theoretic Framework for Heterogenous Information Network Clustering
Robust Information-theoretic Clustering
-
Upload
allegra-whitaker -
Category
Documents
-
view
51 -
download
0
description
Transcript of Robust Information-theoretic Clustering
![Page 1: Robust Information-theoretic Clustering](https://reader035.fdocuments.in/reader035/viewer/2022062300/56812a6c550346895d8df06c/html5/thumbnails/1.jpg)
Robust Information-theoretic Clustering
By C. Bohm, C. Faloutsos, J-Y. Pan, and C. Plant
Presenter: Niyati Parikh
![Page 2: Robust Information-theoretic Clustering](https://reader035.fdocuments.in/reader035/viewer/2022062300/56812a6c550346895d8df06c/html5/thumbnails/2.jpg)
Objective
Find natural clustering in a dataset Two questions:
Goodness of a clustering Efficient algorithm for good clustering
![Page 3: Robust Information-theoretic Clustering](https://reader035.fdocuments.in/reader035/viewer/2022062300/56812a6c550346895d8df06c/html5/thumbnails/3.jpg)
Define “goodness”
Ability to describe the clusters succinctly Adopt VAC (Volume after Compression)
Record #bytes for number of clusters k Record #bytes to record their type (guassian,
uniform,..) Compressed location of each point
![Page 4: Robust Information-theoretic Clustering](https://reader035.fdocuments.in/reader035/viewer/2022062300/56812a6c550346895d8df06c/html5/thumbnails/4.jpg)
VAC
Tells which grouping is better Lower VAC => better grouping Formula using decorrelation matrix Decorrelation matrix = matrix with
eigenvectors
![Page 5: Robust Information-theoretic Clustering](https://reader035.fdocuments.in/reader035/viewer/2022062300/56812a6c550346895d8df06c/html5/thumbnails/5.jpg)
Computing VAC
Steps: Compute covariance matrix of cluster C Compute PCA and obtain eigenvector
matrix Compute VAC from the matrix
![Page 6: Robust Information-theoretic Clustering](https://reader035.fdocuments.in/reader035/viewer/2022062300/56812a6c550346895d8df06c/html5/thumbnails/6.jpg)
Efficient algorithm
Take initial clustering given by any algorithm
Refine that clustering to remove outliers/noise
Output a better clustering by doing post processing
![Page 7: Robust Information-theoretic Clustering](https://reader035.fdocuments.in/reader035/viewer/2022062300/56812a6c550346895d8df06c/html5/thumbnails/7.jpg)
Refining Clusters Use VAC to refine existing clusters Removing outliers from the given cluster C Define Core and Out as set of points for core and outliers
in C Initially Out contains all points in C Arrange points in ascending order of its distance from
center Compute VAC Pick the closest point from Out and move to Core Compute new VAC If new VAC increases then stop, else pick next closest
point and repeat
![Page 8: Robust Information-theoretic Clustering](https://reader035.fdocuments.in/reader035/viewer/2022062300/56812a6c550346895d8df06c/html5/thumbnails/8.jpg)
VAC and Robust estimation
-Conventional estimation: covariance matrix uses Mean-Robust estimation: covariance matrix uses Median-Median is less affected by outliers than Mean
![Page 9: Robust Information-theoretic Clustering](https://reader035.fdocuments.in/reader035/viewer/2022062300/56812a6c550346895d8df06c/html5/thumbnails/9.jpg)
Sample result-Imperfect clusters formed by K-Means affect purifying process-May result into redundant clusters, that could be merged
![Page 10: Robust Information-theoretic Clustering](https://reader035.fdocuments.in/reader035/viewer/2022062300/56812a6c550346895d8df06c/html5/thumbnails/10.jpg)
Cluster Merging
Merge Ci and Cj only if the combined VAC decreases
savedCost(Ci, Cj) = VAC(Ci) + VAC(Cj) – VAC(Ci U Cj)
If savedCost > 0, then merge Ci and Cj Greedy search to maximize savedCost,
hence minimize VAC
![Page 11: Robust Information-theoretic Clustering](https://reader035.fdocuments.in/reader035/viewer/2022062300/56812a6c550346895d8df06c/html5/thumbnails/11.jpg)
Final Result
![Page 12: Robust Information-theoretic Clustering](https://reader035.fdocuments.in/reader035/viewer/2022062300/56812a6c550346895d8df06c/html5/thumbnails/12.jpg)
Experiment results
![Page 13: Robust Information-theoretic Clustering](https://reader035.fdocuments.in/reader035/viewer/2022062300/56812a6c550346895d8df06c/html5/thumbnails/13.jpg)
Example
![Page 14: Robust Information-theoretic Clustering](https://reader035.fdocuments.in/reader035/viewer/2022062300/56812a6c550346895d8df06c/html5/thumbnails/14.jpg)
Thank You
Questions?