Unsupervised Feature Selection for Multi-Cluster Data€¦ · Unsupervised Feature Selection for...
Transcript of Unsupervised Feature Selection for Multi-Cluster Data€¦ · Unsupervised Feature Selection for...
![Page 1: Unsupervised Feature Selection for Multi-Cluster Data€¦ · Unsupervised Feature Selection for Multi-Cluster Data Deng Cai, Chiyuan Zhang, Xiaofei He Zhejiang University. Problem:](https://reader034.fdocuments.in/reader034/viewer/2022042710/5f6ba8a39c6df373c1411568/html5/thumbnails/1.jpg)
Unsupervised Feature Selection for Multi-Cluster Data
Deng Cai, Chiyuan Zhang, Xiaofei HeZhejiang University
![Page 2: Unsupervised Feature Selection for Multi-Cluster Data€¦ · Unsupervised Feature Selection for Multi-Cluster Data Deng Cai, Chiyuan Zhang, Xiaofei He Zhejiang University. Problem:](https://reader034.fdocuments.in/reader034/viewer/2022042710/5f6ba8a39c6df373c1411568/html5/thumbnails/2.jpg)
Problem: High-dimension Data
Text document
Image
Video
Gene Expression
Financial
Sensor
…
![Page 3: Unsupervised Feature Selection for Multi-Cluster Data€¦ · Unsupervised Feature Selection for Multi-Cluster Data Deng Cai, Chiyuan Zhang, Xiaofei He Zhejiang University. Problem:](https://reader034.fdocuments.in/reader034/viewer/2022042710/5f6ba8a39c6df373c1411568/html5/thumbnails/3.jpg)
Problem: High-dimension Data
Text document
Image
Video
Gene Expression
Financial
Sensor
…
![Page 4: Unsupervised Feature Selection for Multi-Cluster Data€¦ · Unsupervised Feature Selection for Multi-Cluster Data Deng Cai, Chiyuan Zhang, Xiaofei He Zhejiang University. Problem:](https://reader034.fdocuments.in/reader034/viewer/2022042710/5f6ba8a39c6df373c1411568/html5/thumbnails/4.jpg)
Solution: Feature Selection
Reduce the dimensionality by finding a relevant feature subset
![Page 5: Unsupervised Feature Selection for Multi-Cluster Data€¦ · Unsupervised Feature Selection for Multi-Cluster Data Deng Cai, Chiyuan Zhang, Xiaofei He Zhejiang University. Problem:](https://reader034.fdocuments.in/reader034/viewer/2022042710/5f6ba8a39c6df373c1411568/html5/thumbnails/5.jpg)
Feature Selection Techniques
Supervised
Fisher score
Information gain
Unsupervised (discussed here)
Max variance
Laplacian Score, NIPS 2005
Q-alpha, JMLR 2005
MCFS, KDD 2010 (Our Algorithm)
…
![Page 6: Unsupervised Feature Selection for Multi-Cluster Data€¦ · Unsupervised Feature Selection for Multi-Cluster Data Deng Cai, Chiyuan Zhang, Xiaofei He Zhejiang University. Problem:](https://reader034.fdocuments.in/reader034/viewer/2022042710/5f6ba8a39c6df373c1411568/html5/thumbnails/6.jpg)
Outline
Problem setting
Multi-Cluster Feature Selection (MCFS) Algorithm
Experimental Validation
Conclusion
![Page 7: Unsupervised Feature Selection for Multi-Cluster Data€¦ · Unsupervised Feature Selection for Multi-Cluster Data Deng Cai, Chiyuan Zhang, Xiaofei He Zhejiang University. Problem:](https://reader034.fdocuments.in/reader034/viewer/2022042710/5f6ba8a39c6df373c1411568/html5/thumbnails/7.jpg)
Problem setting
Unsupervised Multi clusters/classes Feature Selection
How traditional score-ranking methods fail:
![Page 8: Unsupervised Feature Selection for Multi-Cluster Data€¦ · Unsupervised Feature Selection for Multi-Cluster Data Deng Cai, Chiyuan Zhang, Xiaofei He Zhejiang University. Problem:](https://reader034.fdocuments.in/reader034/viewer/2022042710/5f6ba8a39c6df373c1411568/html5/thumbnails/8.jpg)
Multi-Cluster Feature Selection (MCFS) Algorithm
Objective
Select those features such that the multi-cluster structure of the data can be well preserved
Implementation
Spectral analysis to explorer the intrinsic structure
L1-regularized least-square to select best features
![Page 9: Unsupervised Feature Selection for Multi-Cluster Data€¦ · Unsupervised Feature Selection for Multi-Cluster Data Deng Cai, Chiyuan Zhang, Xiaofei He Zhejiang University. Problem:](https://reader034.fdocuments.in/reader034/viewer/2022042710/5f6ba8a39c6df373c1411568/html5/thumbnails/9.jpg)
Spectral Embedding for Cluster Analysis
![Page 10: Unsupervised Feature Selection for Multi-Cluster Data€¦ · Unsupervised Feature Selection for Multi-Cluster Data Deng Cai, Chiyuan Zhang, Xiaofei He Zhejiang University. Problem:](https://reader034.fdocuments.in/reader034/viewer/2022042710/5f6ba8a39c6df373c1411568/html5/thumbnails/10.jpg)
Spectral Embedding for Cluster Analysis
Laplacian Eigenmaps
Can unfold the data manifold and provide the flat embedding for data points
Can reflect the data distribution on each of the data clusters
Thoroughly studied and well understood
![Page 11: Unsupervised Feature Selection for Multi-Cluster Data€¦ · Unsupervised Feature Selection for Multi-Cluster Data Deng Cai, Chiyuan Zhang, Xiaofei He Zhejiang University. Problem:](https://reader034.fdocuments.in/reader034/viewer/2022042710/5f6ba8a39c6df373c1411568/html5/thumbnails/11.jpg)
Learning Sparse Coefficient Vectors
![Page 12: Unsupervised Feature Selection for Multi-Cluster Data€¦ · Unsupervised Feature Selection for Multi-Cluster Data Deng Cai, Chiyuan Zhang, Xiaofei He Zhejiang University. Problem:](https://reader034.fdocuments.in/reader034/viewer/2022042710/5f6ba8a39c6df373c1411568/html5/thumbnails/12.jpg)
Feature Selection on Sparse Coefficient Vectors
![Page 13: Unsupervised Feature Selection for Multi-Cluster Data€¦ · Unsupervised Feature Selection for Multi-Cluster Data Deng Cai, Chiyuan Zhang, Xiaofei He Zhejiang University. Problem:](https://reader034.fdocuments.in/reader034/viewer/2022042710/5f6ba8a39c6df373c1411568/html5/thumbnails/13.jpg)
Algorithm Summary
1. Construct p-nearest neighbor graph W2. Solve generalized eigen-problem to get K eigenvectors
corresponding to the smallest eigenvalues3. Solve K L1-regulairzed regression to get K sparse
coefficient vectors4. Compute the MCFS score for each feature5. Select d features according to MCFS score
![Page 14: Unsupervised Feature Selection for Multi-Cluster Data€¦ · Unsupervised Feature Selection for Multi-Cluster Data Deng Cai, Chiyuan Zhang, Xiaofei He Zhejiang University. Problem:](https://reader034.fdocuments.in/reader034/viewer/2022042710/5f6ba8a39c6df373c1411568/html5/thumbnails/14.jpg)
Complexity Analysis
![Page 15: Unsupervised Feature Selection for Multi-Cluster Data€¦ · Unsupervised Feature Selection for Multi-Cluster Data Deng Cai, Chiyuan Zhang, Xiaofei He Zhejiang University. Problem:](https://reader034.fdocuments.in/reader034/viewer/2022042710/5f6ba8a39c6df373c1411568/html5/thumbnails/15.jpg)
Experiments
Unsupervised feature selection for
Clustering
Nearest neighbor classification
Compared algorithms
MCFS
Q-alpha
Laplacian score
Maximum variance
![Page 16: Unsupervised Feature Selection for Multi-Cluster Data€¦ · Unsupervised Feature Selection for Multi-Cluster Data Deng Cai, Chiyuan Zhang, Xiaofei He Zhejiang University. Problem:](https://reader034.fdocuments.in/reader034/viewer/2022042710/5f6ba8a39c6df373c1411568/html5/thumbnails/16.jpg)
Experiments (USPS Clustering)
USPS Hand Written Digits
9298 samples, 10 classes, 16x16 gray-scale image each
![Page 17: Unsupervised Feature Selection for Multi-Cluster Data€¦ · Unsupervised Feature Selection for Multi-Cluster Data Deng Cai, Chiyuan Zhang, Xiaofei He Zhejiang University. Problem:](https://reader034.fdocuments.in/reader034/viewer/2022042710/5f6ba8a39c6df373c1411568/html5/thumbnails/17.jpg)
Experiments (COIL20 Clustering)
COIL20 image dataset
1440 samples, 20 classes, 32x32 gray-scale image each
![Page 18: Unsupervised Feature Selection for Multi-Cluster Data€¦ · Unsupervised Feature Selection for Multi-Cluster Data Deng Cai, Chiyuan Zhang, Xiaofei He Zhejiang University. Problem:](https://reader034.fdocuments.in/reader034/viewer/2022042710/5f6ba8a39c6df373c1411568/html5/thumbnails/18.jpg)
Experiments (ORL Clustering)
ORL face dataset
400 images of 40 subjects
32x32 gray-scale images
10 Classes
20 Classes
30 Classes
40 Classes
![Page 19: Unsupervised Feature Selection for Multi-Cluster Data€¦ · Unsupervised Feature Selection for Multi-Cluster Data Deng Cai, Chiyuan Zhang, Xiaofei He Zhejiang University. Problem:](https://reader034.fdocuments.in/reader034/viewer/2022042710/5f6ba8a39c6df373c1411568/html5/thumbnails/19.jpg)
Experiments (Isolet Clustering)
Isolet spoken letter recognition data
1560 samples, 26 classes
617 features each sample
![Page 20: Unsupervised Feature Selection for Multi-Cluster Data€¦ · Unsupervised Feature Selection for Multi-Cluster Data Deng Cai, Chiyuan Zhang, Xiaofei He Zhejiang University. Problem:](https://reader034.fdocuments.in/reader034/viewer/2022042710/5f6ba8a39c6df373c1411568/html5/thumbnails/20.jpg)
Experiments (Nearest Neighbor Classification)
![Page 21: Unsupervised Feature Selection for Multi-Cluster Data€¦ · Unsupervised Feature Selection for Multi-Cluster Data Deng Cai, Chiyuan Zhang, Xiaofei He Zhejiang University. Problem:](https://reader034.fdocuments.in/reader034/viewer/2022042710/5f6ba8a39c6df373c1411568/html5/thumbnails/21.jpg)
Experiments (Parameter Selection)
Number of nearest neighbors p: stable
Number of eigenvectors: best equal to number of classes
![Page 22: Unsupervised Feature Selection for Multi-Cluster Data€¦ · Unsupervised Feature Selection for Multi-Cluster Data Deng Cai, Chiyuan Zhang, Xiaofei He Zhejiang University. Problem:](https://reader034.fdocuments.in/reader034/viewer/2022042710/5f6ba8a39c6df373c1411568/html5/thumbnails/22.jpg)
Conclusion
MCFS
Well handle multi-class data
Outperform state-of-art algorithms
Performs especially well when number of selected features is small (< 50)
![Page 23: Unsupervised Feature Selection for Multi-Cluster Data€¦ · Unsupervised Feature Selection for Multi-Cluster Data Deng Cai, Chiyuan Zhang, Xiaofei He Zhejiang University. Problem:](https://reader034.fdocuments.in/reader034/viewer/2022042710/5f6ba8a39c6df373c1411568/html5/thumbnails/23.jpg)
Questions