Yue Han and Lei Yu [email protected] Binghamton University.
-
Upload
damian-sims -
Category
Documents
-
view
220 -
download
0
Transcript of Yue Han and Lei Yu [email protected] Binghamton University.
A Variance Reduction Framework for Stable Feature Selection
Yue Han and Lei [email protected]
Binghamton University
Introduction, Motivation and Related Work Theoretical Framework Empirical Framework : Margin Based Instance
Weighting
Empirical Study◦ Synthetic Data◦ Real-world Data
Conclusion and Future Work
Outline
Introduction and MotivationFeature Selection Applications
D1
D2
Sports
T1 T2 ….…… TN
12 0 ….…… 6
DM
C
Travel
Jobs
… … …
Terms
Docu
men
ts
3 10 ….…… 28
0 11 ….…… 16
…
Features(Genes or Proteins)
Sam
ple
s
Pixels
Vs
Features
MotivationStability of Feature Selection
Stability of Feature Selection : the insensitivity of the result of a feature selection algorithm to variations to the training set.
Stability of feature selection was relatively neglected before and attracted interests from researchers in data mining recently.
Stability of Learning AlgorithmLearning Algorithm
Training Data
Learning Models
Training D1
Data Space Training D2
Training Dn
Feature Subset R1
Feature Subset R2
Feature Subset Rn
Feature Selection Method
Consistent or not???
Stability Issue of Feature Selection
MotivationWhy is Stable Feature Selection needed?
Data Variations
Stable Feature Selection Method
Stable Feature Subset
Learning Results
Learning Methods
Closer to characteristic
features(biomarkers)
Better learning
performance
Largely different feature subsets
Similarly good learning performance
Domain experts (Biomedicine and Biology) also interested in:Biomarkers stable and insensitive to data variations
Unstable feature selection method
Dampen confidence for validation;Increase the cost for experiments
Theoretical FrameworkVariance, Bias and Error of Feature Selection
How to represent the underlying data distribution without increasing sample size?
Challenge: Increasing training sample size could be very costly or impractical
Training D1 Feature Weight Vector
Data Space True Feature Weight Vector* * *1 2* ( , , )dr r r r
1 1 11 1 2( ) ( , , )D D D
dr D r r r
Feature Weight Vector 2 2 22 1 2( ) ( , , )D D D
dr D r r r
Feature Weight Vector 1 2( ) ( , , )n n nD D Dn dr D r r r
Training D2
Training Dn
Variance: fluctuation of n weight values around its central tendencyBias: loss of central tendency(average) from the true weight valueError: average loss of n weight values from the true weight value
, 1...ir i dfor
Theoretical FrameworkBias-variance Decomposition of Feature Selection Error
Error:
Data Space: ; Training Data: D ; FS Result: r(D) ; True FS Result: r*
Bias:
Variance:
Bias-Variance Decomposition of Feature Selection Error:
o Reveals relationship between accuracy(opposite of error) and stability (opposite of variance);
o Suggests a better trade-off between the bias and variance of feature selection.
For each individual feature: weight value instead of 0/1 selection
Average for all features
Theoretical FrameworkVariance Reduction via Importance Sampling Feature Selection (Weighting) Monte Carlo Estimator
Reducing Variance of Monte Carlo Estimator: Importance Sampling
? Increasing sample size impractical and costly
Importance Sampling
Instance Weightin
g
Intuition behind importance sampling:More instances draw from important regionsLess instances draw from other regions
Intuition behind instance weighting:Increase weights for instances from important regionsDecrease weights for instances from other regions
How to weight the instances?How important is each instance?
Empirical FrameworkOverall Framework
Challenges:
How to produce weights for instances from the point view of feature selection stability;
How to present weighted instances to conventional feature selection algorithms.
Margin Based Instance Weighting for Stable Feature Selection
Empirical FrameworkMargin Vector Feature Space
Original SpaceFor each
Margin Vector Feature Space
Hypothesis Margin(along each dimension):
hit miss
Nearest Hit
Nearest Miss
captures the local profile of feature relevance for all features at
Instances exhibit different profiles of feature relevance; Instances influence feature selection results differently.
Empirical FrameworkAn Illustrative Example
Hypothesis-Margin based Feature Space Transformation:(a) Original Feature Space (b) Margin Vector Feature Space.
(a) (b)
Empirical FrameworkMargin Based Instance Weighting Algorithm
Instance
exhibits different profiles of feature relevance
influence feature selection results differently
Instance Weighting
Higher Outlying Degree Lower Weight
Lower Outlying Degree Higher Weight
Review: Variance reduction via Importance Sampling
More instances draw from important regions
Less instances draw from other regions
Weighting:
Outlying Degree:
Empirical FrameworkAlgorithm Illustration
Time Complexity Analysis:
o Dominated by Instance Weighting: o Efficient for High-dimensional Data with small sample size (n<<d)
Empirical StudySubset Stability Measures
Average Pair-wise Similarity:
Kuncheva Index:
Training D1
Data Space Training D2
Training Dn
Feature Subset R1
Feature Subset R2
Feature Subset Rn
Feature Selection Method
Consistent or not???
Stability Issue of Feature Selection
Empirical StudyExperiments on Synthetic Data
Synthetic Data Generation:
Feature Value:two multivariate normal distributions
Covariance matrix
is a 10*10 square matrix with elements 1 along the diagonal and 0.8 off diagonal.100 groups and 10 feature each
Class label: a weighted sum of all feature values with optimal feature weight vector
500 Training Data:100 instances with 50 from and 50 fromLeave-one-out Test Data:5000 instances
Method in Comparison:SVM-RFE: Recursively eliminate 10% features of previous iteration till 10 features remained.
Measures:Variance, Bias, ErrorSubset Stability (Kuncheva Index)Accuracy (SVM)
Empirical StudyExperiments on Synthetic Data
Observations: Error is equal to the sum of bias and variance for both versions of SVM-RFE; Error is dominated by bias during early iterations and is dominated by variance during later iterations; IW SVM-RFE exhibits significantly lower bias, variance and error than SVM-RFE when the number of remaining features approaches 50.
Empirical StudyExperiments on Synthetic Data
Conclusion: Variance Reduction via Margin Based Instance Weightingbetter bias-variance tradeoffincreased subset stabilityimproved classification accuracy
Empirical StudyExperiments on Real-world Data
Microarray Data:
Methods in Comparison:
SVM-RFEEnsemble SVM-RFEInstance Weighting SVM-RFE
Measures:
VarianceSubset StabilityAccuracies (KNN, SVM)
Bootstrapped Training Data
Feature Subset
Aggregated Feature Subset20
...
Bootstrapped Training Data
...
Feature Subset
20-Ensemble SVM-RFE
Empirical StudyExperiments on Real-world Data
Note: 40 iterations starting from about 1000 features till 10 features remain
Observations:Non-discriminative during early iterations;
SVM-RFE sharply increase as # of features approaches 10;
IW SVM-RFE shows significantly slower rate of increase.
Empirical StudyExperiments on Real-world Data
Observations:Both ensemble and instance weighting approaches improve stability consistently;
Ensemble is not as significant as instance weighting;
As # of features increases, stability score decreases because of the larger correction factor.
Empirical StudyExperiments on Real-world Data
Conclusions:Improves stability of feature selection without sacrificing prediction accuracy;
Performs much better than ensemble approach and more efficient;
Leads to significantly increased stability with slight extra cost of time.
Prediction accuracy(via both KNN and SVM):non-discriminative among three approaches for all data sets
Accomplishments: Establish a bias-variance decomposition framework for feature
selection; Propose an empirical framework for stable feature selection; Develop an efficient margin-based instance weighting algorithm; Comprehensive study through synthetic and real-world data.
Future Work: Extend current framework to other state-of-the-art feature selection
algorithms; Explore the relationship between stable feature selection and
classification performance.
Conclusion and Future Work
Related WorkStable Feature Selection
Comparison of Feature Selection Algorithms w.r.t. Stability(Davis et al. Bioinformatics, vol. 22, 2006; Kalousis et al. KAIS, vol. 12, 2007)Quantify the stability in terms of consistency on subset or weight;Algorithms varies on stability and equally well for classification;Choose the best with both stability and accuracy.
Bagging-based Ensemble Feature Selection (Saeys et al. ECML07)Different bootstrapped samples of the same training set;Apply a conventional feature selection algorithm;Aggregates the feature selection results.
Group-based Stable Feature Selection (Yu et al. KDD08; Loscalzo et al. KDD09)Explore the intrinsic feature correlations;Identify groups of correlated features;Select relevant feature groups.