Variance Reduction for Stable Feature Selection
description
Transcript of Variance Reduction for Stable Feature Selection
VARIANCE REDUCTION FOR STABLE FEATURE SELECTION
Presenter: Yue HanAdvisor: Lei Yu
Department of Computer Science10/27/10
OUTLINE Introduction and Motivation Background and Related Work Preliminaries
Publications Theoretical Framework Empirical Framework : Margin Based Instance Weighting
Empirical Study Planned Tasks
OUTLINE Introduction and Motivation Background and Related Work Preliminaries
Publications Theoretical Framework Empirical Framework : Margin Based Instance Weighting
Empirical Study Planned Tasks
INTRODUCTION AND MOTIVATIONFEATURE SELECTION APPLICATIONS
D1
D2
SportsT1 T2 ….…… TN
12 0 ….…… 6
DM
C
Travel
Jobs
… … …
Terms
Doc
umen
ts
3 10 ….…… 28
0 11 ….…… 16
…
Features(Genes or Proteins)
Sam
ple
s
Pixels Vs
Features
INTRODUCTION AND MOTIVATIONFEATURE SELECTION FROM HIGH-DIMENSIONAL DATA
p: # of features n: # of samplesHigh-dimensional data: p >> n
Feature Selection:Alleviating the effect of the curse of dimensionality.Enhancing generalization capability.Speeding up learning process.Improving model interpretability.
Curse of Dimensionality:•Effects on distance functions•In optimization and learning•In Bayesian statistics
High-Dimensional Data
Feature Selection AlgorithmMRMR, SVMRFE, Relief-F,
F-statistics, etc.
Low-Dimensional Data
Learning ModelsClassification, Clustering, etc.
Knowledge Discovery on High-dimensional Data
INTRODUCTION AND MOTIVATIONSTABILITY OF FEATURE SELECTION
Training Data Feature SubsetTraining Data Feature
SubsetTraining Data Feature Subset
Feature Selection Method
Consistent or not???
Stability of Feature Selection: the insensitivity of the result of a feature selection algorithm to variations to the training set.Training Data
Learning ModelTraining
Data Learning
ModelTraining Data
Learning Model
Learning Algorithm
Stability of Learning Algorithm isfirstly examined by Turney in 1995
Stability of feature selection was relatively neglected before and attracted interests from researchers in data mining recently.
Stability Issue of Feature Selection
INTRODUCTION AND MOTIVATIONMOTIVATION FOR STABLE FEATURE SELECTION
D1
D2
Features
Sam
ples Given Unlimited Sample Size of D:
Feature selection results from D1 and D2 are the sameSize of D is limited: (n<<p for high dimensional data)Feature selection results from D1 and D2 are differentChallenge: Increasing #of samples could be very costly or
impractical
Experts from Biology and Biomedicine are interested in:
not only the prediction accuracy but also the consistency of feature subsets;validating stable genes or proteins less sensitive to variations to training data; biomarkers to explain the observed phenomena.
OUTLINE Introduction and Motivation Background and Related Work Preliminaries
Publications Theoretical Framework Empirical Framework : Margin Based Instance Weighting
Empirical Study Planned Tasks
BACKGROUND AND RELATED WORKFEATURE SELECTION METHODS
Subset Generation
Subset Evaluation
Stopping Criterion
Result Validation
Original set
Subset
Goodness of subset
no Yes
Evaluation Criteria Filter Model Wrapper Model Embedded Model
Search Strategies: Complete Search Sequential Search Random Search
Representative Algorithms Relief, SFS, MDLM, etc. FSBC, ELSA, LVW, etc. BBHFS, Dash-Liu’s, etc.
BACKGROUND AND RELATED WORKSTABLE FEATURE SELECTION
Comparison of Feature Selection Algorithms w.r.t. Stability(Davis et al. Bioinformatics, vol. 22, 2006; Kalousis et al. KAIS, vol. 12, 2007)Quantify the stability in terms of consistency on subset or weight;Algorithms varies on stability and equally well for classification;Choose the best with both stability and accuracy.
Bagging-based Ensemble Feature Selection (Saeys et al. ECML07)Different bootstrapped samples of the same training set;Apply a conventional feature selection algorithm;Aggregates the feature selection results.
Group-based Stable Feature Selection (Yu et al. KDD08; Loscalzo et al. KDD09)Explore the intrinsic feature correlations;Identify groups of correlated features;Select relevant feature groups.
BACKGROUND AND RELATED WORKMARGIN BASED FEATURE SELECTION
Sample Margin: how much canan instance travel before it hitsthe decision boundaryHypothesis Margin: how much can the hypothesis travel before it hits an instance (Distance between the hypothesis and the opposite hypothesis of an instance)
Representative Algorithms: Relief, Relief-F, G-flip, Simba, etc. margin is used for feature weighting or feature selection (totally different use in our study)
OUTLINE Introduction and Motivation Background and Related Work Preliminaries
Publications Theoretical Framework Empirical Framework : Margin Based Instance Weighting
Empirical Study Planned Tasks
PUBLICATIONS Yue Han and Lei Yu. An Empirical Study on Stability of Feature
Selection Algorithms. Technical Report from Data Mining Research Laboratory, Binghamton University, 2009.
Yue Han and Lei Yu. Margin Based Sample Weighting for Stable Feature Selection. In Proceedings of the 11th International Conference on Web-Age Information Management (WAIM2010), pages 680-691, Jiuzhaigou, China, July 15-17, 2010.
Yue Han and Lei Yu. A Variance Reduction Framework for Stable Feature Selection. In Proceedings of the 10th IEEE International Conference on Data Mining (ICDM2010), Sydney, Australia, December 14-17, 2010, To Appear.
Lei Yu, Yue Han and Michael E. Berens. Stable Gene Selection from Microarray Data via Sample Weighting. IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB), 2010, Major Revision Under Review.
OUTLINE Introduction and Motivation Background and Related Work Preliminaries
Publications Theoretical Framework Empirical Framework : Margin Based Instance Weighting
Empirical Study Planned Tasks
THEORETICAL FRAMEWORKBIAS-VARIANCE DECOMPOSITION OF FEATURE SELECTION ERROR
Expected Loss(Error):
Training Data: D; Data Space: ; FS Result: r(D); True FS Result: r*
Bias:Variance:
Bias-Variance Decomposition of Feature Selection Error:
o Reveals relationship between accuracy(opposite of loss) and stability (opposite of variance);
o Suggests a better trade-off between the bias and variance of feature selection.
THEORETICAL FRAMEWORKVARIANCE REDUCTION VIA IMPORTANCE SAMPLING Feature Selection (Weighting) Monte Carlo EstimatorRelevance Score: Monte Carlo Estimator:
Variance of Monte Carlo Estimator:Impact Factor: feature selection algorithm and sample size? Increasing sample size impractical and costlyImportance Sampling A good importance sampling function h(x)
Instance Weightin
g
Intuition behind h(x) :More instances draw from important regionsLess instances draw from other regions
Intuition behind instance weight :Increase weights for instances from important regionsDecrease weights for instances from other regions
OUTLINE Introduction and Motivation Background and Related Work Preliminaries
Publications Theoretical Framework Empirical Framework : Margin Based Instance Weighting
Empirical Study Planned Tasks
EMPIRICAL FRAMEWORKOVERALL FRAMEWORK
Challenges:
How to produce weights for instances from the point view of feature selection stability;
How to present weighted instances to conventional feature selection algorithms.
Margin Based Instance Weighting for Stable Feature Selection
EMPIRICAL FRAMEWORKMARGIN VECTOR FEATURE SPACE
Original SpaceFor each
Margin Vector Feature Space
Hypothesis Margin:
hit miss
Nearest HitNearest Miss
captures the local profile of feature relevance for all features at
Instances exhibit different profiles of feature relevance; Instances influence feature selection results differently.
EMPIRICAL FRAMEWORKAN ILLUSTRATIVE EXAMPLE
Hypothesis-Margin based Feature Space Transformation:(a) Original Feature Space (b) Margin Vector Feature Space.
(a) (b)
EMPIRICAL FRAMEWORKMARGIN BASED INSTANCE WEIGHTING ALGORITHM
Instance
exhibits different profiles of feature relevance
influence feature selection results differently
Instance Weighting
Higher Outlying Degree Lower Weight
Lower Outlying Degree Higher Weight
Review: Variance reduction via Importance Sampling
More instances draw from important regions
Less instances draw from other regions
Weighting:
Outlying Degree:
EMPIRICAL FRAMEWORKALGORITHM ILLUSTRATION
Time Complexity Analysis:
o Dominated by Instance Weighting:
o Efficient for High-dimensional Data with small sample size (n<<d)
OUTLINE Introduction and Motivation Background and Related Work Preliminaries
Publications Theoretical Framework Empirical Framework : Margin Based Instance Weighting
Empirical Study Planned Tasks
EMPIRICAL STUDYSUBSET STABILITY MEASURES
Average Pair-wise Similarity:
Kuncheva Index:
Feature SubsetJaccard Index;nPOGR;SIMv.
Feature Ranking:Spearman Coefficient
Feature Weighting:Pearson Correlation Coefficient
Training Data Feature SubsetTraining Data Feature
SubsetTraining Data Feature Subset
Feature Selection Method
Consistent or not???
Stability of Feature Selection
EMPIRICAL STUDYEXPERIMENTS ON SYNTHETIC DATA
Synthetic Data Generation: Feature Value:two multivariate normal distributions
Covariance matrix
is a 10*10 square matrix with elements 1 along the diagonal and 0.8 off diagonal.100 groups and 10 feature each
Class label: a weighted sum of all feature values with optimal feature weight vector
500 Training Data:100 instances with 50 from and 50 fromLeave-one-out Test Data:5000 instancesMethod in Comparison:SVM-RFE: Recursively eliminate 10% features of previous iteration till 10 features remained.
Measures:Variance, Bias, ErrorSubset Stability (Kuncheva Index)Accuracy (SVM)
EMPIRICAL STUDYEXPERIMENTS ON SYNTHETIC DATA
Observations: Error is equal to the sum of bias and variance for both versions of SVM-RFE; Error is dominated by bias during early iterations and is dominated by variance during later iterations; IW SVM-RFE exhibits significantly lower bias, variance and error than SVM-RFE when the number of remaining features approaches 50.
EMPIRICAL STUDYEXPERIMENTS ON SYNTHETIC DATA
Conclusion: Variance Reduction via Margin Based Instance Weightingbetter bias-variance tradeoffincreased subset stabilityimproved classification accuracy
EMPIRICAL STUDYEXPERIMENTS ON REAL-WORLD DATA
Microarray Data:
Experiment Setup:
Methods in Comparison:SVM-RFEEnsemble SVM-RFEInstance Weighting SVM-RFE
Measures:VarianceSubset StabilityAccuracies (KNN, SVM)
10 fold ... TrainingData
Test Data
10-fold Cross-Validation
Bootstrapped Training Data
Feature Subset
Aggregated Feature
Subset20
...
Bootstrapped Training Data
...
Feature Subset
20-Ensemble SVM-RFE
EMPIRICAL STUDYEXPERIMENTS ON REAL-WORLD DATA
Note: 40 iterations starting from about 1000 features till 10 features remain
Observations:Non-discriminative during early iterations;
SVM-RFE sharply increase as # of features approaches 10;
IW SVM-RFE shows significantly slower rate of increase.
EMPIRICAL STUDYEXPERIMENTS ON REAL-WORLD DATA
Observations:Both ensemble and instance weighting approaches improve stability consistently;
Ensemble is not as significant as instance weighting;
As # of features increases, stability score decreases because of the larger correction factor.
EMPIRICAL STUDYEXPERIMENTS ON REAL-WORLD DATA
Conclusions:Improves stability of feature selection without sacrificing prediction accuracy;
Performs much better than ensemble approach and more efficient;
Leads to significantly increased stability with slight extra cost of time.
OUTLINE Introduction and Motivation Background and Related Work Preliminaries
Publications Theoretical Framework Empirical Framework : Margin Based Instance Weighting
Empirical Study Planned Tasks
PLANNED TASKSOVERALL FRAMEWORK
Theoretical Framework of Feature Selection Stability
Empirical Instance Weighting Framework
Margin-based Instance Weighting
Representative FS
Algorithms
SVM
-RFE
Relie
f-F
F-st
atist
ics
HHSV
M
Various Real-world Data
Set
Gene
Dat
a
Text
Dat
a
Iterative ApproachState-of-the-art
Weighting Schemes
Relationship Between Feature Selection
Stability and Classification Accuracy
PLANNED TASKSLISTED TASKS
A Extensive Study on Instance Weighting FrameworkA1 Extension to Various Feature Selection AlgorithmsA2 Study on Datasets from Different Domains
B Development of Algorithms under Instance Weighting FrameworkB1 Development of Instance Weighting SchemesB2 Iterative Approach for Margin Based Instance Weighting
C Investigation on the Relationship between Stable Feature Selection and Classification AccuracyC1 How Bias-Variance Properties of Feature Selection Affect Classification AccuracyC2 Study on Various Factors for Stability of Feature Selection
Oct-Dec 2010
Jan-Mar 2011 April-June2011 July-Aug 2011
A1A2B1B2C1C2
Thank you and
Questions?