Post on 31-Dec-2015
Fuzzy Entropy based feature selection for classification of hyperspectral data
Mahesh Pal
Department of Civil Engineering
National Institute of Technology
Kurukshetra
Hyperspectral data1.Measurement of radiation in the visible to the infrared spectral region in many finely spaced spectral wavebands.
2.Provide greater detail on the spectral variation of targets than conventional multispectral systems.
3.The availability of large amounts of data represents a challenge to classification analyses.
4.Each spectral waveband used in the classification process should add an independent set of information. However, features are highly correlated, suggesting a degree of redundancy in the available information which can have a negative impact on classification accuracy.
An example:
MULTISPECTRAL DATA Discrete wave-bands for example Landsat 7
Band 1- 0.45-0.515 µmBand2- 0.525-0.605 µm
Between 0.45 -2.235 µm - A total of six bands
HYPERSPECTRAL DATADAIS data: Between 0.502-2.395 µm - A total of 72 bands
Continuous bands at 10-45 nm bandwidth
0.4-0.7 µm – visible, 0.7-1.3 µm- NIR, 1.0-3.0 µm-MIR, 3-100 µm- Thermal
Various approaches could be adopted for the appropriate classification of high dimensional data:
1.Adoption of a classifier that is relatively insensitive to the
Hughes effect (Vapnik, 1995).
2.Using a methods to effectively increase training set size i.e.
semi-supervised classification (Chi and Bruzzone, 2005) and use
of unlabelled data (Shahshahani and D. A. Landgrebe, 1994)
3. Use of some form of dimensionality reduction procedure prior
to the classification analysis.
Feature reduction1. Two broad categories are: feature selection and feature
extraction.
2. Feature reduction may speed-up the classification process by
reducing data set size.
3. May increase the predictive accuracy.
4. May increase the ability to understand the classification rules.
5. feature selection select a subset of the original features those
maintains the useful information to separate the classes by
removing redundant features.
Feature selectionThree approaches of feature selection are:
Filters: uses a search algorithm to search through the space of possible
features and evaluate each feature by using a filter such as correlation and
mutual information
Wrappers: uses a search algorithm to search through the space of
possible features and evaluate each subset by using a classification
algorithm.
Embedded: some classification processes such as random forest produce
a ranked list of features during classification.
This study aims to explore the usefulness of four filter based feature selection
approaches.
Feature selection approaches
Four filter based feature selection approaches were used.
1.Entropy
2.Fuzzy entropy
3.Signal-to-noise ratio
4.RELIEF
For a finite set ,if P is the probability distribution on X, Yager’s entropy is defined by:
n21 xxxX ,,.........,
xxpx πlogH(X) 2
For a given fuzzy information system defined by (U, A, V, f), where U is a finite
set of objects (Hu and Yu, 2005), A is set of features i.e.
If Q is a subset of attribute set A, and is the fuzzy relation matrix by an
indiscernibility relation
The significance of a is defined as ,
Significance
If significance , attribute a is considered redundant.
Further details of this algorithms can be found in Hu and Yu (2005).
cnaaa ,,........., 21A
BF
Qa
)-H(Q-H(Q)=)-Q|()-Q|( aaaHaa
0)-Q|( aa
Entropy and Fuzzy Entropy
Signal to noise ratio
2121 classclassclassclass stddevstddevmeanmean
This approach rank all features in order to define how well a feature
discriminates between two classes. In order to use this approach for
multiclass classification problem, one against one approach was used in
this study.
RELIEF The general idea of RELIEF is to choose the features that can be most
distinguished between classes.
At each step of an iterative process, an instance is chosen at random from the dataset and the weight for each feature is updated according to the distance of this instance to its Near-miss and Near-hit (Kira and Rendell, 1992).
An instance from the dataset will be a near-hit to X, if it belongs to the close neighbourhood of X and belongs to the same class as that of X.
An instance would be called a near-miss if belongs to the neighbourhood of X but not to the same class as that of X.
Data Set1. DAIS 7915 sensor by German Space Agency flown on 29 June
2000.
2. The sensor acquire information in 79-bands at a spatial resolution of
5m in the wavelength range of 0.502–12.278 µm.
3. 7 features located in the mid- and thermal infrared region and 7
features from spectral region of 0.502 – 2.395 µm due to striping
noise were removed.
4. An area of 512 pixels by 512 pixels and 65 features covering the test
site was used.
1. Random sampling was used to collect train and test using a
ground reference image.
2. Eight land cover classes i.e. wheat, water, salt lake, hydrophytic
vegetation, vineyards, bare soil, pasture and built-up land.
3. A total of 800 training pixels and a total of 3800 test pixels was
used.
Training and test data
Classification Method
1. Support vector machines using one against one approach for
multiclass data was used.
2. Radial basis function kernel was used.
3. Regularisation parameter (C) =5000 and Gamma =2 was used.
4. In all feature selection approach classification accuracy with test
dataset was obtained.
5. Test for non-inferiority using McNemar test was used.
Feature selection method Selected feature
Entropy 32, 51, 63, 35, 8, 49, 42, 27, 48, 64, 6, 50, 65, 11, 53, 39, 22
Fuzzy entropy 32, 41, 50, 6, 27, 63, 36, 49, 10, 22, 65, 51, 40, 48
Relief 3, 4, 2, 11, 10, 5, 8, 6, 9, 7, 12, 1, 13, 23, 22, 25, 24, 20, 31, 30
Signal to noise ratio 5, 7, 8, 9, 6, 10, 11, 4, 12, 3, 32, 31, 33, 30, 24, 23, 25, 29, 13, 26
Selected features with different feature selection approaches
Feature selection methodNumber of features used in
classificationClassification accuracy (%)
No feature selection 65 91.76
Fuzzy entropy 14 91.68
Entropy 17 91.61
Signal to noise ratio 20 91.68
Relief 20 88.61
Classification accuracy with SVM classifier with different selected features
Number of features
Accuracy (%)Difference in accuracy (%)
95% confidence interval
Conclusion(at 0.05 level of significance)
65 91.76 0.00 0.000-0.000 -
14 91.68 0.36 0.071-0.089 Non-inferior
17 91.61 0.13 0.142-0.158 Non-inferior
20 91.68 0.26 0.071-0.089 Non-inferior
20 88.61 3.00 3.140-3.160 Inferior
Difference and non-inferiority test results based on 95% confidence interval on the estimated difference in accuracy from the accuracy achieved with 65 features and the feature sets selected using different approach.
Conclusions
Fuzzy entropy based feature selection approach works well with this dataset and provides comparable performance with small number of selected features.
Accuracy achieved by signal to noise ratio and entropy based approaches is also comparable to that is achieved with full dataset but require more number of selected features than fuzzy entropy based approach.
Results with Relief based approach show a significant decline in classification accuracy in comparison to full dataset.