Makhtesh Ramon Geological Window, hyper-spectral Calibration site
Hyper Spectral data classification
-
Upload
mahesh-pal -
Category
Documents
-
view
229 -
download
1
description
Transcript of Hyper Spectral data classification
![Page 1: Hyper Spectral data classification](https://reader038.fdocuments.in/reader038/viewer/2022102905/563db864550346aa9a93460a/html5/thumbnails/1.jpg)
CLASSIFICATION AND FEATURESELECTION USING REMOTESENSING DATA
MAHESH PALNATIONAL INSTITUTE OF TECHNOLOGY
KURUKSHETRA, INDIA
![Page 2: Hyper Spectral data classification](https://reader038.fdocuments.in/reader038/viewer/2022102905/563db864550346aa9a93460a/html5/thumbnails/2.jpg)
Remote Sensing dataPanchromatic-one bandMultispectral – Many bands (This system usesensors that detect radiation in a small numberof broad wavelength bands.Hyperspectral – Large numbers of contiguousbandsHyperspectral sensor collects many, verynarrow, contiguous spectral bandsthroughout the visible, near-infrared, mid-infrared, and thermal infrared portions ofthe electromagnetic spectrum.
Panchromatic-one bandMultispectral – Many bands (This system usesensors that detect radiation in a small numberof broad wavelength bands.Hyperspectral – Large numbers of contiguousbandsHyperspectral sensor collects many, verynarrow, contiguous spectral bandsthroughout the visible, near-infrared, mid-infrared, and thermal infrared portions ofthe electromagnetic spectrum.
![Page 3: Hyper Spectral data classification](https://reader038.fdocuments.in/reader038/viewer/2022102905/563db864550346aa9a93460a/html5/thumbnails/3.jpg)
Band number Spectral range(microns)
Groundresolution (m)
1 0.450 - 0.515 302 0.525 - 0.605 303 0.630 - 0.690 30
Landsat 7 ETM+ data(Multispectral)
3 0.630 - 0.690 304 0.750 - 0.900 305 1.550 - 1.750 306 10.40 - 12.50 607 2.090 - 2.350 30
Panchromatic 0.520 - 0.900 15
Between 0.45 -2.35 µm - A total of six bands
![Page 4: Hyper Spectral data classification](https://reader038.fdocuments.in/reader038/viewer/2022102905/563db864550346aa9a93460a/html5/thumbnails/4.jpg)
Images of the La Mancha (Spain) area by ETM+ sensor (30mresolution)
![Page 5: Hyper Spectral data classification](https://reader038.fdocuments.in/reader038/viewer/2022102905/563db864550346aa9a93460a/html5/thumbnails/5.jpg)
Spectrometer Bands (79) Wavelength range(micrometer)
VIS/NIR 32 0.50 - 1.05SWIR I 8 1.50 - 1.80
The DAIS (Digital Airborne ImagingSpectrometer) Hyperspectral Sensor
SWIR I 8 1.50 - 1.80SWIR II 32 1.90 - 2.50
MIR 1 3.00 - 5.00TIR 6 8.70 - 12.50
Between 0.502-2.395 µm - A total of 72 bandsContinuous bands at 10-45 nm bandwidth
![Page 6: Hyper Spectral data classification](https://reader038.fdocuments.in/reader038/viewer/2022102905/563db864550346aa9a93460a/html5/thumbnails/6.jpg)
Images of the La Mancha (Spain) area using DAIS hyperspectral image(5m resolution)
![Page 7: Hyper Spectral data classification](https://reader038.fdocuments.in/reader038/viewer/2022102905/563db864550346aa9a93460a/html5/thumbnails/7.jpg)
Hyperspectral Imaging, ImagingSpectrometry, Imaging Spectroscopy
•Spectroscopy is the study of electromagnetic radiation.
•Imaging spectroscopy has been used in the laboratoryby physicists and chemists for over 100 years.
•Imaging spectroscopy has many names in the remotesensing community, including imaging spectrometry orhyperspectral imaging.
• It acquires image in large number , narrow, contiguousspectral bands. Thus enabling extraction of reflectancespectra at a pixel scale, that can directly be comparedwith similar spectra from field.
Hyperspectral Imaging, ImagingSpectrometry, Imaging Spectroscopy
•Spectroscopy is the study of electromagnetic radiation.
•Imaging spectroscopy has been used in the laboratoryby physicists and chemists for over 100 years.
•Imaging spectroscopy has many names in the remotesensing community, including imaging spectrometry orhyperspectral imaging.
• It acquires image in large number , narrow, contiguousspectral bands. Thus enabling extraction of reflectancespectra at a pixel scale, that can directly be comparedwith similar spectra from field.
![Page 8: Hyper Spectral data classification](https://reader038.fdocuments.in/reader038/viewer/2022102905/563db864550346aa9a93460a/html5/thumbnails/8.jpg)
Importance of a HyperspectralSensor
• Provide spectral reflectance data in hundreds of bandsrather than only a few bands with multispectral data
– Allows for far more specific analysis of land cover– The emissivity levels of each band can be combined to form a
spectral reflectance curve
• These sensor provide information
– Visible region- vegetation, chlorophyll, sediments– Near Infrared - atmospheric properties, cloud cover,
vegetation land cover transformation– Thermal Infrared – Sea surface temperature, forest fires,
volcanoes, cloud height, total ozone
• Provide spectral reflectance data in hundreds of bandsrather than only a few bands with multispectral data
– Allows for far more specific analysis of land cover– The emissivity levels of each band can be combined to form a
spectral reflectance curve
• These sensor provide information
– Visible region- vegetation, chlorophyll, sediments– Near Infrared - atmospheric properties, cloud cover,
vegetation land cover transformation– Thermal Infrared – Sea surface temperature, forest fires,
volcanoes, cloud height, total ozone
![Page 9: Hyper Spectral data classification](https://reader038.fdocuments.in/reader038/viewer/2022102905/563db864550346aa9a93460a/html5/thumbnails/9.jpg)
CLASSIFICATION
Land cover classification has been a major researcharea involving the use of remote sensing images.
Image classification process involves in assigningpixels to classes in terms of the characteristics ofthe objects or materials.
A major input in GIS based studies
Several approaches are used for land coverclassification.
Land cover classification has been a major researcharea involving the use of remote sensing images.
Image classification process involves in assigningpixels to classes in terms of the characteristics ofthe objects or materials.
A major input in GIS based studies
Several approaches are used for land coverclassification.
![Page 10: Hyper Spectral data classification](https://reader038.fdocuments.in/reader038/viewer/2022102905/563db864550346aa9a93460a/html5/thumbnails/10.jpg)
CLASSIFICATION ALGORITHMS
Predictive accuracy Computational cost
o time to construct the modelo time to use the model
Robustnesso handling noise and missing values
Interpretability:o understanding the insight provided by the model
Predictive accuracy Computational cost
o time to construct the modelo time to use the model
Robustnesso handling noise and missing values
Interpretability:o understanding the insight provided by the model
![Page 11: Hyper Spectral data classification](https://reader038.fdocuments.in/reader038/viewer/2022102905/563db864550346aa9a93460a/html5/thumbnails/11.jpg)
Hyperspectral data classification
1. Provide greater detail on the spectral variation oftargets than conventional multispectral systems.
2. The availability of large amounts of data representsa challenge to classification analyses.
3. Each spectral waveband used in the classificationprocess should add an independent set ofinformation.
4. However, features are highly correlated, suggestinga degree of redundancy in the available informationwhich can have a negative impact on classificationaccuracy.
5. Require large pool of training data, which is quite costly tocollect.
Hyperspectral data classification
1. Provide greater detail on the spectral variation oftargets than conventional multispectral systems.
2. The availability of large amounts of data representsa challenge to classification analyses.
3. Each spectral waveband used in the classificationprocess should add an independent set ofinformation.
4. However, features are highly correlated, suggestinga degree of redundancy in the available informationwhich can have a negative impact on classificationaccuracy.
5. Require large pool of training data, which is quite costly tocollect.
![Page 12: Hyper Spectral data classification](https://reader038.fdocuments.in/reader038/viewer/2022102905/563db864550346aa9a93460a/html5/thumbnails/12.jpg)
Various approaches for the appropriateclassification of high dimensional data
1. Adoption of a classifier that is relatively insensitive to the Hughes
effect (Vapnik, 1995).
2. Using a methods to effectively increase training set size i.e. semi-
supervised classification (Chi and Bruzzone, 2005), active
learning, and use of unlabelled data (Shahshahani and D. A.
Landgrebe, 1994)
3. Use of some form of dimensionality reduction procedure prior to
the classification analysis.
Various approaches for the appropriateclassification of high dimensional data
1. Adoption of a classifier that is relatively insensitive to the Hughes
effect (Vapnik, 1995).
2. Using a methods to effectively increase training set size i.e. semi-
supervised classification (Chi and Bruzzone, 2005), active
learning, and use of unlabelled data (Shahshahani and D. A.
Landgrebe, 1994)
3. Use of some form of dimensionality reduction procedure prior to
the classification analysis.
![Page 13: Hyper Spectral data classification](https://reader038.fdocuments.in/reader038/viewer/2022102905/563db864550346aa9a93460a/html5/thumbnails/13.jpg)
Training samples
Learning algorithm Also called asHypothesis
Model/ function Output values
Testing samples
Hypothesis can be considered as a machine that provides the prediction fortest data
![Page 14: Hyper Spectral data classification](https://reader038.fdocuments.in/reader038/viewer/2022102905/563db864550346aa9a93460a/html5/thumbnails/14.jpg)
SUPPORT VECTOR MACHINES (SVM)
Basic Theory: in 1965 Margin based classifier: in 1992 Support vector network: In 1995
Since 1998, support vector network called asSupport Vector Machines (SVM) - used as analternative to neural network.
First application in remote sensingGualtieri and Cromp, (1998) for hyperspectral
image classification
Basic Theory: in 1965 Margin based classifier: in 1992 Support vector network: In 1995
Since 1998, support vector network called asSupport Vector Machines (SVM) - used as analternative to neural network.
First application in remote sensingGualtieri and Cromp, (1998) for hyperspectral
image classification
![Page 15: Hyper Spectral data classification](https://reader038.fdocuments.in/reader038/viewer/2022102905/563db864550346aa9a93460a/html5/thumbnails/15.jpg)
SVM: structural risk minimisation (SRM)statistical learning theory proposed in1960’s by Vapnik and co-workers.
SRM: Minimise the probability ofmisclassifying an unknown data drawnrandomly
Neural network: Empirical riskminimisation
Minimise the misclassification error ontraining data
SVM: structural risk minimisation (SRM)statistical learning theory proposed in1960’s by Vapnik and co-workers.
SRM: Minimise the probability ofmisclassifying an unknown data drawnrandomly
Neural network: Empirical riskminimisation
Minimise the misclassification error ontraining data
![Page 16: Hyper Spectral data classification](https://reader038.fdocuments.in/reader038/viewer/2022102905/563db864550346aa9a93460a/html5/thumbnails/16.jpg)
SVM
Map data from the original input featurespace to a very high dimensional featurespace.Data becomes linearly separable but
problem becomes computationally difficult.Kernel function allows SVM to work in
feature space, without knowing mappingand dimensionality of feature space.
SVM
Map data from the original input featurespace to a very high dimensional featurespace.Data becomes linearly separable but
problem becomes computationally difficult.Kernel function allows SVM to work in
feature space, without knowing mappingand dimensionality of feature space.
![Page 17: Hyper Spectral data classification](https://reader038.fdocuments.in/reader038/viewer/2022102905/563db864550346aa9a93460a/html5/thumbnails/17.jpg)
Margin theory suggest no effect ofdimensionality of input space uses fewer number of training data (called
support vectors)QP solution, so no local minimaNot many user-defined parameters
Advantages
Margin theory suggest no effect ofdimensionality of input space uses fewer number of training data (called
support vectors)QP solution, so no local minimaNot many user-defined parameters
![Page 18: Hyper Spectral data classification](https://reader038.fdocuments.in/reader038/viewer/2022102905/563db864550346aa9a93460a/html5/thumbnails/18.jpg)
But with real data:
55
60
65
70
75
80
85
90
95
5 10 15 20 25 30 35 40 45 50 55 60 65
Cla
ssifi
catio
n ac
cura
cy (%
)
Number of features
8 pixels 15 pixels
25 pixels 50 pixels
75 pixels 100 pixels
55
60
65
70
75
80
85
90
95
5 10 15 20 25 30 35 40 45 50 55 60 65
Cla
ssifi
catio
n ac
cura
cy (%
)
Number of features
8 pixels 15 pixels
25 pixels 50 pixels
75 pixels 100 pixels
Mahesh Pal and Giles M. Foody, 2010, Feature selection for classification of hyperspectral databy SVM. IEEE Transactions on Geoscience and Remote Sensing, Vol. 48, No. 5, 2297-2306.
![Page 19: Hyper Spectral data classification](https://reader038.fdocuments.in/reader038/viewer/2022102905/563db864550346aa9a93460a/html5/thumbnails/19.jpg)
Disadvantages Designed for two class problem Different methods to create multi-class
classifier. Choice of kernel function and kernel
specific parameters The kernel function should satisfy the
Mercer’s theorem Choice of Regularisation Parameter C Output is not naturally probabilistic
Designed for two class problem Different methods to create multi-class
classifier. Choice of kernel function and kernel
specific parameters The kernel function should satisfy the
Mercer’s theorem Choice of Regularisation Parameter C Output is not naturally probabilistic
![Page 20: Hyper Spectral data classification](https://reader038.fdocuments.in/reader038/viewer/2022102905/563db864550346aa9a93460a/html5/thumbnails/20.jpg)
Relevance vector Machines (RVM)
Based on a probabilistic Bayesianformulation of a linear model(Tipping, 2001).Produce a sparse solution than that of
SVM (i.e. less number of relevancevectors)Ability to use non-Mercer kernelsProbabilistic outputNo need of parameter C
Based on a probabilistic Bayesianformulation of a linear model(Tipping, 2001).Produce a sparse solution than that of
SVM (i.e. less number of relevancevectors)Ability to use non-Mercer kernelsProbabilistic outputNo need of parameter C
![Page 21: Hyper Spectral data classification](https://reader038.fdocuments.in/reader038/viewer/2022102905/563db864550346aa9a93460a/html5/thumbnails/21.jpg)
Major difference from SVM
• Selected points are anti-boundary (awayfrom decision boundary)
• Support vectors represent the leastprototypical examples (closer to boundary,difficult to classify)
• Relevance vectors are the mostprototypical (more representative of class)
• Selected points are anti-boundary (awayfrom decision boundary)
• Support vectors represent the leastprototypical examples (closer to boundary,difficult to classify)
• Relevance vectors are the mostprototypical (more representative of class)
![Page 22: Hyper Spectral data classification](https://reader038.fdocuments.in/reader038/viewer/2022102905/563db864550346aa9a93460a/html5/thumbnails/22.jpg)
Location of the useful training cases withSVM & RVM
MAHESH PAL AND G.M FOODY, 2012, Evaluation of SVM, RVM and SMLR for accurate image classificationwith limited ground data, IEEE journal of selected topics in applied earth observations and remote sensing, 5( 5),1344-1355.
![Page 23: Hyper Spectral data classification](https://reader038.fdocuments.in/reader038/viewer/2022102905/563db864550346aa9a93460a/html5/thumbnails/23.jpg)
MAJOR DIFFERENCE FROM SVMSelected points are anti-
boundary (away fromBoundary)
support vectorsrepresent the leastprototypical examples(closer toboundary, difficult toclassify), relevantvectors are the mostprototypical (morerepresentative of class)
Selected points are anti-boundary (away fromBoundary)
support vectorsrepresent the leastprototypical examples(closer toboundary, difficult toclassify), relevantvectors are the mostprototypical (morerepresentative of class)
![Page 24: Hyper Spectral data classification](https://reader038.fdocuments.in/reader038/viewer/2022102905/563db864550346aa9a93460a/html5/thumbnails/24.jpg)
Disadvantages
Requires large computation cost incomparison to SVM.
Designed for 2-class problem- similar toSVM.
Choice of kernel
May have a problem of local minima
Requires large computation cost incomparison to SVM.
Designed for 2-class problem- similar toSVM.
Choice of kernel
May have a problem of local minima
![Page 25: Hyper Spectral data classification](https://reader038.fdocuments.in/reader038/viewer/2022102905/563db864550346aa9a93460a/html5/thumbnails/25.jpg)
Sparse Multinomial LogisticRegression(SMLR)
SMLR algorithm learns a multi-classclassifier based on the multinomial logisticregression. Uses a Laplacian prior on the weights of
the linear combination of functions toenforce sparsity. SMLR performs a feature selection and
classification simultaneously. Somewhat closer to RVM
SMLR algorithm learns a multi-classclassifier based on the multinomial logisticregression. Uses a Laplacian prior on the weights of
the linear combination of functions toenforce sparsity. SMLR performs a feature selection and
classification simultaneously. Somewhat closer to RVM
![Page 26: Hyper Spectral data classification](https://reader038.fdocuments.in/reader038/viewer/2022102905/563db864550346aa9a93460a/html5/thumbnails/26.jpg)
80
90
100
110
Ban
d 5
Location of the useful training cases withSMLR
40
50
60
70
70 80 90 100
Ban
d 5
Band 1
WheatSugar beetOilseed rape
![Page 27: Hyper Spectral data classification](https://reader038.fdocuments.in/reader038/viewer/2022102905/563db864550346aa9a93460a/html5/thumbnails/27.jpg)
LOCATING USEFUL TRAININGSAMPLES
The Mahalanobis distance between asample and a class centroid is used.
Small distance indicates that the samplelies close to the class centroid and so istypical of the class while a large distanceindicates that the sample is atypical.
Can help to reduce the field work forground truth collection, thus reducingproject cost
The Mahalanobis distance between asample and a class centroid is used.
Small distance indicates that the samplelies close to the class centroid and so istypical of the class while a large distanceindicates that the sample is atypical.
Can help to reduce the field work forground truth collection, thus reducingproject cost
![Page 28: Hyper Spectral data classification](https://reader038.fdocuments.in/reader038/viewer/2022102905/563db864550346aa9a93460a/html5/thumbnails/28.jpg)
PRESENT WORK
Working with COST Action (EuropeanCooperation in Science and Technology)TD1202: “Mapping and the citizen sensor” as NonEU member
1. Classification with imperfect/noisy data2. How SVM / RVM and SMLR works with noisy
data3. Will be working on other classifiers- RF, ELM
Working with COST Action (EuropeanCooperation in Science and Technology)TD1202: “Mapping and the citizen sensor” as NonEU member
1. Classification with imperfect/noisy data2. How SVM / RVM and SMLR works with noisy
data3. Will be working on other classifiers- RF, ELM
![Page 29: Hyper Spectral data classification](https://reader038.fdocuments.in/reader038/viewer/2022102905/563db864550346aa9a93460a/html5/thumbnails/29.jpg)
Two type of data Noise Attribute noise and Class noise
We are dealing with class noise, which canhappen due to subjectivity, data-entry error, orinadequacy of the information used to labeleach class.
Possible solutions to deal with class noiseincludes data cleaning, detection andelimination of mislabelled training cases
Two type of data Noise Attribute noise and Class noise
We are dealing with class noise, which canhappen due to subjectivity, data-entry error, orinadequacy of the information used to labeleach class.
Possible solutions to deal with class noiseincludes data cleaning, detection andelimination of mislabelled training cases
![Page 30: Hyper Spectral data classification](https://reader038.fdocuments.in/reader038/viewer/2022102905/563db864550346aa9a93460a/html5/thumbnails/30.jpg)
Error indata - 0 5% 10% 15% 20% 25% 30% 35% 40%
RVM 88%(51)
88.22%(45)
87.11%(40)
87.78%(46)
87.33%(41)
87.56%(37)
86.44%(39)
85.56%(32)
84.00%(35)
SMLR 88.67%(83)
88.89%(91)
88.67%(85)
87.78%(82)
88.00%(89)
87.33%(80)
87.77%(78)
86.89%(86)
86.67%(72)
SVM 89.11%(203)
88.00%(259)
90.0%(310)
89.77%(339)
89.11%(369)
86.67%(409)
84.0%(432)
84.22%(447)
83.11%(490)
![Page 31: Hyper Spectral data classification](https://reader038.fdocuments.in/reader038/viewer/2022102905/563db864550346aa9a93460a/html5/thumbnails/31.jpg)
EXTREME LEARNING MACHINES (ELM)
A neural network classifier
Use one hidden layer only
No parameter except number of hidden nodes
Kernel function can be used in place ofhidden layer by modifying the optimizationproblem.
A neural network classifier
Use one hidden layer only
No parameter except number of hidden nodes
Kernel function can be used in place ofhidden layer by modifying the optimizationproblem.
![Page 32: Hyper Spectral data classification](https://reader038.fdocuments.in/reader038/viewer/2022102905/563db864550346aa9a93460a/html5/thumbnails/32.jpg)
Global solution (no local optima like NN) Performance comparable to SVM and
better than back-propagation neuralnetwork
Multiclass Very fast
Global solution (no local optima like NN) Performance comparable to SVM and
better than back-propagation neuralnetwork
Multiclass Very fast
![Page 33: Hyper Spectral data classification](https://reader038.fdocuments.in/reader038/viewer/2022102905/563db864550346aa9a93460a/html5/thumbnails/33.jpg)
Dataset SVM (%) KELM (%)ETM+ 88.37 90.33
ATM 92.50 94.06
DAIS 91.97 92.16
Classification Accuracy
Computational costDataset SVM (sec) KELM (sec)
ETM+ 76.74 5.78
DAIS 40.78 1.02
ATM 1.30 0.17
Computational cost
Mahesh Pal, A.E. Maxwell and T. A. Warner, Kernel based Extreme Learning Machine for Remote Sensing ImageClassification,2014, Remote Sensing letters.
![Page 34: Hyper Spectral data classification](https://reader038.fdocuments.in/reader038/viewer/2022102905/563db864550346aa9a93460a/html5/thumbnails/34.jpg)
PRESENT WORK
Working on sparse extreme learningmachine (produce sparse solution similar tosupport vector machine)Ensemble of extreme learning machineAlso trying to understand the working ofdeep neural network
Working on sparse extreme learningmachine (produce sparse solution similar tosupport vector machine)Ensemble of extreme learning machineAlso trying to understand the working ofdeep neural network
![Page 35: Hyper Spectral data classification](https://reader038.fdocuments.in/reader038/viewer/2022102905/563db864550346aa9a93460a/html5/thumbnails/35.jpg)
FEATURE REDUCTIONFEATURE REDUCTION
![Page 36: Hyper Spectral data classification](https://reader038.fdocuments.in/reader038/viewer/2022102905/563db864550346aa9a93460a/html5/thumbnails/36.jpg)
Two broad categories are: feature selection andfeature extraction.
Feature reduction may speed-up theclassification process by reducing data set size.
May increase the predictive accuracy.
May increase the ability to understand theclassification rules.
Feature selection select a subset of the originalfeatures those maintains the useful informationto separate the classes by removing redundantfeatures.
Two broad categories are: feature selection andfeature extraction.
Feature reduction may speed-up theclassification process by reducing data set size.
May increase the predictive accuracy.
May increase the ability to understand theclassification rules.
Feature selection select a subset of the originalfeatures those maintains the useful informationto separate the classes by removing redundantfeatures.
![Page 37: Hyper Spectral data classification](https://reader038.fdocuments.in/reader038/viewer/2022102905/563db864550346aa9a93460a/html5/thumbnails/37.jpg)
FEATURE EXTRACTIONNumber of techniques for feature extraction includingPrincipal Components, maximum noise fractiontransformation, non-orthogonal techniques such asprojection pursuit, Independent component analysis areproposed.
MNF requires estimates of the signal and noisecovariance matrices
Different features provided by MNF are ranked as persignal-to-noise ratio (First MNF have smallest value of S-N ratio).
Results with DAIS data suggests that MNF may not beused effectively for dimensionality reduction.
Number of techniques for feature extraction includingPrincipal Components, maximum noise fractiontransformation, non-orthogonal techniques such asprojection pursuit, Independent component analysis areproposed.
MNF requires estimates of the signal and noisecovariance matrices
Different features provided by MNF are ranked as persignal-to-noise ratio (First MNF have smallest value of S-N ratio).
Results with DAIS data suggests that MNF may not beused effectively for dimensionality reduction.
![Page 38: Hyper Spectral data classification](https://reader038.fdocuments.in/reader038/viewer/2022102905/563db864550346aa9a93460a/html5/thumbnails/38.jpg)
Feature selectionThree approaches of feature selection are:
Filters: uses a search algorithm to search through the space of
possible features and evaluate each feature by using a filter such as
correlation and mutual information
Wrappers: uses a search algorithm to search through the space of
possible features and evaluate each subset by using a classification
algorithm.
Embedded: some classification processes such as random forest/
Multinomial logisitic regression produce a ranked list of features
during classification.
Three approaches of feature selection are:
Filters: uses a search algorithm to search through the space of
possible features and evaluate each feature by using a filter such as
correlation and mutual information
Wrappers: uses a search algorithm to search through the space of
possible features and evaluate each subset by using a classification
algorithm.
Embedded: some classification processes such as random forest/
Multinomial logisitic regression produce a ranked list of features
during classification.
![Page 39: Hyper Spectral data classification](https://reader038.fdocuments.in/reader038/viewer/2022102905/563db864550346aa9a93460a/html5/thumbnails/39.jpg)
Filters
Large number of filter based approach are available in literature.Some used with hyperspectral data are:
1. Correlation-based feature selection
2. Minimum-Redundancy-Maximum-Relevance (mRMR)
3. Entropy
4. Fuzzy entropy
5. Signal-to-noise ratio
6. RELIEF
Large number of filter based approach are available in literature.Some used with hyperspectral data are:
1. Correlation-based feature selection
2. Minimum-Redundancy-Maximum-Relevance (mRMR)
3. Entropy
4. Fuzzy entropy
5. Signal-to-noise ratio
6. RELIEF
![Page 40: Hyper Spectral data classification](https://reader038.fdocuments.in/reader038/viewer/2022102905/563db864550346aa9a93460a/html5/thumbnails/40.jpg)
WRAPPER APPROACH
SVM-RFE approach utilise SVM as base classifier.The SVM-RFE utilise the objective function
as a feature ranking criterion to produce a list offeatures ordered by their discriminatory ability.The feature, with the smallest ranking score is
eliminated. SVM-RFE uses a backward feature elimination scheme
to recursively remove insignificant features from subsetsof features in order to derive a list of all features in rankorder of value. A major drawback of wrapper methods is their high
computational requirements
2w21
SVM-RFE approach utilise SVM as base classifier.The SVM-RFE utilise the objective function
as a feature ranking criterion to produce a list offeatures ordered by their discriminatory ability.The feature, with the smallest ranking score is
eliminated. SVM-RFE uses a backward feature elimination scheme
to recursively remove insignificant features from subsetsof features in order to derive a list of all features in rankorder of value. A major drawback of wrapper methods is their high
computational requirements
![Page 41: Hyper Spectral data classification](https://reader038.fdocuments.in/reader038/viewer/2022102905/563db864550346aa9a93460a/html5/thumbnails/41.jpg)
EMBEDDED APPROACH
During classification process some algorithm produceranked list of all features.
For example: two approaches based on Random forestand Multinomial logistic regression classifier can beused.
In contrast to the filter and wrapper approaches, thesearch for an optimal feature subset by embeddedapproach is built into the classification algorithmitself.
Classification and the feature selection processcannot be separated.
During classification process some algorithm produceranked list of all features.
For example: two approaches based on Random forestand Multinomial logistic regression classifier can beused.
In contrast to the filter and wrapper approaches, thesearch for an optimal feature subset by embeddedapproach is built into the classification algorithmitself.
Classification and the feature selection processcannot be separated.
![Page 42: Hyper Spectral data classification](https://reader038.fdocuments.in/reader038/viewer/2022102905/563db864550346aa9a93460a/html5/thumbnails/42.jpg)
Data Set1. DAIS 7915 sensor by German Space Agency flown on 29 June
2000.
2. The sensor acquire information in 79-bands at a spatial resolution of
5m in the wavelength range of 0.502–12.278 µm.
3. 7 features located in the mid- and thermal infrared region and 7
features from spectral region of 0.502 – 2.395 µm due to striping
noise were removed.
4. An area of 512 pixels by 512 pixels and 65 features covering the test
site was used.
1. DAIS 7915 sensor by German Space Agency flown on 29 June
2000.
2. The sensor acquire information in 79-bands at a spatial resolution of
5m in the wavelength range of 0.502–12.278 µm.
3. 7 features located in the mid- and thermal infrared region and 7
features from spectral region of 0.502 – 2.395 µm due to striping
noise were removed.
4. An area of 512 pixels by 512 pixels and 65 features covering the test
site was used.
![Page 43: Hyper Spectral data classification](https://reader038.fdocuments.in/reader038/viewer/2022102905/563db864550346aa9a93460a/html5/thumbnails/43.jpg)
![Page 44: Hyper Spectral data classification](https://reader038.fdocuments.in/reader038/viewer/2022102905/563db864550346aa9a93460a/html5/thumbnails/44.jpg)
1. Random sampling was used to collect training and test using a
ground reference image.
2. Eight land cover classes i.e. wheat, water, salt lake, hydrophytic
vegetation, vineyards, bare soil, pasture and built-up land.
3. A total of 800 training pixels and a total of 3800 test pixels was
used.
Training and test data
1. Random sampling was used to collect training and test using a
ground reference image.
2. Eight land cover classes i.e. wheat, water, salt lake, hydrophytic
vegetation, vineyards, bare soil, pasture and built-up land.
3. A total of 800 training pixels and a total of 3800 test pixels was
used.
![Page 45: Hyper Spectral data classification](https://reader038.fdocuments.in/reader038/viewer/2022102905/563db864550346aa9a93460a/html5/thumbnails/45.jpg)
Feature selectionAlgorithm
Number of usedfeatures Accuracy
None 65 91.76Fuzzy entropy 14 91.68
Entropy 17 91.61Signal to noise ratio 20 91.68
20Relief 20 88.61SVM-RFE 13 91.89
mRMR 37 91.84CFS 17 91.84
Random forest 21 92.08Multinomial logistic
regression 15 92.76
![Page 46: Hyper Spectral data classification](https://reader038.fdocuments.in/reader038/viewer/2022102905/563db864550346aa9a93460a/html5/thumbnails/46.jpg)
PRESENT WORK
How noise affects the feature selection Ensemble of feature selection method Stability of feature selection algorithms
for hyperspectral data
How noise affects the feature selection Ensemble of feature selection method Stability of feature selection algorithms
for hyperspectral data