NVESTIGATION OF FEATURE REDUCTION METHODS FOR IMPROVING...
Transcript of NVESTIGATION OF FEATURE REDUCTION METHODS FOR IMPROVING...
-
!
!
I!
!!!
!!
!
NVESTIGATION OF FEATURE REDUCTION METHODS FOR IMPROVING EMG AND EEG PATTERN RECOGNITION
ROBUSTNESS !!
Article!and!worksheets!!
!!
!
Master's!thesis,!Spring!2015!Astrid!Clausen!Nrgaard!
Biomedical!Engineering!and!Informatics!!
-
!
!
Aalborg University
Department of Health Science & Technology
Investigation of features reduction methods for improving EMG and EEG
pattern recognition robustness
Masters thesis - 10th semester Biomedical Engineering and Informatics
Student:
Group: Supervisor:
Number of pages: Completion date:
Astrid Clausen Nrgaard 15gr1083 Ernest Kamavuako 54 3th of June 2015
-
Preface
This project was carried out by Astrid Clausen Nrgaard as her mastersthesis on the 10th semester of Biomedical Engineering and Informaticson Aalborg University. The time period of the project was the 1st ofFebruary 2015 to the 3th of June 2015.
Acknowledgment
The author would like to thank the subjects who participated in thestudy, as well as Julie Gade and Rsa Hugosdttir, whom the experi-ments was carried out in cooperation with. The author would also liketo thank Rasmus Wiberg Nedergaard for kindly handing over his dataform his masters thesis, and Ernest Nlandu Kamavuako who supervisedthis project.
Reading guide
This project contains two parts, an article and worksheets. The articleserves as the main documentation, while the worksheets can be read, ifthe reader wishes to get a supplementary information about the featurereduction methods or the extracted features. The references are specifiedby the Vancouver System in the article and Harvard Method [last name,year] in the worksheets.
The contents of this report is freely available, but publication (with ref-erences) requires an agreement with the authors.
5
-
Abstrakt
Reducering af dimensionen af signal egenskaber (RDSE) er et essentielttrin i mnster genkendelse. Kliniske studier lider ofte af et hjt antal afsignal egenskaber og et lavt antal observationer, hvor RDSE ofte er nd-vendigt for at fjerne redundante signal egenskaber og for at undg over-tilpasning af klassifikatren. Litteratur, som omhandler RDSE, fokuserprimrt p RDSEs evne til at forbedre klassificeringen. RDSEs evne tilat forbedre robusthed inden for mnstergenkendelse bliver derimod ofteoverset i litteraturen. Dette er til trods for at litteraturen rapportererstor variation i biologiske signaler optaget over forskellige dage/sessioner.Formlet med dette projekt var at undersge hvilke RDSE-metoder,der medvirker til den mest robuste klassificering over flere dage. Detteblev undersgt ved at analysere otte RDSE-metoder p to forskelligedatast: 1) Elektromyografi (EMG)-data optaget over 3 dage, hvor otteforsgspersoner udfrte syv forskellige hndbevgelser, 2) Elektroence-falografi (EEG)-data optaget over 7 dage, hvor syv forsgspersoner ud-frte to forskellige dorsalflektioner. Efter filtrering og segmentering afdatasttene, blev henholdsvis 90 og 72 signal egenskaber udtrukketaf EMG og EEG-datasttet. Dimension af signal egenskaberne blevherefter reduceret med flgende otte RDSE-metoder, som blev udvalgtp baggrund af en litteratur gennemgang:
Principal component analysis (PCA) Fisher discriminant analysis (FDA) Kernel principal component analysis (KPCA) Nonparametric discriminant analysis (NDA) Independent component analysis (ICA) Nonparametric weighted feature extraction (NWFE) Neighbourhood components analysis (NCA) Maximally collapsing metric learning (MCML)
RDSE-metoder blev evalueret med klassifikatrerne linear discriminantanalysis (LDA) og Support vector machine (SVM). Robustheden afRDSE-metoderne blev evalueret gennem to senarier, t senarie hvorRDSE-projektionen og klassifikatrerne blev trnet hver dag, og tsenarie hvor RDSE-projektionen og klassifikatrerne kun blev trnet denfrste dag.Resultaterne for EMG, viste at NCA havde en hj klassifiseringsn-
7
-
8
jagtighed og var den mest robuste RDSE-metode, for begge senarier aftrning. Resultaterne for EMG, viste at KPCA havde den hjest klas-sifiseringsnjagtighed og var blandt de mest robuste RDSE-metoder, forbegge senarier af trning.Det kan gennem dette projekt konkluderes, at RDSE-metoder kanforbedre robustheden. Ved implementering af RDSE-metoder i et klas-sifiseringsystem, anbefales det, at man tester forskellige RDSE-metoder,for at finde den metode der passer det givne signal bedst.
-
Article
-
Investigation of feature reduction methods forimproving EMG and EEG pattern recognition
robustnessAuthor: Astrid Clausen Nrgaard
ARobustness of pattern recognition receives little attention in literature dealing with feature reduction, despite thatcurrent literature reports inconsistency and day-to-day / session-to-session variation in biomedical signals. This articleaims to investigate the robustness of eight feature reduction methods for data recorded over multiple days. The featurereduction methods were tested on two dataset: 1) Electromyography (EMG)-data recorded during three days, whereeight subjects performed seven dierent hand movements and 2) Electroencephalography (EEG)-data recorded duringseven days, where seven subjects performed two dierent dorsiexions. The results show that features reduction has agreat impact on the performance and robustness of EMG and EEG classication. For EMG, Nonparametric discriminantanalysis (NDA) showed high classication accuracies and was the most robust feature reduction method. For EEG,Kernel principal component analysis (KPCA) showed the highest classication accuracies and was among the mostrobust feature reduction methods. In conclusion, feature reduction must be included, when designing a classicationsystem that is robust over time, but it is recommended to test the dierent methods for feature reduction, to nd themethod that ts the given data the best.
Keywords: Feature reduction Dimension reduction Robustness EEG EMG LDA SVM Pattern recognition
1 IFeature reduction is an essential step in biomedical pat-tern recognition [19, 17]. Clinical studies are often ham-pered by a large number of features and low numberof observations, also known as the curse of dimensional-ity. Therefore reduction of features is often necessaryto remove redundant features and to avoid overtting.Furthermore it has been shown that feature reductioncan improve the classication accuracy, when comparingwith no feature reduction [19].Some of the most commonly used feature reductionmethods for biomedical signals includes Principal Com-ponent Analysis (PCA) and Fishers Discriminant Analy-sis (FDA). These methods are widely tested in the litera-ture, and are often used as a benchmark when testing anew method for feature reduction [24, 18, 7]. Current lit-erature dealing with feature reduction mainly focuses onfeature reductions ability to improve the classicationaccuracy [24]. However, when dealing with EMG classi-cation, the following three properties has been suggestedto ensure a high quality feature space [2, 22, 5, 4, 28, 16]:
1. Maximum class separability: A high quality featurespace should have maximum class separability orminimum overlap, to ensures high classicationaccuracy.
2. Robustness: A high quality feature space should beable to adapt time-varying changes.
3. Complexity: The computational complexity of thefeature space should be kept low.
Maximum class separability and complexity are well stud-ied in current literature [24, 3, 29]. However, robustnessreceives little attention in literature dealing with featurereduction. This is despite that current literature reportsinconsistency and day-to-day / session-to-session vari-ation in biological signals [27, 1]. An investigation ofhow well feature reduction methods can handle theseinconsistency, and make the classication more robust istherefore needed [1].
1.1 Related Work
Literature dealing with EMG-classication is often bro-ken down in the three signal processing components:feature extraction, dimensionality reduction and classi-cation. Studies by Kaufmann et al. [12] and Phinyomarket al. [23] have investigated robustness for classier andfeature extraction respectively.Kaufmann et al. recorded EMG data during 21 days. Fivedierent classiers (k-nearest-neighbor, linear discrimi-nant analysis, decision trees, articial neural networksand support vector machines) were compared when clas-sifying ten dierent hand movements. The results show
11
-
that the classication accuracies gradually decrease dur-ing the 21 days, if the classier was not retrained withcurrent data. However, LDA only dropped 3.6 % duringthe 21 days and was found to be the most robust classier[12].Phinyomark et al. [23] used the same EMG data as Kauf-mann et al. to investigate the robustness of 50 time-domain and frequency-domain features. Sample entropywas the most robust feature and showed a classicationaccuracy at 93.37 % when the classier (LDA) was notretrained with current data. This was only 2.45 % lowercompared to when the classier was retrained [23].Studies dealing with dimensionality reduction and ro-bustness, are not investigated in any current literature.
1.2 Aim
The aim of this study is to investigate which of eightfeature reduction method that produces the most robustperformance. This will be studied though the followingfour objectives:
1. Investigate robustness of eight feature reductionmethods across multiple days, when the classierand feature projection are retrained.
2. Investigate robustness of eight feature reductionmethods across multiple days, when the classierand feature projection are not retrained.
3. Investigate the general performance and the robust-ness eight feature reduction methods compared tothe original feature space.
4. Investigate the number of features needed for thefeature reduction methods to show the highest per-formance.
The aim will be investigated by use of two datasets:
EMG-data from three dierent days, where eightsubjects performed seven dierent hand move-ments.
EEG-data from seven dierent days, where sevensubjects performed two dierent dorsiexions.
2 M2.1 EMG experiment
The experiment was performed over three separate dayswith two and four days in between.
SubjectsThe EMG data were collected from eight healthyvolunteers (three women and ve men) with mean ageof 25 1 year. All subjects were right handed and noneof the subjects had any known sensory-motor decits.All subjects gave their written informed consent toparticipate in the study.
RecordingThe EMG signals were recorded with an analog EMGamplier (AnEMG12, OT Bioelettronica, Italy) at afrequency of 2 kHz. The signals were digitalized usinga 16-bit ADC and recorded by the software, Mr. Kick(Knud Larsen, SMI, Aalborg University).
Experimental procedureAfter preparation with electrode gel, one pair ofAg/AgCl surface electrodes (Ambu Neuroline 720) wasplaced on the following ve positions:
1. The pronator teres muscle2. The exor digitorum supercialis muscle and exor
carpi radialis muscle3. The exor carpi ulnaris muscle4. The extensor digitorum muscle5. The extensor carpi ulnaris muscle and extensor
carpi radialis muscle
The positions of the electrodes is shown on Figure 1.Furthermore, a wrist-band was placed around the
!!
Figure 1: The position of the electrodes.
subjects wrist as reference electrode. Data was recordedduring a steady-state medium contraction with theright hand of the following seven hand movements:hand closing (HC), hand opening (HO), wrist exion(WF), wrist extension (WE), wrist supination (WS), wristpronation(WP) and pinch grip (PG). Further data wasrecorded during rest. The total number of classes wasthereby eight. The hand movements are shown onFigure 2. Each movement was performed four times.
The position of the electrodes was marked aftereach session, to ensure identical placement of theelectrodes at each day.
12
-
! HC HO WF WE WS WP PG!!!!!!!!!
Figure 2: Hand movements: hand closing (HC), hand opening (HO), wrist exion (WF), wrist extension (WE), wrist supination (WS), wristpronation (WP) and pinch grip (PG). Selection of hand movements are inspired by [23]
2.2 EEG experiment
In order to make a more general conclusion of robustnessof the feature reduction methods, an EEG dataset wasanalysed in this study. The experiment was performedtwo times per week for four weeks and one session atweek eight. This makes it possible to analyse robustnessfor an extended period of time. The EEG experimentwas conducted by Rasmus Wiberg Nedergaard [20] anddata was used with permission from Rasmus WibergNedergaard. Further description of the experiment canbe seen in his master thesis [20].
SubjectsThe EEG data were collected from seven healthyvolunteers (one woman and six men), with mean ageof 26 1 year. None of the subjects had any knownneurological disorders and disorders of their right footor ankle. All subjects gave their written informedconsent before participation [20].
RecordingThe EEG signals were recorded with an EEG ampli-er (Nuamps Express, Neuroscan) and a 32 channelQuick-Cap (Neuroscan) at a frequency of 500 Hz. Thesignals were digitally converted with 32 bits accuracy.Furthermore, force was sampled with 2000 Hz from aforce transducer mounted on a foot pedal, and displayedby the software Mr. Kick (Knud Larsen, SMI, AalborgUniversity) [20].
Experimental procedureThe electrodes were placed at F3, F4, C3, C4, Cz, P3, P4and Pz according to the 10-20 system [20]. A referencewas placed on the right mastoid bone and the groundelectrode was placed at the nasion. The impedance ofthe electrodes were kept below 5 k. Three MVC forcesof a dorsiexion of the right ankle was initially recordedat each session. The subjects performed two kind ofdorsiexions at force of 20 % of the highest MVC. 1) afast movement, reaching the target force after 0.5 s, and2) a slow movement, reaching the target force after 3 s.This study used the part of the experiment, where thesubjects performed 2 x 30 movements of fast and slow
dorsiexions in randomised order. A trigger was sent atthe beginning of each movement to be able to split thecontinuous recording into epochs.Data from week 2 was excluded in this study dueto technical errors in the recordings. Data from theremaining seven days was included in the study [20].
2.3 Data analysis
Preprossesing of EMG dataThe data was bandpass ltered using a fourth orderButterworth bandpass lter with cut-o frequencies at20 and 400 Hz. Furthermore, the data was ltered with anarrow notch bandstop to remove 50-Hz noise. This wasfollowed by a windowing with a segment length of 250ms with an overlap of 150 ms.
Feature extraction of EMG dataThe following features were extracted from the lteredEMG data from all ve channels:
1. Mean Absolute Value2. Wilson Amplitude, threshold = 10mV3. Zero Crossing, threshold = 10mV4. Slope Sign Changes, threshold = 16mV5. Variance Of EMG6. Wave Length7. Root Mean Square8. Mean Frequence9. Mean Power10. Median Frequence11. 6 Autoregressive coecients, order = 612. Sample Entropy, m = 2, r = 0.2 113. Approximate Entropy m = 2, r = 0.2 2
The dimension of the original feature space was thereby90.1-11 are all common used features in EMG classication[21, 23]. Sample entropy and approximate entropy areextracted due to their robustness found in the study byPhinyomark et al. [23].The value of the parameters are based on the suggestion
1m=embedded dimension, r=tolerance. See worksheets, chapter 42m=embedded dimension, r=tolerance. See worksheets, chapter 4
13
-
in the literature [23].More information about the extracted EMG features canbe found in the worksheets in chapter 4.
Preprossesing of EEG dataThe data was bandpass ltered using a fourth orderButterworth bandpass lter with cut-o frequencies at0.5 and 5 Hz. The signals were split into epochs with asegment length of 5 s via the trigger.
Feature extraction of EEG dataFeatures was extracted from the movement-relatedcortical potentials (MRCP), which is a low frequencynegative shifts, that are associated with planningand execution of a voluntary movement [10]. Themean MRCP for the two movements, across channelsand subject, can be seen on Figure 3. Time domainfeatures like mean, max, slope and intersection are oftenextracted from the MRCP [10]. The chosen features arebased on visual inspection of Figure 3.
Time [s]-3 -2 -1 0 1 2
Ampl
itud
e [7V]
-12
-10
-8
-6
-4
-2
0
2
4
6SlowFast
Figure 3: Mean MRCP from two movements, slow and fast. 0 s isthe time of the movement onset.
The following features were extracted from the lteredEEG data from all nine channels:
1. Mean amplitude from -0.5 s to 0.5 s2. Mean amplitude from -1 s to 0 s3. Point of maximum negativity4. Maximum negativity5. Slope of a linear regression from -1 s to 0 s6. Intersection of a linear regression from -1 s to 0 s7. Slope of a linear regression from 0 s to 1 s8. Intersection of a linear regression from 0 s to 1 s
The dimension of the original feature space was thereby72.
2.4 Feature reduction
The following nine methods were used to reduce thefeatures. The selection of methods is based on the
literature review in the worksheets (chapter 2). Theworksheets further contain a mathematical approach ofthe used methods (chapter 3).
PCAPrincipal Component Analysis (PCA) is one of the mostpopular methods for dimensionality reduction [18]. PCAseeks to maximise the variance in the data by mappingthe data into a linear subspace, containing the principalcomponents. The rst principal components describesthe most variance in the data, and so forth, and is foundon the basis of the eigenvectors and eigenvalues [6].
FDAFisher discriminant analysis (FDA) is another popularmethod for feature reduction. FDA is a supervisedmethod that seeks to maximise the between-class scattermatrix, and minimise the within-class scatter matrix [6].
KPCAKernel principal component analysis (KPCA) is a variantof PCA that uses a nonlinear kernel function, ratherthan the original linear function, before nding theeigenvectors and the eigenvalues of the kernel matrix[8]. Dierent kernels was tested on both datasets, and aGaussian radial basis function ( = 30) showed the bestresults on average, and was therefore implemented.
NDANonparametric discriminant analysis (NDA) is similarto FDA as it also relies on the scatter matrixes. NDAhowever denes a nonparametric between-class scattermatrix [11].
ICAIndependent Component Analysis (ICA) is a blindsource separation technique, that separates a datasetinto independent, non-Gaussian subcomponents [19, 3].The goal of ICA is to nd the features that are mostindependent from each other [8] and is in this studyimplemented by the FastICA method.
NWFENonparametric Weighted Feature Extraction (NWFE) isa feature reduction method, which idea is to computethe weighted mean by weighing every sample dierently.On the basis of the weighted mean the nonparametricbetween-class and within-class scatter matrices aredened [13].
NCANeighborhood Components Analysis (NCA) is asupervised method, that seeks to nd a Mahalanobisdistance metric for k-nearest-neighbours (kNN) thatoptimizes the leave-one-out error on the training set[17]. The optimization problem are non-convex andrelies on a gradient based iterative algorithm,
14
-
MCMLMaximally Collapsing Metric Learning (MCML) isa supervised method, which is similar to NCA, andalso relies on the Mahalanobis distance metric fork-nearest neighbours. MCML diers from NCA, as theoptimization problem is convex for MCML [7].
2.5 Evaluation of feature space
The features space was evaluated by calculating theclassication accuracies using Linear DiscriminantAnalysis (LDA) and Support Vector Machines (SVM).
LDALDA was chosen because of its robustness found byKaufmann et al. [12]. Additionally LDA is a simpleclassier, that is computationally ecient, and it doesnot require any adjustment of parameter [23]. However,LDA is limited when the number of features are highcompared to the number of samples, often referred to aslarge p small n [25]. Hence, it is not possible to evaluatethe full original feature space with LDA, and LDA willonly be tested with the projected features.
SVMSVM are well known for being able to deal with a highdimensional feature space [24]. Furthermore, it wasfound to be the second most robust classier in the studyby Kaufmann et al. [12]. SVM, one-against-all with alinear kernel, will be used to evaluate the full originalfeature space as well as the projected feature space.
2.6 Evaluation of robustness
The measurement for robustness was chosen to be thestandard deviation between days, as suggested in [9].The robustness across the multiple days was tested withand without retraining of the classier and the featurereduction projection.
Retraining:Four-fold cross-validation was used, when the classierand the feature reduction projection was retrained. Thedata was partitioned into four equal folds, e.g. the EMGdata was split into four folds containing eight sampleseach. In each fold all eight classes was represented.Three of the four fold was used as training data, andthe cross-validation process was performed by testingall four dierent combinations of training and test datafor each of the four folds. The classication and thefeature reduction projection were therefore performedfour times with four dierent combinations of trainingand test data. The mean of the four classication errorswas presented.
No retraining:When testing the robustness without retaining of theclassier and the feature reduction projection, day 1
acted as training data, and day 2 and day 3 etc. acted astest data.
2.7 Evaluation of number of features
The number of features was optimised, so that thenumber of features that gave the highest classicationaccuracy, was chosen. This was chosen rather thanusing a xed number of features, e.g. based on a certainpercentage of the explained variability for PCA, as thismay not necessarily show the highest classicationaccuracy.
Statistics:To test if the results were robust across days, the non-parametric Friedman test was used, as the assumptionsfor ANOVA (equal variance and sphericity) was notmet for all data. For p-values below 0,05 a BonferroniPost Hoc test was used. Three Friedman tests wereperformed, with the following three aims:
Test for dierences between the days. Test for dierences between the feature reductionmethods.
Test for dierences between retrain and no retrain.
3 R3.1 EMG
RetrainThe average EMG classication accuracies for LDA andSVM across subjects, are presented in Table 1. These re-sults are obtained by using the retrained data and 4-foldcross validation. On average, it is seen that NDA showsthe highest classication accuracy when classifying withLDA. It is also seen that NDA is the most robust method,with the lowest standard deviation. PCA also shows thehigh classication accuracy when classifying with bothLDA and SVM, but the robustness of PCA is lower thanmost of the other methods.It should also be noted that SVM shows a lower classi-cation accuracies for all methods and also shows a lowerrobustness.The original feature space was only outperformed bythree of the feature reduction methods (PCA, NWFE andNDA) when classifying with SVM.A signicant change was found between the days forICA classied with SVM. The post hoc test showed astatistical signicant dierence between day 2 and day 3(p = 0.03).Furthermore, a signicant dierence between the meth-ods was found, where p was < 0.01 for both LDA andSVM. The post hoc test showed the following statisticalsignicant dierences:
ICA was signicant dierent from NCA when clas-sifying with LDA (p = 0.02)
15
-
ICA was signicant dierent from NCA when clas-sifying with SVM (p = 0.03)
ICA was signicant dierent from PCA when clas-sifying with SVM (p = 0.02)
Table 1: Mean classication accuracies across subjects, with re-trained EMG data.
LDA
Method Day 1 Day 2 Day 3 Mean std p
PCA 89.8 95.7 94.5 93.4 3.1 0.09FDA 68.0 75.4 73.0 72.1 3.8 0.35KPCA 87.5 85.5 83.6 85.5 2.0 0.27NDA 93.4 95.3 94.9 94.5 1.0 0.84ICA 44.1 45.3 42.6 44.0 1.4 0.37NWFE 79.3 84.8 81.3 81.8 2.8 0.38NCA 87.9 90.2 89.5 89.2 1.2 0.76MCML 89.1 92.2 89.1 90.1 1.8 0.26
SVM
Method Day 1 Day 2 Day 3 Mean std p
PCA 83.2 78.5 75.8 79.2 3.8 0.34FDA 58.2 59.8 62.9 60.3 2.4 0.61KPCA 55.5 56.3 49.2 53.6 3.9 0.64NDA 79.3 79.7 73.8 77.6 3.3 0.24ICA 26.2 22.7 28.5 25.8 2.9 *0.03NWFE 77.0 72.7 70.7 73.4 3.2 0.30NCA 65.6 64.8 60.5 63.7 2.7 0.52MCML 68.8 64.5 61.7 65.0 3.5 0.42
Orig. feat. 69.9 68.8 64.5 67.7 2.9 0.39
No retrainTable 2 shows the average EMG classication accuraciesacross subjects when LDA and SVM and the featureprojection was not retained. NWFE shows an averageclassication accuracy at 94.3 % when classifying withLDA. The classication accuracies obtained by SVM areagain lower than for LDA, and the highest classicationaccuracy for SVM is NDA (77.7 %). NDA also shows avery robust performance, both when classifying withLDA and SVM.Beside ICA, all feature reduction methods show higherclassication accuracies than the original feature space.The feature reduction methods also improved therobustness.In the statistical tests, no signicant changes were foundbetween the days.Furthermore, no signicant dierence between themethods was found when classifying with LDA (p=0.06).However, a signicant dierence between the methodswas found, when classifying with SVM (p=0.04). Thepost hoc test showed a statistical signicant dierencebetween ICA and NDA (p = 0.05).
Table 2: Mean classication accuracies across subjects, where day1 acted as training data, and day 2 and 3 as test data.
LDA
Method Day 2 Day 3 Mean std p
PCA 87.9 85.2 86.5 1.9 0.41FDA 78.5 72.7 75.6 4.1 0.16KPCA 88.7 81.6 85.2 5.0 0.06NDA 89.1 89.1 89.1 0.0 0.56ICA 45.3 39.8 42.6 3.9 0.48NWFE 96.5 92.2 94.3 3.0 0.32NCA 87.5 81.6 84.6 4.1 0.41MCML 85.9 83.6 84.8 1.7 0.65
SVM
Method Day 2 Day 3 Mean std p
PCA 75.8 74.2 75.0 1.1 0.41FDA 54.7 58.6 56.6 2.8 0.10KPCA 73.4 69.5 71.5 2.8 0.48NDA 78.1 77.3 77.7 0.6 0.32ICA 41.0 35.5 38.3 3.9 0.10NWFE 64.8 57.0 60.9 5.5 0.18NCA 77.3 73.4 75.4 2.8 0.48MCML 77.3 75.4 76.4 1.4 0.48
Org. feat. 60.9 50.8 55.9 7.2 0.16
Retrain vs. no retrainThe mean dierences between retrain and no retrainare seen in Table 3. A negative value indicates lowerperformance when not being retrained and vice versa.The results from the statistical tests show that mostmethods have a signicant dierences between retrainand no retrain. Some of the methods showed a higherperformance when not being retrained, e.g. NWFE withLDA as classier showed 11.3% higher classicationaccuracy, than when being retrained.
Table 3: Mean dierence between retrain and no retrain, EMG
Method LDA SVM
Mean di.betw. retrain& no retrain
pMean di.
betw. retrain& no retrain
p
PCA -8.6 *0.01 -2.1 0.26FDA 1.4 0.71 -4.7 0.06KPCA 0.6 1.00 18.8 *0.03NDA -6.1 0.10 1.0 0.48ICA -1.4 1.00 12.7 *0.00NWFE 11.3 *0.00 -10.7 *0.01NCA -5.3 0.06 12.7 *0.03MCML -5.9 *0.03 13.3 *0.01
Org. feat. -10.7 0.16
Number of featuresTable 4 shows the median of required features to obtainthe highest EMG classication accuracies across subjects.NWFE is the method that requires the lowest numberof features, both for LDA and SVM, and when beingretained and not retrained. The number of features usingNWFE are reduced from 90 features to 6 and 7.
16
-
Table 4: Median of the required features across subjects to obtainthe presented EMG classication accuracies.
Method LDA SVMRetrain No retrain Retrain No retrain
PCA 6 9 6 8FDA 6 6 10 17KPCA 9 12 6 20NDA 6 9 6 13ICA 10 19 6 27NWFE 6 6 6 7NCA 7 8 7 17MCML 7 13 7 20
3.2 EEG
RetrainThe average EEG classication accuracies across subjects,for LDA and SVM are presented in Table 5. On average,KPCA shows the highest classication accuracy for bothLDA and SVM. KPCA also shows a robust performance,as the standard diviations are among the lowest.The original feature space is only outperformed by KPCAand NWFE.In the statistical tests, signicant changes between thedays were found at three of the tests. The post hoc testshowed the following statistical signicant dierences:
NCA using SVM: week 1 - day 2 and week 4 - day 1(p = 0.05)
MCML using SVM: week 3 - day 2 and week 4 - day2 (p = 0.05)
NDA using SVM: week 1 - day 2 and week 3 - day 2(p = 0.04)
Furthermore, a signicant dierence between the meth-ods was found, where p was < 0.01 for both LDA andSVM. The post hoc test showed the following statisticalsignicant dierences:
KPCA was signicant dierent from FDA (p=0.02),PCA (p=0.04) and ICA(p=0.02) when classifyingwith LDA.
KPCA was signicant dierent from FDA (p=0.01),ICA (p=0.01) and MCML (p=0.05) when classifyingwith SVM.
FDA was signicant dierent from KPCA (p=0.01),NWFE (p=0.05) and orig. feat. (p=0.05) when classi-fying with SVM.
No retrainTable 6 shows the average EEG classication accuracies,when LDA and SVM and the feature projection was notretained. KPCA shows the best average classicationaccuracy and the most robust performance, for bothLDA and SVM. All methods, expect ICA, show a higheraverage classication accuracy than the original featurespace.In the statistical tests, signicant changes between thedays were found at three of the test. The post hoc testshowed the following statistical signicant dierences:
NCA using LDA: week 4 - day 1 and week 8 (p =0.04)
NWFE using SVM: week 1 - day 2 and week 8 (p =0.03)
NCA using SVM: week 4 - day 1 and week 8 (p =0.05)
MCML using SVM: week 4 - day 1 and week 8 (p =0.02)
Furthermore, a signicant dierence between the meth-ods was found, where p was < 0.01 for both LDA andSVM. The post hoc test showed the following statisticalsignicant dierences:
KPCA was signicant dierent from FDA (p=0.01),ICA (p=0.02) and NWFE (p=0.03) when classifyingwith LDA.
PCA was signicant dierent from FDA (p=0.03),ICA (p=0.04) when classifying with LDA.
KPCA was signicant dierent from ICA (p=0.02)and orig. feat. (p=0.03) when classifying with SVM.
PCA was signicant dierent from ICA (p=0.05) andorig. feat. (p=0.05) when classifying with SVM.
Retrain vs. no retrainThe mean dierences between retrain and no retrainare seen in Table 7. A negative value indicates lowerperformance when not being retrained and vice versa.The results from the statistical tests show that non of themethods have a signicant dierences between retrainand no retrain.
Table 7: Mean dierence between retrain and no retrain, EEG
Method LDA SVM
Mean di.betw. retrain& no retrain
pMean di.
betw. retrain& no retrain
p
PCA 1.2 0.71 1.2 0.71FDA 2.1 0.71 1.4 0.71KPCA -4.7 0.06 -3.3 0.26NCA -2.4 0.26 -1.3 0.06ICA -1.1 0.71 0.4 0.71NWFE -5.0 0.06 -4.4 0.26MCML -2.0 0.26 -0.6 0.71NDA -0.6 0.71 -0.1 0.26
Orig. feat. -6.8 0.26
Number of featuresTable 8 shows the median of required features to ob-tain the highest EEG classication accuracies across sub-jects. Similar to the results found for EMG, NWFE is themethod that on average requires the lowest number offeatures.
17
-
Table 5: Mean classication accuracies across subjects, with retrained EEG data.
LDA
Method Week 1Day 1
Week 1Day 2
Week 3Day 1
Week 3Day 2
Week 4Day 1
Week 4Day 2
Week 8Day 1
Mean std p
PCA 64.9 72.3 65.8 62.9 65.2 71.2 72.4 67.8 4.0 0.11FDA 63.7 64.8 67.1 62.7 62.9 62.9 65.3 64.2 1.6 0.42KPCA 74.4 83.1 75.4 76.3 74.8 77.2 77.3 76.9 2.9 0.62NDA 65.7 74.5 64.6 65.6 64.8 73.5 71.2 68.6 4.3 0.06ICA 64.9 68.0 62.9 66.3 63.5 64.0 65.8 65.1 1.8 0.52NWFE 65.9 75.8 66.6 66.8 71.2 69.3 68.4 69.2 3.5 0.51NCA 65.4 72.7 67.2 62.9 67.1 71.5 72.0 68.4 3.7 0.06MCML 65.2 73.5 66.3 66.1 63.3 71.8 71.6 68.2 4.0 0.07
SVM
Method Week 1Day 1
Week 1Day 2
Week 3Day 1
Week 3Day 2
Week 4Day 1
Week 4Day 2
Week 8Day 1
Mean std p
PCA 64.5 71.8 67.1 62.0 64.2 71.8 70.9 67.5 4.1 0.19FDA 59.1 64.7 64.5 63.3 62.8 64.0 67.0 63.5 2.4 0.42KPCA 73.2 82.0 73.7 75.5 73.5 75.9 74.6 75.5 3.1 0.31NDA 65.7 74.3 65.0 63.8 64.4 72.2 69.6 67.8 4.2 *0.02ICA 64.2 63.1 61.4 64.1 64.5 64.0 63.4 63.5 1.0 0.81NWFE 67.8 75.9 67.5 68.8 73.0 71.5 69.0 70.5 3.1 0.63NCA 64.4 73.3 65.3 63.9 63.5 69.1 69.3 67.0 3.7 *0.02MCML 63.6 71.4 63.7 62.0 63.2 73.0 71.4 66.9 4.8 *0.01
Orig. feat. 66.8 78.4 68.5 68.7 69.3 70.1 70.8 70.4 3.8 0.11
Table 6: Mean classication accuracies across subjects, where day 1 acted as training data, and day 2, day 3 ect. as test data.
LDA
Method Week 1Day 2
Week 3Day 1
Week 3Day 2
Week 4Day 1
Week 4Day 2
Week 8Day 1
Mean std p
PCA 74.3 68.2 71.5 65.4 68.5 69.0 69.5 3.1 0.07FDA 67.7 61.1 62.4 64.1 62.9 62.8 63.5 2.3 0.67KPCA 72.3 72.4 74.4 71.1 73.5 72.5 72.7 1.1 0.86NDA 72.4 68.0 69.4 65.0 67.2 68.5 68.4 2.4 0.74ICA 68.5 62.2 63.8 62.7 61.2 65.5 64.0 2.7 0.30NWFE 70.2 62.5 64.6 63.4 65.1 62.6 64.7 2.9 0.13NCA 68.9 65.7 63.1 63.9 66.7 70.7 66.5 2.9 *0.02MCML 68.8 65.3 64.6 66.1 65.6 70.3 66.8 2.3 0.19
SVM
Method Week 1Day 2
Week 3Day 1
Week 3Day 2
Week 4Day 1
Week 4Day 2
Week 8Day 1
Mean std p
PCA 72.9 69.0 67.6 65.9 68.6 71.1 69.2 2.5 0.14FDA 72.1 64.5 63.8 63.8 63.8 66.9 65.8 3.3 0.58KPCA 74.4 72.8 73.2 69.2 74.2 71.6 72.5 1.9 0.39NDA 71.5 67.6 66.9 65.7 68.2 68.9 68.1 2.0 0.30ICA 66.7 61.5 64.3 62.9 61.5 65.9 63.8 2.2 0.21NWFE 70.9 64.6 67.0 64.8 67.5 64.6 66.6 2.5 *0.03NCA 66.7 65.4 63.1 63.6 68.0 69.9 66.1 2.6 *0.01MCML 69.8 67.2 62.9 63.8 65.9 71.3 66.8 3.3 *0.03
Orig. feat. 69.3 63.5 62.8 60.9 65.1 63.3 64.1 2.8 0.13
. .
18
-
Table 8: Median of the required features across subjects to obtainthe presented EEG classication accuracies.
Method LDA SVMRetrain No retrain Retrain No retrain
PCA 12 22 12 23FDA 13 20 10 20KPCA 16 17 12 18ICA 13 18 11 15NWFE 2 7 9 16NCA 14 13 11 18MCML 16 18 13 16NCA 15 22 10 21
4 DThe four aforementioned objectives will be discussedalong with a discussion of the methodology.
1. Robustness of feature reduction methods, retrainFor EMG, NDA showed the highest and most robustclassication accuracy obtained by LDA. This is similarto results found in previous literature, where NDAshowed to outperform e.g. FDA and PCA [15, 11]. Thehigh robustness for NDA has however never beenreported before.Also PCA showed high classication accuracy for bothLDA and SVM. This was unexpected, as several studiesreport that PCA shows lower classication accuracywhen comparing to e.g. FDA, NWFE and ICA [5, 28, 24].For EEG the results dier from EMG, as KPCA showedhigh and robust classication accuracy, and KPCA wassignicant dierent from many of the other methods.It is seen from the results, that the choice of featurereduction, can be a trade o between robustness andclassication accuracy. For instance, PCA showed one ofthe high average performances for both EMG and EEG,but also showed a poor robustness. A general conclusionof which feature reduction method that shows the mostrobust classication accuracy when retraining cannot bebe drawn.
2. Robustness of feature reduction methods, no retrainFor EMG, NWFE showed a high classication accuracyat 94.3 % when classifying with LDA. This is 11.3 %higher compared to the retrain-test. Compared to studiesthat investigated robustness of EMG classication, dropsof 3.6 % for the most robust classier and 2.45 % for themost robust feature are reported [12, 23]. These studieswas however recorded during 21 days.NWFE was not the most robust feature reduction withinthe no retrain-test, but is still within an acceptable range.NDA was the most robust feature reduction method, justlike it also was seen in the retrain-test.For EEG, KPCA showed the highest classicationaccuracy for both LDA and SVM (76.9 % and 75.5 %).This is 4.7 % and 3.3 % lower than when comparing tothe retrain-test, which is quite similar to the resultsfound in [12, 23].
KPCA was also found to be one of the most robustfeature reduction methods within the no retrain-test.When testing over multiple days, studies report that re-training sessions can be necessary each day to overcometime variations in the signals [14, 26]. In this study,many of the methods showed no signicant dierencebetween the retrain-test and the no retrain-test. Theresults found in this study, thereby indicates, thatthis retraining session might not be necessary, if anappropriate feature reduction method is used.
3. Robustness and performance of feature reductionmethods compared to the original feature spaceFor EMG, retrain-test, PCA, NWFE and NDA showedbetter performance than the original features, but didnot tend to improve the robustness.For EMG, no retrain-test, all feature reduction methods,except ICA, showed higher performance than theoriginal feature space. Also, the robustness for the noretrain-test, was improved for all feature reductionmethods.For EEG, most feature reduction methods did not showto have the same positive impact on the results, whencomparing to the original feature space. A signicantdierence between the original features and KPCA forthe no retrain-test was however found.
4. DimensionIt was found that NWFE needed the fewest number offeatures for both EMG and EEG. The number of featuresfor NWFE was reduced from 90 to 6-7 features for EMG,and from 72 to 2-16 for EEG.These results are similar to the results found in previousliterature [16]. It was found that NWFE needed thelowest number of features compared to PCA and FDA[16]. NWFE might thereby be able to overcome the curseof dimensionality-phenomena.
5. MethodologyIt should be considered if an experiment recorded overthree days are enough to evaluate the robustness. Ideally,the EMG experiment should have been recorded overan extended period of time, to draw a more certainconclusion about the robustness.Also, the methodology for the no retrain-test should beconsidered. Four-fold cross validation was not applied inthe no retrain-test. The number for training samples fore.g. EMG was therefore increased from 24 to 32. Thismight be the reason for big dierences between theretrain-test and no-retrain test, e.g. for NWFE whichshowed an increase of 11.3 % in the no retrain-test forEMG.
6. ConclusionThis study was the rst of its kind to investigaterobustness of feature reduction methods. The aim ofthis study was to investigate eight feature reductionmethods and their ability to produces robust per-
19
-
formance. Feature reduction shows to have a greatimpact on the performance and robustness of EMGand EEG classication. For EMG, NDA showed highclassication accuracies and was the most robust featurereduction methods. For EEG, KPCA showed the highestclassication accuracies and was among the most robustfeature reduction methods.In order to make a classication system that is robustover time and can adapt time-varying changes, featurereduction must be included. However, it is recommendedto test the dierent methods for feature reduction, tond the method that ts the given data best, as theresults were highly dependent of the signal and theclassier.
R[1] AZ , O. , F , D. , DM , S . ,
C , R . A. Aect detection from non-stationary physiological data using ensemble classi-ers. Evolving Systems (2014), 114.
[2] B , R . , M , M. H. Evalu-ation of the forearm emg signal features for thecontrol of a prosthetic hand. Physiological measure-ment 24, 2 (2003), 309.
[3] C , L . , C , K . , C , W. , L , H. , G , Q. A comparison of pca, kpca and ica for di-mensionality reduction in support vector machine.Neurocomputing 55, 1 (2003), 321336.
[4] C , A. D. , G , G. C. Myoelectriccontrol development toolbox. In Proceedings of 30thConference of the Canadian Medical & BiologicalEngineering Society (2007), vol. 1, pp. M01001.
[5] C , J .U. , M , I . , L , Y. J . , K , S .K. , M , M.S. A supervised feature-projection-based real-time emg pattern recognition for multi-function myoelectric hand control. Mechatronics,IEEE/ASME Transactions on 12, 3 (2007), 282290.
[6] G , D. , A , U. R. , M , R . J . ,S , S . V. , L , T.C. , A , T. , S , J . S . Automated diagnosis of coronaryartery disease aected patients using lda, pca, icaand discrete wavelet transform. Knowledge-BasedSystems 37 (2013), 274282.
[7] G , A. , R , S . T. Metriclearning by collapsing classes. In Advances in neuralinformation processing systems (2005), pp. 451458.
[8] H , K . , V , M. , M,R . F. Feature reduction for improved recognitionof subcellular location patterns in uorescence mi-croscope images. In Biomedical Optics 2003 (2003),International Society for Optics and Photonics,pp. 307318.
[9] J , Y. , S , B . Trade-o betweenperformance and robustness: an evolutionary mul-tiobjective approach. 237251.
[10] J , M. , N , I . K . , MK , N. , F , D. , D ,K . Detection and classication of movement-related cortical potentials associated with task forceand speed. Journal of neural engineering 10, 5 (2013),056015.
[11] K , E . N. , S , E . J . , E, K . B. Nonparametric discriminant pro-jections for improved myoelectric classication. 6869.
[12] K , P. , E, K . , P , M. Fluctuating emg signals: Inves-tigating long-term eects of pattern matching al-gorithms. In Proceedings of the 32nd Annual In-ternational Conference of the IEEE Engineering inMedicine and Biology Society (2010), pp. 63576360.
[13] K , B .C. , L , D. A. Nonpara-metric weighted feature extraction for classication.Geoscience and Remote Sensing, IEEE Transactionson 42, 5 (2004), 10961105.
[14] L , X . , B , R . , G , P. , T , D. E . Ultra-low-power biomedical cir-cuit design and optimization: Catching the dontcares. In Integrated Circuits (ISIC), 2014 14th Inter-national Symposium on (2014), IEEE, pp. 115118.
[15] L , Z . , L , D. , T , X . Nonparametricdiscriminant analysis for face recognition. PatternAnalysis and Machine Intelligence, IEEE Transactionson 31, 4 (2009), 755761.
[16] L , C .T. , L , K .L. , K , L .W., L , S .F. , K , B .C. , C , I .F. , . Nonpara-metric single-trial eeg feature extraction and clas-sication of drivers cognitive responses. EURASIPJournal on Advances in Signal Processing 2008, 1(2008), 849040.
[17] M, J . , Y , P. Neighbor-hood components analysis in semg signal dimen-sionality reduction for gait phase pattern recog-nition. In Broadband and Biomedical Communica-tions (IB2Com), 2011 6th International Conference on(2011), IEEE, pp. 8690.
[18] M , R . J . , A , U. R. , M ,L . C. Ecg beat classication using pca, lda, icaand discrete wavelet transform. Biomedical SignalProcessing and Control 8, 5 (2013), 437448.
[19] M , B . , T , T. S . , S , J . C .A review of feature reduction techniques in neu-roimaging. Neuroinformatics 12, 2 (2014), 229244.
20
-
[20] N , R . W. Online detection andclassication of movement-related cortical po-tentials on healthy volunteers and patients witha stroke aecting their motor cortex. http://projekter.aau.dk/projekter/files/
198493376/14gr1072bed_mmelse2014.pdf,2014, Master Thesis, Aalborg University.
[21] P , A. , L , C . , P, P. A novel feature extractionfor robust emg pattern recognition. arXiv preprintarXiv:0912.3973 (2009).
[22] P , A. , P, P. , L , C . Feature reduction and selec-tion for emg signal classication. Expert Systemswith Applications 39, 8 (2012), 74207431.
[23] P , A. , Q , F. , C , S . , S , C . , TB ,F. , L , Y. Emg feature evaluationfor improving myoelectric pattern recognition ro-bustness. Expert Systems with Applications 40, 12(2013), 48324840.
[24] S , A. , G, M. I . Eeg signalclassication using pca, ica, lda and support vectormachines. Expert Systems with Applications 37, 12(2010), 86598666.
[25] T , F. , P , W. Incorporating prior knowl-edge of gene functional groups into regularizeddiscriminant analysis of microarray data. Bioinfor-matics 23, 23 (2007), 31703177.
[26] W , W. , C , J . L . , D,A. D. , TK , E . C. , S ,A. B. , M , D. W. , W , D. J . ,W , B . , V , R . K. , A , R . C. , . An electrocorticographicbrain interface in an individual with tetraplegia.PloS one 8, 2 (2013), e55344.
[27] W , K . R. Device control using gesturessensed from emg. In Soft Computing in IndustrialApplications, 2003. SMCia/03. Proceedings of the 2003IEEE International Workshop on (2003), IEEE, pp. 2126.
[28] Y , P. , X , K . , H , J . , W ,Y. A novel feature reduction method for real-timeemg pattern recognition system. In Control and De-cision Conference (CCDC), 2013 25th Chinese (2013),IEEE, pp. 15001505.
[29] Z , J . , Y , P. L . , K , J . T. Bilinearprobabilistic principal component analysis. NeuralNetworks and Learning Systems, IEEE Transactionson 23, 3 (2012), 492503.
http://projekter.aau.dk/projekter/files/198493376/14gr1072bed_mmelse2014.pdfhttp://projekter.aau.dk/projekter/files/198493376/14gr1072bed_mmelse2014.pdfhttp://projekter.aau.dk/projekter/files/198493376/14gr1072bed_mmelse2014.pdf
-
Worksheets
-
Contents
Chapter 1 List of abbreviations 27
Chapter 2 Literature Review 292.1 Methods for the literature review . . . . . . . . . . . . . . . 292.2 PCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302.3 FDA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312.4 KPCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322.5 NDA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332.6 ICA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342.7 NWFE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352.8 NCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362.9 MCML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
Chapter 3 Mathematical approach for the feature reductionmethods 39
3.1 PCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393.2 FDA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403.3 KPCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403.4 NDA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413.5 ICA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413.6 NWFE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423.7 NCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433.8 MCML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Chapter 4 Feature extraction - EMG 474.1 Mean Absolute Value . . . . . . . . . . . . . . . . . . . . . . 474.2 Zero Crossing . . . . . . . . . . . . . . . . . . . . . . . . . . 474.3 Wilson Amplitude . . . . . . . . . . . . . . . . . . . . . . . . 474.4 Slope Sign Changes . . . . . . . . . . . . . . . . . . . . . . . 484.5 Variance Of EMG . . . . . . . . . . . . . . . . . . . . . . . . 484.6 Wave Length . . . . . . . . . . . . . . . . . . . . . . . . . . . 484.7 Root Mean Square . . . . . . . . . . . . . . . . . . . . . . . 484.8 Mean Frequency . . . . . . . . . . . . . . . . . . . . . . . . . 494.9 Median Frequency . . . . . . . . . . . . . . . . . . . . . . . . 494.10 Mean Power . . . . . . . . . . . . . . . . . . . . . . . . . . . 494.11 Autoregressive coefficients . . . . . . . . . . . . . . . . . . . 494.12 Sample Entropy . . . . . . . . . . . . . . . . . . . . . . . . . 504.13 Approximate Entropy . . . . . . . . . . . . . . . . . . . . . 50
25
-
Contents
Bibliography 53
26
-
List of abbreviations 1Table 1.1: List of abbreviations, feature reduction methods
PCA Principal component analysisFDA Fisher discriminant analysisKPCA Kernel principal component analysisNDA Nonparametric discriminant analysisICA Independent component analysisNWFE Nonparametric weighted feature extractionNCA Neighbourhood components analysisMCML Maximally collapsing metric learningNLDA Nonlinear discriminant analysisSOFM Self-organizing feature mapsNFA Nonparametric feature analysisNLPCA Nonlinear principal component analysisLPP Locality preserving projectionLLE Locally linear embeddingCCA Canonical correlation analysisDBFE Decision boundary feature extractionBDFS Bhattacharyya distance feature selectionCMM Correlative Matrix MappingNMCML Non Convex MCMLKLDA Kernel LDABM Bayesian method
27
-
1. List of abbreviations
Table 1.2: List of abbreviations, classifiers
SVM Support vector machinekNN k-nearest neighborsNBC Naive Bayes classifierMLP Multilayer perceptronGMM Gaussian mixture modelNN Neural networkPNN Orobabilistic neural networkLMkNN Local mean k-nearest neighborsRF Random forestDT Decision TreesMCS Multiple classifier systems
28
-
Literature Review 2The purpose of this chapter is to give an overview of the current researchdealing with feature reduction. The selected feature reduction methodsin the article are based on this literature review.
2.1 Methods for the literature review
The following keywords were used during the literature search:
Feature reduction Dimension reduction Feature extraction Feature projection
Furthermore, chain search was also used, where the references in thealready found literature, was investigated.Only studies that tested two or more methods were included in thisreview.The literature will be presented in tables containing:
The reference The data used in the article The feature reduction methods used in the article The applied classifiers in the article A short conclusion of the article
Please notice, that some of the articles reoccur in the tables, e.g. anarticle dealing with PCA and FDA, will occur in both Table 2.1 andTable 2.2.
29
-
2. Literature Review
2.2 PCA
Principal Component Analysis (PCA) is one of the most popular unsu-pervised linear method for dimensionality reduction [Martis et al., 2013].PCA seeks to maximise the variance in the data by mapping the datainto a linear subspace, containing the principal components. PCA is of-ten used as a benchmark in the literature, why there are many studiesdealing with PCA.Only literature dealing with biological signals will be presented in thissection. Literature dealing with PCA can be seen in Table 2.1.It is seen that PCA, despite being a popular method, is outperformedby most other methods.
Table 2.1: Literature dealing with PCA
Article Data Methods Classifier Conclusion
[Subasi & Gursoy, 2010] EEGPCAFDAICA
SVM PCA was outperformedby FDA and ICA.
[Lin et al., 2008] EEGPCAFDANWFE
KNNNBC
PCA was outperformedby NWFE, but showedbetter performancethan FDA.
[Yang et al., 2013] EMG PCANWFE SVMPCA was outperformedby NWFE.
[Chu et al., 2007] EMG
PCAFCANLDASOFM
MLP
PCA showed lower per-formance than NLDAand FDA, but betterthan SOFM.
[Giri et al., 2013] ECGPCAFDAICA
SVMGMMPNNkNN
PCA showed the high-est average performanceacross four classifiers,but ICA with GMMas classifier showed thehighest performance.
[Martis et al., 2013] ECGPCALDAICA
SVMNNPNN
PCA with PNN as clas-sifier, showed a higherperformance than FDA,but lower performancethan ICA.
30
-
2.3. FDA
2.3 FDA
FDA is another popular method for feature reduction. FDA is a super-vised method, that seeks to maximise the between-class scatter matrix,and minimising the within-class scatter matrix [Giri et al., 2013]. Onlyliterature dealing with biological signals will be presented for FDA, seeTable 2.2.It is seen that FDA, is outperformed by most other methods.
Table 2.2: Literature dealing with FDA
Article Data Methods Classifier Conclusion
[Subasi & Gursoy, 2010] EEGFDAPCAICA
SVM
FDA showed a betterperformance than PCA,but was outperformedby ICA.
[Lin et al., 2008] EEGPCAFDANWFE
kNNNBC
FDA was outperformedby both NWFE andPCA.
[Chu et al., 2007] EMG
FCAPCANLDASOFM
MLP
FDA showed a higherperformance thanPCA and SOFM, butwas outperformed byNLDA.
[Kamavuako et al., 2014] EMGFDANDANFA
kNNLMkNN
FDA was outperformedby the other methods.
[Giri et al., 2013] ECGFDAPCAICA
SVMGMMPNNkNN
FDA was outperformedby PCA and ICA.
[Martis et al., 2013] ECGFDAPCAICA
SVMNNPNN
FDA was outperformedby PCA and ICA.
31
-
2. Literature Review
2.4 KPCA
Kernel principal component analysis (KPCA) is a variant of PCA thatuses a nonlinear kernel function, rather than the original linear [K. Huanget al., 2003]. Literature dealing with ICA can be seen in Table 2.3. Ingeneral KPCA shows good performance in the literature, but there wasnot found any literature that investigates KPCA and biological signals.
Table 2.3: Literature dealing with KPCA
Article Data Methods Classifier Conclusion
[Castaings et al., 2010] Image
KPCAPCANWFEBDFSDBFE
SVMRF
The study tested twodifferent datasets andKPCA was in gen-eral outperformed byNWFE and BDFS, butshowed better resultsthan PCA and DBFE.
[K. Huang et al., 2003] Image
KPCAPCAICANLPCA
SVM
KPCA outperformedthe other tested featurereduction methods.Four feature selec-tion methods was alsotested, and they allshowed higher perfor-mance than KPCA.
[W. Huang & Yin, 2012] Image
KPCAPCALPPLLEISOMAPCCA
kNNsoftk-NNLDASVM
On average, LPP andLLE outperformed theother methods, andKPCA showed similarresults to the remainingmethods.
[Cao et al., 2003]Sevenvariousdatasets
KPCAPCAICA
SVMKPCA showed the high-est performance for alltested datasets.
32
-
2.5. NDA
2.5 NDA
NDA is a nonparametric method, that is similar to FDA as it also relieson the scatter matrixes. Literature dealing with NDA can be seen inTable 2.8. NDA shows good performance, when comparing to traditionalmethods, but is outperformed by NFA.
Table 2.4: Literature dealing with MCML
Article Data Methods Classifier Conclusion
[Kamavuako et al., 2014] EMGNDAFDANFA
kNNLMkNN
NDA showed better per-formance than FDA,but was outperformedby NFA
[Li et al., 2009] Image
NDAPCAFDABMKLDA
MCSNDA outperformed theother methods for bothof the tested datasets.
33
-
2. Literature Review
2.6 ICA
ICA is a blind source separation technique, that separates a dataset intoindependent, non-Gaussian subcomponents [Cao et al., 2003; Mwangi etal., 2014].Literature dealing with ICA can be seen in Table 2.5. ICA outperformsPCA and FDA for the litterature dealing with EEG and ECG, but wasoutperformed by KPCA in litterature dealing with image and other var-ious datasets.
Table 2.5: Literature dealing with ICA
Article Data Methods Classifier Conclusion
[Subasi & Gursoy, 2010] EEGICAPCAFDA
SVMICA showed higher per-formance than PCA andFDA.
[Martis et al., 2013] ECGICAPCAFDA
SVMNNPNN
ICA with PNN as clas-sifier showed higher per-formance than any theother combination ofclassifier and feature re-duction methods.
[Giri et al., 2013] ECGICAPCAFDA
SVMGMMPNNKNN
ICA with GMM as clas-sifier showed higher per-formance than any theother combinations ofclassifier and feature re-duction methods.
[K. Huang et al., 2003] Image
ICAPCAKPCANLPCA
SVM
ICA showed higher per-formance than NLPCA,but was outperformedby KPCA and PCA.
[Cao et al., 2003]Sevenvariousdatasets
ICAPCAKPCA
SVM
ICA showed higher per-formance than PCA,but is outperformed byKPCA.
34
-
2.7. NWFE
2.7 NWFE
NWFE is a new nonparametric feature reduction method.Literature dealing with NWFE can be seen in Table 2.6. NWFE showsto outperform many of the other methods.
Table 2.6: Literature dealing with NWFE
Article Data Methods Classifier Conclusion
[Yang et al., 2013] EMGNWFEPCAKPCA
SVM NWFE outperformedPCA
[Lin et al., 2008] EEGNWFEPCAFDA
KNNNBC
NWFE outperformedall the other methodsfor both of the testedclassifiers.
[Castaings et al., 2010] Image
NWFEPCAKPCABDFSDBFE
SVMRF
On average NWFEoutperformed the othermethods, for the twotested datasets.
35
-
2. Literature Review
2.8 NCA
NCA is a supervised method, that seeks to find a Mahalnobis distancemetric for kNN that optimises the leave-one-out error on the training set[Manit & Youngkong, 2011].Literature dealing with NCA can be seen in Table 2.7. NCA showshigh performance, and outperforms most methods, except NMCML andMCML in [Globerson & Roweis, 2005].
Table 2.7: Literature dealing with NCA
Article Data Methods Classifier Conclusion
[Manit & Youngkong, 2011] EMG
NCAPCAFDALPP
SVM NCA outperformed theother methods.
[Soto et al., 2011] Image
NCAFDAMCMLCMMCCA
kNNDTSVM
NCA using kNN-classification was thecombination thatshowed the highestperformance
[Goldberger et al., 2004] Six variuosdatasets
NCAPCAFDA
kNNNCA outperformed theother methods for alldatasets.
[Globerson & Roweis, 2005] Six variousdatasets
NCANMCMLMCML
kNNNCA was outperformedby NMCML and MCMLon average.
36
-
2.9. MCML
2.9 MCML
MCML is a supervised method, which is similar to NCA, and also relieson the Mahalanobis distance metric for k-nearest neighbours [Globerson& Roweis, 2005].Literature dealing with MCML can be seen in Table 2.8. MCML showsvarious results, and is e.g. outperformed by NCA in one study [Soto etal., 2011], and is better than NCA in another study [Globerson & Roweis,2005].
Table 2.8: Literature dealing with MCML
Article Data Methods Classifier Conclusion
[Soto et al., 2011] Image
MCMLFDANCACMMCCA
kNNDTSVM
In general MCML wasoutperformed by NCAand CMM.
[Globerson & Roweis, 2005]Sixvariuosdatasets
MCMLNMCMLNCA
kNN
MCML and NMCMLshow similar results, butboth methods showedbetter results than NCAon average.
37
-
Mathematical
approach for the
feature reduction
methods 3This chapter gives an overview of the mathematical approaches used inthis study. All methods was implemented in Matlab 2015A.
3.1 PCA
The step by step procedure for PCA is as follows [Giri et al., 2013]:
1. Center the feature dataset by subtracting the mean of the dataset,x.
x = x 1N
NX
i=1x
i
(3.1)
2. Calculate the covariance matrix () of the centered dataset, wherem defines the mean vector and N defines the number of dimensions.
= 1N
{(x m)(x m)T } (3.2)
3. Calculate the eigenvectors (V ) and the eigenvalues (D) of the co-variance matrix.
V =V D (3.3)4. Sort the eigenvectors according to decreasing eigenvalues.5. Choose the number of desired principal component.6. Project the training data by multiplying the centered training data
and the eigenvectors.7. Project the test data by multiplying the centered test data and the
eigenvectors.
39
-
3. Mathematical approach for the feature reduction methods
3.2 FDA
The step by step procedure for FDA is as follows [Giri et al., 2013;Kamavuako et al., 2014]:
1. Calculate the between-class scatter matrix:
S
b
=LX
i=1
X
x
j
2Ci
(xj
mi
)(xj
mi
)T (3.4)
2. Calculate the within-class scatter matrix:
S
w
=LX
i=1n
i
(mi
m)(mi
m)T (3.5)
3. Calculate the eigenvectors and eigenvalues of (Sw
)1Sb
4. Sort the eigenvectors according to decreasing eigenvalues.5. Project the training data by multiplying the training data and the
eigenvectors.6. Project the test data by multiplying the test data and the eigen-
vectors.
3.3 KPCA
The step by step procedure for KPCA is as follows [K. Huang et al.,2003; Kuzmin & Warmuth, 2007; Kwok & Tsang, 2004]:
1. Construct the kernel matrix, where x defines the the dataset. Thevalue of was chosen to be 30 in this study.
K (x, xT ) = exp( |x xT |2
22) (3.6)
2. Center the kernel matrix, where 1N
defines a N xN matrix whereeach element in the matrix is 1/N
K
0n
= K 1N
K K 1N
+1N
K 1N
(3.7)
3. Calculate the eigenvectors (V ) and the eigenvalues (D) of the cen-tered kernel matrix, K
n
.
V Kn
=V D (3.8)
4. Sort the eigenvectors according to decreasing eigenvalues.5. Choose the number of desired principal component.6. Project the training data by multiplying the centered kernel matrix
with the eigenvectors.7. Construct a centered kernel matrix of the test data and project the
test data by multiplying with the eigenvectors.
40
-
3.4. NDA
3.4 NDA
The step by step procedure for NDA is as follows [Kamavuako et al.,2014]:
1. Calculate the within-class scatter matrix:
S
w
=LX
i=1n
i
(mi
m)(mi
m)T (3.9)
2. Calculate the weighting function !(i , j , l )
!(i , j , l ) =mi n{d(xi
l
, N Nk
(xil
, i )),d(xil
, N Nk
(xil
, i )}
d
(xil
, N Nk
(xil
, i )),d(xil
, N Nk
(xil
, i )(3.10)
where d denotes the Euclidian distance, controls speed of thechanging, regard to the distance ratio and xi
l
denotes the featurevector, l , in class i .
3. Calculate the between-class scatter matrix:
S
b
=cX
i=1
cX
j=1j 6=i
N
iX
l=1!(i , j , l ) (xi
l
mj
(xil
)) (xil
mj
(xil
))T (3.11)
4. Calculate the eigenvectors and eigenvalues of (Sw
)1Sb
5. Sort the eigenvectors according to decreasing eigenvalues.6. Project the training data by multiplying the training data and the
eigenvectors.7. Project the test data by multiplying the test data and the eigen-
vectors.
3.5 ICA
ICA is a blind source separation technique, that separates a dataset intoindependent, non-Gaussian subcomponents [Cao et al., 2003; Mwangi etal., 2014]. ICA assumes that the dataset x is a linear mixture with thesource signal, s, and seeks to find this signal:
x = A s (3.12)
The step by step procedure for ICA is as follows [Cao et al., 2003; Martiset al., 2013]:
1. Center the feature dataset by subtracting the mean of the dataset,x.
x = x 1N
NX
i=1x
i
(3.13)
41
-
3. Mathematical approach for the feature reduction methods
2. Whitening of the dataset, to ensure that the dataset is Gaussian:
x =V D1/2V T x (3.14)
where V DV T can be obtained by calculating the covariance matrix:
=V DV T (3.15)
3. Selection of the independence criteria. FastICA was implementedin this study:
a) Set a random initial weight vector wb) Calculate W +
W
+ = E {xg (W T x)}E {g 0(W T x)} W (3.16)
Where the non-quadratic function for this study was chosento be g (u) = u3. E denotes the expected value.
c) Normalise W +
W
+ =W +/||W +|| (3.17)
d) Repeat until W + is converged.4. When W is converged, its inverse A is calculated.5. Project the training data by multiplying the whitened training data
and the output from the independence criteria.6. Project the test data by multiplying the whitened test data and
the output from the independence criteria.
ICA was implemented by using fastICA.m developed by Hugo Gvert.
3.6 NWFE
The step by step procedure for NWFE is as follows [Kuo & Landgrebe,2004]:
1. Calculate the distance matrix as follows:
w
(i , j )l k
= di st (xt (i ), xk (i )1
n
jPt=1
di st (xt
(i ), xk
(i )1(3.18)
2. Calculate the weighted means Mj
(x(i )k
) by using the distance matrixw
(i , j )lk
M
j
(xl
(i )) =N
jX
k=1w
(i , j )lk
x
( j )k
(3.19)
42
-
3.7. NCA
3. Calculate the weight of the scatter matrix:
i , jl
=di st (x(i )
l
, Mj
(xl
(i )1
N
jPk=1
di st (x(i )t
, Mj
(xt
(i )1(3.20)
4. Calculate the nonparametric between-class scatter matrix:
S
b
=LX
i=1P
i
LX
j=1j 6=i
N
iX
k=1
i , jk
n
i
x
(i )k
Mj
(x(i )k
)x
(i )k
Mj
(x(i )k
)
T
(3.21)
5. Calculate the nonparametric within-class scatter matrix and regu-larise it:
S
w
=LX
i=1P
i
N
iX
k=1
i , jk
n
i
x
(i )k
Mj
(x(i )k
)x
(i )k
Mj
(x(i )k
)
T
(3.22)
S
w
= 0.5Sw
+0.5di ag (Sw
) (3.23)
6. Calculate the eigenvectors and eigenvalues of (Sw
)1Sb
7. Sort the eigenvectors according to decreasing eigenvalues.8. Project the training data by multiplying the training data and the
eigenvectors.9. Project the test data by multiplying the test data and the eigen-
vectors.
3.7 NCA
The step by step procedure for NCA is as follows [Goldberger et al.,2004]:
1. Center the feature dataset by substracting the mean of the dataset,x.
x = x 1N
NX
o=1x
i
(3.24)
2. Calculate the mahalanobis metrix of the samples {x1, x2, .., xN } withthe belonging labels {y1, y2, .., yN }:
d(xi
, xj
) = (Axi
Axj
)T (Axi
Axj
) (3.25)
3. NCA is aiming to find A that maximises the nearest neighbor clas-sification. The optimisation criterion is implemented by use ofsoft-neighbor-approach, where p
i j
must be calculated:
p
i j
=exp(||Ax
i
Axj
||2P
k 6=iexp(||Ax
i
Axk
||2) , pi i = 0 (3.26)
43
-
3. Mathematical approach for the feature reduction methods
4. pi
, the probability that a point i will be classified correctly is cal-culated:
p
i
=X
j2Ci
p
i j
(3.27)
C
i
= { j |yj
= yi
} (3.28)
5. The optimisation criterion f (A) is calculated as the sum of all theprobabilities of a correctly classification:
f (A) =X
i
p
i
(3.29)
6. A is finally optimised by the gradient rule:@ f
@A= 2A
X
i
(pi
X
k
p
i k
x
i k
x
T
i k
X
j2Ci
p
i j
x
i j
x
T
i j
) (3.30)
7. Project the training data by multiplying the centered training dataand A.
8. Project the test data by multiplying the centered test data and A.
NCA was implemented by using the Matlab Toolbox for DimensionalityReduction developed by Laurens van der Maaten.
3.8 MCML
The step by step procedure for MCML is as follows [Globerson & Roweis,2005]:
1. Center the feature dataset by substracting the mean of the dataset,x.
x = x 1N
NX
o=1x
i
(3.31)
2. Calculate the mahalanobis metrix of the samples {x1, x2, .., xN } withthe belonging labels {y1, y2, .., yN }:
d(xi
, xj
|A) = d Ai j
= (xi
xj
)T A(xi
xj
) (3.32)
where A denotes the PSD matrix.3. Calculate the conditional probabilities p A( j |i ) and the conditional
distribution p0( j |i )
p
A( j |i ) =expd A
i j
Pk 6=i
expd Ai k
, i 6= j (3.33)
p0( j |i )_(
1 yi
= yj
0 = yi
6= yj
(3.34)
44
-
3.8. MCML
4. Minimise A by the the KullbackLeibler (KL) divergence betweenp0 and p A:
minA
=X
i
K L[p0( j |i )|p A( j |i )] (3.35)
5. Project the training data by multiplying the centered training dataand A.
6. Project the test data by multiplying the centered test data and A.
MCML was implemented by using the Matlab Toolbox for Dimensional-ity Reduction developed by Laurens van der Maaten.
45
-
Feature extraction -
EMG 4This chapter describes the features extracted from the EMG. The fea-tures extracted from the EEG will not be described, due to their sim-plicity. Throughout this chapter x
i
denotes the signal in segment i , andN denotes the length of x
i
.
4.1 Mean Absolute Value
Mean Absolute Value (MAV) is a frequently used feature within EMGpattern recognition. It is calculated by taking the mean of the absoluteamplitude of the signal [Phinyomark et al., 2012]:
M AV = 1N
NX
i=1|x
i
| (4.1)
4.2 Zero Crossing
Zero crossing (ZC) contains information about the frequencies, but isdefined in the time domain. It is defined by the number of time thevalue of the signal crosses a certain threshold. The threshold for thisstudy is 10mV . It is calculated as follows [Phinyomark et al., 2012]:
ZC =N1X
i=1[sg n(x
i
xi+1
\|x
i
xi+1| thr eshol d ] (4.2)
4.3 Wilson Amplitude
Wilson Amplitude (WAMP) also contains information about the frequen-cies but defined in the time domain. It reflects the contraction force andthe firing of motor units. It is defined by the number of time the differ-ence between two amplitudes exceeds a certain threshold. The threshold
47
-
4. Feature extraction - EMG
for this study is 10mV . It is calculated as follows [Phinyomark et al.,2012]:
W AMP =N1X
i=1[ f (|x
i
xi+1|)] (4.3)
f (x) =(
1, x thr eshol d0, other wi se
(4.4)
4.4 Slope Sign Changes
Slope Sign Changes (SSC) also contains information about the frequen-cies, but defined in the time domain. It is defines as the number of timesthe slopes of the signal changes sign, above a certain threshold. Thethreshold for this study is thr eshol d = 10mV . It is calculated as follows[Phinyomark et al., 2012]:
SSC =N1X
i=2[ f ([x
i
xi1) (xi xi 01)]] (4.5)
f (x) =(
1, x thr eshol d0, other wi se
(4.6)
4.5 Variance Of EMG
Variance Of EMG (VAR) is defined as [Phinyomark et al., 2012]:
V AR = 1N 1
NX
i=1x
2i
(4.7)
4.6 Wave Length
Wave length is the cumulative length of the signal, and is calculated asfollows [Phinyomark et al., 2012]:
RMS =N1X
i=1|x
i+1 xi | (4.8)
4.7 Root Mean Square
Root Mean Square (SMS) is another frequently used feature within EMGpattern recognition, and is calculated as follows [Phinyomark et al., 2012]:
48
-
4.8. Mean Frequency
RMS =
vuut 1N
NX
i=1x
2i
(4.9)
4.8 Mean Frequency
Mean frequency (MNF) is a common used frequency domain feature.The mean frequency is defines by [Phinyomark et al., 2012]:
M N F =MX
j=1f
j
P
j
MX
j=1P
j
(4.10)
f
j
denotes the frequency in the frequency bin j , Pj
denotes the power inthe frequency bin j and M is the total number of bins.
4.9 Median Frequency
Median Frequency (MDF) is another popular feature from the frequencydomain, and is calculated as follows [Phinyomark et al., 2012]:
MDF = 12
MX
j=1P
j
(4.11)
4.10 Mean Power
Median Frequency (MNP) of the power spectrum is defined as [Phiny-omark et al., 2012]:
M N P =MX
j=1P
j
M (4.12)
4.11 Autoregressive coecients
Autoregressive (AR) model is defined as follows [Phinyomark et al.,2012]:
x
i
=PX
p=1a
p
xip +wi (4.13)
where P denotes the order of the model, which was chosen to be 6 in thisstudy. w
i
denotes the white noise error.
49
-
4. Feature extraction - EMG
4.12 Sample Entropy
Sample Entropy (SampEn) can be found as follows [Kumar & Dewal,2011]:
1. Form a m vector based on the original EMG data (xn
= x1, x2, ..., x(N )):
X
m
(i ) = [x(i ), x(i +1), ..., x(i +m 1)],1 i N m +1 (4.14)
where m is defined as 2 in this study.
2. Calculate the distance between Xm
(i ) and Xm
( j ) as follows:
d [Xm
(i ), Xm
( j )] = maxk=0,...,m1(|x(i +k)x( j +k)|)2 (4.15)
3. Calculate the Sample Entropy:
SampEn = l i m{ln[A
m
r
B
m
r
]} (4.16)
where Amr
defines the number for vector pairs having a distance < rof length m+1, and B m
r
defines the number for vector pairs havinga distance < r of length m. r is set to r = 0.2 in this study.
4.13 Approximate Entropy
Approximate Entropy (ApEn) can be found as follows [Kumar & Dewal,2011]:
1. Form a vector of subsequences of X = [x(1), x(2), .., x(N )]:
x(i ) = [x(i ), x(i +1), x(i +2), ..., x(i +m1)],1 i N m (4.17)
where m is defined as 2 in this study.
2. Calculate the distance between X (i ) and X ( j ) as follows:
d [x(i ), x( j )] = maxk=0,...,m1|x(i +k)x( j +k)| (4.18)
3. Find M m(i ), the number of times the distance is above r. r is setto r = 0.2 in this study. Calculate:
C
m
r
(i ) = Mm(i )
N m +1, f or i = 1, ..., N m +1 (4.19)
4. Then find the mean logarithm of C mr
(i ):
mr
= 1N m +1
Nm+1X
i=1lnC
m
r
(i ) (4.20)
50
-
4.13. Approximate Entropy
5. Repeat the calculations for m +1.
6. Calculate the ApEn:
ApEn = l i m(mr
m+1r
) (4.21)
where Amr
defines the number for vector pairs having a distance < rof length m+1, and B m
r
defines the number for vector pairs havinga distance < r of length m. r is set to r = 0.2 in this study.
51
-
Bibliography
Cao, L., Chua, K., Chong, W., Lee, H., & Gu, Q. (2003). A comparisonof pca, kpca and ica for dimensionality reduction in support vectormachine. Neurocomputing, 55(1), 321336.
Castaings, T., Waske, B., Atli Benediktsson, J., & Chanussot, J. (2010).On the influence of feature reduction for the classification of hyperspec-tral images based on the extended morphological profile. InternationalJournal of Remote Sensing, 31(22), 59215939.
Chu, J.-U., Moon, I., Lee, Y.-J., Kim, S.-K., & Mun, M.-S. (2007).A supervised feature-projection-based real-time emg pattern recog-nition for multifunction myoelectric hand control. Mechatronics,IEEE/ASME Transactions on, 12(3), 282290.
Giri, D., Acharya, U. R., Martis, R. J., Sree, S. V., Lim, T.-C., Ahamed,T., & Suri, J. S. (2013). Automated diagnosis of coronary arterydisease affected patients using lda, pca, ica and discrete wavelet trans-form. Knowledge-Based Systems, 37, 274282.
Globerson, A., & Roweis, S. T. (2005). Metric learning by collapsingclasses. In Advances in neural information processing systems (pp.451458).
Goldberger, J., Roweis, S., Hinton, G., & Salakhutdinov, R. (2004).Neighbourhood components analysis.
Huang, K., Velliste, M., & Murphy, R. F. (2003). Feature reduction forimproved recognition of subcellular location patterns in fluorescencemicroscope images. In Biomedical optics 2003 (pp. 307318).
Huang, W., & Yin, H. (2012). On nonlinear dimensionality reductionfor face recognition. Image and Vision Computing, 30(4), 355366.
Kamavuako, E. N., Scheme, E. J., & Englehart, K. B. (2014). Nonpara-metric discriminant projections for improved myoelectric classification., 6869.
Kumar, Y., & Dewal, M. (2011). Complexity measures for normal andepileptic eeg signals using apen, sampen and sen. Int J Comput Com-mun Technol, 2, 612.
53
-
Bibliography
Kuo, B.-C., & Landgrebe, D. A. (2004). Nonparametric weighted featureextraction for classification. Geoscience and Remote Sensing, IEEETransactions on, 42(5), 10961105.
Kuzmin, D., & Warmuth, M. K. (2007). Online kernel pca with entropicmatrix updates. In Proceedings of the 24th international conferenceon machine learning (pp. 465472).
Kwok, J.-Y., & Tsang, I. W. (2004). The pre-image problem in kernelmethods. Neural Networks, IEEE Transactions on, 15(6), 15171525.
Li, Z., Lin, D., & Tang, X. (2009). Nonparametric discriminant analysisfor face recognition. Pattern Analysis and Machine Intelligence, IEEETransactions on, 31(4), 755761.
Lin, C.-T., Lin, K.-L., Ko, L.-W., Liang, S.-F., Kuo, B.-C., Chung, I.-F., et al. (2008). Nonparametric single-trial eeg feature extractionand classification of drivers cognitive responses. EURASIP Journalon Advances in Signal Processing, 2008(1), 849040.
Manit, J., & Youngkong, P. (2011). Neighborhood components analysisin semg signal dimensionality reduction for gait phase pattern recog-nition. In Broadband and biomedical communications (ib2com), 20116th international conference on (pp. 8690).
Martis, R. J., Acharya, U. R., & Min, L. C. (2013). Ecg beat classificationusing pca, lda, ica and discrete wavelet transform. Biomedical SignalProcessing and Control, 8(5), 437448.
Mwangi, B., Tian, T. S., & Soares, J. C. (2014). A review of featurereduction techniques in neuroimaging. Neuroinformatics, 12(2), 229244.
Phinyomark, A., Phukpattaranont, P., & Limsakul, C. (2012). Featurereduction and selection for emg signal classification. Expert Systemswith Applications, 39(8), 74207431.
Soto, A. J., Strickert, M., Vazquez, G. E., & Milios, E. (2011). Subspacemapping of noisy text documents. In Advances in artificial intelligence(pp. 377383). Springer.
Subasi, A., & Gursoy, M. I. (2010). Eeg signal classification using pca,ica, lda and support vector machines. Expert Systems with Applica-tions, 37(12), 86598666.
Yang, P., Xing, K., Huang, J., & Wang, Y. (2013). A novel featurereduction method for real-time emg pattern recognition system. InControl and decision conference (ccdc), 2013 25th chinese (pp. 15001505).
54
ContentsList of abbreviationsLiterature ReviewMethods for the literature reviewPCAFDAKPCANDAICANWFENCAMCML
Mathematical approach for the feature reduction methodsPCAFDAKPCANDAICANWFENCAMCML
Feature extraction - EMGMean Absolute ValueZero CrossingWilson AmplitudeSlope Sign ChangesVariance Of EMGWave LengthRoot Mean SquareMean FrequencyMedian FrequencyMean PowerAutoregressive coefficientsSample EntropyApproximate Entropy
Bibliography