EEG Data Analysis, Feature Extraction and Classifiers

Clemson UniversityTigerPrints

All Theses Theses

5-2011

EEG Data Analysis, Feature Extraction andClassifiersJing ZhouClemson University, [email protected]

Follow this and additional works at: https://tigerprints.clemson.edu/all_theses

Part of the Electrical and Computer Engineering Commons

This Thesis is brought to you for free and open access by the Theses at TigerPrints. It has been accepted for inclusion in All Theses by an authorizedadministrator of TigerPrints. For more information, please contact [email protected].

Recommended CitationZhou, Jing, "EEG Data Analysis, Feature Extraction and Classifiers" (2011). All Theses. 1075.https://tigerprints.clemson.edu/all_theses/1075

https://tigerprints.clemson.edu?utm_source=tigerprints.clemson.edu%2Fall_theses%2F1075&utm_medium=PDF&utm_campaign=PDFCoverPages

https://tigerprints.clemson.edu/all_theses?utm_source=tigerprints.clemson.edu%2Fall_theses%2F1075&utm_medium=PDF&utm_campaign=PDFCoverPages

https://tigerprints.clemson.edu/theses?utm_source=tigerprints.clemson.edu%2Fall_theses%2F1075&utm_medium=PDF&utm_campaign=PDFCoverPages

https://tigerprints.clemson.edu/all_theses?utm_source=tigerprints.clemson.edu%2Fall_theses%2F1075&utm_medium=PDF&utm_campaign=PDFCoverPages

http://network.bepress.com/hgg/discipline/266?utm_source=tigerprints.clemson.edu%2Fall_theses%2F1075&utm_medium=PDF&utm_campaign=PDFCoverPages

https://tigerprints.clemson.edu/all_theses/1075?utm_source=tigerprints.clemson.edu%2Fall_theses%2F1075&utm_medium=PDF&utm_campaign=PDFCoverPages

mailto:[email protected]

EEG Data Analysis, Feature Extraction and Classifiers

A Thesis

Presented to

the Graduate School of

Clemson University

In Partial Fulfillment

of the Requirements for the Degree

Master of Science

Electrical Engineering

by

Jing Zhou

May 2011

Accepted by:

Dr. Robert Schalkoff, Committee Chair

Dr. John Gowdy

Dr. Brian Dean

Abstract

Epileptiform transients (ETs) are an important kind of EEG signal. They have various

morphologies and can be difficult to detect. This thesis describes several approaches to detecting

and classifying epileptiform transients (ETs), including Bayesian classification (with Gaussian As-

sumption), artificial neural networks (Backpropagation FeedForward Network) and k-NNR. Various

features were extracted, including the shape, frequency domain and wavelet transform coefficients.

The long term goal of this research is to determine the required size of a dataset to obtain clinically

significant machine classification results. The immediate goal is to identify a reasonable feature set

which can achieve acceptable classification performance with reasonable computational complexity.

We have explored the effect on the results by changing window size, by filtering and by adding spa-

tial information. Unsupervised methods, e.g. clustering, have also been explored. Presently, ANN

provides the best classification method using a wavelet set for the best feature set. Future directions

are indicated.

ii

Acknowledgments

Many people contributed time, knowledge, skill, and support to this research. I am pleased

to acknowledge their contributions. First, I would like to thank Dr. Jonathan Halford for supporting

us with all the materials needed in this project. And I would like to thank Dr. Schalkoff, Dr. Gowdy

and Dr. Dean for being great advisors and always being there when I had a question. I would also

like to thank Dr. Dean and his students for preparing data for us.

iii

Table of Contents

Title Page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i

Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii

Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii

List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi

List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Available Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.3 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81.4 Overview of This Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2 Research Design and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.1 Feature Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.2 Spatial Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.3 Classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.4 Window Size and Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192.5 Classifier and Feature Set Performance Evaluation Methodology . . . . . . . . . . . 21

3 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233.1 Definitions Used in the Assessment of Classifier Performance . . . . . . . . . . . . . 233.2 Cross-Validation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243.3 k-Fold Results of k-NNR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373.4 Effect of Spatial Information in Feature Set . . . . . . . . . . . . . . . . . . . . . . . 483.5 Comparison of 2-Class Model and 3-Class Model . . . . . . . . . . . . . . . . . . . . 503.6 Comparison of CV Results between ’Original’ Annotation and ’Best 7’ Annotation . 583.7 Clustering Analysis and RBF Classifier . . . . . . . . . . . . . . . . . . . . . . . . . . 623.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 663.9 Future Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

Appendices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68A Initial Dataset Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69B Distribution of Feature Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80C Bayesian Parameter Estimation Approach . . . . . . . . . . . . . . . . . . . . . . . . 90D Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93E Risk Factors and Effects in the Bayesian Classifier . . . . . . . . . . . . . . . . . . . 96F FastICA and the Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

iv

G Matlab Simulation Code Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

v

List of Tables

2.1 Coordinate Information of Electrode Channels . . . . . . . . . . . . . . . . . . . . . 142.2 Distribution of AEP Segment Length in the Annotation of D20 . . . . . . . . . . . . 192.3 Distribution of AEP Segment Length in the ’Original’ Annotation of D100 . . . . . . 202.4 Distribution of AEP Segment Length in the ’Best 7’ Annotation of D100 . . . . . . . 20

3.1 Confusion Matrix of Bayesian Classify and ANN of class = 3 Problem . . . . . . . . 523.2 Confusion Matrix of Bayesian Classify and ANN of class = 2 Problem . . . . . . . . 543.3 Confusion Matrix of k-fold . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 573.4 Error Rate of c-means for Selected D20 Feature Sets (c = 2) . . . . . . . . . . . . . . 633.5 Parameters of SOFM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 633.6 Final Weights of SOFM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 643.7 Parameters of NG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 643.8 Final Weights of NG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 659 AEP Sample Number and nonAEP Sample Number in Each Dataset . . . . . . . . . 6910 AEP Signal Segments in the Annotation of D20 . . . . . . . . . . . . . . . . . . . . . 7011 AEP Signal Segments in the Original Annotation of D100 . . . . . . . . . . . . . . . 7212 AEP Signal Segments in the ’Best 7’ Annotation of D100 . . . . . . . . . . . . . . . . 7413 Total Number of AEP Samples by Channel Pair in the Annotation of D20 . . . . . . 7514 Total Number of AEP Samples by Channel Pair in the Original Annotation of D100 7517 Segment Length of AEP Signal in the Original Annotation of D100 . . . . . . . . . . 7718 Segment Length of AEP Signal in the ’Best 7’ Annotation of D100 . . . . . . . . . . 7815 Total Number of AEP Samples by Channel Pair in the ’Best 7’ Annotation of D100 . 7916 Segment Length of AEP Signal in the Annotation of D20 . . . . . . . . . . . . . . . 7919 Matlab Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

vi

List of Figures

1.1 Lobes in Hemisphere . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Electrode Location . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3 Electrode Distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.4 Abnormal Epileptiform PED . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.5 Artifactual PED . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.6 Normal Electrocortical PED . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.7 Interface of ’eegNet’ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.1 Shape Features of an ET Used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.2 Comparison between Wavelet Functions and ETs . . . . . . . . . . . . . . . . . . . . 122.3 Multilayer Feedforward Network using Back-Propagation Algorithm . . . . . . . . . 16

3.1 Mean of D100 CV Sensitivity by ’Best 7’, c=2 . . . . . . . . . . . . . . . . . . . . . . 253.2 Mean of D100 CV Specificity by ’Best 7’, c=2 . . . . . . . . . . . . . . . . . . . . . . 263.3 Variance of D100 CV Sensitivity by ’Best 7’, c=2 . . . . . . . . . . . . . . . . . . . . 273.4 Variance of D100 CV Specificity by ’Best 7’, c=2 . . . . . . . . . . . . . . . . . . . . 283.5 CV Sensitivity by Bayesian Classifier, Frequency Feature, ’Best 7’ & c=2 . . . . . . 293.6 CV Specificity by Bayesian Classifier, Frequency Feature, ’Best 7’ & c=2 . . . . . . . 293.7 CV Sensitivity by ANN, Frequency Feature, ’Best 7’ & c=2 . . . . . . . . . . . . . . 293.8 CV Specificity by ANN, Frequency Feature, ’Best 7’ & c=2 . . . . . . . . . . . . . . 293.9 CV Sensitivity by Bayesian Classifier, Waveform Feature, ’Best 7’ & c=2 . . . . . . 303.10 CV Specificity by Bayesian Classifier, Waveform Feature, ’Best 7’ & c=2 . . . . . . . 303.11 CV Sensitivity by ANN, Waveform Feature, ’Best 7’ & c=2 . . . . . . . . . . . . . . 303.12 CV Specificity by ANN, Waveform Feature, ’Best 7’ & c=2 . . . . . . . . . . . . . . 303.13 CV Sensitivity by Bayesian Classifier, Waveform+Frequency Feature, ’Best 7’ & c=2 313.14 CV Specificity by Bayesian Classifier, Waveform+Frequency Feature, ’Best 7’ & c=2 313.15 CV Sensitivity by ANN, Waveform+Frequency Feature, ’Best 7’ & c=2 . . . . . . . 313.16 CV Specificity by ANN, Waveform+Frequency Feature, ’Best 7’ & c=2 . . . . . . . 313.17 CV Sensitivity by Bayesian Classifier, Wavelet Feature, ’best 7’ & c=2 . . . . . . . . 323.18 CV Specificity by Bayesian Classifier, Wavelet Feature, ’best 7’ & c=2 . . . . . . . . 323.19 CV Sensitivity by ANN, Wavelet Feature, ’Best 7’ & c=2 . . . . . . . . . . . . . . . 323.20 CV Specificity by ANN, Wavelet Feature, ’Best 7’ & c=2 . . . . . . . . . . . . . . . 323.21 CV Sensitivity by Bayesian Classifier, Wavelet+XY Feature, ’Best 7’ & c=2 . . . . . 333.22 CV Specificity by Bayesian Classifier, Wavelet+XY Feature, ’Best 7’ & c=2 . . . . . 333.23 CV Sensitivity by ANN, Wavelet+XY Feature, ’Best 7’ & c=2 . . . . . . . . . . . . 333.24 CV Specificity by ANN, Wavelet+XY Feature, ’Best 7’ & c=2 . . . . . . . . . . . . 333.25 Mean of D100 CV Unbalanced Sensitivity by ’Best 7’, c=3 . . . . . . . . . . . . . . . 353.26 Mean of D100 CV Unbalanced Specificity by ’Best 7’, c=3 . . . . . . . . . . . . . . . 353.27 Mean of D100 CV Balanced Sensitivity by ’Best 7’, c=3 . . . . . . . . . . . . . . . . 363.28 Mean of D100 CV Balanced Specificity by ’Best 7’, c=3 . . . . . . . . . . . . . . . . 363.29 k-NNR Sensitivity and Specificity of Frequency D2000, c=2 . . . . . . . . . . . . . . 37

vii

3.30 k-NNR Sensitivity and Specificity of Frequency D5000, c=2 . . . . . . . . . . . . . . 373.31 k-NNR Sensitivity and Specificity of Waveform D2000, c=2 . . . . . . . . . . . . . . 383.32 k-NNR Sensitivity and Specificity of Waveform D5000, c=2 . . . . . . . . . . . . . . 383.33 k-NNR Sensitivity and Specificity of Waveform+Frequency D2000, c=2 . . . . . . . . 383.34 k-NNR Sensitivity and Specificity of Waveform+Frequency D5000, c=2 . . . . . . . . 383.35 k-NNR Sensitivity and Specificity of Wavelet+XY D2000, c=2 . . . . . . . . . . . . . 393.36 k-NNR Sensitivity and Specificity of Wavelet+XY D5000, c=2 . . . . . . . . . . . . . 393.37 k-NNR Sensitivity of Frequency D2000, c=3 . . . . . . . . . . . . . . . . . . . . . . . 403.38 k-NNR AP Sensitivity of Frequency D2000, c=3 . . . . . . . . . . . . . . . . . . . . . 403.39 k-NNR NEP Sensitivity of Frequency D2000, c=3 . . . . . . . . . . . . . . . . . . . . 403.40 k-NNR Specificity of Frequency D2000, c=3 . . . . . . . . . . . . . . . . . . . . . . . 403.41 k-NNR Sensitivity of Frequency D5000, c=3 . . . . . . . . . . . . . . . . . . . . . . . 413.42 k-NNR AP Sensitivity of Frequency D5000, c=3 . . . . . . . . . . . . . . . . . . . . . 413.43 k-NNR NEP Sensitivity of Frequency D5000, c=3 . . . . . . . . . . . . . . . . . . . . 413.44 k-NNR Specificity of Frequency D5000, c=3 . . . . . . . . . . . . . . . . . . . . . . . 413.45 k-NNR Sensitivity of Waveform D2000, c=3 . . . . . . . . . . . . . . . . . . . . . . . 423.46 k-NNR AP Sensitivity of Waveform D2000, c=3 . . . . . . . . . . . . . . . . . . . . . 423.47 k-NNR NEP Sensitivity of Waveform D2000, c=3 . . . . . . . . . . . . . . . . . . . . 423.48 k-NNR Specificity of Waveform D2000, c=3 . . . . . . . . . . . . . . . . . . . . . . . 423.49 k-NNR Sensitivity of Waveform D5000, c=3 . . . . . . . . . . . . . . . . . . . . . . . 433.50 k-NNR AP Sensitivity of Waveform D5000, c=3 . . . . . . . . . . . . . . . . . . . . . 433.51 k-NNR NEP Sensitivity of Waveform D5000, c=3 . . . . . . . . . . . . . . . . . . . . 433.52 k-NNR Specificity of Waveform D5000, c=3 . . . . . . . . . . . . . . . . . . . . . . . 433.53 k-NNR Sensitivity of Waveform+Frequency D2000, c=3 . . . . . . . . . . . . . . . . 443.54 k-NNR AP Sensitivity of Waveform+Frequency D2000, c=3 . . . . . . . . . . . . . . 443.55 k-NNR NEP Sensitivity of Waveform+Frequency D2000, c=3 . . . . . . . . . . . . . 443.56 k-NNR Specificity of Waveform+Frequency D2000, c=3 . . . . . . . . . . . . . . . . . 443.57 k-NNR Sensitivity of Waveform+Frequency D5000, c=3 . . . . . . . . . . . . . . . . 453.58 k-NNR AP Sensitivity of Waveform+Frequency D5000, c=3 . . . . . . . . . . . . . . 453.59 k-NNR NEP Sensitivity of Waveform+Frequency D5000, c=3 . . . . . . . . . . . . . 453.60 k-NNR Specificity of Waveform+Frequency D5000, c=3 . . . . . . . . . . . . . . . . . 453.61 k-NNR Sensitivity of Wavelet+XY D2000, c=3 . . . . . . . . . . . . . . . . . . . . . 463.62 k-NNR AP Sensitivity of Wavelet+XY D2000, c=3 . . . . . . . . . . . . . . . . . . . 463.63 k-NNR NEP Sensitivity of Wavelet+XY D2000, c=3 . . . . . . . . . . . . . . . . . . 463.64 k-NNR Specificity of Wavelet+XY D2000, c=3 . . . . . . . . . . . . . . . . . . . . . . 463.65 k-NNR Sensitivity of Wavelet+XY D5000, c=3 . . . . . . . . . . . . . . . . . . . . . 473.66 k-NNR AP Sensitivity of Wavelet+XY D5000, c=3 . . . . . . . . . . . . . . . . . . . 473.67 k-NNR NEP Sensitivity of Wavelet+XY D5000, c=3 . . . . . . . . . . . . . . . . . . 473.68 k-NNR Specificity of Wavelet+XY D5000, c=3 . . . . . . . . . . . . . . . . . . . . . . 473.69 D20 Sensitivity with/without Spatial Info for Risk Ratio Range from 0 to 50 . . . . 483.70 D20 Sensitivity with/without Spatial Info for Risk Ratio Range from 0 to 1 . . . . . 483.71 D20 Specificity with/without Spatial Info for Risk Ratio Range from 0 to 50 . . . . . 493.72 D20 Specificity with/without Spatial Info for Risk Ratio Range from 0 to 1 . . . . . 493.73 D20 Selectivity with/without Spatial Info for Risk Ratio Range from 0 to 50 . . . . . 493.74 D20 Selectivity with/without Spatial Info for Risk Ratio Range from 0 to 1 . . . . . 493.75 Confusion Matrix Sample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 503.76 Mean of D100 CV Sensitivity by Original 7, c=2 . . . . . . . . . . . . . . . . . . . . 583.77 Mean of D100 CV Specificity by Original 7, c=2 . . . . . . . . . . . . . . . . . . . . . 583.78 Compare of Original 7 & ’Best 7’ Sensitivity by ANN Frequency . . . . . . . . . . . 593.79 Compare of Original 7 & ’Best 7’ Specificity by ANN Frequency . . . . . . . . . . . 593.80 Compare of Original 7 & ’Best 7’ Sensitivity by ANN Waveform . . . . . . . . . . . 59

viii

3.81 Compare of Original 7 & ’Best 7’ Specificity by ANN Waveform . . . . . . . . . . . . 593.82 Compare of Original 7 & ’Best 7’ Sensitivity by ANN Waveform+Frequency . . . . . 603.83 Compare of Original 7 & ’Best 7’ Specificity by ANN Waveform+Frequency . . . . . 603.84 Compare of Original 7 & ’Best 7’ Sensitivity by Bayesian Classifier Frequency . . . . 603.85 Compare of Original 7 & ’Best 7’ Specificity by Bayesian Classifier Frequency . . . . 603.86 Compare of Original 7 & ’Best 7’ Sensitivity by Bayesian Classifier Waveform . . . . 613.87 Compare of Original 7 & ’Best 7’ Specificity by Bayesian Classifier Waveform . . . . 613.88 Compare of Original 7 & ’Best 7’ Sensitivity by Bayesian Classifier Waveform+Freq. 613.89 Compare of Original 7 & ’Best 7’ Specificity by Bayesian Classifier Waveform+Freq. 6190 Prob. Plot of ’Best 7’ AEP FHWA VS. ’Normal Distribution’ . . . . . . . . . . . . . 8091 Prob. Plot of ’Best 7’ AEP SHWA VS. ’Normal Distribution’ . . . . . . . . . . . . . 8092 Prob. Plot of ’Best 7’ AEP FHWD VS. ’Normal Distribution’ . . . . . . . . . . . . . 8093 Prob. Plot of ’Best 7’ AEP SHWD VS. ’Normal Distribution’ . . . . . . . . . . . . . 8094 Prob. Plot of ’Best 7’ AEP FHWS VS. ’Normal Distribution’ . . . . . . . . . . . . . 8095 Prob. Plot of ’Best 7’ AEP SHWS VS. ’Normal Distribution’ . . . . . . . . . . . . . 8096 Prob. Plot of ’Best 7’ AEP 1st Freq. Peak VS. ’Normal Distribution’ . . . . . . . . . 8197 Prob. Plot of ’Best 7’ AEP 1st Freq. VS. ’Normal Distribution’ . . . . . . . . . . . . 8198 Prob. Plot of ’Best 7’ AEP 2nd Freq. Peak VS. ’Normal Distribution’ . . . . . . . . 8199 Prob. Plot of ’Best 7’ AEP 2nd Freq. VS. ’Normal Distribution’ . . . . . . . . . . . 81100 Prob. Plot of ’Best 7’ AEP X Value VS. ’Normal Distribution’ . . . . . . . . . . . . 81101 Prob. Plot of ’Best 7’ AEP Y Value VS. ’Normal Distribution’ . . . . . . . . . . . . 81102 Prob. Plot of ’Best 7’ nonAEP FHWA VS. ’Normal Distribution’ . . . . . . . . . . . 81103 Prob. Plot of ’Best 7’ nonAEP SHWA VS. ’Normal Distribution’ . . . . . . . . . . . 81104 Prob. Plot of ’Best 7’ nonAEP FHWD VS. ’Normal Distribution’ . . . . . . . . . . . 81105 Prob. Plot of ’Best 7’ nonAEP SHWD VS. ’Normal Distribution’ . . . . . . . . . . . 82106 Prob. Plot of ’Best 7’ nonAEP FHWS VS. ’Normal Distribution’ . . . . . . . . . . . 82107 Prob. Plot of ’Best 7’ nonAEP SHWS VS. ’Normal Distribution’ . . . . . . . . . . . 82108 Prob. Plot of ’Best 7’ nonAEP 1st Freq. Peak VS. ’Normal Distribution’ . . . . . . 82109 Prob. Plot of ’Best 7’ nonAEP 1st Freq. VS. ’Normal Distribution’ . . . . . . . . . . 82110 Prob. Plot of ’Best 7’ nonAEP 2nd Freq. Peak VS. ’Normal Distribution’ . . . . . . 82111 Prob. Plot of ’Best 7’ nonAEP 2nd Freq. VS. ’Normal Distribution’ . . . . . . . . . 82112 Prob. Plot of ’Best 7’ nonAEP X Value Peak VS. ’Normal Distribution’ . . . . . . . 82113 Prob. Plot of ’Best 7’ nonAEP Y Value Peak VS. ’Normal Distribution’ . . . . . . . 82114 Prob. Plot of ’Best 7’ AEP FHWA VS. ’Exponential Distribution’ . . . . . . . . . . 83115 Prob. Plot of ’Best 7’ AEP SHWA VS. ’Exponential Distribution’ . . . . . . . . . . 83116 Prob. Plot of ’Best 7’ AEP FHWD VS. ’Exponential Distribution’ . . . . . . . . . . 83117 Prob. Plot of ’Best 7’ AEP SHWD VS. ’Exponential Distribution’ . . . . . . . . . . 83118 Prob. Plot of ’Best 7’ AEP FHWS VS. ’Exponential Distribution’ . . . . . . . . . . 83119 Prob. Plot of ’Best 7’ AEP SHWS VS. ’Exponential Distribution’ . . . . . . . . . . 83120 Prob. Plot of ’Best 7’ AEP 1st Freq. Peak VS. ’Exponential Distribution’ . . . . . . 84121 Prob. Plot of ’Best 7’ AEP 1st Freq. VS. ’Exponential Distribution’ . . . . . . . . . 84122 Prob. Plot of ’Best 7’ AEP 2nd Freq. Peak VS. ’Exponential Distribution’ . . . . . . 84123 Prob. Plot of ’Best 7’ AEP 2nd Freq. VS. ’Exponential Distribution’ . . . . . . . . . 84124 Prob. Plot of ’Best 7’ nonAEP FHWA VS. ’Exponential Distribution’ . . . . . . . . 84125 Prob. Plot of ’Best 7’ nonAEP SHWA VS. ’Exponential Distribution’ . . . . . . . . 84126 Prob. Plot of ’Best 7’ nonAEP FHWD VS. ’Exponential Distribution’ . . . . . . . . 84127 Prob. Plot of ’Best 7’ nonAEP SHWD VS. ’Exponential Distribution’ . . . . . . . . 85128 Prob. Plot of ’Best 7’ nonAEP FHWS VS. ’Exponential Distribution’ . . . . . . . . 85129 Prob. Plot of ’Best 7’ nonAEP SHWS VS. ’Exponential Distribution’ . . . . . . . . 85130 Prob. Plot of ’Best 7’ nonAEP 1st Freq. Peak VS. ’Exponential Distribution’ . . . . 85131 Prob. Plot of ’Best 7’ nonAEP 1st Freq. VS. ’Exponential Distribution’ . . . . . . . 85

ix

132 Prob. Plot of ’Best 7’ nonAEP 2nd Freq. Peak VS. ’Exponential Distribution’ . . . 85133 Prob. Plot of ’Best 7’ nonAEP 2nd Freq. VS. ’Exponential Distribution’ . . . . . . . 85134 Distribution of 1st Freq. Peak of D20 AEP Data . . . . . . . . . . . . . . . . . . . . 86135 Distribution of 1st Freq. of D20 AEP Data . . . . . . . . . . . . . . . . . . . . . . . 86136 Distribution of 2nd Freq. Peak of D20 AEP Data . . . . . . . . . . . . . . . . . . . . 86137 Distribution of 2nd Freq. of D20 AEP Data . . . . . . . . . . . . . . . . . . . . . . . 86138 Distribution of 1st PSD Peak of D20 AEP Data . . . . . . . . . . . . . . . . . . . . . 86139 Distribution of 1st PSD Freq. of D20 AEP Data . . . . . . . . . . . . . . . . . . . . 86140 Distribution of 2nd PSD Peak of D20 AEP Data . . . . . . . . . . . . . . . . . . . . 87141 Distribution of 2nd PSD Freq. of D20 AEP Data . . . . . . . . . . . . . . . . . . . . 87142 Distribution of the FHWA of D20 AEP Data . . . . . . . . . . . . . . . . . . . . . . 87143 Distribution of the SHWA of D20 AEP Data . . . . . . . . . . . . . . . . . . . . . . 87144 Distribution of the FHWD of D20 AEP Data . . . . . . . . . . . . . . . . . . . . . . 87145 Distribution of the SHWD of D20 AEP Data . . . . . . . . . . . . . . . . . . . . . . 87146 Distribution of the FHWS of D20 AEP Data . . . . . . . . . . . . . . . . . . . . . . 87147 Distribution of the SHWS of D20 AEP Data . . . . . . . . . . . . . . . . . . . . . . . 87148 Distribution of X Value of D20 AEP Data . . . . . . . . . . . . . . . . . . . . . . . . 87149 Distribution of Y Value of D20 AEP Data . . . . . . . . . . . . . . . . . . . . . . . . 87150 Distribution of 1st Freq. Peak of D20 nonAEP Data . . . . . . . . . . . . . . . . . . 87151 Distribution of 1st Freq. of D20 nonAEP Data . . . . . . . . . . . . . . . . . . . . . 87152 Distribution of 2nd Freq. Peak of D20 nonAEP Data . . . . . . . . . . . . . . . . . . 88153 Distribution of 2nd Freq. of D20 nonAEP Data . . . . . . . . . . . . . . . . . . . . . 88154 Distribution of 1st PSD Peak of D20 nonAEP Data . . . . . . . . . . . . . . . . . . . 88155 Distribution of 1st PSD Freq. of D20 nonAEP Data . . . . . . . . . . . . . . . . . . 88156 Distribution of 2nd PSD Peak of D20 nonAEP Data . . . . . . . . . . . . . . . . . . 88157 Distribution of 2nd PSD Freq. of D20 nonAEP Data . . . . . . . . . . . . . . . . . . 88158 Distribution of the FHWA of D20 nonAEP Data . . . . . . . . . . . . . . . . . . . . 88159 Distribution of the SHWA of D20 nonAEP Data . . . . . . . . . . . . . . . . . . . . 88160 Distribution of the FHWD of D20 nonAEP Data . . . . . . . . . . . . . . . . . . . . 88161 Distribution of the SHWD of D20 nonAEP Data . . . . . . . . . . . . . . . . . . . . 88162 Distribution of the FHWS of D20 nonAEP Data . . . . . . . . . . . . . . . . . . . . 88163 Distribution of the SHWS of D20 nonAEP Data . . . . . . . . . . . . . . . . . . . . 88164 Distribution of X Value of D20 nonAEP Data. Note MultiModal Characteristic. . . . 89165 Distribution of Y Value of D20 nonAEP Data. Note MultiModal Characteristic. . . . 89166 Frequency Response of the Bandpass Filter . . . . . . . . . . . . . . . . . . . . . . . 93167 Frequency Response of the Notch Filter . . . . . . . . . . . . . . . . . . . . . . . . . 93168 Frequency Response of the 2nd Patient’s Raw Data in 13th Channel in D20 . . . . . 94169 Sensitivity of D20 Feature Sets with Spatial Info for Risk Ratio Range from 0 to 50 . 94170 Sensitivity of D20 Feature Sets with Spatial Info for Risk Ratio Range from 0 to 1 . 94171 Specificity of D20 Feature Sets with Spatial Info for Risk Ratio Range from 0 to 50 . 95172 Specificity of D20 Feature Sets with Spatial Info for Risk Ratio Range from 0 to 1 . 95173 Selectivity of D20 Feature Sets with Spatial Info for Risk Ratio Range from 0 to 50 . 95174 Selectivity of D20 Feature Sets with Spatial Info for Risk Ratio Range from 0 to 1 . 95175 Error Rate of D20’s AEP Class under Different Risk Ratio . . . . . . . . . . . . . . . 96176 Error Rate of D20’s nonAEP Class under Different Risk Ratio . . . . . . . . . . . . . 96177 Error Rate of the Whole D20 under Different Risk Ratio . . . . . . . . . . . . . . . . 97178 FastICA Results for Patient#3 of D20: 2 Components . . . . . . . . . . . . . . . . . 99179 FastICA Results for Patient#3 of D20: 3 Components . . . . . . . . . . . . . . . . . 99180 FastICA Results for Patient#3 of D20: 4 Components . . . . . . . . . . . . . . . . . 100181 FastICA Results for Patient#3 of D20: 5 Components . . . . . . . . . . . . . . . . . 100182 FastICA Results for Patient#3 of D20: 1st—3rd of 6 Components . . . . . . . . . . 101

x

183 FastICA Results for Patient#3 of D20: 4th—6th of 6 Components . . . . . . . . . . 101184 FastICA Results for Patient#3 of D20: 1st—4th of 7 Components . . . . . . . . . . 102185 FastICA Results for Patient#3 of D20: 5th—7th of 7 Components . . . . . . . . . . 102186 FastICA Results for Patient#3 of D20: 1st—4th of 8 Components . . . . . . . . . . 103187 FastICA Results for Patient#3 of D20: 5th—8th of 8 Components . . . . . . . . . . 103

xi

Chapter 1

Introduction

1.1 Background

The electroencephalogram (EEG) measures the electrical potential from the asynchronous

firing of billions of neurons within the nervous system on scalp. EEG is recorded on the scalp from

a number of electrodes, usually 30-45 minutes [2].

1.1.1 The International 10-20 System

Figure 1.1: Lobes in Hemisphere

The International 10-20 System of electrode placement is the most widely used method to

1

Figure 1.2: Electrode Location

describe the location of scalp electrodes. Figure 1.2 1 shows a typical 10-20 system. The 10-20

system is based on the relationship between the location of an electrode and the underlying area

of cerebral cortex. Each site has a letter (to identify the lobe) and a number or another letter to

identify the hemisphere location: ’F’ - Frontal lobe, ’T’ - Temporal lobe , ’P’ - Parietal lobe, ’O’ -

Occipital lobe. The location of each lobe on the brain hemisphere is shown in Figure 1.1 2. There

is no central lobe in the cerebral cortex. ’C’ is just used for identification purposes only. ’z’ (zero)

refers to an electrode placed on the midline. Even numbers (2,4,6,8) refer to electrode positions on

the right hemisphere, while odd numbers (1,3,5,7) refer to those on the left hemisphere. The ’10’

and ’20’ refer to the fact that the actual distances between adjacent electrodes are either 10% or 20%

of the total front-back or right-left distance of the skull. Figure 1.3 illustrates how the commonly

used electrodes be arranged using ’10-20’ rules. While recording a more detailed EEG with more

electrodes, extra electrodes are added in between the existing 10-20 system 3.

1From http://aboutneurofeedback.com/headchart.htm2From http://www.epilepsyfoundation.org/about/types/syndromes/temporallobe.cfm3http://www.brainmaster.com/generalinfo/electrodeuse/eegbands/1020/1020.html

2

Figure 1.3: Electrode Distance

1.1.2 Detection of ETs

Abnormal EEG activity can be separated into epileptiform and non-epileptiform activity.

An epileptic seizure is defined as a transient symptom of ’abnormal excessive or synchronous neuronal

activity in the brain’. Between seizures, the EEG of a patient with epilepsy may be characterized

by occasional epileptiform transients (ETs) which consist of spikes or sharp waves which can last for

20-70ms or 70-200ms respectively. Occurrence of ETs usually indicates recurrent seizure in patients

after a first seizure. Thus it becomes important to detect ETs from a clinical standpoint. However,

ETs have wide various morphologies and they are similar to some normal background activities or

artifacts (Shown in Figure 1.4 to Figure 1.6), which increases the difficulty to detect ETs [4]. This

is true whether the detection is due by skilled human or machine.

3

Figure 1.4: Abnormal Epilepti-form PED

Figure 1.5: Artifactual PED Figure 1.6: Normal Electrocor-tical PED

The short-term aim of our work is to develop a classifier with proper feature sets to detect

ETs in EEG signals.

1.2 Available Datasets

Our datasets are obtained from ’eegNet’ 4. Figure 1.7 is the eegNet interface.

Figure 1.7: Interface of ’eegNet’

Two datasets from the ’eegNet’ system were provided: The 20-patient dataset (D20) and

the 100-patient dataset (D100). The signals in both of the datasets were sampled at 256Hz. Each

patient was provided a 30-second length signal segment. There were 3 files to indicate the details

4http://eegnet.clemson.edu/

4

of the dataset D100: a text file records all the raw data; a ’annot.csv’ file records the annotated

segments; a ’class.csv’ file records the classes. There were 2 files to indicate the details of the dataset

D20: one text file records all the raw data; The other text file records both the annotated segments

and the classes.

The raw data appear in text files. Below is a sample of the dataset text file:

’ ......

550119,4,2.63956237 -1.09683490 -2.29316831 5.99740553 6.56948233 -0.37172681 -1.14136672 3.72959113 8.00398254 -

0.17825392 -2.76054430 7.10619974 1.35884488 1.01790762 0.63446009 -1.38020098 1.42206383 -1.65502286 -0.39664027

-5.83198118 0.40588593 -21.09597015,9

550120,5,6.86397362 -0.64254433 -2.65083361 6.62032461 7.27144480 0.54865205 -0.87649751 4.72090673 2.78531981

1.76304972 -0.83939958 10.73378086 4.94329071 3.88529015 3.08858657 0.38335836 4.89542246 1.99085104 1.32382345

-5.87150192 1.26432145 -16.64295387,9

550121,6,7.97851658 -1.91149771 -1.83141732 8.01316357 9.21953392 0.58892983 -0.01626577 5.60230017 0.40754431

2.59789228 0.90154105 10.32989979 6.37197971 6.80893517 4.79823065 2.59815145 6.34335184 4.77180672 2.49411249

-3.18666124 2.36158156 14.37728214,9

...... ’

In one text file, there are 153600 rows (256 samples by 30 seconds by 20 patients). Each row

represents the information of one sample measurement. And there are 25 columns. Each column

represents the following information:

DataPointID, SeqNum, Fp1 F7 T3 T5 O1 F3 C3 P3 A1 Fz Cz Fp2 F8 T4 T6 O2 F4 C4 P4 A2 Pz

ECG, DatasetID

The definitions of the above columns are as following:

• DataPointID, 1 column.

• SeqNum, 1 column, the sample number.

• ChannelValues, 22 column, the temporal data recorded in corresponding channel, the column-

major signal values, separated by spaces.

• DatasetID, 1 column, the dataset they came from.

5

The data we are interested in are from Column 3 to Column 23 (21 columns total), each

column being EEG signals recorded from one of the 21 electrode channels. All the training sets (H)

and test sets (ST ) are derived from these text files.

Corresponding annotations were provided to annotate the class of the samples. They also

appear in text files. There are two text files to demonstrate the signal segment and the corresponding

class respectively. They are ’annot.csv’ and ’class.csv’. Here is a sample of ’annot.csv’:

’ ......

16648,213,333,1,F7-T3,,-5,\N,Unclassified,10,12,9,1273077900

......

20886,555,624,2,T3-T5,,1,\N,Unclassified,10,6,9,1275401859

......

20888,895,1033,2,T3-T5,,1,\N,Unclassified,10,6,9,1275401864

......

’

Each row represents information of one signal segment. Each column represents:

AnnotationID, StartSeqNum, EndSeqNum, ChannelNum, OrigChannelName, Notes, ScaleZoom,

Thumbnail, PEDType, MontageID, UserID, DatasetID, timestamp

The most crucial column definitions are as following:

• AnnotationID

• StartSeqNum, the sample number at which the annotated segment starts.

• EndSeqNum, the sample number at which the annotated segment stops.

• OrigChannelName, bipolar information of one segments. The differences of bipolar channel

pairs are used to extract all the features.

• PEDType, the default value is ’Unclassified’. The vote results from ’class.csv’ will be used

later.

• DatasetID, the the dataset they came from.

’annot.csv’ provides information about segments length and location in the dataset while

’class.csv’ provides class information. Here is a sample of ’class.csv’:

6

’ ......

38742,1278360034,4,10,16648,Normal Electrocortical PED




38746,1280852968,9,10,16648,Artifactual PED


38748,1283112365,12,10,16648,Artifactual PED

......

’

Each row represents information of one reviewer’s vote for class. Each column represents:

ClassificationID, Timestamp, UserID, TrialID, AnnotationID, PEDType

The most crucial column definitions are as following:

• UserID, the identification of the reviewer.

• AnnotationID, which matches the counterpart ’annot.csv’.

• PEDType, the class that the corresponding reviewer votes for.

The definitions of the classes of each signal segment are based on the annotation of each

dataset. D100 has two possible annotations 5 and allows 3 classes:

1. ’Artifactual PED’ (’AP’ class)

2. ’Abnormal Epileptiform PED’ (’AEP’ class)

3. ’Normal Electrocortical PED’ (’NEP’ class)

The two annotations of D100 are the ’original’ annotation and the ’best 7’ annotation. The detail

of the differences in data segments we are interested in is described in Appendix A.

D20 has one annotation and allows 5 classes:

1. ’Artifactual PED’5Depending upon the set of reviewers used to generate the annotations.

7

2. ’Abnormal Epileptiform PED’

3. ’Normal Electrocortical PED’

4. ’Unclassified’ (’U’ class)

5. ’Abnormal Electrocortical Non-Epileptiform’ (’AENE’ class)

The immediate focus of our research is to distinguish class ’Abnormal Epileptiform PED’

(AEP) from the other classes by machine. The number of samples in each class is also used to

estimate the apriori probability of each class, which is an important parameter of Bayesian Classifier.

In addition, other training methods relied on a ’balanced’ training set. Appendix A shows that the

approximation proportion of AEP samples in each overall dataset is less than 2% and only 31

patients provided AEP samples while all provided nonAPE samples in ’best 7’ annotation. Since

the proportion of AEP class is small in the whole datasets, we balanced the presentation of the

two classes in training by balancing the training dataset classes or by adding risk parameters in the

Bayesian Classifier case.

1.3 Related Work

Many approaches to machine classification have been proposed this research area. They

included template matching (e.g. Stevens et al., 1972), parametric methods (e.g. Wilson and

Emerson, 2002), mimetic analysis (e.g. Exarchos et al., 2006), power spectral analysis (e.g. Exarchos

et al., 2006), Wavelet analysis (e.g. Indiradevi et al., 2008) and Artificial neural networks (e.g.

Argoud et al., 2006). Wavelet analysis has seen increased popularity. ICA (De Lucia et al., 2008)

has also been applied. It is still unclear that which algorithm performed the best since the datasets

implemented are different and thus the machine classifiers cannot be directly compared [4].

1.4 Overview of This Thesis

Chapter 2 shows all the feature sets and classifiers involved in this thesis. It also shows

the methodology of evaluation and selection of smoothing windows. Chapter 3 shows the results

under different situations, i.e. classification using different classifiers and feature sets, and also

interprets the trends and results of the experiments. Appendix A analyses the distribution of

8

length of the AEP segments in order to provide an apriori information. Appendix B demonstrates

the distribution of each feature. Appendix C demonstrates the estimate of Bayesian parameters.

Appendix D demonstrates the effects of filters on data and final results. Appendix E demonstrates

the effects of risk factors in the Bayesian Classifier on performance. Appendix F shows and analyses

the FastICA results. Finally, Appendix G shows the code summary.

9

Chapter 2

Research Design and Methods

2.1 Feature Sets

Due to the variety of ETs, we have explored several feature sets:

• Waveform Morphology Feature Set

• Frequency Domain Feature Set

• Wavelet Feature Set

Each is described below.

2.1.1 Waveform Morphology Feature Set

Figure 2.1: Shape Features of an ET Used

An epileptiform transient (ET) is a peak having the following distinguishing features [1].

1) There is a relatively large and smooth slope followed by a relatively large and smooth slope of

10

opposite polarity. 2) The apex of the ET is sharp. 3) Although the two sides may be of unequal

length, the duration of an ET is always between 20 and 70 ms. This duration can be defined as the

sum of the first and second half wave duration.

A model of an ET that possesses the properties described above is a triangular waveform,

shown in Figure 2.1 [1].

There are six waveform features extracted from the differences of the raw data using the triangle

model:

1. First Half Wave Amplitude (FHWA),

2. First Half Wave Duration (FHWD),

3. First Half Wave Slope (FHWS),

4. Second Half Wave Amplitude (SHWA),

5. Second Half Wave Duration (SHWD),

6. Second Half Wave Slope (SHWS),

Figure 2.1 shows FHWA, SHWA, FHWD and SHWD. FHWS and SHWS are not shown in the

figure. They are computed as follows:

First Half Wave Slope (FHWS) = First Half Wave Amplitude (FHWA)/First Half Wave Duration

(FHWD)

and

Second Half Wave Slope (SHWS) = Second Half Wave Amplitude (SHWA)/Second Half Wave

Duration (SHWD)

2.1.2 Frequency Domain Feature Set

An ET has two parts: A sharp apex followed by a slow wave. The sharp apex and the

slow wave are comprised of different frequencies. Theoretically, the energy of the sharp apex is the

largest and the energy of the slow wave is the second largest in the frequency domain. By applying

the Fast Fourier Transform on the raw data, we get the frequency spectrum. Theoretically, the

largest peak in frequency domain corresponds to the sharp apex of ET and the second largest peak

11

in frequency domain corresponds to the slow wave of ET. A hamming window was formed by 64 1

temporal samples to obtain EEG signal segments. For each sample, we extracted four features from

the frequency domain by applying FFT on each EEG signal segment:

1. Largest amplitude value in frequency domain.

2. Frequency of the largest peak.

3. Second largest amplitude value in frequency domain.

4. Frequency of the second largest peak.

Power spectral density (PSD) is a positive real function of a frequency variable associated

with a stationary stochastic process, or a deterministic function of time. It describes how the power

of a signal or time series is distributed within frequency domain. We assume the largest peak in

PSD corresponds to the sharp apex of ET and the second largest peak in PSD corresponds to the

slow wave of ET. We also extracted four features using PSD:

1. Largest amplitude value in PSD.

2. Frequency of the largest peak.

3. Second largest amplitude value in PSD.

4. Frequency of the second largest peak.

2.1.3 Wavelet Feature Set

Figure 2.2: Comparison between wavelet functions and epileptic wave form: (A)scaling function ofDaubechies 4 (DB4), (B)wavelet function of DB4, and (C)the shape of an epileptic spike [6]

The short time Fourier transform has the drawback of fixed time-frequency resolution.

To analyze signal structures of very different sizes, we need to perform a multiresolution analysis

1Window size is discussed in Section 2.4

12

on the recorded EEG signal. The wavelet transform decomposes signals over dilated and translated

wavelets and measures the time evolution of frequency transients [9].

In this case, we chose Daubechies wavelet of order 4 (DB4), since DB4 obtain the highest correlation

coefficients with the epileptic spike among the wavelet bases available in the Matlab toolbox [6].

Figure 2.2 illustrates the similarities between the scaling and wavelet function of DB4 and the shape

of an epileptic spike. A rectangular window was formed by 64 temporal samples to obtain EEG

signal segments. We decomposed these signal segments into 4 levels. Then there are 5 subbands:

4 detail subbands (D1-D4) and one approximation subband (A4). For each subband, we have 35,

21, 14 and 10 detail wavelet coefficients at the first, second, third and fourth levels respectively, and

10 approximation wavelet coefficients at the fourth level. If we use all the coefficients as input, it

will be a high dimension vector with the size of 90. To reduce the dimension of the feature set, we

then used the following statistical features to classify the EEG signals in stead of using the original

wavelet coefficients:

1. Maximum of the wavelet coefficients in each of the 5 subbands (D1, D2, D3, D4 and A4).

2. Minimum of the wavelet coefficients in each of the 5 subbands.

3. Mean of the wavelet coefficients in each of the 5 subbands.

4. Standard deviation of the wavelet coefficients in each of the 5 subbands.

Thereupon, in total we have 20 features for wavelet-based feature set [3].

2.2 Spatial Information

The data we extracted to form features are not the raw data but the difference between

to annotated channels. Recall in Section 1.2, in the annotation of class, each segment has bipolar

information. The bipolar information indicates which two channels out of the 21 channels are used

to compute the difference. The electrode of each channel respectively has a location on the scalp.

The coordinate of this location provides the spatial information we use as additional features.

The X, Y coordinates of channels are computed using distribution of electrodes indicated in Figure

1.2. The coordinate values are shown in Table 2.1.

13

Channel Number Channel Name X Coordinate Value Y Coordinate ValueChannel 1 Fp1 -12.3607 38.0423Channel 2 F7 -32.3607 23.5114Channel 3 T3 -40 0Channel 4 T5 -32.3607 -23.5114Channel 5 O1 -12.3607 -38.0423Channel 6 F3 -15.6429 21.6974Channel 7 C3 -20 0Channel 8 P3 -15.6429 -21.6974Channel 9 A1 -50 0Channel 10 Fz 0 20Channel 11 Cz 0 0Channel 12 Fp2 12.3607 38.0423Channel 13 F8 32.3607 23.5114Channel 14 T4 40 0Channel 15 T6 32.3607 -23.5114Channel 16 O2 12.3607 -38.0423Channel 17 F4 15.6429 21.6974Channel 18 C4 20 0Channel 19 P4 15.6429 -21.6974Channel 20 A2 50 0Channel 21 Pz 0 -20

Table 2.1: Coordinate Information of Electrode Channels

2.3 Classifier

Once the features have been chosen, a classifier must be designed and trained. The classifier’s

performance depends greatly on the characteristics of the training data. The 3 major classifiers we

have considered are:

1. Bayesian classifier

2. Artificial Neural Network

3. k-NNR

We also explored other classification methods:

1. Clustering (including c-means, SOFM, NG)

2. RBF

3. FastICA

14

2.3.1 Bayesian Classifier

Bayesian classifier is a probabilistic classifier based on Bayesian risk minimization. Ad-

equately characterized, sometimes Bayesian classifier can achieve better performance than more

sophisticated classifiers.

For simplicity 2, we assume the feature vectors are characterized by a parametrically adjustable

Gaussian distribution [10].

p(x) = (2π)−d

2 |Σ|−1

2 exp[−1

2(x− µ)TΣ−1(x− µ)] (2.1)

For a 2-class problem, classification decision is made for the input data based on the following

discriminant function:

−1

2(x− µ

1)TΣ−1

1 (x− µ1)−

1

2log |Σ1|+ log{P (ω1)}

ω1

>

<ω2

−1

2(x− µ

2)TΣ−1

2 (x− µ2)−

1

2log |Σ2|+ log{P (ω2)} (2.2)

The derivation of the discriminant function is in Appendix C.

2.3.1.1 Incorporation of Classification Risk Measures

In the real world, the apriori probability is sometimes not the crucial factor we care about

in classification. Risk of mistaking one important class into another will also be concerned. We

have known from Appendix A that AEP, the class we are interested in, has a much smaller apriori

probability (less than 2%) than other classes. In order to balance the training data, we added risk

into the Bayesian classifier. Define parameter λij as the risk of choosing class i when class j is the

true class. Assume there is no cost if choose the right class. The discriminant function turns into:

p(x|ω1)

p(x|ω2)

α1

>

<α2

λ12

λ21

P (ω2)

P (ω1)(2.3)

The derivation of the discriminant function is in Appendix C.

The Bayesian Classifier boundaries occur in regions where the number of training samples

belonging to the two most locally dominant classes are comparable [12]. However, it cannot give a

2Note in retrospect this assumption was only partially true. See Appendix A.

15

solution for complicate boundary cases (e.g. XOR problem). We need a more sophisticated classifier.

2.3.2 Artificial Neural Network

Artificial Neural Networks are known to facilitated classifier design and implementation.

ANN can learn from experience and implement complex decision surfaces although the training

period can be long [11].

Figure 2.3: Multilayer Feedforward Network using Back-Propagation Algorithm

The feedforward network structure is a popular and useful ANN structure. The most popular

training algorithm, namely backpropagation method, is used. Figure 2.3 3 illustrates a simple MLFF

net using BP algorithm. A typical back-propagation training architecture has one input layer, one

output layer, and at least one hidden layer. Figure 2.3 shows the typical structure of a MLFF

network. In this paper, we designed the FF network with one hidden layer. The size of the input

layer is the same as the size of the input vector, d. We set the size of hidden layer as 2d+1 [8].

For a 2-class model, the size of the output layer is 2 units, which indicate the output format is a

2 dimension vector, [0 1]T or [1 0]T . For a 3-class model, the size of the output layer is 3 units.,

which indicate the output format is a 3 dimension vector, [0 0 1]T , [0 1 0]T or [1 0 0]T .

3From http://www.geoneurale.com/MultilayerPerceptrons.htm

16

2.3.3 k-NNR

The k-nearest neighbor rule (k-NNR) is a straightforward, non-parametric method for clas-

sifying objects based on determining k closest training vectors in the feature space. It is the simplest

of all machine learning algorithms: a sample is classified by a majority vote of its k nearest neighbors

(k is a positive integer, typically a small odd number) [10]. Notice that k-NNR has a high compu-

tational complexity and is extremely time-consuming for large datasets. In our research, we used

smaller datasets by randomly choosing samples from original dataset (without repeat) for K-NNR

experiments. They are:

1. 2000-dataset: ’AEP 1000.txt’ & ’nonAEP 1000.txt’, with 1000 samples in each class.

2. 5000-dataset: ’AEP 2500.txt’ & ’nonAEP 2500.txt’, with 2500 samples in each class.

These data are extracted from dataset D100 with ’best 7’ annotation without repeat.

2.3.4 Unsupervised Learning: Clustering

In our problem, unsupervised learning methods (clustering strategies in our case) were used

to see if, for a given feature choice, there are any ’natural’ clusters.

Unsupervised learning is a way to detect ’natural groupings’ or clusters of patterns, without

labeled classes, optimization criterion, feedback, or any other information beyond the raw data and

a grouping principle. Clustering assigns a set of features into subsets so that features in the same

cluster are similar in some sense (e.g. the Euclidean distance between the vectors inside one subsets

are close). Notice that features are not labeled with class [10].

2.3.4.1 c-means

The essence of c-means is to achieve a self-consistent partitioning of the data.

The c-means algorithm is [10]: Given training set H = xk, k = 1, 2, ...

1. Choose the number of classes, c

2. Choose µ̂i, µ̂1, . . . µ̂c

17

3. Classify each xk in H

4. Recompute the estimates for µ̂i, using the results of 3

5. If the µ̂i are consistent, STOP; otherwise go to step 1, 2, or 3

2.3.4.2 Self-Organizing Feature Maps (SOFM)

SOFM reduces the input vector’s dimension by mapping feature space into a one or two

dimensional space. While training the network, only correct the weights topologically near the best-

matching unit (The unit that has the largest inner-product with input vector) [11].

The SOFM algorithm is [11]:

1. Define topological structure of the network, including the dimension of the network, the number

of units in each dimension.

2. Initialize the weights of the units randomly. Notice that the dimension of the units should be

the same with that of the input vector.

3. Compute and choose the best-matching unit.

4. Correct the weights surrounding the winner unit and within the correction radius, using neigh-

borhood function.

5. Iterate Step 3 to Step 4 for all the input vectors.

2.3.4.3 Neural Gas (NG)

The NG approach is an ordering-based approach to self-organization. For implicit sorting,

each unit’s weight correction is inversely proportional with the norm of its difference with the training

vector. 4

The NG algorithm is:

1. Define the number of units in the network.

2. Initialize the weights of the units randomly. Notice that the dimension of the units should be

the same with that of the input vector.

4Cite from teaching document ’Neural Gas Algorithms and Architectures for PR, RJ Schalkoff, March 27,2007’

18

3. Use implicit sorting (compute the relative distance between input vector and unit’s vector) to

correct the weights.

4. Iterate Step 3 for all the input vectors.

2.3.5 Radial Basis Function (RBF) Network

The RBF net is a network which has a similar structure as feedforward network but with

a layer of radial basis units. The hidden-layer of RBF unit employs a Gaussian activation function,

using known centers as Gaussian function’s mean, while the output units are simple linear units [11].

The two steps involved in training the RBF net are:

1. Use other clustering algorithm (e.g. c-means) to determine the RBF unit centers and thus

complement the hidden-layer.

2. Determine the hidden layer to output layer weights, in our case, a pseudoinverse formulation.

2.4 Window Size and Type

Appendix A shows that the length of AEP segments of D100 are varying from 31 to 185.

Thus choosing a reasonable window to fit the large variety of segments becomes crucial.

2.4.1 Window Size

To get the corresponding frequencies, peak values and wavelet coefficients of signal segments,

we need window the data. We do not want our AEP segments to be contaminated by nonAEP data

segments while we want the window long enough to cover the whole AEP event. The size of the

window is expected to be a nth power of 2 to benefit the FFT and WT. In order to determine a

proper window size, we examined the length of AEP segments. The detail of the segments’ length

is presented in Appendix A. Based on the information in Appendix A, we summarize the statistical

distribution of annotated AEP events’ lengths.

segment len<128 segment len>=128 segment len<64 segment len>=6410 11 0 21

Table 2.2: Distribution of AEP Segment Length in the Annotation of D20

19

segment len<128 segment len>=128 segment len<64 segment len>=6487 (91.58%) 8 19 76 (80.00%)

Table 2.3: Distribution of AEP Segment Length in the ’Original’ Annotation of D100

segment len<128 segment len>=128 segment len<64 segment len>=6476 (91.56%) 7 14 69 (83.13%)

Table 2.4: Distribution of AEP Segment Length in the ’Best 7’ Annotation of D100

Table 2.2 shows that no AEP segment is shorter than 64 temporal samples, in D20’s an-

notation. Table 2.3 shows that there are 20% AEP segments shorter than 64 temporal samples in

D100’s original annotation. Table 2.4 shows that there are 16.87% AEP segments shorter than 64

temporal samples in D100’s best 7 annotation. According to the statistical results above, we chose

a window length equal to 64 temporal samples. This window is 250ms in duration since the signal

was sampled at 256Hz. Theoretically this window length is short enough not to be contaminated by

non-annotated information, and is long enough since the lasting time of spikes is 20-70ms and that

of following waves is typically 70-200ms.

In our early research with D20, we also choose another window length as 129 temporal samples to see

whether the change of window size has a distinct or significant effect on final result. The 129-window

result did better than 64-window result when the risk rate (λ12/λ21) is from 0 to 20.

2.4.2 Window Type

The Hamming window, or the raised cosine window, is a smooth time function which can

reduce the spectral leakage in the frequency spectrum. It is a frequently used window in FFT.

For frequency domain analysis, we chose a Hamming window. For wavelet analysis, we chose a

rectangular window.

20

2.5 Classifier and Feature Set Performance Evaluation Method-

ology

2.5.1 Cross-Validation: Training by Patient

Due to the various morphologies of ETs, it is difficult to decide a reasonable size for a

dataset which is large enough for training and classification. We designed a ’leave half out’ method,

called cross-validation training by patient method, to show the trend of sensitivity, specificity and

selectivity while the size of training dataset varies.

The definition of training by patient strategy is shown below. It has the following steps:

Step 1: For each trial, randomly split D100 into 2 mutually exclusive 50-patient subsets. One is

for testing (ST ), the other is for training (H). For H, randomly choose 10 patients’ data and

build a 10-patient training set (H1). Use the same method to build a 20-patient training set

(H2), a 30-patient training set (H3), a 40-patient training set (H4) and a 50-patient training

set (H5).

Step 2: Build 5 ’classifiers’ respectively using the 5 training sets, Hi, i=1, 2, ... 5. For Bayesian

Classifier method, we have 5 groups of classifier parameters. For ANN method, we have 5

trained nets.

Step 3: Apply the classifiers on ST .

Step 4: Iterate Step 1 to Step 3 to appointed trials (In our case, 100 trails). When all the trials are

finished, compute the means and variance of the training set with same number of patients.

The reason of implement ’Step 4’ is to avoid the bias caused by randomly combination of

patients in training set.

2.5.2 K-Fold Method for Estimation

In k-fold cross-validation, the dataset D is randomly split into k mutually exclusive subsets,

D1, D2 ... Dk of approximately equal size. Then we train and test k times, where each time we

train on D \Dt and test on Dt, t=1, 2 ... k [7].

21

In the stratified cross-validation, the folds are stratified so that they contain approximately

the same proportions of labels as the original dataset [7]. This method is mainly used to evaluate

the performance of k-NNR classifier on the two smaller datasets. It is also applied on several feature

sets with Bayesian Classifier and ANN. k=10 was used.

22

Chapter 3

Results and Discussion

The majority of the results discussed in this chapter are derived from dataset D100 with the

’best 7’ annotation. Elements considered include:

1. Class number in model

2. Feature set used

3. Classifier model

4. preprocessing, e.g. filtering

3.1 Definitions Used in the Assessment of Classifier Perfor-

mance

There exist many measures to assess machine classification performance. Here we used:

1. True positive (TP): number of events marked as AEP signal by both the annotation and our

classifier

2. False positive (FP): number of events marked as AEP signal by our classifier but not by the

annotation

3. True negative (TN): number of events that were neither marked as AEP signal by our classifier

nor by the annotation

23

4. False negative (FN): number of AEP signal events marked by the annotation, which were not

marked by our classifier

5. Sensitivity is a system’s capacity to recognize positive events, given by:

Sensitivity = TP /(TP + FN)

6. Specificity is a system’s capacity to recognize negative activity, given by:

Specificity = TN /(TN + FP)

7. Selectivity, also called positive predictive value, is percentage of successful judgments by system

in positive detections:

Selectivity = TP /(TP + FP)

3.2 Cross-Validation Results

3.2.1 2-Class Model

The 2-class (c=2) case refers to ’AEP’ class and ’nonAEP’ class.

Here are the means of the sensitivity and the specificity of cross-validation results of different feature

sets with different classification methods in Figure 3.1 and Figure 3.2 (k-NNR results are in Section

3.3). In Figure 3.1, the trend of all the curves shows significant similarity. We have observed from

this plot that the means of the sensitivity go up approximate 20% while H holds more patients,

except the ’Bayesian Waveform+XY with 100 trails’. Meanwhile, in Figure 3.2, most of the means

of the specificity descend slightly (less than 10%), and even go up when the number of datasets

approach to 50 patients, except the ’Bayesian FFT+XY with 100 trails’, ’Bayesian Wavelet with

100 trails’ and ’Bayesian Wavelet+XY with 100 trails’ results. ’Bayesian FFT+XY with 100 trails’

and ’Bayesian Wavelet with 100 trails’ results decrease nearly 20% while ’Bayesian Wavelet+XY

with 100 trails’ decrease nearly 30%.

From Figure 3.1 to Figure 3.24, we can get following conclusions:

Conclusion 1: Only considering sensitivity in Figure 3.1, ’Bayesian Wavelet(+XY) with 100 trials’

yields the best result. The corresponding specificity, as shown in Figure 3.2, decreases and

levels off.

24

0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Mean of Sensitivity of Different Feature Sets with Risk Adjusted, CV

number of randomly selected EEG datasets(n) for training

Bayesian FFT+XY with 100 trialsBayesian Waveform+XY with 100 trialsANN FFT+XY with 100 trialsANN Waveform+XY with 100 trialsBayesian Waveform+FFT+XY with 100 trialsANN Waveform+FFT+XY with 100 trialsBayesian Wavelet with 100 trialsANN Wavelet with 100 trialsBayesian Wavelet+XY with 100 trialsANN Wavelet+XY with 100 trials

Figure 3.1: Mean of Cross-Validation Sensitivity of D100 annotated by ’best 7’ vs. Training Datasets(class = 2, by patient)

25

0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Mean of Specificity of Different Feature Sets with Risk Adjusted, CV



Figure 3.2: Mean of Cross-Validation Specificity of D100 annotated by ’best 7’ vs. Training Datasets(class = 2, by patient)

26

0 10 20 30 40 500

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1Variance of Sensitivity of Different Feature Sets with Risk Adjusted, CV



Figure 3.3: Variance of Cross-Validation Sensitivity of D100 annotated by ’best 7’ vs. TrainingDatasets (class = 2, by patient)

27

0 10 20 30 40 500

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1Variance of Specificity of Different Feature Sets with Risk Adjusted, CV


Figure 3.4: Variance of Cross-Validation Specificity of D100 annotated by ’best 7’ vs. TrainingDatasets (class = 2, by patient)

28

0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Sensitivity of FFT Feature Set with 100 Trials, Risk Adjusted, CV


Figure 3.5: Bayesian Classifier, Sensitivity ofD100

Frequency Feature Set annotated by ’best 7’ with100 Trial, Risk Adjusted, Cross-Validation Results(class = 2)

0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Specificity of FFT Feature Set with 100 Trials, Risk Adjusted, CV


Figure 3.6: Bayesian Classifier, Specificity ofD100

Frequency Feature Set annotated by ’best 7’ with100 Trial, Risk Adjusted, Cross-Validation Results(class = 2)

0 10 20 30 40 50 600

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Sensitivity of FFT Feature Set with 100 Trials, Risk Adjusted, CV

number of randomly selected EEG data sets(n) for training

Figure 3.7: ANN, Sensitivity of D100 FrequencyFeature Set annotated by ’best 7’ with 100 Trial,Risk Adjusted, Cross-Validation Results (class =2)

0 10 20 30 40 50 600

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Specificity of FFT Feature Set with 100 Trials, Risk Adjusted, CV


Figure 3.8: ANN, Specificity of D100 FrequencyFeature Set annotated by ’best 7’ with 100 Trial,Risk Adjusted, Cross-Validation Results (class =2)

29

0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Sensitivity of Waveform Feature Set with 100 Trials, Risk Adjusted, CV


Figure 3.9: Bayesian Classifier, Sensitivity ofD100

Waveform Feature Set annotated by ’best 7’ with100 Trial, Risk Adjusted, Cross-Validation Results(class = 2)

0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Specificity of Waveform Feature Set with 100 Trials, Risk Adjusted, CV


Figure 3.10: Bayesian Classifier, Specificity ofD100 Waveform Feature Set annotated by ’best7’ with 100 Trial, Risk Adjusted, Cross-ValidationResults (class = 2)

0 10 20 30 40 50 600

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Sensitivity of Waveform Feature Set with 100 Trials, Risk Adjusted, CV


Figure 3.11: ANN, Sensitivity of D100 WaveformFeature Set annotated by ’best 7’ with 100 Trial,Risk Adjusted, Cross-Validation Results (class =2)

0 10 20 30 40 50 600

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Specificity of Waveform Feature Set with 100 Trials, Risk Adjusted, CV


Figure 3.12: ANN, Specificity of D100 WaveformFeature Set annotated by ’best 7’ with 100 Trial,Risk Adjusted, Cross-Validation Results (class =2)

30

0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Sensitivity of Waveform−FFT Feature Set with 100 Trials, Risk Adjusted, CV


Figure 3.13: Bayesian Classifier, Sensitivity ofD100 Waveform+Frequency Feature Set annotatedby ’best 7’ with 100 Trial, Risk Adjusted, Cross-Validation Results (class = 2)

0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Specificity of Waveform−FFT Feature Set with 100 Trials, Risk Adjusted, CV


Figure 3.14: Bayesian Classifier, Specificity ofD100 Waveform+Frequency Feature Set annotatedby ’best 7’ with 100 Trial, Risk Adjusted, Cross-Validation Results (class = 2)

0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Sensitivity of Waveform−FFT Feature Set with 100 Trials, Risk Adjusted, CV


Figure 3.15: ANN, Sensitivity of D100 Wave-form+Frequency Feature Set annotated by ’best7’ with 100 Trial, Risk Adjusted, Cross-ValidationResults (class = 2)

0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Specificity of Waveform−FFT Feature Set with 100 Trials, Risk Adjusted, CV


Figure 3.16: ANN, Specificity of D100 Wave-form+Frequency Feature Set annotated by ’best7’ with 100 Trial, Risk Adjusted, Cross-ValidationResults (class = 2)

31

0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Sensitivity of Wavelet Feature Set with 100 Trials, Risk Adjusted, CV


Figure 3.17: Bayesian Classifier, Sensitivity ofD100 Wavelet Feature Set annotated by ’Best 7’with 100 Trial, Risk Adjusted, Cross-ValidationResults (class = 2)

0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Specificity of Wavelet Feature Set with 100 Trials, Risk Adjusted, CV


Figure 3.18: Bayesian Classifier, Specificity ofD100 Wavelet Feature Set annotated by ’Best 7’with 100 Trial, Risk Adjusted, Cross-ValidationResults (class = 2)

0 10 20 30 40 50 600

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9



Figure 3.19: ANN, Sensitivity of D100 WaveletFeature Set annotated by ’best 7’ with 100 Trial,Risk Adjusted, Cross-Validation Results (class =2)

0 10 20 30 40 50 600

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9



Figure 3.20: ANN, Specificity of D100 WaveletFeature Set annotated by ’best 7’ with 100 Trial,Risk Adjusted, Cross-Validation Results (class =2)

32

0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9



Figure 3.21: Bayesian Classifier, Sensitivity ofD100 Wavelet+XY Feature Set annotated by ’best7’ with 100 Trial, Risk Adjusted, Cross-ValidationResults (class = 2)

0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9



Figure 3.22: Bayesian Classifier, Specificity ofD100 Wavelet+XY Feature Set annotated by ’best7’ with 100 Trial, Risk Adjusted, Cross-ValidationResults (class = 2)

0 10 20 30 40 50 600

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9



Figure 3.23: ANN, Sensitivity of D100

Wavelet+XY Feature Set annotated by ’best 7’with 100 Trial, Risk Adjusted, Cross-ValidationResults (class = 2)

0 10 20 30 40 50 600

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9



Figure 3.24: ANN, Specificity of D100

Wavelet+XY Feature Set annotated by ’best 7’with 100 Trial, Risk Adjusted, Cross-ValidationResults (class = 2)

33

Conclusion 2: Increasing the number of features does not always guarantee a better sensitivity. In

Figure 3.1, the sensitivity of ’ANNWaveform+FFT+XY with 100 trials’ does not change much

from ’ANN Waveform+XY with 100 trials’ and is even lower than that of ’ANN FFT+XY with

100 trials’ when the H size is larger than 20 patients. Although the sensitivity of ’Bayesian

Waveform+FFT+XY with 100 trials’ is better than that of ’Bayesian Waveform+XY with 100

trials’, it is lower than that of ’Bayesian FFT+XY with 100 trials’. It only achieves the mean

of the other 2 results, rather than surpassing them.

Conclusion 3: Adding spatial information helps all feature sets’ sensitivity increase. An exception

is the wavelet feature set. Further research on spatial information will be described in Section

3.4.

Conclusion 4: Although some Bayesian Classifier sensitivity results are superior than others, by

examining the variances in Figure 3.3 and in Figure 3.4, we can conclude that the variances of

Bayesian Classifier results are always larger than corresponding results of ANN. So does the

span of results by trial, which we can conclude from Figure 3.5 to Figure 3.24. For example, in

Figure 3.17, the span of Bayesian Classifier sensitivity at 50 patients is from 0.2 to 1, while the

span of the corresponding ANN sensitivity (use the same feature set) shown in Figure 3.19 is

from 0.45 to 0.85. The causation of large variance and span in Bayesian Classification is that

there are overlaps of AEP and nonAEP datasets and part of the features are not following

a Gaussian Distribution as we assumed for the Bayesian Classifier. The distribution issue is

addressed in Appendix B.

Conclusion 5: Considering both the sensitivity performance in Figure 3.1 and the specificity per-

formance in Figure 3.2, the wavelet feature sets using ANN achieved relatively better results

than other feature sets. Future research is probably going to focus on wavelet feature set and

ANN classification methods.

3.2.2 3-Class Model

To see if classifier performance would have an improvement compare to the 2-class model, we

implemented a 3-class model. The 3-class (c=3) case refer to ’AEP’ class, ’AP’ class and ’NEP’ class.

34

Figure 3.25 and Figure 3.26 show the results of the unbalanced training sets, which means

the apriori probabilities are equal (prob. = 1/3) for the 3 classes.

0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Mean of Sensitivity of Different Feature Sets with P(ωi)=1/3 (class = 3)


Bayesian FFT+XY with 100 trialsBayesian Waveform+XY with 100 trialsANN FFT+XY with 100 trialsANN Waveform+XY with 100 trialsBayesian Waveform+FFT+XY with 100 trialsANN Waveform+FFT+XY with 100 trialsBayesian Wavelet+XY with 100 trialsANN Wavelet+XY with 100 trials

Figure 3.25: Mean of Cross-Validation Sensitiv-ity of D100 annotated by ’best 7’, Unbalanced vs.Training Datasets (class = 3, by patient)

0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Mean of Specificity of Different Feature Sets with P(ωi)=1/3 (class = 3)



Figure 3.26: Mean of Cross-Validation Specificityof D100 annotated by ’best 7’, Unbalanced vs.Training Datasets (class = 3, by patient)

Figure 3.27 and Figure 3.28 show the results for balanced training sets, which means the

apriori probabilities are equal (AEP prob. = 1/2, AP prob.= NEP prob. = 1/4) for AEP and

nonAEP.

If one class dominates the training set (unbalanced), i.e. the preponderance of data comes

from one class in the training set in ANN (or the apriori probability is greater than that of the

other classes in Bayesian Classifier), the classifier tends to favor this class. Hence this class, in our

case, the nonAEP class, will achieve a low error rate as shown in the two specificity result figures.

Although all the curves in both figures vary above 0.65, as a whole the curves in Figure 3.26, which

illustrates the unbalanced case results, are concentrated between 0.8—0.9 while half of the curves

in Figure 3.28, which illustrates the balanced case results, are lower and varying between 0.7—0.8.

However, other measures (e.g. sensitivity) may suffer. The consequence is that the sensitivities

of the unbalanced cases are worse than that of the balanced cases while the specificities go down

(as shown). Comparing the two sensitivity result figures, the points on the curve in Figure 3.25

(unbalanced) always have a lower value than its counterpart in Figure 3.27 (balanced). For this

reason, we typically balanced H for training.

For ANN, there are other factors that will affect the final results since the ANN is memorizing

features by not only the presences of features but also the structure of the network. Adding one

35

0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Mean of Sensitivity of Different Feature Sets, Balanced CV (class = 3)



Figure 3.27: Mean of Cross-Validation Sensitiv-ity of D100 annotated by ’best 7’, Balanced vs.Training Datasets (class = 3, by patient)

0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Mean of Specificity of Different Feature Sets, Balanced CV (class = 3)



Figure 3.28: Mean of Cross-Validation Specificityof D100 annotated by ’best 7’, Balanced vs. Train-ing Datasets (class = 3, by patient)

36

output unit involves 2d+ 1 new factors in weight correction, which can explain the reason that the

sensitivities of the 3-class model are different, in our case lower, than that of the 2-class model.

3.3 k-Fold Results of k-NNR

k-NNR classification experiments were undertaken to assess feature performance and some-

what divorced from a parametric classifier. Experiments were done with fixed cardinality of H

(2000-dataset/5000-dataset). In the k-NNR experiments, we chose k = 1, 3, 5 and compared the

results for different k.

3.3.1 2-Class Model for k-NNR

Figure 3.29 shows an example of one small feature set (2000-dataset)’s 1 k-fold results by

K-NNR. From this plot, we have observed that while the number of nearest neighbor increases in

K-NNR, the sensitivity increases slightly. Meanwhile the specificity descends.

Figure 3.29 to Figure 3.36 show all the results of the c=2 problem.

0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1KNNR Cross Validation of FFT 2000 Feature Set, k−fold

serial number of trial

k=1 Sensitivityk=3 Sensitivityk=5 Sensitivityk=1 Specificityk=3 Specificityk=5 Specificity

Figure 3.29: Sensitivity and Specificity of Fre-quency ’AEP 1000’ & ’nonAEP 1000’ Datasetwith ’k’ Neighbors, k-fold (class = 2)

0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1KNNR Cross Validation of FFT 5000 Feature Set, k−fold



Figure 3.30: Sensitivity and Specificity of Fre-quency ’AEP 2500’ & ’nonAEP 2500’ Datasetwith ’k’ Neighbors, k-fold (class = 2)

The k-NNR results are significantly better than that of the ANN and Bayesian Classifier.

One interpretation is: The Frequency and Wavelet feature vectors were extracted by applying a

window. Two temporal samples may share data information if their temporal distance is less than

1Described in Section 2.3.3

37

0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1KNNR Cross Validation of Waveform 2000 Feature Set, k−fold



Figure 3.31: Sensitivity and Specificity of Wave-form ’AEP 1000’ & ’nonAEP 1000’ Dataset with’k’ Neighbors, k-fold (class = 2)

0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1KNNR Cross Validation of Waveform 5000 Feature Set, k−fold



Figure 3.32: Sensitivity and Specificity of Wave-form ’AEP 2500’ & ’nonAEP 2500’ Dataset with’k’ Neighbors, k-fold (class = 2)

0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1KNNR Cross Validation of Waveform+FFT 2000 Feature Set, k−fold



Figure 3.33: Sensitivity and Specificity of Wave-form+Frequency ’AEP 1000’ & ’nonAEP 1000’Dataset with ’k’ Neighbors, k-fold (class = 2)

0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1KNNR Cross Validation of Waveform+FFT 5000 Feature Set, k−fold



Figure 3.34: Sensitivity and Specificity of Wave-form+Frequency ’AEP 2500’ & ’nonAEP 2500’Dataset with ’k’ Neighbors, k-fold (class = 2)

38

0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1KNNR Cross Validation of Wavelet+XY 2000 Feature Set, k−fold



Figure 3.35: Sensitivity and Specificity ofWavelet+XY ’AEP 1000’ & ’nonAEP 1000’Dataset with ’k’ Neighbors, k-fold (class = 2)

0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1KNNR Cross Validation of Wavelet+XY 5000 Feature Set, k−fold



Figure 3.36: Sensitivity and Specificity ofWavelet+XY ’AEP 2500’ & ’nonAEP 2500’Dataset with ’k’ Neighbors, k-fold (class = 2)

half of the window size which causes similarity between their features. The more temporally closer

the samples are, the higher similarity they are going to show. If two close samples (apparently in the

same class) were selected to H and ST respectively, the one in H will be discovered as the nearest

neighbor of the other in ST , thereby k-NNR yields significant high sensitivity and specificity.

3.3.2 3-Class Model for k-NNR

Change the class number of the model and leave the rest of the code alone. We yield 3-class

(c=3) unbalanced model results, from Figure 3.37 to Figure 3.68. There are two more new types of

results in 3-class model other than sensitivity and specificity. They are AP sensitivity, the accuracy

of AP class:

AP sensitivity = Number of AP samples classified into the right class (AP class) / Total number of

AP samples;

and NEP sensitivity, the accuracy of NEP class:

NEP sensitivity = Number of NEP samples classified into the right class (NEP class) / Total number

of NEP samples.

Our aim is to compare the sensitivity and specificity of the 3-class model with that of the 2-class

model. The purpose that we introduce AP sensitivity and NEP sensitivity is to observe how the

3-class model effect each class.

With reference to the 2-class model, the corresponding sensitivity and specificity of 3-class

39

0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Sensitivity of AEP of KNN Cross Validation of FFT Feature Set


k=1k=3k=5

Figure 3.37: Sensitivity of Frequency ’AEP 1000’,’wf AP 1000’ & ’wf NEP 1000’ Dataset with ’k’Neighbors, k-fold (class = 3)

0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Sensitivity of AP of KNN Cross Validation of FFT Feature Set


k=1k=3k=5

Figure 3.38: AP Sensitivity of Frequency’AEP 1000’, ’wf AP 1000’ & ’wf NEP 1000’Dataset with ’k’ Neighbors, k-fold (class = 3)

0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Sensitivity of NEP of KNN Cross Validation of FFT Feature Set


k=1k=3k=5

Figure 3.39: NEP Sensitivity of Frequency’AEP 1000’, ’wf AP 1000’ & ’wf NEP 1000’Dataset with ’k’ Neighbors, k-fold (class = 3)

0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Specificity of KNN Cross Validation of FFT Feature Set


k=1k=3k=5

Figure 3.40: Specificity of Frequency ’AEP 1000’,’wf AP 1000’ & ’wf NEP 1000’ Dataset with ’k’Neighbors, k-fold (class = 3)

40

0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Sensitivity of AEP of KNN Cross Validation of FFT Feature Set


k=1k=3k=5

Figure 3.41: Sensitivity of Frequency ’AEP 2500’,’wf AP 2500’ & ’wf NEP 2500’ Dataset with ’k’Neighbors, k-fold (class = 3)

0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Sensitivity of AP of KNN Cross Validation of FFT Feature Set


k=1k=3k=5

Figure 3.42: AP Sensitivity of Frequency’AEP 2500’, ’wf AP 2500’ & ’wf NEP 2500’Dataset with ’k’ Neighbors, k-fold (class = 3)

0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Sensitivity of NEP of KNN Cross Validation of FFT Feature Set


k=1k=3k=5

Figure 3.43: NEP Sensitivity of Frequency’AEP 2500’, ’wf AP 2500’ & ’wf NEP 2500’Dataset with ’k’ Neighbors, k-fold (class = 3)

0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Specificity of KNN Cross Validation of FFT Feature Set


k=1k=3k=5

Figure 3.44: Specificity of Frequency ’AEP 2500’,’wf AP 2500’ & ’wf NEP 2500’ Dataset with ’k’Neighbors, k-fold (class = 3)

41

0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Sensitivity of AEP of KNN Cross Validation of Waveform Feature Set


k=1k=3k=5

Figure 3.45: Sensitivity of Waveform’wf AEP 1000’, ’wf AP 1000’ & ’wf NEP 1000’Dataset with ’k’ Neighbors, k-fold (class = 3)

0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Sensitivity of AP of KNN Cross Validation of Waveform Feature Set


k=1k=3k=5

Figure 3.46: AP Sensitivity of Waveform’wf AEP 1000’, ’wf AP 1000’ & ’wf NEP 1000’Dataset with ’k’ Neighbors, k-fold (class = 3)

0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Sensitivity of NEP of KNN Cross Validation of Waveform Feature Set


k=1k=3k=5

Figure 3.47: NEP Sensitivity of Waveform’wf AEP 1000’, ’wf AP 1000’ & ’wf NEP 1000’Dataset with ’k’ Neighbors, k-fold (class = 3)

0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Specificity of KNN Cross Validation of Waveform Feature Set


k=1k=3k=5

Figure 3.48: Specificity of Waveform’wf AEP 1000’, ’wf AP 1000’ & ’wf NEP 1000’Dataset with ’k’ Neighbors, k-fold (class = 3)

42

0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Sensitivity of AEP of KNN Cross Validation of Waveform Feature Set


k=1k=3k=5

Figure 3.49: Sensitivity of Waveform’wf AEP 2500’, ’wf AP 2500’ & ’wf NEP 2500’Dataset with ’k’ Neighbors, k-fold (class = 3)

0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Sensitivity of AP of KNN Cross Validation of Waveform Feature Set


k=1k=3k=5

Figure 3.50: AP Sensitivity of Waveform’wf AEP 2500’, ’wf AP 2500’ & ’wf NEP 2500’Dataset with ’k’ Neighbors, k-fold (class = 3)

0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Sensitivity of NEP of KNN Cross Validation of Waveform Feature Set


k=1k=3k=5

Figure 3.51: NEP Sensitivity of Waveform’wf AEP 2500’, ’wf AP 2500’ & ’wf NEP 2500’Dataset with ’k’ Neighbors, k-fold (class = 3)

0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Specificity of KNN Cross Validation of Waveform Feature Set


k=1k=3k=5

Figure 3.52: Specificity of Waveform’wf AEP 2500’, ’wf AP 2500’ & ’wf NEP 2500’Dataset with ’k’ Neighbors, k-fold (class = 3)

43

0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Sensitivity of AEP of KNN Cross Validation of Waveform+FFT Feature Set


k=1k=3k=5

Figure 3.53: Sensitivity of Wave-form+Frequency ’wfft AEP 1000’, ’wfft AP 1000’& ’wfft NEP 1000’ Dataset with ’k’ Neighbors,k-fold (class = 3)

0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Sensitivity of AP of KNN Cross Validation of Waveform+FFT Feature Set


k=1k=3k=5

Figure 3.54: AP Sensitivity of Wave-form+Frequency ’wfft AEP 1000’, ’wfft AP 1000’& ’wfft NEP 1000’ Dataset with ’k’ Neighbors,k-fold (class = 3)

0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Sensitivity of NEP of KNN Cross Validation of Waveform+FFT Feature Set


k=1k=3k=5

Figure 3.55: NEP Sensitivity of Wave-form+Frequency ’wfft AEP 1000’, ’wfft AP 1000’& ’wfft NEP 1000’ Dataset with ’k’ Neighbors, k-fold (class = 3)

0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Specificity of KNN Cross Validation of Waveform+FFT Feature Set


k=1k=3k=5

Figure 3.56: Specificity of Wave-form+Frequency ’wfft AEP 1000’, ’wfft AP 1000’& ’wfft NEP 1000’ Dataset with ’k’ Neighbors,k-fold (class = 3)

44

0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Sensitivity of AEP of KNN Cross Validation of Waveform+FFT Feature Set


k=1k=3k=5

Figure 3.57: Sensitivity of Wave-form+Frequency ’wfft AEP 2500’, ’wfft AP 2500’& ’wfft NEP 2500’ Dataset with ’k’ Neighbors,k-fold (class = 3)

0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Sensitivity of AP of KNN Cross Validation of Waveform+FFT Feature Set


k=1k=3k=5

Figure 3.58: AP Sensitivity of Wave-form+Frequency ’wfft AEP 2500’, ’wfft AP 2500’& ’wfft NEP 2500’ Dataset with ’k’ Neighbors,k-fold (class = 3)

0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Sensitivity of NEP of KNN Cross Validation of Waveform+FFT Feature Set


k=1k=3k=5

Figure 3.59: NEP Sensitivity of Wave-form+Frequency ’wfft AEP 2500’, ’wfft AP 2500’& ’wfft NEP 2500’ Dataset with ’k’ Neighbors, k-fold (class = 3)

0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Specificity of KNN Cross Validation of Waveform+FFT Feature Set


k=1k=3k=5

Figure 3.60: Specificity of Wave-form+Frequency ’wfft AEP 2500’, ’wfft AP 2500’& ’wfft NEP 2500’ Dataset with ’k’ Neighbors,k-fold (class = 3)

45

0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Sensitivity of AEP of KNN Cross Validation of Wavelet+XY Feature Set


k=1k=3k=5

Figure 3.61: Sensitivity of Wavelet+XY’wl AEP 1000’, ’wl AP 1000’ & ’wl NEP 1000’Dataset with ’k’ Neighbors, k-fold (class = 3)

0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Sensitivity of AP of KNN Cross Validation of Wavelet+XY Feature Set


k=1k=3k=5

Figure 3.62: AP Sensitivity of Wavelet+XY’wl AEP 1000’, ’wl AP 1000’ & ’wl NEP 1000’Dataset with ’k’ Neighbors, k-fold (class = 3)

0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Sensitivity of NEP of KNN Cross Validation of Wavelet+XY Feature Set


k=1k=3k=5

Figure 3.63: NEP Sensitivity of Wavelet+XY’wl AEP 1000’, ’wl AP 1000’ & ’wl NEP 1000’Dataset with ’k’ Neighbors, k-fold (class = 3)

0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Specificity of KNN Cross Validation of Wavelet+XY Feature Set


k=1k=3k=5

Figure 3.64: Specificity of Wavelet+XY’wl AEP 1000’, ’wl AP 1000’ & ’wl NEP 1000’Dataset with ’k’ Neighbors, k-fold (class = 3)

46

0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Sensitivity of AEP of KNN Cross Validation of Wavelet+XY Feature Set


k=1k=3k=5

Figure 3.65: Sensitivity of Wavelet+XY’wl AEP 2500’, ’wl AP 2500’ & ’wl NEP 2500’Dataset with ’k’ Neighbors, k-fold (class = 3)

0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Sensitivity of AP of KNN Cross Validation of Wavelet+XY Feature Set


k=1k=3k=5

Figure 3.66: AP Sensitivity of Wavelet+XY’wl AEP 2500’, ’wl AP 2500’ & ’wl NEP 2500’Dataset with ’k’ Neighbors, k-fold (class = 3)

0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Sensitivity of NEP of KNN Cross Validation of Wavelet+XY Feature Set


k=1k=3k=5

Figure 3.67: NEP Sensitivity of Wavelet+XY’wl AEP 2500’, ’wl AP 2500’ & ’wl NEP 2500’Dataset with ’k’ Neighbors, k-fold (class = 3)

0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Specificity of KNN Cross Validation of Wavelet+XY Feature Set


k=1k=3k=5

Figure 3.68: Specificity of Wavelet+XY’wl AEP 2500’, ’wl AP 2500’ & ’wl NEP 2500’Dataset with ’k’ Neighbors, k-fold (class = 3)

47

model remain almost the same level. Referring to Figure 3.36, Figure 3.65 and Figure 3.68, both

sensitivity and specificity did not vary over 10%. Although the error rate of AP and NEP each is

relatively high, refering to Figure 3.66, Figure 3.67.

3.4 Effect of Spatial Information in Feature Set

Figure 3.69 to Figure 3.74 show the sensitivity, specificity and selectivity results of feature

set with/without spatial features based on D20.

0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

λ12/λ21

sens

itivi

ty

sensitivity of all data

6wave+2−feature4freq+2−feature4psd+2−feature4freq+2−feature64w4psd+2−feature64w6wave+2−feature−flt6wave−feature4freq−feature4psd−feature4freq−feature64w4psd−feature64w6wave−feature−flt

Figure 3.69: Sensitivity of D20 Feature Sets with-/without Spatial Info for Risk Ratio Range from0 to 50

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

λ12/λ21

sens

itivi

ty



Figure 3.70: Sensitivity of D20 Feature Sets with-/without Spatial Info for Risk Ratio Range from0 to 1

Figure 3.70 shows the sensitivity results with the risk ratio ranging from 0 to 1. The dash

lines are corresponding to the solid lines with the same color. The difference is the dash lines

are results yielded without spatial information while the solid line are results yielded with spatial

information. In the plot, we observe that the sensitivities using spatial information are always higher

than their counterparts without spatial information. This is also true in Figure 3.69, which zooms

out the scale of the risk ratio range to 0—50.

Figure 3.72 shows the specificity results with the risk ratio ranging from 0 to 1. we observe

that the difference between feature sets with or without spatial information are not distinct. The

only exception are the 2 PSD related feature sets: The specificity without spatial information is

lower than that with spatial information when the risk ratio are less than 0.4 (0.6 for 129-window

result); then it exceeds. However, from Figure 3.71, we see that all the specificity results tend to be

48

0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

λ12/λ21

spec

ifici

ty

specificity of all data


Figure 3.71: Specificity of D20 Feature Sets with-/without Spatial Info for Risk Ratio Range from0 to 50

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

λ12/λ21

spec

ifici

ty



Figure 3.72: Specificity of D20 Feature Sets with-/without Spatial Info for Risk Ratio Range from0 to 1

0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

λ12/λ21

sele

ctiv

ity

selectivity of all data


Figure 3.73: Selectivity of D20 Feature Sets with-/without Spatial Info for Risk Ratio Range from0 to 50

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

λ12/λ21

sele

ctiv

ity



Figure 3.74: Selectivity of D20 Feature Sets with-/without Spatial Info for Risk Ratio Range from0 to 1

49

the same (approach to 1).

From the above observation, we concluded that spatial information can increase sensitivity

without damaging the performance of specificity much while combining with waveform or frequency

features.

3.5 Comparison of 2-Class Model and 3-Class Model

To illustrate the performance of different feature sets, classifiers and models, use confusion

matrix to show the sensitivity, specificity and correction rate. In a confusion matrix, each row

represents the samples in a predicted class, while each column represents the proportion of samples

in an actual class. The sum of each column is 1. Figure 3.75 shows a simple example of confusion

matrix:

Figure 3.75: Confusion Matrix Sample

Tables 3.1, 3.2 and 3.3 show the confusion matrix of the class = 3 problem and class =

2 problem under the condition of respectively applying Bayesian Classifier, ANN and K-NNR on

Frequency, Waveform or Waveform+Frequency Feature Sets.

Actual Class

Bayesian classifier ann

Predicted Class (balanced) (epochs = 50)

AEP AP NEP AEP AP NEP nonAEP

FFT 10 AEP 28.64% 15.23% 19.25% 25.75% 13.03% 15.76%

Patients AP 20.34% 53.15% 13.59% 25.91% 60.73% 18.78% 85.36%

50

NEP 51.02% 31.62% 67.16% 48.34% 26.24% 65.46%

20 AEP 30.48% 16.55% 19.63% 36.94% 14.28% 20.50%

Patients AP 17.82% 50.65% 9.82% 22.79% 65.55% 16.93% 82.07%

NEP 51.70% 32.80% 70.55% 40.27% 20.17% 62.57%

30 AEP 30.72% 16.08% 18.17% 43.36% 14.22% 22.03%

Patients AP 15.56% 48.97% 8.29% 21.50% 67.65% 15.34% 81.21%

NEP 53.72% 34.95% 73.54% 35.14% 18.13% 62.63%

40 AEP 30.86% 15.32% 16.00% 48.08% 14.29% 22.88%

Patients AP 13.86% 47.75% 7.25% 19.18% 68.66% 14.51% 80.68%

NEP 55.28% 36.93% 76.75% 32.74% 17.06% 62.62%

50 AEP 28.90% 14.18% 13.64% 49.06% 14.04% 22.83%

Patients AP 12.57% 46.80% 6.80% 18.69% 69.60% 14.58% 80.80%

NEP 58.53% 39.02% 79.56% 32.25% 16.37% 62.59%

Waveform 10 AEP 22.32% 14.51% 16.10% 28.69% 13.99% 17.56%

Patients AP 21.22% 55.95% 10.77% 23.39% 66.01% 16.11% 83.94%

NEP 56.46% 29.55% 73.13% 47.93% 20.00% 66.33%

20 AEP 19.13% 9.46% 10.80% 33.93% 12.93% 20.89%

Patients AP 18.21% 56.19% 7.80% 20.18% 72.21% 13.90% 82.40%

NEP 62.66% 34.35% 81.40% 45.88% 14.86% 65.21%

30 AEP 17.51% 8.26% 8.21% 38.14% 12.80% 22.42%

Patients AP 15.48% 53.17% 5.51% 19.09% 73.59% 12.93% 81.59%

NEP 67.01% 38.56% 86.28% 42.77% 13.62% 64.65%

40 AEP 17.28% 7.10% 7.29% 42.58% 12.52% 23.52%

Patients AP 13.79% 52.60% 4.45% 17.96% 74.98% 12.53% 81.03%

NEP 68.93% 40.30% 88.26% 39.46% 12.50% 63.94%

50 AEP 16.29% 6.13% 6.31% 44.14% 12.13% 24.40%

Patients AP 12.71% 51.09% 3.72% 17.07% 75.68% 12.23% 80.68%

NEP 71.00% 42.78% 89.97% 38.79% 12.19% 63.37%

Waveform 10 AEP 24.13% 12.67% 16.04% 25.49% 10.00% 14.29%

+FFT Patients AP 18.96% 62.22% 11.41% 21.06% 66.63% 15.46% 87.49%

NEP 56.92% 25.11 72.54% 53.45% 23.37% 70.25%

20 AEP 22.25% 9.99% 11.89% 31.62% 9.29% 17.32%

Patients AP 17.19% 60.95% 8.02% 19.38% 72.89% 13.57% 86.01%

NEP 60.57% 29.05% 80.09% 49.00% 17.82% 69.11%

30 AEP 23.18% 9.51% 9.93% 36.99% 9.23% 19.20%

Patients AP 14.01% 58.39% 6.54% 17.03% 76.13% 12.43% 84.91%

NEP 62.82% 32.10% 83.53% 45.97% 14.64% 68.36%

40 AEP 23.34% 8.83% 8.87% 40.66% 9.26% 20.27%

51

Patients AP 12.26% 57.60% 5.53% 16.26% 77.18% 12.12% 84.28%

NEP 64.41% 33.57% 85.61% 43.07% 13.56% 67.61%

50 AEP 22.84% 8.49% 7.96% 45.37% 9.25% 21.39%

Patients AP 10.79% 56.16% 4.89% 15.22% 78.08% 11.80% 83.64%

NEP 66.37% 35.35% 87.15% 39.40% 12.67% 66.80%

Wavelet 10 AEP 22.69% 8.21% 11.95% 28.26% 8.10% 13.19%

+XY Patients AP 20.26% 68.64% 6.26% 14.61% 73.13% 11.97% 88.92%

NEP 57.05% 23.15 81.80% 57.12% 18.76% 74.84%

20 AEP 24.86% 8.17% 9.10% 38.42% 7.24% 16.37%

Patients AP 17.45% 66.38% 4.24% 11.28% 78.10% 9.66% 87.43%

NEP 57.69% 25.44% 86.66% 50.30% 14.66% 73.98%

30 AEP 28.40% 9.66% 8.63% 44.46% 7.00% 17.75%

Patients AP 13.38% 63.64% 3.25% 9.06% 80.71% 9.14% 86.70%

NEP 58.22% 26.70% 88.12% 46.47% 12.29% 73.11%

40 AEP 29.50% 10.52% 9.00% 48.84% 6.89% 18.31%

Patients AP 10.99% 62.24% 2.72% 8.96% 81.95% 8.83% 86.43%

NEP 59.51% 27.24% 88.28% 42.20% 11.16% 72.85%

50 AEP 29.51% 11.23% 8.60% 53.70% 6.88% 19.26%

Patients AP 9.74% 60.73% 2.39% 8.15% 82.59% 8.53% 85.90%

NEP 60.75% 28.04% 89.01% 38.16% 10.53% 72.22%

Table 3.1: Confusion Matrix of Bayesian Classify and ANN of class = 3 Problem

Actual Class

Bayesian classifier ann

Predicted Class (balanced) (epochs = 50)

AEP nonAEP AEP nonAEP

FFT 10 AEP 39.75% 29.46%

Patients nonAEP 72.33% 82.24%

20 AEP 51.06% 38.05%


30 AEP 61.45% 45.53%


40 AEP 64.39% 51.76%


50 AEP 68.46% 55.10%


52

Waveform 10 AEP 36.08% 30.35%


20 AEP 41.39% 38.68%


30 AEP 44.69% 43.69%


40 AEP 44.16% 45.51%


50 AEP 44.11% 47.96%


Waveform 10 AEP 35.78% 32.94%

+FFT Patients nonAEP 74.02% 82.45%

20 AEP 46.07% 37.49%


30 AEP 53.31% 41.71%


40 AEP 54.22% 46.83%


50 AEP 56.66% 50.07%


Wavelet 10 AEP 53.98% 48.24%


20 AEP 64.12% 56.28%


30 AEP 69.01% 61.43%


40 AEP 71.10% 63.79%


50 AEP 78.01% 65.93%


Wavelet 10 AEP 40.15% 31.09%

+XY Patients nonAEP 69.71% 85.60%

20 AEP 60.61% 42.08%


30 AEP 68.98% 48.64%


40 AEP 74.76% 53.38%


53

50 AEP 78.33% 56.74%


Table 3.2: Confusion Matrix of Bayesian Classify and ANN of class = 2 Problem

Actual Class

AEP AP NEP AEP nonAEP

FFT K-NNR 2000 AEP 83.56% 15.02% 13.95% 89.20%

with Data non- AP 7.63% 69.57% 13.99% 79.10%

k=1 Set AEP NEP 8.81% 15.41% 72.06%

5000 AEP 71.85% 16.49% 17.67% 81.84%

Data non- AP 14.39% 38.42% 47.1% 71.16%

Set AEP NEP 13.76% 45.08% 35.22%

K-NNR 2000 AEP 88.29% 18.15% 18.47% 92.80%

with Data non- AP 4.87% 67.60% 9.28% 75.70%

k=3 Set AEP NEP 6.84% 14.25% 72.25%

5000 AEP 83.55% 26.08% 27.72% 85.68%

Data non- AP 8.18% 37.12% 38.64% 66.04%

Set AEP NEP 8.27% 36.79% 33.64%

K-NNR 2000 AEP 87.07% 18.35% 20.09% 90.40%

with Data non- AP 6.04% 67.98% 10.34% 74.40%

k=5 Set AEP NEP 6.89% 13.67% 69.57%

5000 AEP 84.00% 26.34% 27.61% 85%

Data non- AP 9.62% 41.47% 42.31% 65.44%

Set AEP NEP 6.38% 32.19% 30.08%

Waveform K-NNR 2000 AEP 84.30% 12.00% 17.30% 87.90%

with Data non- AP 6.30% 68.20% 18.40% 75.20%

k=1 Set AEP NEP 9.40% 19.80% 64.30%

5000 AEP 96.20% 7.64% 8.88% 96.28%

Data non- AP 1.80% 41.28% 58.20% 83.08%

Set AEP NEP 2.00% 51.08% 32.92%

K-NNR 2000 AEP 82.20% 19.70% 26.10% 85.00%

with Data non- AP 8.00% 66.70% 15.40% 73.10%

k=3 Set AEP NEP 9.80% 13.60% 58.50%

5000 AEP 93.72% 17.64% 18.72% 92.12%

Data non- AP 2.88% 38.88% 47.12% 74.88%

Set AEP NEP 3.40% 43.48% 34.16%

54

K-NNR 2000 AEP 77.80% 20.20% 27.60% 82.30%

with Data non- AP 9.80% 67.90% 16.90% 71.30%

k=5 Set AEP NEP 12.40% 11.90% 55.50%

5000 AEP 92.12% 21.72% 23.32% 89.12%

Data non- AP 4.60% 40.04% 46.04% 72.04%

Set AEP NEP 3.28% 38.24% 30.64%

Bayesian 2000 AEP 21.20% 2.90% 4.70% 33.40%

Classifier Data non- AP 11.80% 54.60% 3.60% 89.30%

Set AEP NEP 67.00% 42.50% 91.70%

5000 AEP 26.24% 5.40% 6.12% 57.32%

Data non- AP 60.96% 86.96% 85.72% 75.52%

Set AEP NEP 12.80% 7.64% 8.16%

ANN 2000 AEP 72.30% 13.80% 24.70% 75.10%

Data non- AP 11.60% 76.10% 12.80% 72.00%

Set AEP NEP 16.10% 10.10% 62.50%

5000 AEP 77.92% 23.48% 24.60% 79.40%

Data non- AP 13.52% 51.60% 50.16% 75.00%

Set AEP NEP 8.56% 24.92% 25.24%

Waveform K-NNR 2000 AEP 73.40% 13.60% 23.70% 78.80%

+FFT with Data non- AP 9.40% 64.60% 19.80% 66.30%

k=1 Set AEP NEP 17.20% 21.80% 56.50%

5000 AEP 79.28% 14.64% 16.36% 85.68%

Data non- AP 9.92% 39.48% 46.84% 73.32%

Set AEP NEP 10.80% 45.88% 36.80%

K-NNR 2000 AEP 74.90% 20.70% 33.00% 76.80%

with Data non- AP 8.40% 63.10% 15.20% 66.70%

k=3 Set AEP NEP 16.70% 16.20% 51.80%

5000 AEP 85.72% 25.04% 27.40% 86.36%

Data non- AP 6.96% 35.84% 38.88% 70.64%

Set AEP NEP 7.32% 39.12% 33.72%

K-NNR 2000 AEP 74.40% 18.20% 31.50% 77.40%

with Data non- AP 10.70% 67.40% 18.90% 66.90%

k=5 Set AEP NEP 14.90% 14.40% 49.60%

5000 AEP 83.44% 23.80% 27.00% 85.84%

Data non- AP 10.40% 42.68% 42.04% 69.52%

Set AEP NEP 6.16% 33.52% 30.96%

Bayesian 2000 AEP 26.50% 6.30% 6.50% 55.80%


Set AEP NEP 68.50% 39.50% 89.60%

55

5000 AEP 30.88% 4.64% 5.88% 72.44%

Data non- AP 56.40% 86.08% 84.16% 70.72%

Set AEP NEP 12.72% 9.28% 9.96%

ANN 2000 AEP 76.20% 9.80% 22.60% 81.00%

Data non- AP 8.60% 79.50% 11.90% 77.70%

Set AEP NEP 15.20% 10.70% 65.50%

5000 AEP 83.32% 20.52% 20.96% 85.16%

Data non- AP 7.08% 44.84% 43.80% 77.84%

Set AEP NEP 9.60% 34.64% 35.24%

Wavelet K-NNR 2000 AEP 79.70% 8.30% 20.20% 84.60%

+XY with Data non- AP 3.80% 80.80% 9.90% 76.50%

k=1 Set AEP NEP 16.50% 10.90% 69.90%

5000 AEP 82.92% 11.56% 13.44% 88.88%

Data non- AP 8.32% 41.92% 47.40% 76.52%

Set AEP NEP 8.76% 46.52% 39.16%

K-NNR 2000 AEP 80.90% 10.20% 23.30% 84.30%

with Data non- AP 2.70% 79.10% 7.30% 75.20%

k=3 Set AEP NEP 16.40% 10.70% 69.40%

5000 AEP 87.96% 19.84% 21.68% 88.96%

Data non- AP 5.80% 40.28% 43.08% 75.64%

Set AEP NEP 6.24% 39.88% 35.24%

K-NNR 2000 AEP 81.40% 9.60% 22.90% 85.20%

with Data non- AP 2.70% 79.80% 7.70% 75.50%

k=5 Set AEP NEP 15.90% 10.60% 69.40%

5000 AEP 87.72% 19.40% 20.32% 89.52%

Data non- AP 7.88% 43.52% 45.68% 74.60%

Set AEP NEP 4.40% 37.08% 34.00%

Bayesian 2000 AEP 38.70% 8.00% 10.20% 96.30%


Set AEP NEP 59.80% 22.10% 89.10%

5000 AEP 40.12% 7.24% 9.16% 95.76%

Data non- AP 34.52% 70.24% 72.64% 34.96%

Set AEP NEP 25.36% 22.52% 18.20%

ANN 2000 AEP 86.40% 3.90% 18.00% 84.20%

Data non- AP 2.60% 90.60% 6.10% 80.70%

Set AEP NEP 11.00% 5.50% 75.90%

5000 AEP 85.32% 15.36% 17.16% 87.28%

Data non- AP 6.72% 49.24% 43.92% 83.44%

56

Set AEP NEP 7.96% 35.40% 38.92%

Table 3.3: Confusion Matrix of k-fold

3.5.1 Assessment of the Results

Comparing the sensitivity and specificity of Table 3.1 and Table 3.2, We can conclude that

the sensitivity and specificity of ANN are more stable and the variances are much smaller than that

of Bayesian Classifier although the latter achieves a remarkable low error rate in the nonAEP classes,

especially in NEP class. Using Wavelet+XY results as example, the sensitivity of 50-patient H for

the 3-class model based Bayesian Classifier is 29.51% while that of 2-class model is 78.33%. The

difference is 48.82%. Let all the variables remain the same but use ANN as classification method.

The sensitivity of 50-patient H for 3-class model is 53.70% while that of 2-class model is 56.74%. The

difference is only 3.04%. Although the sensitivity value of 2-class model based Bayesian Classifier

is the highest, recalling the results in Section 3.2 that Bayesian Classifier has higher variance and

larger span, we cannot conclude that Bayesian Classifier achieves best results.

All the sensitivities of 3-class model, shown in Table 3.1, are below 31% no matter how large

the H is. The mean of the correction rate of NEP in 3-class model are over 80% and that of AP are

over 50%, the cost of which is to give up the quality of sensitivity.

By reviewing Table 3.1 and Table 3.2, we see that the specificity is always better than the

corresponding sensitivity. An important factor contributing to the relatively better specificity than

that of sensitivity is the unbalanced patient information. In the D100 dataset, there are only 31

patients providing AEP data segments2 while all the 100 patients providing nonAEP data segments.

Another factor is the percentage of data used for training. In Table 3.3, in contradiction

to the above results, almost all the sensitivities are better than the corresponding specificities since

these results are from a different evaluation method and different datasets. Smaller datasets reduced

the effect of unbalanced apriori H influence.

2’best 7’ condition. For original annotation the number is 33 patient.

57

3.6 Comparison of CV Results between ’Original’ Annota-

tion and ’Best 7’ Annotation

We have know that ’original’ annotation and ’best 7’ annotation are voted by different re-

viewers. It is believed that ’best 7’ annotation are more reliable. There are 28 conflicting cases in

’original’ annotation but only 6 conflicting cases in ’best 7’ annotation. In order to reduce distur-

bance, we removed these conflicting cases when we built datasets for H and ST .

Figure 3.1 shows the cross-validation sensitivity of different D100 feature sets yielding by the

’best 7’ annotation. Figure 3.2 shows the cross-validation specificity of different D100 feature sets

yielding by the ’best 7’ annotation. Figure 3.76 shows the cross-validation sensitivity of different

D100 feature sets yielding by the original 7 annotation. Figure 3.77 shows the cross-validation

specificity of different D100 feature sets yielding by the original 7 annotation.

0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Mean of Sensitivity of Different Feature Sets


Bayesian FFT+XY with 100 trialsBayesian Waveform+XY with 100 trialsANN FFT+XY with 100 trialsANN Waveform+XY with 100 trialsBayesian Waveform+FFT+XY with 100 trialsANN Waveform+FFT+XY with 100 trials

Figure 3.76: Mean of Cross-Validation Sensitivityof D100 annotated by original 7 annot. vs. Train-ing Datasets(class = 2, by patient)

0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Mean of Specificity of Different Feature Sets


Bayesian FFT+XY with 100 trialsBayesian Waveform+XY with 100 trialsANN FFT+XY with 100 trialsANN Waveform+XY with 100 trialsBayesian Waveform+FFT+XY with 100 trialsANN Waveform+FFT+XY with 100 trials

Figure 3.77: Mean of Cross-Validation Specificityof D100 annotated by original 7 annot. vs. Train-ing Datasets(class = 2, by patient)

To compare the details of the results of two annotations while they are under same condition,

consider Figure 3.78 to Figure 3.89. From these figures, we can conclude that the difference between

the results from the original 7 annotation and those from the ’best 7’ annotation are not significant.

This is hardly surprising since we removed conflicting annotated data while building both training

and test sets.

58

0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Mean of Sensitivity of ANN FFT Feature Sets with Risk Adjusted, CV


7 ScorersBest 7 Scorers

Figure 3.78: Comparison of Mean of ANN Cross-Validation Sensitivity of D100 Frequency FeatureSet annotated by original 7 annot. and by ’best 7’annot. vs. Training Datasets(by patient)

0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Mean of Specificity of ANN FFT Feature Sets with Risk Adjusted, CV



Figure 3.79: Comparison of Mean of ANN Cross-Validation Specificity of D100 Frequency FeatureSet annotated by original 7 annot. and by ’best 7’annot. vs. Training Datasets(by patient)

0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Mean of Sensitivity of ANN Waveform Feature Sets with Risk Adjusted, CV



Figure 3.80: Comparison of Mean of ANN Cross-Validation Sensitivity of D100 Waveform FeatureSet annotated by original 7 annot. and by ’best 7’annot. vs. Training Datasets(by patient)

0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Mean of Specificity of ANN Waveform Feature Sets with Risk Adjusted, CV



Figure 3.81: Comparison of Mean of ANN Cross-Validation Specificity of D100 Waveform FeatureSet annotated by original 7 annot. and by ’best 7’annot. vs. Training Datasets(by patient)

59

0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Mean of Sensitivity of ANN Waveform−FFT Feature Sets with Risk Adjusted, CV



Figure 3.82: Comparison of Mean of ANNCross-Validation Sensitivity of D100 Waveform-Frequency Feature Set annotated by original 7annot. and by ’best 7’ annot. vs. TrainingDatasets(by patient)

0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Mean of Specificity of ANN Waveform−FFT Feature Sets with Risk Adjusted, CV



Figure 3.83: Comparison of Mean of ANNCross-Validation Specificity of D100 Waveform-Frequency Feature Set annotated by original 7annot. and by ’best 7’ annot. vs. TrainingDatasets(by patient)

0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Mean of Sensitivity of Bayesian FFT Feature Sets with Risk Adjusted, CV



Figure 3.84: Comparison of Mean of BayesianClassifier Cross-Validation Sensitivity of D100 Fre-quency Feature Set annotated by original 7 annot.and by ’best 7’ annot. vs. Training Datasets(bypatient)

0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Mean of Specificity of Bayesian FFT Feature Sets with Risk Adjusted, CV



Figure 3.85: Comparison of Mean of BayesianClassifier Cross-Validation Specificity of D100 Fre-quency Feature Set annotated by original 7 annot.and by ’best 7’ annot. vs. Training Datasets(bypatient)

60

0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Mean of Sensitivity of Bayesian Waveform Feature Sets with Risk Adjusted, CV



Figure 3.86: Comparison of Mean of BayesianClassifier Cross-Validation Sensitivity of D100

Waveform Feature Set annotated by original 7annot. and by ’best 7’ annot. vs. TrainingDatasets(by patient)

0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Mean of Specificity of Bayesian Waveform Feature Sets with Risk Adjusted, CV



Figure 3.87: Comparison of Mean of BayesianClassifier Cross-Validation Specificity of D100

Waveform Feature Set annotated by original 7annot. and by ’best 7’ annot. vs. TrainingDatasets(by patient)

0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Mean of Sensitivity of Bayesian Waveform−FFT Feature Sets with Risk Adjusted, CV



Figure 3.88: Comparison of Mean of BayesianClassifier Cross-Validation Sensitivity of D100

Waveform-Frequency Feature Set annotated byoriginal 7 annot. and by ’best 7’ annot. vs. Train-ing Datasets(by patient)

0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Mean of Specificity of Bayesian Waveform−FFT Feature Sets with Risk Adjusted, CV



Figure 3.89: Comparison of Mean of BayesianClassifier Cross-Validation Specificity of D100

Waveform-Frequency Feature Set annotated byoriginal 7 annot. and by ’best 7’ annot. vs. Train-ing Datasets(by patient)

61

3.7 Clustering Analysis and RBF Classifier

To see if there are any natural clusters in the dataset, we implemented 4 clustering or related

approaches. They are:

1. c-means

2. SOFM (Self-Organizing Feature Maps)

3. Neural Gas

4. RBF (Radial Basis Function)

The visualization of the distribution of various feature vectors is challenging due to the

dimensionality.

3.7.1 Results of c-means

The results in this section are based on D20. They are shown in Table 3.4. Below is the

explanation of the terms in Table 3.4:

dataset name The dataset where features used in certain trial come from

error number in class i The number of samples of class i misclassified into other classes

error rate of class i Error number in class i over total number in class i

total error number The sum of error numbers of all the classes

total error rate The total error number over total number in the dataset

The error rates in Table 3.4 indicate that the proportion of AEP data and nonAEP data assigned

into two clusters are nearly the same (e.g. For case ’6wave+2-feature’, the error rate of class 1

is 0.7133, indicating 71.33% of class 1 data is assigned into cluster 2 and 28.67% of class 1 data is

assigned into cluster 1; the error rate of class 2 is 0.1954, indicating 19.54% of class 2 data is assigned

into cluster 1 and 80.46% of class 2 data is assigned into cluster 2). In other words they don’t form

any mutually independent clusters since AEP has an unacceptable high error rate.

62

datasetname

errornumber

inclass 1

errorrateof

class 1

errornumber

inclass 2

errorrateof

class 2

totalerror

number

totalerrorrate

6wave+2-feature

2724 0.7133 100071 0.1954 102795 0.1993

4freq+2-feature

3819 1 50932 0.0995 54751 0.1061

4freq+2-feature64w

3817 0.9995 57738 0.1128 61555 0.1193

4psd+2-feature64w

3819 1 58584 0.1144 62403 0.1210

4psd+2-feature64w-filtered

3710 0.9715 121483 0.2373 125193 0.2427

Table 3.4: Error Rate of c-means for Selected D20 Feature Sets (c = 2)

3.7.2 Results of SOFM

2000-dataset (frequency domain feature) was used. A 1-D SOFM with 20 units was imple-

mented for this data. The parameters of the algorithm are shown in Table 3.5.

Parameter ValueTopology neighborhood radius 3Topology neighborhood function Gaussian functionMean of neighborhood function 0

Initial square deviation of neighborhood function The topology radiusFinal square deviation of neighborhood function 1

Decay method of neighborhood function Exponential decayInitial learning rate 0.1

Decay method of learning rate Exponential decayAlpha Product of learning rate and neighborhood function

Initial weights Random decimal between 0 to 0.1

Table 3.5: Parameters of SOFM

Table 3.6 shows the final weights after all training.

As Table 3.6 shows, there are not any obvious clusters.

63

Parameter Weightunit 1 97.048 13.759 63.929 19.83 -5.7205 12.416unit 2 121.06 12.274 80.205 20.298 -5.3931 10.034unit 3 157.42 9.6858 91.126 21.164 -4.2286 6.4564unit 4 203.53 7.1777 79.625 22.2 -4.5286 3.7116unit 5 257.46 5.7691 66.838 22.694 -3.5263 2.5976unit 6 294.34 7.248 87.842 20.712 -3.062 3.2817unit 7 326.82 11.824 155.69 17.831 -3.4815 4.4376unit 8 383.09 14.024 210.14 17.454 -3.0605 4.794unit 9 426.38 10.203 157.9 19.456 -1.8664 5.5764unit 10 454.25 5.9873 89.762 22.161 -1.5552 7.2464unit 11 513.33 4.0623 68.929 24.399 -3.9274 8.3119unit 12 600.92 3.7118 79.235 24.712 -5.0618 8.7674unit 13 710.8 3.6748 95.634 25.264 -3.3576 10.741unit 14 840.12 3.6973 117.42 26.002 -2.7574 11.358unit 15 1007.4 3.8248 155.11 25.314 -4.6366 10.33unit 16 1302.2 4.1413 185.65 24.563 -5.6356 10.308unit 17 1685.5 3.8129 202.25 23.467 -5.2183 8.6291unit 18 2159.6 2.9321 224.97 21.937 -8.3318 5.1305unit 19 2812.2 2.3404 250.76 20.46 -14.182 2.8188unit 20 3358.2 1.9998 279.69 19.485 -20.47 0.23745

Table 3.6: Final Weights of SOFM

3.7.3 Results of NG

In this section, 2000-dataset (frequency domain feature) was used and a NG net with 20

units was implemented for the data. Implicit sorting was implemented. The parameters of the

algorithm are shown in Table 3.7.

Parameter Value’Rewarding’ function Exponential function(1/λ)Decay method of λ Exponential decay

Initial λ 10Final λ 0.01

Final square deviation of neighborhood function 1Initial learning rate 0.5Final Learning rate 0.005

Decay method of learning rate Exponential decayInitial weights random decimal between 0 to 0.1

Table 3.7: Parameters of NG

Table 3.8 shows the final weights after all training.

64

Parameter Weightunit 1 315.42 8.848 92.89 21.193 -4.1396 7.141unit 2 315.42 8.848 92.89 21.193 -4.1396 7.141unit 3 315.42 8.848 92.89 21.193 -4.1396 7.141unit 4 315.42 8.848 92.89 21.193 -4.1396 7.141unit 5 315.42 8.848 92.89 21.193 -4.1396 7.141unit 6 315.42 8.848 92.89 21.193 -4.1396 7.141unit 7 315.42 8.848 92.89 21.193 -4.1396 7.141unit 8 315.42 8.848 92.89 21.193 -4.1396 7.141unit 9 315.42 8.848 92.89 21.193 -4.1396 7.141unit 10 315.42 8.848 92.89 21.193 -4.1396 7.141unit 11 315.42 8.848 92.89 21.193 -4.1396 7.141unit 12 2382.6 3.0974 306 20.515 -12.939 3.2141unit 13 315.42 8.848 92.89 21.193 -4.1396 7.141unit 14 315.42 8.848 92.89 21.193 -4.1396 7.141unit 15 315.42 8.848 92.89 21.193 -4.1396 7.141unit 16 315.42 8.848 92.89 21.193 -4.1396 7.141unit 17 315.42 8.848 92.89 21.193 -4.1396 7.141unit 18 315.42 8.848 92.89 21.193 -4.1396 7.141unit 19 315.42 8.848 92.89 21.193 -4.1396 7.141unit 20 315.42 8.848 92.89 21.193 -4.1396 7.141

Table 3.8: Final Weights of NG

From Table 3.8, we see that there are 2 clusters:

cluster1 [ 315.42 8.848 92.89 21.193 -4.1396 7.141 ];

cluster2 [ 2382.6 3.0974 306 20.515 -12.939 3.2141 ];

However, the clusters do not reflect class if we look into the proportion of the assignment: There

are 887 AEP samples assigned into cluster1 and 113 AEP samples assigned into cluster2; Meanwhile

958 nonAEP samples are assigned into cluster1 and 42 nonAEP samples are assigned into cluster2.

The two clusters are not distinct by the rule of AEP/nonAEP. This classification results of NG are

the same with c-means(c=2) using the same dataset.

3.7.4 Results of RBF

2000-dataset was used in this section. A network with 20 hidden layer units was formed. We

use the results of c-means (c = 20) as the RBF hidden units’ centers. Gaussian function was used as

the radial basis function for hidden units, while the square deviation of the Gaussian function is the

largest distance from training vector to this unit’s center. Pseudoinverse formulation was used to

train the weights from hidden layer to output layer. To evaluate the results, we used k-fold (k=10)

65

method.

By implement RBF, we have the final result of 70.8% sensitivity and 63.5% specificity (the

percentages here are the means nevertheless the deviation is small enough to neglect).

3.8 Conclusion

We have implemented various feature sets and classification methods. We also evaluated the

results using different methods. So far the performance of the machine classifiers we have explored

is probably insufficient for clinical application. The major problem is that there is similarity among

parts of the feature data between the 3 classes (overlap) and there is no natural clustering for

each class. The feature sets we have built cannot be separated by simple hyperplanes. Although

the current ANN can create complex boundaries and achieve relatively stable and higher quality

performance, it is still not good enough. So far, Wavelet feature sets yield the best results among

various feature sets we have derived. 3-class model did not performance as well as 2-class model to

detect AEP samples. We have verified in Section 3.2 that when the size of training set grows, the

sensitivity of the classifier typically increases while the specificity of the classifier typically decreases.

It is possible that there is a lower limit of the dataset’s size that will provide enough information

for classification within admissible error.

3.9 Future Research

Based upon the knowledge from the previous conclusion, our further research is going to focus in

these areas:

1. New neural networks, e.g. multi-hierarchy networks, or networks with preprocessing of other

classification methods.

2. New feature sets. Our focus will be using various wavelet bases to derive features other than

DB4 and compare the performance of different feature sets. We can also combine current

features into new feature set under the condition that the performance is improved.

One of the most important issues in exploring new feature sets is the reduction or conversion

of the raw wavelet coefficients into a smaller fixed-size feature vectors. The methodology used herein

66

(derived from Guler’s paper) [3], while useful, is somewhat ad-hoc. A more comprehensive and

systematic mapping of coefficient sets into feature vectors should be pursued.

Besides, the issue of the classes in our classification model needs further discussion: We

believe the 2-class model or 3-class model might not be the optimal model for the classifiers since

AEP set is a collection of two dissimilar kinds of waves (spikes and slow waves) and the nonAEP set

has two classes (AP and NEP). To solve this issue, our strategy is to apply unsupervised learning

algorithm (e.g. c-means, SOFM, NE) to the training sets individually. It is probably going to

provide guidance on the within-class modes or clusters of feature vectors.

Another disadvantage is that for all the distance-measure-based results in our initial simu-

lation, a Euclidean norm has been adopted. Non-euclidean norms warrant investigation, provided

suitable and applicable norms may be identified. We plan to start with the computation of the

covariance matrices of each training set.

To facilitate analysis of the results, an evaluation method to combine the measure of sen-

sitivity and specificity need to be developed. One possibility is an overall measure of risk, derived

from classification results.

67

Appendices

68

Appendix A Initial Dataset Analysis

A.1 Datasets and Annotations

Initially, an analysis of the raw EEG data and annotation was undertaken. The aim of

the analysis was to determine apriori class probabilities and other factors such as average annota-

tion length. The classes of D20, stored in ’DataPointsDS9.txt’ file, are defined by annotation set

’AnnotationsDS9.txt’ file. There are signal portions which are not addressed or annotated, which

represents about 19% of the sample data.

The classes of D100, stored in ’Datasets1 20.txt’ & ’Datasets21 40.txt’ & ’Datasets41 60.txt’

& ’Datasets61 80.txt’ & ’Datasets81 100.txt’ files, are defined by annotation set ’annot.csv’ file with

two ’class.csv’ files. One ’class.csv’ file is evaluated by 7 experts:

Original 7 experts UserID: 4, 6, 7, 8, 9, 11, 12

The other ’class.csv’ file is evaluated by 11 experts. There are 7 best experts among them:

Current 11 experts UserID: 4, 6, 7, 8, 9, 11, 12, 21, 22, 24, 25

Current 7 Best experts UserID: 4, 6, 9, 11, 21, 24, 25

The research described previously focuses on the evaluation of the best 7 experts on

D100. Table 9 shows the information of Abnormal Epileptiform PED and non Abnormal Epileptiform

PED sample numbers in different datasets.

AEP sample nonAEP sampleD20 3819 512027

D100 Original Annot. 8566 305604D100 Best 7 Annot. 7573 308815

Table 9: AEP Sample Number and nonAEP Sample Number in Each Dataset

Table 10 shows the information of the Abnormal Epileptiform PED signal segment of D20.

Table 11 and Table 12 show the same information of D100 while Table 11 is yielded by the original

7 experts annotation and Table 12 is yielded by the best 7 experts annotation.

Here are the explanation of the terms in Table 10, Table 11 and Table 12:

start No. : The serial number of the sample at which time Abnormal Epileptiform PED start

end No. : The serial number of the sample at which time Abnormal Epileptiform PED end

69

channel1 No. : The number of the minuend channel in annotation

channel2 No. : The number of the subtrahend channel in annotation

patient No. : The serial number of the patient whom this data segment belongs to

length : The number of samples in this data segment

votes : The number that the experts vote for AEP class

start No. end No. channel1 No. channel2 No. patient No. length16089 16250 13 14 3 16217525 17927 2 3 3 40317547 17969 1 2 3 42320803 21156 2 3 3 35420806 21154 6 7 3 34920835 21111 1 2 3 27735110 35285 2 3 5 17655318 55436 12 13 8 11966123 66319 2 3 9 19767888 68064 2 3 9 17769380 69485 2 3 10 10670074 70192 3 4 10 11971597 71722 2 3 10 12672341 72429 1 2 10 8972365 72454 3 4 10 9072494 72557 2 3 10 6476128 76192 2 3 10 6579206 79273 2 3 11 6882703 82848 2 3 11 14688066 88190 14 15 12 125100799 100982 17 18 14 184

Table 10: AEP Signal Segments in the Annotation of D20

start No. end No. channel1 No. channel2 No. patient No. length votes

78349 78383 3 4 11 35 5

78413 78484 3 4 11 72 5

131782 131881 3 4 18 100 3

131784 131825 1 6 18 42 4

134811 134899 1 2 18 89 3

135153 135217 1 2 18 65 3

16089 16177 13 14 23 89 6

17597 17690 12 20 23 94 7

17675 17748 17 18 23 74 7

17766 17828 1 2 23 63 7

17827 17905 13 14 23 79 7

20833 20889 1 2 23 57 7

20865 20929 17 18 23 65 7

20939 20983 17 18 23 45 7

20996 21026 1 2 23 31 6

21050 21110 2 3 23 61 7

70

35158 35233 1 6 25 76 7

35173 35223 12 17 25 51 7

43613 43715 17 18 26 103 4

55343 55436 12 13 28 94 4

67898 68008 2 3 29 111 5

67917 68029 3 4 29 113 5

72362 72442 2 3 30 81 5

74582 74647 3 4 30 66 3

76146 76213 3 4 30 68 3

79204 79271 2 3 31 68 6

79998 80114 2 3 31 117 5

82716 82819 2 3 31 104 6

82720 82815 3 4 31 96 6

86487 86558 13 14 32 72 4

88064 88171 14 15 32 108 5

90189 90310 14 15 32 122 3

91704 91754 3 4 32 51 3

100787 100931 11 21 34 145 7

125591 125694 3 4 37 104 4

10995 11055 1 2 42 61 6

11933 12056 3 4 42 124 7

13054 13157 1 2 42 104 7

41459 41515 17 18 46 57 4

41500 41605 2 3 46 106 7

41627 41750 1 6 46 124 6

41634 41705 10 11 46 72 6

71992 72044 14 15 50 53 4

108131 108234 3 4 55 104 5

110489 110583 2 3 55 95 7

134430 134548 12 17 58 119 6

135045 135196 11 21 58 152 7

8950 9134 6 7 62 185 6

69322 69438 8 5 70 117 4

69832 69937 4 5 70 106 4

70617 70727 3 4 70 111 4

71123 71219 2 3 70 97 3

73562 73640 1 2 70 79 4

73679 73752 8 5 70 74 4

74417 74510 3 4 70 94 4

74621 74739 2 3 70 119 4

6328 6402 2 3 81 75 3

11410 11460 18 19 82 51 4

12936 13056 18 19 82 121 4

13620 13701 18 19 82 82 4

14338 14420 18 19 82 83 4

14955 15052 18 19 82 98 4

71

16043 16176 3 4 83 134 4

16055 16154 4 5 83 100 5

17197 17266 3 4 83 70 4

17778 17886 3 4 83 109 6

17780 17871 1 6 83 92 5

17795 17883 4 5 83 89 6

17935 18045 3 4 83 111 6

26618 26747 18 19 84 130 5

30627 30711 18 19 84 85 3

34455 34560 12 13 85 106 5

41777 41866 7 8 86 90 3

55592 55666 7 8 88 75 3

56666 56767 1 2 88 102 3

78773 78917 1 2 91 145 3

82183 82307 3 4 91 125 3

82426 82531 1 2 91 106 6

82815 82977 1 2 91 163 3

97708 97762 2 22 93 55 3

112117 112274 3 4 95 158 7

112155 112256 2 3 95 102 7

119059 119148 12 17 96 90 5

119192 119289 13 14 96 98 3

122486 122562 14 15 96 77 4

133130 133203 8 5 98 74 3

133135 133197 7 8 98 63 3

141263 141351 12 17 99 89 3

141407 141512 12 17 99 106 3

142659 142753 12 17 99 95 3

149769 149825 3 4 100 57 3

149850 149885 3 4 100 36 4

149857 149887 4 5 100 31 4

149917 149979 3 4 100 63 6

149921 149986 4 5 100 66 6

Table 11: AEP Signal Segments in the Original Annotation of D100

start No. end No. channel1 No. channel2 No. patient No. length votes

13830 13895 1 6 2 66 3

78349 78383 3 4 11 35 4

78413 78484 3 4 11 72 4

122643 122717 8 5 16 75 5

16089 16177 13 14 23 89 4

17597 17690 12 20 23 94 7

17675 17748 17 18 23 74 7

72

17766 17828 1 2 23 63 7

17827 17905 13 14 23 79 7

20833 20889 1 2 23 57 7

20865 20929 17 18 23 65 7

20939 20983 17 18 23 45 7

20996 21026 1 2 23 31 7

21050 21110 2 3 23 61 7

35158 35233 1 6 25 76 7

35173 35223 12 17 25 51 7

43613 43715 17 18 26 103 6

55343 55436 12 13 28 94 6

67898 68008 2 3 29 111 6

67917 68029 3 4 29 113 6

72362 72442 2 3 30 81 7

76146 76213 3 4 30 68 4

79204 79271 2 3 31 68 7

79998 80114 2 3 31 117 5

82716 82819 2 3 31 104 7

82720 82815 3 4 31 96 7

85945 86018 3 4 32 74 4

86487 86558 13 14 32 72 6

88064 88171 14 15 32 108 6

90189 90310 14 15 32 122 4

91704 91754 3 4 32 51 4

100787 100931 11 21 34 145 6

125591 125694 3 4 37 104 4

10995 11055 1 2 42 61 7

11933 12056 3 4 42 124 7

13054 13157 1 2 42 104 7

41459 41515 17 18 46 57 4

41500 41605 2 3 46 106 7

41627 41750 1 6 46 124 6

41634 41705 10 11 46 72 5

72023 72073 14 15 50 51 4

108131 108234 3 4 55 104 5

110489 110583 2 3 55 95 7

134430 134548 12 17 58 119 5

135045 135196 11 21 58 152 7

8950 9134 6 7 62 185 7

69322 69438 8 5 70 117 6

69832 69937 4 5 70 106 6

70617 70727 3 4 70 111 6

71123 71219 2 3 70 97 4

73562 73640 1 2 70 79 4

73679 73752 8 5 70 74 6

74417 74510 3 4 70 94 6

73

74621 74739 2 3 70 119 6

86311 86410 12 17 72 100 4

11410 11460 18 19 82 51 4

12936 13056 18 19 82 121 5

13620 13701 18 19 82 82 5

14338 14420 18 19 82 83 5

14955 15052 18 19 82 98 5

16043 16176 3 4 83 134 5

16055 16154 4 5 83 100 5

17197 17266 3 4 83 70 7

17778 17886 3 4 83 109 7

17780 17871 1 6 83 92 5

17795 17883 4 5 83 89 7

17935 18045 3 4 83 111 7

26618 26747 18 19 84 130 6

34455 34560 12 13 85 106 6

56666 56767 1 2 88 102 5

57067 57218 1 2 88 152 4

82023 82107 3 4 91 85 5

82426 82531 1 2 91 106 7

112117 112274 3 4 95 158 6

112155 112256 2 3 95 102 7

119059 119148 12 17 96 90 5

122486 122562 14 15 96 77 4

140922 141020 12 17 99 99 5

141101 141178 12 17 99 78 5

141934 142006 17 18 99 73 5

149857 149887 4 5 100 31 4

149917 149979 3 4 100 63 4

149921 149986 4 5 100 66 6

Table 12: AEP Signal Segments in the ’Best 7’ Annotation of D100

From Table 10 we can work out that there are 2747 AEP point-in-time, which is 1.79% in

time domain. From Table 11 we can work out that there are 7620 AEP point-in-time, which is 0.99%

in time domain. From Table 12 we can work out that there are 6761 AEP point-in-time, which is

0.88% in time domain.

Table 13 shows the distribution of total number of the Abnormal Epileptiform PED of

different channel pairs of D20. Table 14 and Table 15 show the same information of D100 while

Table 14 is yielded by the original 7 experts annotation and Table 15 is yielded by the best 7 experts

annotation. Here are the explanation of the terms in Table 13, Table 14 and Table 15:

74

channel1 No. : The number of the minuend channel in annotation

channel2 No. : The number of the subtrahend channel in annotation

total samples : Total number of samples of Abnormal Epileptiform PED for the two-channel pair

channel1 No. channel2 No. total samples13 14 1622 3 18821 2 7896 7 34912 13 1193 4 20914 15 12517 18 184

Table 13: Total Number of AEP Samples by Channel Pair in the Annotation of D20

channel1 No. channel2 No. total samples3 4 20011 6 3341 2 106513 14 33812 20 9417 18 3442 3 113612 17 55012 13 20014 15 36011 21 29710 11 726 7 1858 5 2654 5 39218 19 6507 8 2282 22 55

Table 14: Total Number of AEP Samples by Channel Pair in the Original Annotation of D100

A.2 Distribution of the Length of AEP Segments

Tables 16, 17 and 18 show the lengths of all the annotated AEP segments.

No. segment len number

1 31 2

2 35 1

3 36 1

4 42 1

5 45 1

6 51 3

7 53 1

8 55 1

75

9 57 3

10 61 2

11 63 3

12 65 2

13 66 2

14 68 2

15 70 1

16 72 3

17 74 3

18 75 2

19 76 1

20 77 1

21 79 2

22 81 1

23 82 1

24 83 1

25 85 1

26 89 4

27 90 2

28 92 1

29 94 3

30 95 2

31 96 1

32 97 1

33 98 2

34 100 2

35 102 2

36 103 1

37 104 4

38 106 5

39 108 1

40 109 1

41 111 3

42 113 1

43 117 2

44 119 2

45 121 1

46 122 1

47 124 2

48 125 1

49 130 1

50 134 1

51 145 2

52 152 1

53 158 1

54 163 1

76

55 185 1

Table 17: Segment Length of AEP Signal in the Original Annotation of D100

No. segment len number

1 31 2

2 35 1

3 45 1

4 51 4

5 57 2

6 61 2

7 63 2

8 65 1

9 66 2

10 68 2

11 70 1

12 72 3

13 73 1

14 74 3

15 75 1

16 76 1

17 77 1

18 78 1

19 79 2

20 81 1

21 82 1

22 83 1

23 85 1

24 89 2

25 90 1

26 92 1

27 94 3

28 95 1

29 96 1

30 97 1

31 98 1

32 99 1

33 100 2

34 102 2

35 103 1

36 104 4

37 106 4

38 108 1

39 109 1

77

40 111 3

41 113 1

42 117 2

43 119 2

44 121 1

45 122 1

46 124 2

47 130 1

48 134 1

49 145 1

50 152 2

51 158 1

52 185 1

Table 18: Segment Length of AEP Signal in the ’Best 7’ Annotation of D100

78

channel1 No. channel2 No. total samples1 6 3583 4 17768 5 26613 14 24012 20 9417 18 4171 2 7552 3 106112 17 53712 13 20014 15 35811 21 29710 11 726 7 1854 5 39218 19 565

Table 15: Total Number of AEP Samples by Channel Pair in the ’Best 7’ Annotation of D100

No. segment len number1 64 12 65 13 68 14 89 15 90 16 106 17 119 28 125 19 126 110 146 111 162 112 176 113 177 114 184 115 197 116 277 117 349 118 354 119 403 120 423 1

Table 16: Segment Length of AEP Signal in the Annotation of D20

79

Appendix B Distribution of Feature Values

B.1 Probability Plot of D100 Feature Values

Probability plot is a graphical technique for comparing two datasets and assessing how

closely two datasets agree. The Matlab probability plots function, probplot (’distribution’, Y),

produces a probability plot comparing the distribution of the data Y to ’distribution’. Y can be a

single vector, or a matrix with a separate sample in each column. The plot includes a reference line

useful for judging whether the data follow the specified by ’distribution’ 3.

In this section, we respectively compare the distribution of D100 data annotated by the ’best 7’

annotation, with normal distribution and exponential distribution. The results are following.

Compare the distribution of AEP feature set with normal distribution from Figure 90 to Figure 101.

0 100 200 300 400 500

0.0010.0030.01 0.02 0.05 0.10

0.25

0.50

0.75

0.90 0.95 0.98 0.99

0.9970.999

Data

Pro

babi

lity

AEP, First Half Wave Amplitude

Figure 90: Prob. Plot of theFHWA of AEP Data in D100

annotated by ’best 7’ Com-paring ’Normal Distribution’.

−300 −250 −200 −150 −100 −50 0

0.0010.0030.01 0.02 0.05 0.10

0.25

0.50

0.75

0.90 0.95 0.98 0.99

0.9970.999

Data

Pro

babi

lity

AEP, Second Half Wave Amplitude

Figure 91: Prob. Plot of theSHWA of AEP Data in D100


0 10 20 30 40 50 60

0.0010.0030.01 0.02 0.05 0.10

0.25

0.50

0.75

0.90 0.95 0.98 0.99

0.9970.999

Data

Pro

babi

lity

AEP, First Half Wave Duration

Figure 92: Prob. Plot of theFHWD of AEP Data in D100


5 10 15 20 25 30 35

0.0010.0030.01 0.02 0.05 0.10

0.25

0.50

0.75

0.90 0.95 0.98 0.99

0.9970.999

Data

Pro

babi

lity

AEP, Second Half Wave Duration

Figure 93: Prob. Plot of theSHWD of AEP Data in D100


0 10 20 30 40 50

0.0010.0030.01 0.02 0.05 0.10

0.25

0.50

0.75

0.90 0.95 0.98 0.99

0.9970.999

Data

Pro

babi

lity

AEP, First Half Wave Slope

Figure 94: Prob. Plot of theFHWS of AEP Data in D100


−35 −30 −25 −20 −15 −10 −5 0

0.0010.0030.01 0.02 0.05 0.10

0.25

0.50

0.75

0.90 0.95 0.98 0.99

0.9970.999

Data

Pro

babi

lity

AEP, Second Half Wave Slope

Figure 95: Prob. Plot of theSHWS of AEP Data in D100


Compare the distribution of nonAEP feature set with normal distribution from Figure 102 to Figure

113.3From ’Matlab Help’

80

0 500 1000 1500 2000 2500 3000 3500 4000 4500

0.0010.0030.01 0.02 0.05 0.10

0.25

0.50

0.75

0.90 0.95 0.98 0.99

0.9970.999

Data

Pro

babi

lity

AEP, Max FFT Amplitude

Figure 96: Prob. Plot of Am-plitude Value at the LargestPeak of Frequency of AEPData in D100 annotated by’best 7’ Comparing ’NormalDistribution’.

5 10 15 20 25

0.0010.0030.01 0.02 0.05 0.10

0.25

0.50

0.75

0.90 0.95 0.98 0.99

0.9970.999

Data

Pro

babi

lity

AEP, Freqency at Max FFT Amplitude

Figure 97: Prob. Plot of Fre-quency Value at the LargestPeak of Frequency of AEPData in D100 annotated by’best 7’ Comparing ’NormalDistribution’.

0 200 400 600 800 1000 1200 1400 1600 1800

0.0010.0030.01 0.02 0.05 0.10

0.25

0.50

0.75

0.90 0.95 0.98 0.99

0.9970.999

Data

Pro

babi

lity

AEP, 2nd Max FFT Amplitude

Figure 98: Prob. Plot of Am-plitude Value at the SecondLargest Peak of Frequency ofAEP Data in D100 annotatedby ’best 7’ Comparing ’Nor-mal Distribution’.

0 10 20 30 40 50 60 70

0.0010.0030.01 0.02 0.05 0.10

0.25

0.50

0.75

0.90 0.95 0.98 0.99

0.9970.999

Data

Pro

babi

lity

AEP, Freqency at 2nd Max FFT Amplitude

Figure 99: Prob. Plot of Fre-quency Value at the SecondLargest Peak of Frequency ofAEP Data in D100 annotatedby ’best 7’ Comparing ’Nor-mal Distribution’.

−30 −20 −10 0 10 20 30

0.0010.0030.01 0.02 0.05 0.10

0.25

0.50

0.75

0.90 0.95 0.98 0.99

0.9970.999

Data

Pro

babi

lity

AEP, X Coordinate Value

Figure 100: Prob. Plot of XValue of AEP Data in D100


−30 −20 −10 0 10 20 30

0.0010.0030.01 0.02 0.05 0.10

0.25

0.50

0.75

0.90 0.95 0.98 0.99

0.9970.999

Data

Pro

babi

lity

AEP, Y Coordinate Value

Figure 101: Prob. Plot of YValue of AEP Data in D100


0 100 200 300 400 500 600

0.0010.0030.01 0.02 0.05 0.10

0.25

0.50

0.75

0.90 0.95 0.98 0.99

0.9970.999

Data

Pro

babi

lity

nonAEP, First Half Wave Amplitude

Figure 102: Prob. Plot ofthe FHWA of nonAEP Datain D100 annotated by ’best 7’Comparing ’Normal Distribu-tion’.

−500 −400 −300 −200 −100 0

0.0010.0030.01 0.02 0.05 0.10

0.25

0.50

0.75

0.90 0.95 0.98 0.99

0.9970.999

Data

Pro

babi

lity

nonAEP, Second Half Wave Amplitude

Figure 103: Prob. Plot ofthe SHWA of nonAEP Datain D100 annotated by ’best 7’Comparing ’Normal Distribu-tion’.

0 10 20 30 40 50 60 70 80 90

0.0010.0030.01 0.02 0.05 0.10

0.25

0.50

0.75

0.90 0.95 0.98 0.99

0.9970.999

Data

Pro

babi

lity

nonAEP, First Half Wave Duration

Figure 104: Prob. Plot ofthe FHWD of nonAEP Datain D100 annotated by ’best 7’Comparing ’Normal Distribu-tion’.

81

0 10 20 30 40 50 60

0.0010.0030.01 0.02 0.05 0.10

0.25

0.50

0.75

0.90 0.95 0.98 0.99

0.9970.999

Data

Pro

babi

lity

nonAEP, Second Half Wave Duration

Figure 105: Prob. Plot ofthe SHWD of nonAEP Datain D100 annotated by ’best 7’Comparing ’Normal Distribu-tion’.

0 50 100 150 200

0.0010.0030.01 0.02 0.05 0.10

0.25

0.50

0.75

0.90 0.95 0.98 0.99

0.9970.999

DataP

roba

bilit

y

nonAEP, First Half Wave Slope

Figure 106: Prob. Plot ofthe FHWS of nonAEP Datain D100 annotated by ’best 7’Comparing ’Normal Distribu-tion’.

−200 −150 −100 −50 0

0.0010.0030.01 0.02 0.05 0.10

0.25

0.50

0.75

0.90 0.95 0.98 0.99

0.9970.999

Data

Pro

babi

lity

nonAEP, Second Half Wave Slope

Figure 107: Prob. Plot ofthe SHWS of nonAEP Datain D100 annotated by ’best 7’Comparing ’Normal Distribu-tion’.

0 1000 2000 3000 4000 5000 6000 7000 8000 9000

0.0010.0030.01 0.02 0.05 0.10

0.25

0.50

0.75

0.90 0.95 0.98 0.99

0.9970.999

Data

Pro

babi

lity

nonAEP, Max FFT Amplitude

Figure 108: Prob. Plotof Amplitude Value at theLargest Peak of Frequency ofnonAEP Data in D100 anno-tated by ’best 7’ Comparing’Normal Distribution’.

0 10 20 30 40 50 60 70

0.0010.0030.01 0.02 0.05 0.10

0.25

0.50

0.75

0.90 0.95 0.98 0.99

0.9970.999

Data

Pro

babi

lity

nonAEP, Freqency at Max FFT Amplitude

Figure 109: Prob. Plotof Frequency Value at theLargest Peak of Frequency ofnonAEP Data in D100 anno-tated by ’best 7’ Comparing’Normal Distribution’.

0 500 1000 1500 2000

0.0010.0030.01 0.02 0.05 0.10

0.25

0.50

0.75

0.90 0.95 0.98 0.99

0.9970.999

Data

Pro

babi

lity

nonAEP, 2nd Max FFT Amplitude

Figure 110: Prob. Plotof Amplitude Value at theSecond Largest Peak of Fre-quency of nonAEP Data inD100 annotated by ’best 7’Comparing ’Normal Distribu-tion’.

0 10 20 30 40 50 60 70

0.0010.0030.01 0.02 0.05 0.10

0.25

0.50

0.75

0.90 0.95 0.98 0.99

0.9970.999

Data

Pro

babi

lity

nonAEP, Freqency at 2nd Max FFT Amplitude

Figure 111: Prob. Plotof Frequency Value at theSecond Largest Peak of Fre-quency of nonAEP Data inD100 annotated by ’best 7’Comparing ’Normal Distribu-tion’.

−50 −40 −30 −20 −10 0 10 20 30

0.0010.0030.01 0.02 0.05 0.10

0.25

0.50

0.75

0.90 0.95 0.98 0.99

0.9970.999

Data

Pro

babi

lity

nonAEP, X Coordinate Value

Figure 112: Prob. Plot ofX Value of nonAEP Data inD100 annotated by ’best 7’Comparing ’Normal Distribu-tion’.

−30 −20 −10 0 10 20 30

0.0010.0030.01 0.02 0.05 0.10

0.25

0.50

0.75

0.90 0.95 0.98 0.99

0.9970.999

Data

Pro

babi

lity

nonAEP, Y Coordinate Value

Figure 113: Prob. Plot ofY Value of nonAEP Data inD100 annotated by ’best 7’Comparing ’Normal Distribu-tion’.

82

Compare the distribution of AEP feature set with exponential distribution from Figure 114 to Figure

123.

0 100 200 300 400 500 6000.010.25

0.5

0.75

0.9

0.95

0.99

0.995

0.999

0.9995

0.9999

Data

Pro

babi

lity

AEP, First Half Wave Amplitude

Figure 114: Prob. Plot of theFHWA of AEP Data in D100

annotated by ’best 7’ Com-paring ’Exponential Distribu-tion’.

0 50 100 150 200 250 300 3500.010.25

0.5

0.75

0.9

0.95

0.99

0.995

0.999

0.9995

0.9999

Data

Pro

babi

lity

AEP, Second Half Wave Amplitude

Figure 115: Prob. Plot of theSHWA of AEP Data in D100


0 10 20 30 40 50 60 700.010.25

0.5

0.75

0.9

0.95

0.99

0.995

0.999

0.9995

0.9999

Data

Pro

babi

lity

AEP, First Half Wave Duration

Figure 116: Prob. Plot of theFHWD of AEP Data in D100


0 5 10 15 20 25 30 35 400.010.25

0.5

0.75

0.9

0.95

0.99

0.995

0.999

0.9995

0.9999

Data

Pro

babi

lity

AEP, Second Half Wave Duration

Figure 117: Prob. Plot of theSHWD of AEP Data in D100


0 10 20 30 40 50 600.010.25

0.5

0.75

0.9

0.95

0.99

0.995

0.999

0.9995

0.9999

Data

Pro

babi

lity

AEP, First Half Wave Slope

Figure 118: Prob. Plot of theFHWS of AEP Data in D100


0 5 10 15 20 25 30 35 400.010.25

0.5

0.75

0.9

0.95

0.99

0.995

0.999

0.9995

0.9999

Data

Pro

babi

lity

AEP, Second Half Wave Slope

Figure 119: Prob. Plot of theSHWS of AEP Data in D100


Compare the distribution of nonAEP feature set with exponential distribution from Figure 124 to

Figure 133.

From the probability plots, we conclude that there are about 70% Data in D100 with the normal

distribution while there are about 90% Data in D100 with the exponential distribution.

83

0 500 1000 1500 2000 2500 3000 3500 4000 45000.010.25

0.5

0.75

0.9

0.95

0.99

0.995

0.999

0.9995

0.9999

Data

Pro

babi

lity

AEP, Max FFT Amplitude

Figure 120: Prob. Plotof Amplitude Value at theLargest Peak of Frequency ofAEP Data in D100 annotatedby ’best 7’ Comparing ’Expo-nential Distribution’.

0 5 10 15 20 25 300.010.25

0.5

0.75

0.9

0.95

0.99

0.995

0.999

0.9995

0.9999

Data

Pro

babi

lity

AEP, Freqency at Max FFT Amplitude

Figure 121: Prob. Plotof Frequency Value at theLargest Peak of Frequency ofAEP Data in D100 annotatedby ’best 7’ Comparing ’Expo-nential Distribution’.

0 500 1000 1500 20000.010.25

0.5

0.75

0.9

0.95

0.99

0.995

0.999

0.9995

0.9999

Data

Pro

babi

lity

AEP, 2nd Max FFT Amplitude

Figure 122: Prob. Plotof Amplitude Value at theSecond Largest Peak of Fre-quency of AEP Data in D100


0 10 20 30 40 50 60 700.010.25

0.5

0.75

0.9

0.95

0.99

0.995

0.999

0.9995

0.9999

Data

Pro

babi

lity

AEP, Freqency at 2nd Max FFT Amplitude

Figure 123: Prob. Plotof Frequency Value at theSecond Largest Peak of Fre-quency of AEP Data in D100


0 100 200 300 400 500 600 7000.25

0.50.75

0.90.95

0.990.995

0.9990.9995

0.9999

Data

Pro

babi

lity

nonAEP, First Half Wave Amplitude

Figure 124: Prob. Plot ofthe FHWA of nonAEP Datain D100 annotated by ’best 7’Comparing ’Exponential Dis-tribution’.

0 100 200 300 400 500 6000.25

0.50.75

0.90.95

0.990.995

0.9990.9995

0.9999

Data

Pro

babi

lity

nonAEP, Second Half Wave Amplitude

Figure 125: Prob. Plot ofthe SHWA of nonAEP Datain D100 annotated by ’best 7’Comparing ’Exponential Dis-tribution’.

0 10 20 30 40 50 60 70 80 900.25

0.50.75

0.90.95

0.990.995

0.9990.9995

0.9999

Data

Pro

babi

lity

nonAEP, First Half Wave Duration

Figure 126: Prob. Plot ofthe FHWD of nonAEP Datain D100 annotated by ’best 7’Comparing ’Exponential Dis-tribution’.

84

0 10 20 30 40 50 600.25

0.50.75

0.90.95

0.990.995

0.9990.9995

0.9999

Data

Pro

babi

lity

nonAEP, Second Half Wave Duration

Figure 127: Prob. Plot ofthe SHWD of nonAEP Datain D100 annotated by ’best 7’Comparing ’Exponential Dis-tribution’.

0 50 100 150 200 2500.25

0.50.75

0.90.95

0.990.995

0.9990.9995

0.9999

Data

Pro

babi

lity

nonAEP, First Half Wave Slope

Figure 128: Prob. Plot ofthe FHWS of nonAEP Datain D100 annotated by ’best 7’Comparing ’Exponential Dis-tribution’.

0 50 100 150 200 2500.25

0.50.75

0.90.95

0.990.995

0.9990.9995

0.9999

Data

Pro

babi

lity

nonAEP, Second Half Wave Slope

Figure 129: Prob. Plot ofthe SHWS of nonAEP Datain D100 annotated by ’best 7’Comparing ’Exponential Dis-tribution’.

0 1000 2000 3000 4000 5000 6000 7000 8000 90000.25

0.50.75

0.90.95

0.990.995

0.9990.9995

0.9999

Data

Pro

babi

lity

nonAEP, Max FFT Amplitude

Figure 130: Prob. Plotof Amplitude Value at theLargest Peak of Frequency ofnonAEP Data in D100 anno-tated by ’best 7’ Comparing’Exponential Distribution’.

0 10 20 30 40 50 60 70 800.25

0.50.75

0.90.95

0.990.995

0.9990.9995

0.9999

Data

Pro

babi

lity

nonAEP, Freqency at Max FFT Amplitude

Figure 131: Prob. Plotof Frequency Value at theLargest Peak of Frequency ofnonAEP Data in D100 anno-tated by ’best 7’ Comparing’Exponential Distribution’.

0 500 1000 1500 2000 25000.25

0.50.75

0.90.95

0.990.995

0.9990.9995

0.9999

Data

Pro

babi

lity

nonAEP, 2nd Max FFT Amplitude

Figure 132: Prob. Plotof Amplitude Value at theSecond Largest Peak of Fre-quency of nonAEP Data inD100 annotated by ’best 7’Comparing ’Exponential Dis-tribution’.

0 10 20 30 40 50 60 70 800.25

0.50.75

0.90.95

0.990.995

0.9990.9995

0.9999

Data

Pro

babi

lity

nonAEP, Freqency at 2nd Max FFT Amplitude

Figure 133: Prob. Plotof Frequency Value at theSecond Largest Peak of Fre-quency of nonAEP Data inD100 annotated by ’best 7’Comparing ’Exponential Dis-tribution’.

85

B.2 Histogram of D20 Feature Values

In order to assess whether the assumption of Gaussian distribution for the data (used in the Bayesian

Classifier) is appropriate, histograms of the value of the features individually were generated. The

first check is to assess both distribution and number of peaks.

There are large spikes at frequency = 0Hz in frequency domain feature set and PSD feature set

for both AEP and nonAEP data. The distribution of waveform features, amplitudes of frequency

domain and PSD features of nonAEP and the distribution of amplitudes of PSD features of AEP

suggest an exponential distribution.

From the histograms, we conclude that the Gaussian assumption is acceptable for the AEP class

features since there are not obvious multiple peaks in the histograms. However, the Figures indicate

that there are some multimodal spikes in certain non-AEP features.

0 500 1000 1500 2000 25000

10

20

30

40

50

60

70

80

Figure 134: Distribution ofthe Amplitude Value at theLargest Peak of Frequency ofD20 AEP Data

0 5 10 15 20 250

200

400

600

800

1000

1200

Figure 135: Distribution ofthe Frequency Value at theLargest Peak of Frequency ofD20 AEP Data

0 100 200 300 400 500 600 700 800 9000

5

10

15

20

25

30

35

40

45

50

Figure 136: Distribution ofthe Amplitude Value at theSecond Largest Peak of Fre-quency of D20 AEP Data

0 10 20 30 40 50 60 70 800

50

100

150

200

250

300

Figure 137: Distribution ofthe Frequency Value at theSecond Largest Peak of Fre-quency of D20 AEP Data

−2 0 2 4 6 8 10 12

x 104

0

50

100

150

200

250

300

350

400

450

500

Figure 138: Distribution ofthe Amplitude Value at theLargest Peak of PSD of D20

AEP Data

0 5 10 15 20 250

100

200

300

400

500

600

700

800

900

1000

Figure 139: Distribution ofthe Frequency Value at theLargest Peak of PSD of D20

AEP Data

86

−2000 0 2000 4000 6000 8000 10000 12000 14000 16000 180000

50

100

150

200

250

300

350

400

450

Figure 140: Distribution ofthe Amplitude Value at theSecond Largest Peak of PSDof D20 AEP Data

0 10 20 30 40 50 60 700

100

200

300

400

500

600

700

800

Figure 141: Distribution ofthe Frequency Value at theSecond Largest Peak of PSDof D20 AEP Data

−50 0 50 100 150 200 250 3000

50

100

150

200

250

300

Figure 142: Distribution ofthe FHWA of D20 AEP Data

−300 −250 −200 −150 −100 −50 0 500

50

100

150

200

250

Figure 143: Distribution ofthe SHWA of D20 AEP Data

0 5 10 15 20 25 30 35 400

50

100

150

200

250

300

350

400

450

Figure 144: Distribution ofthe FHWD of D20 AEP Data

0 5 10 15 20 25 30 350

50

100

150

200

250

300

350

400

450

Figure 145: Distribution ofthe SHWD of D20 AEP Data

−10 0 10 20 30 40 500

20

40

60

80

100

120

Figure 146: Distribution ofthe FHWS of D20 AEP Data

−35 −30 −25 −20 −15 −10 −5 0 50

20

40

60

80

100

120

140

160

180

200

Figure 147: Distribution ofthe SHWS of D20 AEP Data

−40 −30 −20 −10 0 10 20 30 400

500

1000

1500

2000

2500

Figure 148: Distribution of XValue of D20 AEP Data

−15 −10 −5 0 5 10 15 20 25 30 350

500

1000

1500

2000

2500

Figure 149: Distribution of YValue of D20 AEP Data

0 1000 2000 3000 4000 5000 6000 7000 8000 90000

5000

10000

15000

Figure 150: Distribution ofthe Amplitude Value at theLargest Peak of Frequency ofD20 nonAEP Data

0 10 20 30 40 50 60 70 800

0.5

1

1.5

2

2.5x 10

5

Figure 151: Distribution ofthe Frequency Value at theLargest Peak of Frequency ofD20 nonAEP Data

87

−500 0 500 1000 1500 2000 25000

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5x 10

4

Figure 152: Distribution ofthe Amplitude Value at theSecond Largest Peak of Fre-quency of D20 nonAEP Data

0 10 20 30 40 50 60 70 800

0.5

1

1.5

2

2.5

3

3.5

4x 10

4

Figure 153: Distribution ofthe Frequency Value at theSecond Largest Peak of Fre-quency of D20 nonAEP Data

−2 0 2 4 6 8 10 12 14 16

x 105

0

0.5

1

1.5

2

2.5

3

3.5

4x 10

5

Figure 154: Distribution ofthe Amplitude Value at theLargest Peak of PSD of D20

nonAEP Data

0 10 20 30 40 50 60 70 800

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2x 10

5

Figure 155: Distribution ofthe Frequency Value at theLargest Peak of PSD of D20

nonAEP Data

−5 0 5 10 15 20

x 104

0

0.5

1

1.5

2

2.5

3

3.5

4x 10

5

Figure 156: Distribution ofthe Amplitude Value at theSecond Largest Peak of PSDof D20 nonAEP Data

0 10 20 30 40 50 60 70 800

1

2

3

4

5

6

7x 10

4

Figure 157: Distribution ofthe Frequency Value at theSecond Largest Peak of PSDof D20 nonAEP Data

−100 0 100 200 300 400 500 600 7000

0.5

1

1.5

2

2.5

3

3.5x 10

4

Figure 158: Distribution ofthe FHWA of D20 nonAEPData

−600 −500 −400 −300 −200 −100 0 1000

0.5

1

1.5

2

2.5

3

3.5

4x 10

4

Figure 159: Distribution ofthe SHWA of D20 nonAEPData

−10 0 10 20 30 40 50 60 70 800

2

4

6

8

10

12

14x 10

4

Figure 160: Distribution ofthe FHWD of D20 nonAEPData

−10 0 10 20 30 40 50 600

2

4

6

8

10

12

14x 10

4

Figure 161: Distribution ofthe SHWD of D20 nonAEPData

−50 0 50 100 150 200 2500

2

4

6

8

10

12

14x 10

4

Figure 162: Distribution ofthe FHWS of D20 nonAEPData

−300 −250 −200 −150 −100 −50 0 500

2

4

6

8

10

12

14x 10

4

Figure 163: Distribution ofthe SHWS of D20 nonAEPData

88

−40 −30 −20 −10 0 10 20 30 400

1

2

3

4

5

6

7

8

9

10x 10

4

Figure 164: Distribution of XValue of D20 nonAEP Data.Note MultiModal Character-istic.

−40 −30 −20 −10 0 10 20 30 400

1

2

3

4

5

6

7

8x 10

4

Figure 165: Distribution of YValue of D20 nonAEP Data.Note MultiModal Character-istic.

89

Appendix C Bayesian Parameter Estimation Approach

In Bayesian classifier design, Gaussian model is always the first choice because of its math-

ematical tractability [10]. Below is a multidimensional Gaussian distribution:

p(x) = (2π)−d

2 |Σ|−1

2 exp[−1

2(x− µ)TΣ−1(x− µ)] (1)

The aposteriori probability of a class is

P (ωi|x) =[p(x|ωi)P (ωi)]

p(x)(2)

where

p(x) =∑i

p(x|ωi) (3)

One discriminant function for the ith class is therefore:

gi(x) = P (ωi|x) (4)

The log of gi(x) is

gi(x) = log{P (ωi|x)p(x)} = log{p(x|ωi)}+ log{P (ωi)} (5)

which becomes, in the Gaussian case:

gi(x) = −1

2(x− µ

i)TΣ−1

i (x− µi)− (

d

2) log(2π)−

1

2log |Σi|+ log{P (ωi)} (6)

neglecting (−d/2) log(2π) yields

gi(x) = −1

2(x− µ

i)TΣ−1

i (x− µi)−

1

2log |Σi|+ log{P (ωi)} (7)

For a two class problem, in this minimum error formulation 4 if P (ω1|x) > P (ω2|x), which equal

to g1(ω1|x) > g2(ω2|x), we choose class 1 as the result for input feature vector x, based upon the

4Parametrically adjustable risk measures come later.

90

corresponding discriminant functions,

−1

2(x− µ

1)TΣ−1

1 (x− µ1)−

1

2log |Σ1|+ log{P (ω1)}

ω1

>

<ω2

−1

2(x− µ

2)TΣ−1

2 (x− µ2)−

1

2log |Σ2|+ log{P (ω2)} (8)

C.0.1 Risk Measures

Suppose we choose αi, when ωj was the true class. A more general and parametrically

adjustable risk measurement for the two classes case is

R = λ11P (α1|ω1)P (ω1) + λ21P (α2|ω1)P (ω1) + λ12P (α1|ω2)P (ω2) + λ22P (α2|ω2)P (ω2) (9)

Here the parameter λij is the risk of choosing class i when class j is the true class. The P(αi|ωi)

terms depend on the chosen mapping α(x) → αi (the classifier). Thus, a measure of conditional risk

associated with a c=2 class decision rule is

R[α(x) → α1] = R(α1|x) = λ11P (ω1|x) + λ12P (ω2|x) (10)

for α1 and

R[α(x) → α2] = R(α2|x) = λ21P (ω1|x) + λ22P (ω2|x) (11)

for α2.

Thus, the expected risk is given by the total probability:

R[α(x)] =

∫R[α(x|p(x))]dx (12)

This suggests we minimize the conditional risk R[α(x)|x]. In order to minimize R[α(x)] for the

two-class problem, the decision rule is formulated as:

R[α1|(x)]α2

>

<α1

R[α2|(x)] (13)

91

λ11P (ω1|x) + λ12P (ω2|x)α2

>

<α1

λ21P (ω1|x) + λ22P (ω2|x) (14)

(λ11 − λ21)p(x|ω1)P (ω1)α2

>

<α1

(λ22 − λ12)p(x|ω2)P (ω2) (15)

Assuming(λ11 − λ21) < 0,

p(x|ω1)

p(x|ω2)

α1

>

<α2

(λ22 − λ12)

(λ11 − λ21)

P (ω2)

P (ω1)(16)

While it is not necessary, we often choose λ11 = λ22 = 0 since there is no ’cost’ or ’risk’ in a correct

classification,

p(x|ω1)

p(x|ω2)

α1

>

<α2

λ12

λ21

P (ω2)

P (ω1)(17)

The formulation holds for arbitrary feature vectors [10].

92

Appendix D Filtering

We are not informed whether there are any preprocessing of the given datasets. To eliminate

potential noise, we apply a notch filter at 60Hz and a bandpass filter with passband 0.5∼100Hz.

Filter response is shown in Figures 166 and 167. The corresponding effect upon the raw signal is

shown in Figure 168. The filtered frequency spectrum does not change much in Figure 168.

0 0.2 0.4 0.6 0.8 1−1000

−500

0

500

Normalized Frequency (×π rad/sample)

Pha

se (

degr

ees)

0 0.2 0.4 0.6 0.8 1−400

−300

−200

−100

0


Mag

nitu

de (

dB)

Figure 166: Frequency Response of the Bandpass Filter

0 0.2 0.4 0.6 0.8 1−100

−50

0

50

100


Pha

se (

degr

ees)

0 0.2 0.4 0.6 0.8 1−300

−200

−100

0


Mag

nitu

de (

dB)

Figure 167: Frequency Response of the Notch Filter

Of course, other filtering strategies and corresponding classification effects could be considered.

Filtering causes result deviation in frequency feature set while it has almost no effect on

other datasets, which is shown from Figure 169 to Figure 174. However, by examining Figure 168,

we believed that the data is not contaminated by power frequency or low frequency noise (e.g. bias).

93

0 20 40 60 80 100 1200

500

1000

1500

fft of raw data of 13th channel of 2nd patientfft of data after notch filterfft of data after notch and bandpass filter

Figure 168: Frequency Response of the 2nd Patient’s Raw Data in 13th Channel in D20

0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

λ12/λ21

sens

itivi

ty


6wave+2−feature4freq+2−feature4psd+2−feature4freq+2−feature64w4psd+2−feature64w6wave+2−feature−flt4freq+2−feature64w−flt4psd+2−feature64w−flt

Figure 169: Sensitivity of D20 Feature Sets withSpatial Info for Risk Ratio Range from 0 to 50

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

λ12/λ21

sens

itivi

ty



Figure 170: Sensitivity of D20 Feature Sets withSpatial Info for Risk Ratio Range from 0 to 1

It is not necessary to filter the data.

94

0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

λ12/λ21

spec

ifici

ty



Figure 171: Specificity of D20 Feature Sets withSpatial Info for Risk Ratio Range from 0 to 50

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

λ12/λ21

spec

ifici

ty



Figure 172: Specificity of D20 Feature Sets withSpatial Info for Risk Ratio Range from 0 to 1

0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

sele

ctiv

ity


6wave+2−feature4freq+2−feature4psd+2−feature4freq+2−feature64w4psd+2−feature64w6wave+2−feature−fltflt 4freq+2−feature64w−fltflt 4psd+2−feature64w−flt

Figure 173: Selectivity of D20 Feature Sets withSpatial Info for Risk Ratio Range from 0 to 50

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

λ12/λ21

sele

ctiv

ity



Figure 174: Selectivity of D20 Feature Sets withSpatial Info for Risk Ratio Range from 0 to 1

95

Appendix E Risk Factors and Effects in the Bayesian Clas-

sifier

An alternative visualization involves consideration of (class specific) risk. A different window

size is also used as reference. Some results are shown here(Figure 175, Figure 176 and Figure 177).

λ12: λ12 is the risk of choosing class 1 when class 2 is the true class.

λ21: λ21 is the risk of choosing class 2 when class 1 is the true class.

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

λ12/λ21

erro

r ra

te

error rate/risk ratio of AEP data (class1)

6wave+2−feature6wave−feature4freq+2−feature4freq−feature4psd+2−feature4psd−feature2−feature4freq+2−feature64w4freq−feature64w4psd+2−feature64w4psd−feature64w

Figure 175: Error Rate of D20’s AEP Class under Different Risk Ratio

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

λ12/λ21

erro

r ra

te

error rate/risk ratio of nonAEP data (class2)


Figure 176: Error Rate of D20’s nonAEP Class under Different Risk Ratio

96

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

λ12/λ21

erro

r ra

te

error rate/risk ratio of all data


Figure 177: Error Rate of the Whole D20 under Different Risk Ratio

Adding risk of one class is equal to increasing the apriori probability. With AEP’s risk

increasing, the sensitivity approaches to 1. On the other hand the specificity descends to 0.

In sensitivity results, PSD feature set applied with 129 hamming window with spatial in-

formation presents the highest sensitivity result at most of the risk ratio while the risk ratio ranges

from 0 to 1. The sensitivity of 129 hamming window filtered segments are higher than corresponding

64 hamming window filtered segments’ results.

97

Appendix F FastICA and the Results

To see if there are any components only occurring in AEP class, independent component

analysis (FastICA in our case) was implemented. Independent component analysis (ICA), is a sta-

tistical and computational technique that represents a multidimensional random vector as a linear

combination of non-Gaussian random variables (’independent components’) that are as independent

as possible. ICA defines a generative model for the observed multivariate data, which is typically

given as a large database of samples. In the model, the data variables are assumed to be linear

mixtures of some unknown latent variables, and the mixing system is also unknown. The latent

variables are assumed non-Gaussian and mutually independent, and they are called the independent

components of the observed data. These independent components, also called sources or factors,

can be found by ICA [5].

The ICA problem is solved on the basis of minimizing or maximizing certain contrast func-

tions. This transforms the ICA problem to a numerical optimization problem. One of the most

efficient practical learning rules for solving the problem is FastICA algorithm. The FastICA is based

on a fixed-point iteration scheme for finding a maximum of the non-Gaussianity measured by the

approximation of negentropy. It can also be derived as an approximate Newton iteration [5].

To determine if ICA could play a role in the EEG data analysis, FastICA was applied to

selected data 5. Below are the FastICA results for the third patient 6. The number of components

being implemented is from 2 to 8. The red part is the annotated AEP signal segments. The blue

part is other signal segments.

From this initial FastICA results, we presently conclude that all the components have obvious changes

when an AEP event occurs. In other words, there is not a separate component that is affected by

AEP only.

5Here we use raw data rather than bipolar differences of data6This patient data contains the most samples of AEP signal in D20.

98

0 1000 2000 3000 4000 5000 6000 7000 8000−20

−10

0

10

time

mag

nitu

de o

f sig

nal

fastica result of #3 patient: 2 components

0 1000 2000 3000 4000 5000 6000 7000 8000−10

0

10

20

time

mag

nitu

de o

f sig

nal

Figure 178: FastICA Results for Patient#3 ofD20: 2 Components

0 1000 2000 3000 4000 5000 6000 7000 8000−20

−10

0

10fastica result of #3 patient: 3 components

0 1000 2000 3000 4000 5000 6000 7000 8000−10

0

10

20

mag

nitu

de o

f sig

nal

0 1000 2000 3000 4000 5000 6000 7000 8000−20

−10

0

10

time


99

0 1000 2000 3000 4000 5000 6000 7000 8000−20

−10

0


0 1000 2000 3000 4000 5000 6000 7000 8000−20

−10

0

10

mag

nitu

de o

f sig

nal

0 1000 2000 3000 4000 5000 6000 7000 8000−10

0

10

20

0 1000 2000 3000 4000 5000 6000 7000 8000−20

−10

0

10

time


0 1000 2000 3000 4000 5000 6000 7000 8000−20

−10

0


0 1000 2000 3000 4000 5000 6000 7000 8000−20

−10

0

10

0 1000 2000 3000 4000 5000 6000 7000 8000−20

−10

0

10

mag

nitu

de o

f sig

nal

0 1000 2000 3000 4000 5000 6000 7000 8000−10

0

10

20

0 1000 2000 3000 4000 5000 6000 7000 8000−10

0

10

20

time


100

0 1000 2000 3000 4000 5000 6000 7000 8000−10

−5

0

5

10

15fastica result of #3 patient part 1: 6 components

0 1000 2000 3000 4000 5000 6000 7000 8000−10

0

10

20

mag

nitu

de o

f sig

nal

0 1000 2000 3000 4000 5000 6000 7000 8000−15

−10

−5

0

5

10

time

Figure 182: FastICA Results for Patient#3 ofD20: 1st—3rd of 6 Components

0 1000 2000 3000 4000 5000 6000 7000 8000−10

−5

0

5

10


0 1000 2000 3000 4000 5000 6000 7000 8000−10

−5

0

5

10

mag

nitu

de o

f sig

nal

0 1000 2000 3000 4000 5000 6000 7000 8000−10

−5

0

5

10

15

time

Figure 183: FastICA Results for Patient#3 ofD20: 4th—6th of 6 Components

101

0 1000 2000 3000 4000 5000 6000 7000 8000−10

0

10


0 1000 2000 3000 4000 5000 6000 7000 8000−20

−10

0

10

mag

nitu

de o

f sig

nal

0 1000 2000 3000 4000 5000 6000 7000 8000−20

−10

0

10

0 1000 2000 3000 4000 5000 6000 7000 8000−10

0

10

20

time

Figure 184: FastICA Results for Patient#3 ofD20: 1st—4th of 7 Components

0 1000 2000 3000 4000 5000 6000 7000 8000−10

0

10


0 1000 2000 3000 4000 5000 6000 7000 8000−20

−10

0

10

mag

nitu

de o

f sig

nal

0 1000 2000 3000 4000 5000 6000 7000 8000−10

−5

0

5

10

time


102

0 1000 2000 3000 4000 5000 6000 7000 8000−10

0

10


0 1000 2000 3000 4000 5000 6000 7000 8000−10

0

10

20

mag

nitu

de o

f sig

nal

0 1000 2000 3000 4000 5000 6000 7000 8000−10

0

10

20

0 1000 2000 3000 4000 5000 6000 7000 8000−10

0

10

20

time

Figure 186: FastICA Results for Patient#3 ofD20: 1st—4th of 8 Components

0 1000 2000 3000 4000 5000 6000 7000 8000−20

−10

0


0 1000 2000 3000 4000 5000 6000 7000 8000−10

−5

0

5

10

mag

nitu

de o

f sig

nal

0 1000 2000 3000 4000 5000 6000 7000 8000−5

0

5

10

0 1000 2000 3000 4000 5000 6000 7000 8000−10

−5

0

5

10

time


103

Appendix G Matlab Simulation Code Summary

A significant amount of Matlab code was generated to facilitate the results shown. Table

19 is a summarization of the code.

FILE NAME CONTENT TYPE

cmeans1.m c-means algorithm with data related covariance function

kNearestNeighbors.m K-Nearest Neighbor Rule algorithm function

ica.mcompute the FastICA by patient;

plot each ica componentsscript

nbflt.mfiltering raw data with notch

and chebyshev bandpass filterscript

waveform.m

find the channel pairs in annotation; compute the

difference between each pairs; build the 8

feature set (6 waveform + 2 spatial features)

script

freqfeature.mcompute the 4 frequency feature

for all samples in every channel pairsscript

psdfeature.mcompute the 4 psd feature

for all samples in every channel pairsscript

6feature.mbuild the 6 feature set

(4 fft/psd + 2 spatial feature)script

cov&mean.m compute the covariance matrix and means for feature set script

BayesianClassifier.m

build Bayesian Classifier with Gaussian assumption

for AEP and nonAEP; compute the error rate

under different risk ratio; plot the result

script

plotsss.mcompute sensitivity, specificity and selectivity;

plot sensitivity, specificity and selectivityscript

BayesianClassifier cv.m

build Bayesian Classifier, training by validation method,

with Gaussian assumption for AEP and nonAEP; compute sensitivity,

specificity and their means and variances; plot the result

script

ann cv.m

build Artificial Neural Network, training by cross-

validation method; compute sensitivity, specificity

and their means and variances; plot the result

script

knn kfold.m

apply K-Nearest Neighbor Rule, training by k-fold method;

compute sensitivity, specificity and their means

and variances; plot the result

script

Table 19: Matlab Code

104

Bibliography

[1] Nurettin Acir, Ibrahim Oztura, Mehmet Kuntalp, Baris Baklan, and Cuneyt Guzelis. Automaticdetection of epileptiform events in EEG by a three-stage procedure based on artificial neuralnetworks. IEEE Transactions on Biomedical Engineering, 52(1), January 2005.

[2] Rezaul Begg, Joarder Kamruzzaman, and Ruhul Sarker. Neural Networks in Healthcare: Po-tential and Challenges. Idea Group Publishing, 2006.

[3] Inan Guler and Elif Derya Ubeyli. Adaptive neuro-fuzzy inference system for classification ofeeg signals using wavelet coefficients. Journal of Neuroscience Methods, 148:113–121, April2005.

[4] Jonathan J. Halford. Computerized epileptiform transient detection in the scalp electroen-cephalogram: Obstacles to progress and the example of computerized eeg interpretation. Clin-ical Neurophysiology, 2009.

[5] Aapo Hyvarinen and Erkki Oja. Independent component analysis: Algorithms and applications.Neural Networks, 13:411–430, 2000.

[6] K. P. Indiradevi, Elizabeth Elias, P. S. Sathidevi, S. Dinesh Nayak, and K. Radhakrishnan. Amulti-level wavelet approach for automatic detection of epileptic spikes in the electroencephalo-gram. Computers in Biology and Medicine, 38:805–816, April 2008.

[7] Ron Kohavi. A study of cross-validation and bootstrap for accuracy estimation and modelselection. In International Joint Conference on Artificial Intelligence(IJCAI), 1995.

[8] Steve Lawrence, C.Lee Giles, and Ah Chung Tsoi. What size neural gives optimal generaliza-tion? convergence properties of backpropagation. Master’s thesis, University of Maryland.

[9] Stephane Mallat. a Wavelet Tour of Signal Processing. Academic Press, 1998.

[10] Robert J. Schalkoff. Pattern Recognition, statistical, structural and neural approaches. JohnWiley & Sons, Inc, 1992.

[11] Robert J. Schalkoff. Artificial Neural Networks. The McGraw-Hill Companies, Inc, 1997.

[12] Kagan Tumer and Joydeep Ghosh. Analysis of decision boundaries in linearly combined neuralclassifiers. Master’s thesis, University of Texas, Austin.

105

EEG Data Analysis, Feature Extraction and Classifiers

Documents

Transcript of EEG Data Analysis, Feature Extraction and Classifiers