arXiv:1903.03232v4 [cs.LG] 2 Apr 2020fDm 1;:::;D m N e g, where N eis the size of the ensemble. The...

10
SeizureNet: Multi-Spectral Deep Feature Learning for Seizure Type Classification Umar Asif, Subhrajit Roy, Jianbin Tang, and Stefan Harrer IBM Research Australia [email protected] Abstract. Automatic classification of epileptic seizure types in electroen- cephalograms (EEGs) data can enable more precise diagnosis and efficient man- agement of the disease. This task is challenging due to factors such as low signal- to-noise ratios, signal artefacts, high variance in seizure semiology among epilep- tic patients, and limited availability of clinical data. To overcome these chal- lenges, in this paper, we present SeizureNet, a deep learning framework which learns multi-spectral feature embeddings using an ensemble architecture for ac- curate cross-patient seizure type classification. Experiments on the recently re- leased TUH EEG Seizure Corpus show that SeizureNet produces state-of-the-art weighted F1 scores of 0.98 for seizure type classification setting new benchmarks on the dataset. We also show that the high-level feature embeddings learnt by SeizureNet considerably improve the classification accuracy of smaller networks through knowledge distillation for applications with low-memory and fast infer- ence speed requirements. 1 Introduction Epilepsy is a neurological disorder which affects 1% of the world’s population. It causes sudden and unforeseen seizures which can result in critical injury, or even death of the patient. One third of epileptic patients do not have appropriate medical treatments avail- able. For the remaining two thirds of the patients, treatment options and quality vary due to the fact that seizure semiology is different for every epileptic patient. An important technique to diagnose epilepsy is through visual inspection of electroencephalography (EEG) recordings by physicians to analyse abnormalities in brain activities. This task is time-consuming, inefficient, and subject to inter-observer variability. With the ad- vancements in IoT-based data collection, machine learning based automated systems have been developed which using Deep Convolutional Neural Networks (DCNN) and recurrent neural networks to capture abnormal patterns in the EEG data during seizures [30,13,26,7,6]. In this context, current systems have mostly focused on tasks such as seizure detection and seizure prediction [29,19,28,5,25,14], and the task of seizure type classification is largely undeveloped due to factors such as complex nature of the task and unavailability of clinical datasets with seizure type annotations. Nevertheless, the capability to discriminate between different seizure types (e.g., focal or generalized seizures) as they are detected has the potential to improve long-term patient care, en- abling timely drug adjustments and remote monitoring in clinical trials [9]. Recently, Temple University has released TUH EEG Seizure Corpus [22] (v1.4.0) for epilepsy arXiv:1903.03232v4 [cs.LG] 2 Apr 2020

Transcript of arXiv:1903.03232v4 [cs.LG] 2 Apr 2020fDm 1;:::;D m N e g, where N eis the size of the ensemble. The...

Page 1: arXiv:1903.03232v4 [cs.LG] 2 Apr 2020fDm 1;:::;D m N e g, where N eis the size of the ensemble. The Proposed Ensemble Architecture: Our ensemble consists of three deep Convo-lutional

SeizureNet: Multi-Spectral Deep Feature Learning forSeizure Type Classification

Umar Asif, Subhrajit Roy, Jianbin Tang, and Stefan Harrer

IBM Research [email protected]

Abstract. Automatic classification of epileptic seizure types in electroen-cephalograms (EEGs) data can enable more precise diagnosis and efficient man-agement of the disease. This task is challenging due to factors such as low signal-to-noise ratios, signal artefacts, high variance in seizure semiology among epilep-tic patients, and limited availability of clinical data. To overcome these chal-lenges, in this paper, we present SeizureNet, a deep learning framework whichlearns multi-spectral feature embeddings using an ensemble architecture for ac-curate cross-patient seizure type classification. Experiments on the recently re-leased TUH EEG Seizure Corpus show that SeizureNet produces state-of-the-artweighted F1 scores of 0.98 for seizure type classification setting new benchmarkson the dataset. We also show that the high-level feature embeddings learnt bySeizureNet considerably improve the classification accuracy of smaller networksthrough knowledge distillation for applications with low-memory and fast infer-ence speed requirements.

1 Introduction

Epilepsy is a neurological disorder which affects 1% of the world’s population. It causessudden and unforeseen seizures which can result in critical injury, or even death of thepatient. One third of epileptic patients do not have appropriate medical treatments avail-able. For the remaining two thirds of the patients, treatment options and quality vary dueto the fact that seizure semiology is different for every epileptic patient. An importanttechnique to diagnose epilepsy is through visual inspection of electroencephalography(EEG) recordings by physicians to analyse abnormalities in brain activities. This taskis time-consuming, inefficient, and subject to inter-observer variability. With the ad-vancements in IoT-based data collection, machine learning based automated systemshave been developed which using Deep Convolutional Neural Networks (DCNN) andrecurrent neural networks to capture abnormal patterns in the EEG data during seizures[30,13,26,7,6]. In this context, current systems have mostly focused on tasks such asseizure detection and seizure prediction [29,19,28,5,25,14], and the task of seizure typeclassification is largely undeveloped due to factors such as complex nature of the taskand unavailability of clinical datasets with seizure type annotations. Nevertheless, thecapability to discriminate between different seizure types (e.g., focal or generalizedseizures) as they are detected has the potential to improve long-term patient care, en-abling timely drug adjustments and remote monitoring in clinical trials [9]. Recently,Temple University has released TUH EEG Seizure Corpus [22] (v1.4.0) for epilepsy

arX

iv:1

903.

0323

2v4

[cs

.LG

] 2

Apr

202

0

Page 2: arXiv:1903.03232v4 [cs.LG] 2 Apr 2020fDm 1;:::;D m N e g, where N eis the size of the ensemble. The Proposed Ensemble Architecture: Our ensemble consists of three deep Convo-lutional

2 Umar Asif, Subhrajit Roy, Jianbin Tang, and Stefan Harrer

research which contains 2012 seizures making it the world’s largest publicly avail-able dataset for seizure type classification. The work of [20] presented baseline resultson the TUH EEG Seizure Corpus [22] for seizure type classification by conductinga search space exploration of various standard machine learning algorithms and pre-processing techniques. Other methods such as [24,21] used subsamples of data fromselected seizure types for seizure analysis. Recently, the work of [2] presented a deeplearning based framework consisting of a neural memory network with neural plasticityfor seizure type classification. The authors demonstrated state-of-the-art performance(weighted F1 scores of 0.945) on the TUH EEG Seizure Corpus [22]. In this paper,we improve on the existing methods by presenting an ensemble learning approach forseizure type classification. The main contributions of this paper are as follows:

1. We present SeizureNet, a deep learning framework focused on diversifying individ-ual classifiers of an ensemble by learning feature embeddings at different spatialand frequency resolutions of an EEG data spectrum. Experiments show that ourmulti-spectral feature learning encourages diversity in the ensemble and reducesvariance in the final predictions for seizure type classification.

2. We present Saliency-encoded Spectrograms, a visual representation which capturessalient information contained in the frequency transform of the time-series EEGdata. Experiments show that our saliency-encoded spectrograms produce higherseizure classification accuracy compared to those without the saliency information.

3. We evaluate the capability of our framework for transfering knowledge to smallernetworks through knowledge distillation on the TUH EEG Seizure Corpus [22] andpresent benchmark results for seizure type classification. Experiments show thatthe proposed multi-spectral feature learning coupled with knowledge distillationimprove classification accuracy of small networks.

2 The Proposed Framework (SeizureNet)

Fig. 1-A shows the overall architecture of our framework which transforms raw time-series EEG signals into the proposed saliency-encoded spectrograms, and uses an en-semble of deep CNN models to produce predictions for seizure type classification. Inthe following, we describe in detail the individual components of our framework.

Saliency-encoded Spectrograms: Our saliency-encoded spectrograms are inspiredfrom visual saliency detection [10], where we transform time-series EEG signalsinto a visual representation which captures multi-scale saliency information from theEEG data. Specifically, saliency-encoded spectrograms consist of three feature mapsas shown in Fig. 1-D. i) a Fourier Transform map (FT ) which encodes the logamplitude Fourier Transform of the EEG signals, ii) a spectral saliency map (S1),which extracts saliency by computing the spectral residual of the FT feature map,and iii) a multi-scale saliency map (S2), which captures spectral saliency at multi-ple scales using center-surround differences of the features of the FT feature map[16,12]. Mathematically, given a time-series EEG sequence X(c, t) from a channel

Page 3: arXiv:1903.03232v4 [cs.LG] 2 Apr 2020fDm 1;:::;D m N e g, where N eis the size of the ensemble. The Proposed Ensemble Architecture: Our ensemble consists of three deep Convo-lutional

SeizureNet: Multi-Spectral Deep Feature Learning for Seizure Type Classification 3

E: Multi-Spectral Feature SamplingD: Saliency-encoded Spectrogram generation

Training subset

N×224×224×3

A: SeizureNet

Frequency transform (FT)

Temporal resolution

B: Dense Block 1

Dense Layer 1

Dense Layer 2

Dense Layer 3

Dense Layer 4C C C Dense

Layer 5Dense

Layer 6CC

Saliency-encoded

Spectrogram generation

C: Dense Layer

1×1 Conv

BNReLU

3×3 Conv

BNReLU Dropout

EEG data

96 Hz

64 Hz

48 Hz

24 Hz

fcConvPool

Dense block 1

Dense block 2

Dense block 3

ConvBN-Relu

Pool

ConvBN-Relu

Pool

ConvBN-Relu

Pool

Dense block 4

fcConvPool

Dense block 1

Dense block 2

Dense block 3

ConvBN-Relu

Pool

ConvBN-Relu

Pool

ConvBN-Relu

Pool

Dense block 4

fcConvPool

Dense block 1

Dense block 2

Dense block 3

ConvBN-Relu

Pool

ConvBN-Relu

Pool

ConvBN-Relu

Pool

Dense block 4

softmax

6 layers 12 layers 24 layers 16 layers

6 layers 12 layers 32 layers 32 layers

6 layers 12 layers 36 layers 24 layers

1×1024

1×1664

1×2208

Multi-Spectral feature

sampling

Freq

uenc

y sp

ectru

m

Saliency Map 1 (S1)

Saliency Map 2 (S2)

Saliency-encoded Spectrogram (D)

Sampling window

window length (w) Overlap (o)

Fig. 1: The overall architecture of our framework. Input EEG data is first transformed into theproposed saliency-encoded spectrograms (D), which are then sampled at different frequency andspatial resolutions (E), and finally fed into an ensemble of deep CNN models. The outputs of thesub-networks are combined through summation and fed into a Softmax operation for producingprobabilistic distributions with respect to the target classes (A).

c parameterized by time t, we compute the Fast Fourier Transform (F) of the se-quence as: F(X) =

∫ −∞∞ X(c, t)e−2πitdt. We compute F on data from selected

20 channels1 and take the log of the amplitude of the Fourier Transform. The out-put is reshaped into a Rp×20−dimensional feature map (FT ) where p denotes thenumber of data points of the EEG sequence. Mathematically, FT can be written as:FT = log(Amplitude(F(X))). To compute feature map S1, we take the log am-plitude of the feature map FT and subtract it from the average log amplitude of FT .Mathematically, S1 can be written as: S1 = G ∗ F−1(exp(FT − H ∗ FT ) + P)2,where, F−1 denotes the Inverse Fourier Transform. The term H represents the aver-age spectrum of FT approximated by convoluting the feature map FT by a 3 × 3local averaging filter. The term G is a Gaussian kernel to smooth the feature values.The term P denotes the phase spectrum of the feature map FT . The saliency map S2

captures saliency in the feature map FT with respect to its surrounding data pointsby computing center-surround differences at multiple scales. Let FTi represents a fea-

1 The sclap EEG data was collected using 10-20 system [23], and TCP montage [15] was used toselect 20 channels of the input. We used the following 20 channels: FP1−F7;F7−T3;T3−T5;T5−O1;FP2−F8;F8−T4;T4−T6;T6−O2;T3−C3;C3−CZ;CZ−C4;C4−T4;FP1− F3;F3−C3;C3− P3;P3−O1;FP2− F4;F4−C4;C4− P4;P4−O2.

Page 4: arXiv:1903.03232v4 [cs.LG] 2 Apr 2020fDm 1;:::;D m N e g, where N eis the size of the ensemble. The Proposed Ensemble Architecture: Our ensemble consists of three deep Convo-lutional

4 Umar Asif, Subhrajit Roy, Jianbin Tang, and Stefan Harrer

ture value at location i, and Ω denotes a circular neighborhood of scale ρ surroundingthe location i. Mathematically, the saliency calculation at location i can be written as:S2(i) =

∑ρ∈[2,3,4](FTi −min([FTk,ρ])),∀k ∈ Ω, where, [FTk,ρ] represents the fea-

ture values in the local neighborhood Ω. Finally, we concatenate the three feature mapsFT , S1, and S2 into an RGB-like data structure (D) which is normalized between 0 and255 range as shown in Fig. 1-D. It can be written as: D = [|FT |, |S1|, |S2|], where, | · |denotes normalization of the feature map.

Multi-Spectral Feature Learning: Deep neural networks are often over-parameterized and require sufficient amount of training data to effectively learn featuresthat can generalize to the test data. When confronted with limited training data whichis a common issue in health informatics [4], deep architectures suffer poor convergenceor over-fitting. To overcome these challenges, we present Multi-Spectral Feature Sam-pling (MSFS), a novel method to encourage diversity in ensemble learning by trainingthe sub-networks using data sampled from different frequency and temporal resolutions.Fig. 1-E shows an overview of our MSFS method. Consider an M−dimensional train-ing dataset D = (Di, yi)|0 ≤ i ≤ Nd, which is composed of Nd samples, whereDi is a training sample with the corresponding class label yi ∈ Y . During training,MSFS generates a feature subspace Dm = (Dmi , yi)|0 ≤ i ≤ Nd which containsspectrograms generated by a random selection of the sampling frequency f ∈ F (Hz), awindow length parameter w ∈ W (seconds), and a window step size parameter o ∈ O2.This process is repeated Ne = 3 times to obtain the combination of random subspaceDm

1 , ...,DmNe, where Ne is the size of the ensemble.

The Proposed Ensemble Architecture: Our ensemble consists of three deep Convo-lutional Neural Networks (DCNs) as shown in Fig. 1-A. The basic building block of aDCN is a Dense Block which is composed of multiple bottleneck convolutions intercon-nected through dense connections [11]. Specifically, each DCN model starts with a 7×7convolution followed by Batch Normalization (BN), a Rectified Linear Unit (ReLU),and a 3 × 3 average pooling operation. Next, there are four dense blocks, where eachdense block consists of Nl number of layers termed Dense Layers which share infor-mation from all the preceding layers connected to the current layer through fuse con-nections. Fig. 1-B shows the structure of a dense block with Nl = 6 dense layers. Eachdense layer consists of 1 × 1 and 3 × 3 convolutions followed by BN, a ReLU, and adropout block as shown in Fig. 1-C. Mathematically, the output of the lth dense layer ina dense block can be written as: Xl = [X0, ...,Xl−1], where [· · ·] represents concatena-tion of the features produced by the layers 0, ..., l − 1. The final dense block producesYdense ∈ Rk×R×7×7−dimensional features which are squeezed to k ×R−dimensionsthrough an averaging operation, and then fed to a linear layer fc ∈ RK which learnsprobabilistic distributions of the input data with respect to K target classes. Mathemat-ically, the output of a linear layer (fc) can be written as: Yfc = Ydense ∗Wfc + Bfc,where, Wfc and Bfc represent weights and bias matrices, respectively. To increase di-

2 In this work, we used F = [24, 48, 64, 96] Hz, W = [1, 2, 4, 8, 16] seconds, and O =[0.25, 0.5, 1.0, 2.0, 4.0, 8.0]

Page 5: arXiv:1903.03232v4 [cs.LG] 2 Apr 2020fDm 1;:::;D m N e g, where N eis the size of the ensemble. The Proposed Ensemble Architecture: Our ensemble consists of three deep Convo-lutional

SeizureNet: Multi-Spectral Deep Feature Learning for Seizure Type Classification 5

versity among sub-networks of the ensemble, we vary the numbers of dense layers ofDense block 3 and Dense block 4 of the sub-networks.

Training and Implementation: Consider a training dataset of spectrograms and labels(D, y) ∈ (D,Y), where each sample belongs to one of theK classes (Y = 1, 2, ...,K).The goal is to determine a function fs(D) : D → Y . To learn this mapping, we trainour SeizureNet parameterized by f(D, θ∗), where θ∗ are the learned parameters ob-tained by minimizing a training objective function: θ∗ = argminθ LCE(y, f(D, θ)),where LCE denotes a Cross-Entropy loss which is applied to the outputs of the en-semble with respect to the ground truth labels. Mathematically, LCE can be written as:LCE =

∑Kk=1 I(k = yi) log σ(Oe, yi), where Oe = 1/Ne

∑Ne

e=1Ok denotes the com-bined logits produced by the ensemble,Ok denotes the logits produced by an individualsub-network, I is the indicator function, and σ is the SoftMax operation. It is given by:σ(zi) = exp zi/

∑Kk=1 exp zk. For training the networks, we initialized the weights of

the networks from zero-mean Gaussian distributions. The standard deviations were setto 0.01, and biases were set to 0. We trained the networks for 200 epochs with a startlearning rate of 0.01 (which was divided by 10 at 50% and 75% of the total number ofepochs), and a parameter decay of 0.0005 (on the weights and biases). Our implemen-tation is based on the auto-gradient computation framework of the Torch library [18].Training was performed by ADAM optimizer with a batch size of 50.

3 Experiments and Results

We used the world’s largest publicly available dataset of seizure recordings, TUH EEGSeizure Corpus [22] (v1.4.0) which contains 2012 seizures. Table 1 shows the distri-bution of seizures in terms of different seizure types. For experiments, we excludedonly myoclonic seizures because of the small number of seizures recorded (only threeseizure events). We adopted 5-fold cross-validation, where for each fold, the seizuresfor each type were proportionally divided into training and test sets. For comparisonwith baselines and other methods, we used the preprocessed version of TUH SeizureCorpus [22] known [22] the IBM TUSZ pre-processed data (v1.0.0) [20]. We evalu-ated SeizureNet on the test data for all combinations of the hyper-parameters: sam-pling frequency F = [24, 48, 64, 96] Hz, window lengthW = [1, 2, 4, 8, 16] seconds,and window step size O = [0.25, 0.5, 1.0, 2.0, 4.0, 8.0] as shown in Fig. 2. The re-sults show that the model produces higher f1 scores for smaller window lengths andshorter window step sizes. This is because, smaller window sizes increase the num-ber of training samples and reduce over-fitting during training. Fig. 2 also shows thatspectrograms generated at higher frequency levels capture more discriminative infor-mation for distinguishing different seizure types thereby resulting in higher f1 scorescompared to the spectrograms generated at lower frequency bands. The best perfor-mance was achieved when the test data was processed with a window length of 2 secs,a step size of 1, and a sampling frequency of 96Hz as shown by the shaded regionsin Fig. 2. Table 3 shows weighted f1 scores of our SeizureNet and a comparison withother methods on the TUH EEG Seizure Corpus. The results show that our SeizureNetproduced considerable improvements in the f1 scores compared to the other methods.

Page 6: arXiv:1903.03232v4 [cs.LG] 2 Apr 2020fDm 1;:::;D m N e g, where N eis the size of the ensemble. The Proposed Ensemble Architecture: Our ensemble consists of three deep Convo-lutional

6 Umar Asif, Subhrajit Roy, Jianbin Tang, and Stefan Harrer

Table 1: A short description and total countof different types of seizures in the TUH EEGSeizure Corp.

Seizure type Count

1. Focal Non-Specific Seizure (FN) 9922. Generalized Non-Specific (GN) 4153. Simple Partial Seizure (SP) 444. Complex Partial Seizure (CP) 3425. Absence Seizure (AB) 996. Tonic Seizure (TN) 677. Tonic Clonic Seizure (TC) 50

Table 2: Ablation study of SeizureNet in termsof architecture, model parameters (million),FLOPS (million), inference speed, and Knowl-edge Distillation (KD).

Architecture scores Param. Flops Time

DenseNet 0.9845 45.94 14241.37 90 msResNet 0.8474 0.08 12.75 2 msResNet-KD 0.8759 0.08 12.75 2 ms

Table 3: Average weighted f1 scores ofSeizureNet and other methods on the TUHSeizure Corpus for seizure type classification.

MethodsWeighted F1

scores

[20] Adaboost 0.593SAE (based on [14]) 0.675LSTM (based on [3]) 0.692LSTM (based on [27]) 0.701CNN (based on [1]) 0.716[20] SGD 0.724CNN-LSTM (based on [26]) 0.795CNN (based on [24]) 0.802[2] CNN-LSTM 0.824CNN (based on [17]) 0.826CNN-LSTM (based on [7]) 0.831[20] XGBoost 0.851[20] kNN 0.898CNN (based on [8]) 0.901[2] Plastic NMN 0.945

SeizureNet (this work) 0.984

Fig. 2: Average weighted f1 scores produced our SeizureNet for different values of the hyper-parameters: window length (w), window step size (o), and sampling maximum frequency (f ).The shaded regions represent the minimum and maximum bounds of the f1 scores.

For instance, SeizureNet improved f1 scores by around 9 points and 4 points comparedto the CNN-based method of [8] and the neural palsticity-based method of [2], respec-tively. These improvements are mainly attributed to the proposed multi-spectral featurelearning which captures information from different frequency and spatial resolutions,and enables SeizureNet to learn more discriminative features compared to the othermethods which learn frequency-specific features.

SeizureNet for Knowledge Distillation: Here, we evaluated the capability ofSeizureNet in transfering knowledge to smaller networks for seizure classification. Forthis, we trained a student ResNet model with 3 residual layers in conjunction with

Page 7: arXiv:1903.03232v4 [cs.LG] 2 Apr 2020fDm 1;:::;D m N e g, where N eis the size of the ensemble. The Proposed Ensemble Architecture: Our ensemble consists of three deep Convo-lutional

SeizureNet: Multi-Spectral Deep Feature Learning for Seizure Type Classification 7

Table 4: Ablation study of SeizureNet interms of the proposed saliency-encodedspectrograms and the proposed ensemblingarchitecture for different data sizes.

Saliency EnsembleData size

(70%) (30%) (10%)

No No 0.7428 0.6511 0.5029No Yes 0.7544 0.6536 0.5123Yes No 0.7573 0.6600 0.5215Yes Yes 0.7674 0.6754 0.5310

Table 5: Ablation study of SeizureNet in terms ofthe proposed Multi-Spectral Feature Sampling(MSFS) using different sizes of training data.

Data Weighted F1 scoressize no MSFS with MSFS

10% 0.3947 0.482530% 0.6197 0.671250% 0.6312 0.744670% 0.6739 0.767585% 0.7360 0.8052

SeizureNet acting as a teacher network using a knowledge distillation based trainingfunction. Our training function LKD is a weighted combination a CrossEntropy lossterm LCE and a distillation loss term LKL. Mathematically, LKD can be written as:LKD = α · LCE(Pt, y) + β · LCE(Ps, y) + γ · LKL, where Pt and Ps represent thelogits (the inputs to the SoftMax) of SeizureNet and the student model, respectively.The terms α ∈ [0, 0.5, 1], β ∈ [0, 0.5, 1], and γ ∈ [0, 0.5, 1] are the hyper-parameterswhich balance the individual loss terms. The distillation loss term LKL is composedof Kullback-Leibler (KL) divergence function defined between log-probabilities com-puted from the outputs of SeizureNet and the student model. Mathematically, LKL canbe written as:LKL(Ps, Pt) = σ(Ps)·(log(σ(Ps))− σ(Pt)/T ) ,where σ represents theSoftMax operation and T is a temperature hyper-parameter which controls the softeningof the outputs SeizureNet. A higher value of T produces a softer probability distribu-tion over the target classes. Table 2 shows that ResNet-KD produced improvement ofaround 3% in the mean f1 scores compared to the ResNet model which was trainedwithout knowledge distillation. These results show that the high dimensional featureembeddings learnt by SeizureNet can successfully be used to improve the generaliza-tion performance of smaller networks (e.g., 3-layer ResNet having 45× less trainingparameters, 1100× less number of flops, and 45× faster inference as shown in Table2), for deployment in low-power and memory-constrained systems.

Significance of Saliency-encoded Spectrograms and Ensembling: Table 4 showsthat the combination of spectral residual of Fourier Transform and multi-scale center-surround difference information in the proposed saliency-encoded spectrograms turnedout to be more discriminative for seizure classification especially for small trainingdata. For instance, when only 10% of the data was used for training, the model trainedusing saliency-encoded spectrograms produced around 2 points improvement in the f1scores compared to the model that was trained using only Fourier Transform. Table 4also shows that, combining multiple DCN models with different layer configurationsencourages diversity in ensemble feature learning thereby producing higher f1 scorescompared to the models trained independently.

Significance of Multi-Spectral Feature Learning: Table 5 shows that the modelstrained using MSFS produced higher f1 scores compared to the frequency-specific

Page 8: arXiv:1903.03232v4 [cs.LG] 2 Apr 2020fDm 1;:::;D m N e g, where N eis the size of the ensemble. The Proposed Ensemble Architecture: Our ensemble consists of three deep Convo-lutional

8 Umar Asif, Subhrajit Roy, Jianbin Tang, and Stefan Harrer

A B C D

Fig. 3: TSNE visualizations and confusion matrices of the seizure type manifolds produced bySeizureNet using the proposed multi-spectral feature learning (A, C) and without using the pro-posed multi-spectral feature learning (B, D), respectively.

models. For instance, when only 10% training data was used, SeizureNet with MSFSproduced improvements of around 9 points, 8 points, 8 points, and 17 points in theweighted f1 scores compared to frequency-specific models at 24 Hz, 48 Hz, 64 Hz,and 96 Hz, respectively. These improvements show that information from different fre-quency resolutions compliment each other in discriminating seizure classes, and theyincrease variation in the training data resulting in features which generalize better tothe unseen test data, especially for small data sizes compared to the features learnt atspecific frequency bands. Fig. 3 shows a comparison of TSNE mappings produced bymodels trained with and without MSFS. The results show that the seizure manifoldsproduced with MSFS are better separated in the high-dimensional feature space (asshown in Fig. 3-A) compared to the seizure manifolds produced without using MSFS(as shown in Fig. 3-B). This shows that increasing variation in the training informationby combining data from different spatial and frequency bands is beneficial for learningdiscriminative features for seizure classification. Fig. 3 also show that models trainedwith MSFS produced considerably less confusions (as shown in Fig. 3-C) for almostall the seizure classes exhibiting the importance of combining data from different fre-quency and spatial resolutions.

4 Conclusion and Future Work

This paper presents a deep learning framework for EEG-based seizure type classifi-cation in cross-patient scenarios. The greatest challenge in a cross-patient approach isto learn robust features from limited training data which can effectively generalize tounseen test patient data. This is achieved through two novel contributions: i) saliency-encoded visual spectrograms which encode multi-scale saliency information containedin the frequency transform of the EEG signals, and ii) multi-spectral feature learningwithin an ensemble architecture, where spectrograms generated at different frequencyand spatial resolutions encourages diversity in the information flow through the net-works, and ensembling reduces variance in the final predictions. Experiments show thatthe proposed framework produces state-of-the-art f1 score of 0.98 for seizure type clas-sification on the world’s largest publicly available epilepsy dataset. Experiments alsoshow that the high-dimensional feature embeddings learnt by our framework consid-erably improve the accuracy of smaller networks through knowledge distillation. Infuture, we plan to investigate multi-modal seizure type classification by fusing data

Page 9: arXiv:1903.03232v4 [cs.LG] 2 Apr 2020fDm 1;:::;D m N e g, where N eis the size of the ensemble. The Proposed Ensemble Architecture: Our ensemble consists of three deep Convo-lutional

SeizureNet: Multi-Spectral Deep Feature Learning for Seizure Type Classification 9

from different modalities such as EEG, wearable sensors, and videos for more reliableautomated seizure type logging in real-world epilepsy monitoring units.

References

1. U Rajendra Acharya, Shu Lih Oh, Yuki Hagiwara, Jen Hong Tan, and Hojjat Adeli, ‘Deepconvolutional neural network for the automated detection and diagnosis of seizure using eegsignals’, Computers in biology and medicine, 100, 270–278, (2018).

2. David Ahmedt-Aristizabal, Tharindu Fernando, Simon Denman, Lars Petersson, Matthew JAburn, and Clinton Fookes, ‘Neural memory networks for robust classification of seizuretype’, arXiv preprint arXiv:1912.04968, (2019).

3. David Ahmedt-Aristizabal, Clinton Fookes, Kien Nguyen, and Sridha Sridharan, ‘Deep clas-sification of epileptic signals’, in 2018 40th Annual International Conference of the IEEEEngineering in Medicine and Biology Society (EMBC), pp. 332–335. IEEE, (2018).

4. Turkey N Alotaiby, Saleh A Alshebeili, Tariq Alshawi, Ishtiaq Ahmad, and Fathi E Abd El-Samie, ‘Eeg seizure detection and prediction algorithms: a survey’, EURASIP Journal onAdvances in Signal Processing, 2014(1), 183, (2014).

5. Andreas Antoniades, Loukianos Spyrou, Clive Cheong Took, and Saeid Sanei, ‘Deep learn-ing for epileptic intracranial eeg data’, in Machine Learning for Signal Processing (MLSP),2016 IEEE 26th International Workshop on, pp. 1–6. IEEE, (2016).

6. Larbi Boubchir, Somaya Al-Maadeed, and Ahmed Bouridane, ‘On the use of time-frequencyfeatures for detecting and classifying epileptic seizure activities in non-stationary eeg sig-nals’, in ICASSP, pp. 5889–5893. IEEE, (2014).

7. Meysam Golmohammadi, Saeedeh Ziyabari, Vinit Shah, Eva Von Weltin, Christopher Camp-bell, Iyad Obeid, and Joseph Picone, ‘Gated recurrent networks for seizure detection’, in2017 IEEE Signal Processing in Medicine and Biology Symposium (SPMB), pp. 1–5. IEEE,(2017).

8. Yongfu Hao, Hui Ming Khoo, Nicolas von Ellenrieder, Natalja Zazubovits, and Jean Gotman,‘Deepied: An epileptic discharge detector for eeg-fmri based on deep learning’, NeuroImage:Clinical, 17, 962–975, (2018).

9. Stefan Harrer, Pratik Shah, Bhavna Antony, and Jianying Hu, ‘Artificial intelligence for clin-ical trial design’, Trends in pharmacological sciences, (2019).

10. Xiaodi Hou and Liqing Zhang, ‘Saliency detection: A spectral residual approach’, in CVPR,pp. 1–8, (2007).

11. Gao Huang, Zhuang Liu, Kilian Q Weinberger, and Laurens van der Maaten, ‘Densely con-nected convolutional networks’, CVPR, (2017).

12. Laurent Itti, Christof Koch, and Ernst Niebur, ‘A model of saliency-based visual attentionfor rapid scene analysis’, PAMI, (11), 1254–1259, (1998).

13. Martin Langkvist, Lars Karlsson, and Amy Loutfi, ‘A review of unsupervised feature learn-ing and deep learning for time-series modeling’, Pattern Recognition Letters, 42, 11–24,(2014).

14. Qin Lin, Shu-qun Ye, Xiu-mei Huang, Si-you Li, Mei-zhen Zhang, Yun Xue, and Wen-ShengChen, ‘Classification of epileptic eeg signals with stacked sparse autoencoder based on deeplearning’, in International Conference on Intelligent Computing, pp. 802–810. Springer,(2016).

15. Silvia Lopez, Aaron Gross, Scott Yang, Meysam Golmohammadi, Iyad Obeid, and JosephPicone, ‘An analysis of two common reference points for eegs’, in 2016 IEEE Signal Pro-cessing in Medicine and Biology Symposium (SPMB), pp. 1–5. IEEE, (2016).

Page 10: arXiv:1903.03232v4 [cs.LG] 2 Apr 2020fDm 1;:::;D m N e g, where N eis the size of the ensemble. The Proposed Ensemble Architecture: Our ensemble consists of three deep Convo-lutional

10 Umar Asif, Subhrajit Roy, Jianbin Tang, and Stefan Harrer

16. Sebastian Montabone and Alvaro Soto, ‘Human detection using a mobile platform and novelfeatures derived from a visual saliency mechanism’, Image and Vision Computing, 28(3),391–402, (2010).

17. Alison O’Shea, Gordon Lightbody, Geraldine Boylan, and Andriy Temko, ‘Investigatingthe impact of cnn depth on neonatal seizure detection performance’, in 2018 40th AnnualInternational Conference of the IEEE Engineering in Medicine and Biology Society (EMBC),pp. 5862–5865. IEEE, (2018).

18. Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary De-Vito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer, ‘Automatic differentia-tion in pytorch’, (2017).

19. Siddharth Pramod, Adam Page, Tinoosh Mohsenin, and Tim Oates, ‘Detecting epilepticseizures from eeg data using neural networks’, arXiv preprint arXiv:1412.6502, (2014).

20. Subhrajit Roy, Umar Asif, Jianbin Tang, and Stefan Harrer, ‘Machine learning for seizuretype classification: Setting the benchmark’, arXiv preprint arXiv:1902.01012, (2019).

21. Inggi Ramadhani Dwi Saputro, Nita Dwi Maryati, Siti Rizqia Solihati, Inung Wijayanto,Sugondo Hadiyoso, and Raditiana Patmasari, ‘Seizure type classification on eeg signal usingsupport vector machine’, in Journal of Physics: Conference Series, volume 1201, p. 012065.IOP Publishing, (2019).

22. Vinit Shah, Eva Von Weltin, Silvia Lopez de Diego, James Riley McHugh, Lillian Veloso,Meysam Golmohammadi, Iyad Obeid, and Joseph Picone, ‘The temple university hospitalseizure detection corpus’, Frontiers in Neuroinformatics, 12, 83, (2018).

23. Daniel Silverman, ‘The rationale and history of the 10-20 system of the international feder-ation’, American Journal of EEG Technology, 3(1), 17–22, (1963).

24. Natarajan Sriraam, Yasin Temel, Shyam Vasudeva Rao, Pieter L Kubben, et al., ‘A convo-lutional neural network based framework for classification of seizure types’, in 2019 41stAnnual International Conference of the IEEE Engineering in Medicine and Biology Society(EMBC), pp. 2547–2550. IEEE, (2019).

25. Akara Supratak, Ling Li, and Yike Guo, ‘Feature extraction with stacked autoencoders forepileptic seizure detection’, in EMBC, pp. 4184–4187. IEEE, (2014).

26. Pierre Thodoroff, Joelle Pineau, and Andrew Lim, ‘Learning robust features using deeplearning for automatic seizure detection’, in Machine learning for healthcare conference,pp. 178–190, (2016).

27. Kostas M Tsiouris, Vasileios C Pezoulas, Michalis Zervakis, Spiros Konitsiotis, Dimitrios DKoutsouris, and Dimitrios I Fotiadis, ‘A long short-term memory deep learning network forthe prediction of epileptic seizures using eeg signals’, Computers in biology and medicine,99, 24–37, (2018).

28. JT Turner, Adam Page, Tinoosh Mohsenin, and Tim Oates, ‘Deep belief networks used onhigh resolution multichannel electroencephalography data for seizure detection’, in 2014AAAI Spring Symposium Series, (2014).

29. L Vidyaratne, A Glandon, Mahbubul Alam, and Khan M Iftekharuddin, ‘Deep recurrentneural network for seizure detection’, in 2016 International Joint Conference on NeuralNetworks (IJCNN), pp. 1202–1207, (2016).

30. Ali Shahidi Zandi, Manouchehr Javidan, Guy A Dumont, and Reza Tafreshi, ‘Automatedreal-time epileptic seizure detection in scalp eeg recordings using an algorithm based onwavelet packet transform’, IEEE Transactions on Biomedical Engineering, 57(7), 1639–1651, (2010).