Kouhyar Tavakolian et al- Mental Task Classification for Brain Computer Interface Applications

8/3/2019 Kouhyar Tavakolian et al- Mental Task Classification for Brain Computer Interface Applications

1/6

1

Mental Task Classification for Brain Computer Interface Applications

Kouhyar Tavakolian,Engineering Science, Simon Fraser University

Faratash Vasefi, Engineering Science, Simon Fraser University

Kaveh Naziripour, Engineering Science, Simon Fraser University

Siamak Rezaei, Computer Science, University of Northern British Columbia

Abstract: In this work the application of different machine learning techniques for classification of mental tasks from

Electroencephalograph (EEG) signals is investigated. The main application for this research is the improvement of brain

computer interface (BCI) systems. For this purpose, Bayesian graphical network, Neural Network, Bayesian quadratic,

Fisher linear and Hidden Markov Model classifiers are applied to two known EEG datasets in the BCI field. The Bayesian

network classifier is used for the first time in this work for classification of EEG signals. The Bayesian network appeared

to have significant accuracy. In addition to classical correct classification accuracy criteria, the mutual information is also

used to compare the classification results with other BCI groups.

Keywords: EEG, brain computer interface, Bayesian network classifier, neural networks, mutual information.

1. INTRODUCTION

Mental task classification by recognizingElectroencephalographic (EEG) patterns is an importantand challenging biomedical signal processing problem.

Such classification can be utilized to enable a patient tocommunicate without any overt physical movement. This

is done just by the computer processing of the patients

brain waves as can be seen in the block diagram of Figure

1. Developments of faster digital computers and betterEEG devices have motivated many researchers to work on

BCI systems [1] [2].

So far the accuracy of classification has been one of the

main pitfalls of the developed BCI systems which directly

affects the decisions made as the BCI output. Thisaccuracy is affected by the quality of EEG signal and the

processing algorithms. The processing algorithms include preprocessing, feature extraction and feature

classification. In our previous research the effect of

different feature extraction algorithms [3] and different

number of EEG channels [4] on classification accuracywas investigated. In the current work, the effects of

different types of classifiers on the accuracy ofclassification are investigated and compared.

In the present research, the classification of mental tasks

using the Purdue University EEG dataset [5] and the EEG

dataset from Department of Medical Informatics,University of Technology Graz [5] are investigated. Both

datasets are known and well established datasets in the

BCI field and are accessible from internet. Autoregressive(AR) and adaptive autoregressive coefficients (AAR)

were extracted from the EEG windows for all classifiers.

These extracted features were inputted to the next stage ofBCI, which is the classifier. The same extracted features

for all classifiers facilitated the comparison of classifiers

efficiencies.

The main focus was on investigation and comparison offeed forward neural network, Bayesian quadratic,

Bayesian network, Fisher linear classifier and Hidden

Markov Models (HMM) in mental task classification.These classifiers are known methods in the machine

learning literature. The classifiers are intentionally

chosen to cover both linear and nonlinear methods. The

Gaussian mixture model is represented as a Bayesiannetwork and this is the first time that such a classifier is

used for the EEG signal classification [7].

We trained the Bayesian network and Hidden Markov

model using expectation maximization (EM) algorithm.Mixture models are a type of density models. They arecomprised of a number of component functions that in our

case were Gaussians. These component functions are

combined to provide a multimodal density.


2/6

2

Figure1. Flow of the methodology

Following in the paper in the method section first thedatasets and preprocessing are introduced and then the

five different classifiers are briefly explained. The EEG

data was classified by these classifiers and results are

presented in the tables and the figure of the result section.There is a discussion of results and a conclusion at the end

of the paper.

2. METHODS

In this section first the EEG datasets are introduced andthen the applied machine learning algorithms are brieflyexplained as also can be seen in Figure 1.

2.1. EEG recordings and Preprocessing

The main differences between the two EEG datasets are

that the Purdue dataset is taken during the performance offive mental tasks while for the Graz dataset there are just

two mental activities of left and right hand movement. On

the other hand, the Graz dataset has many more sessionscompared to the Purdue dataset.

2.1.1 Purdue dataset: The Purdue dataset was acquired byAunon and Keirn [5] in university of Purdue and has been

taken from seven subjects during performance of five

different mental tasks. An elastic electrode cap was used

to record from positions C3, C4, P3, P4, O1, and O2 onthe scalp as can be seen in Figure 2. Data was recorded at

a sampling rate of 250 Hz with a 12 bit A/D converter.

Eye blinks were detected by means of a separate channelof data recorded from two electrodes placed above and

below the subject's left eye. The subjects were asked to

perform five mental tasks:

(a) Baseline task. The subjects were asked to relax as

much as possible.

(b) Letter task. The subjects were instructed to mentallycompose a letter to a friend or relative without vocalizing.

(c) Math task. The subjects were given nontrivial

multiplication problems, such as 49 times 78.

(d) Visual counting task. The subjects were asked to

imagine a blackboard and to visualize numbers beingwritten on the board sequentially.

(e) Geometric figure rotation. The subjects were asked to

visualize a particular three dimensional block figure being

rotated about an axis. Data was recorded for 10 seconds

during each task and each task was repeated five times persession.

In this work the algorithms were applied to the subjectshaving more than one session of EEG signal which were

subjects 1, 3, 6 and 5.

The eye blinks were removed with two different methods.In the first method a simple time filter was used by

excluding sudden jumps of EEG made by the eye

movement. To do this the EOG channel of the EEG

dataset was used. This was accomplished by calculatingthe average of the signal on windows of length 20

milliseconds and then removing those windows havingaverages greater than two times the average of signal in

window of length 500 millisecond. In the other method

independent component analysis (ICA) was used which

resulted in much better classification accuracy [7]. In thispaper the results with the time filter are reported because

the differences of classifiers were more distinct in this

method.

Figure2. The electrode placement for Purdue dataset

2.1.2 Graz dataset: The Graz dataset [6] was recorded

from a normal female subject during a feedback session.

The task was to control a feedback bar by means ofimagery and left or right hand movements. The

experiment consisted of 7 runs with 40 trials each. All

runs were conducted on the same day with several

MentalTask EEG

Signals

FeatureExtraction

AR or AAR

Mental TaskRecognized

(BCI output)

Classifiers:HMM,

Neural Networks,Bayesian networks,

Bayes quadratic classifier

Fisher linear classifier

Pre-

processing


3/6

3

Node1

Node2

Node3 Gaussian

Mean, covariance

Component

1 or 2

Class

B/M

minutes break in between them. Three bipolar EEG

channels were measured over C3, Cz and C4. The EEGwas sampled with 128Hz sampling rate and was filtered

between 0.5 and 30Hz. The trials for training and testing

were randomly chosen. This can prevent any systematiceffect due to the feedback. In this dataset the eye artifact

had been already removed.

2.2. Classifiers

In this section the five classifiers are briefly introduced

for more detailed information one can refer to the

references [8] [9] [10] [11] [12] and [13] or other texts onmachine learning or pattern recognition.

2.2.1 Bayesian graphical network classifierBayesian Network is a modeling tool that combines

directed acyclic graphs with Bayesian probability. Figure

3 shows the example of Bayesian network which consists

of a causal graph combined with an underlying

probability distribution. Each node of the network in the

figure corresponds to a variable and edges representcausality between these events. The other elements of a

Bayesian network are probability distributions associated

with each node. With this information the network canmodel probabilities of complex causal relationships [13].

The graphical model corresponding to the Bayesian

network used in this work is shown in Figure 3. Note thatthe square box in the figure corresponds to the input

extracted features. The rectangular box corresponds to the

Gaussian mixture components. The square and

rectangular nodes represent discrete values while the

round node in the figure represents continuous values.

The graph structure of this model can be represented by

the following adjacency matrix: 011, 001, 000. TheBayesian Network Toolbox (BNT) was used forimplementing the classifier [10]. The model was trained

using the EM algorithm. EM works by starting with a

randomly initialized model (mean and covariance here),

and then iteratively refining the model

parameters to produce a locally optimal maximum-

likelihood fit. So, the EMalgorithm is composed of twosteps. In the first step, each data point undergoes a soft-

assignment to each mixture component. In the second

step, the parameters of the model are adjusted to fit thedata based on the soft assignment of the previous step [8].

2.2.2 Neural Networks

For this classifier a two layer feed forward neural network

was implemented with 20 neurons in the hidden layer and

one neuron in the output layer. This structure (using 20

neurons in the hidden layer) was found to be the optimumone according to [3]. We set a 0.5 threshold for the output

neuron. Values more than 0.5 and lower than 1 wereassigned to one of the tasks and values between 0 and 0.5

to the other task. The network was trained using the error

back propagation algorithm. The Neural Network toolbox

of Matlab was used for this part of the research.

2.2.3 Hidden Markov Model

The Hidden Markov Model is a finite set of states, each of

which is associated with a probability distribution which

is generally multidimensional like this case. Transitionsamong the states are governed by a set of probabilities

called transition probabilities. In a particular state an

outcome or observation can be generated, according to theassociated probability distribution [9]. In this research the

observation was the extracted EEG features explained

previously generated by a Gaussian mixture model that is

characterized by three matrices for mean, variance and

mixture percentages. There was a transition matrix formoving between our two states. These parameters are all

updated by the EM algorithm explained earlier and at the

end there will be two trained HMMs corresponding to thetwo mental tasks.To classify the test vectors given by the

5-fold cross validation scheme the likelihood of them tobelong to each of these HMMs were calculated. The onehaving more likelihood was assigned to that mental task.

Figure3. Gaussian mixture model represented as a simple graphical model. B stands for baseline and M for

Multiplication tasks in the Purdue dataset.


4/6

4

Table1.Bayesian Graphical Networks (BNT), Neural Network, Bayes Quadratic classifier, Fisher linear and Hidden

Markov Model are compared for classification of binary combinations of five mental tasks. The results in table are

averaged over 10 different possible binary combinations of mental tasks.

2.2.4 Bayesian Quadratic classifier

Given a set of classes (mental task here) M characterized

by a set of known parameters in model a set of EEG

extracted feature vector X belongs to the class (mental

task) which has the highest probability. This is shown in(1) and is known as Bayes decision rule

klXMPXMPMX lkk ),,|(),|( (1)

To calculate the a posteriori probability shown, Bayes lawwas used which finally by assuming that features are

distributed normally, lead to a quadratic classifier format

known as Bayes Quadratic classifier [12]. The parameters

are mean and covariance of our training vectors and

likelihoods are calculated as stated above.

2.2.5 Linear Fisher Classifier

According to equation0wXW

T+ in the case of two

classes (mental tasks) a linear classifier can assign a

negative value to feature vector X, belonging to onemental task and positive values to it belonging to the otherclass. The aim was to find W that reduces number of

misclassification and to do so there were some criterions

to be optimized [11]. The approach taken by fisher was to

find a linear combination of the variables that separates

the two classes as much as possible. That is, the directionis sought, along which the two classes are best separated

in some sense. The criterion proposed by Fisher is the

ratio between-class to with-in class variances. Formally, adirection w is wanted such that (2) is maximized.

wSw

mmwJ

W

T

T

F

2

21 |)(| = (2)

1m and 2m are group means and WS is the within class

sample covariance matrix [12].

3. RESULTSIn this section the results of classifications are presented.

First the Purdue dataset results are presented followed by

the results taken from Graz dataset.

3.1. Purdue dataset results:

Binary combinations of five mental tasks were classified

for subjects one, three five and six and classifications

were done on the total of two or three sessions of EEGdataset, depending on the subject. Considering the number

of mental tasks which is five this leads to 10 pairs of

binary mental tasks. For this dataset AR coefficients oforder six were considered as features vectors.

For each individual pair considering the 5-fold crossvalidation scheme, the classification was performed for 44

times and averaged over all combinations to compute the

final classification accuracy.In Table 1, the results for five

classifiers can be seen. Each item in Table 1, was

calculated by averaging over all ten pairs of mental tasks.The last row of the table is the average of each

classification method on all subjects altogether

For each individual pair considering the 5-fold cross

validation scheme, the classification was performed for 44

times and averaged over all combinations to compute thefinal classification accuracy. In Table 1, the results for

five classifiers can be seen. For each of the results in

Table 1 the above results were averaged for all ten pairs

of mental tasks. The last row of the table is the average of

each classification method on all subjects altogether.

As of Table 1, in subject one Bayesian network is better

than Bayesian quadratic classifier. In the average for all

subjects the Bayesian network is just two percent lower

than Bayes quadratic classifier and is better compared to Neural Network, Fisher linear classifier and HMM. On

the other hand the standard deviation is lower by Bayesiannetwork. This can show a more consistent classification

of this classifier. Considering the execution time Bayesian

network was the most time consuming classifier while

Fisher linear

Sub.

BNT

Neural

Network Bayes

Fisher

Linear HMM

1 94.072.2 92.482.9 93.782.8 91.152.7 70.188.8

3 87.433.9 85.044.3 89.223.5 82.774.1 64.109.1

5 82.482.8 82.613.0 86.583.4 81.793.1 62.437.8

6 90.312.7 89.393.1 92.493.2 90.383.1 64.618.3

means 88.573.0 87.383.4 90.513.2 86.633.3 65.338.5


5/6

5

Table2. The summary of the results of different groups on the Graz dataset. The three last rows are results obtained

in this research. Considering the value of MI for Bayesian network, the result of this work ranks second compared to

others.

Ranking Groups Minimum

Error

Maximum

SNR

Minimum MI

1 C 10.71 1.34 0.61

2 F 15.71 0.90 0.46

3 B 17.14 0.86 0.45

4 A 13.57 0.85 0.44

5 G 17.14 0.50 0.29

6 I 23.57 0.44 0.26

7 E 17.14 0.34 0.21

8 D 32.14 0.14 0.09

9 H 49.29 0.00 0.00

Bayesian network - 16.43 1.00 0.50

Neural network - 15.71 1.04 0.51

Bayes classifier - 17.14 0.71 0.38

classifier and Bayesian quadratic classifiers used the

minimum amount of time to be trained and classify EEG

extracted vectors.

3.2. Graz dataset results:

It is quite common to use the error rate for comparingdifferent methods. However, the error rate takes into

account just the sign of the classifier output but not the

magnitude. For this reason, the mutual information is usedto compare the different results. On the other hand, other

groups working on the same dataset have expressed their

results on Graz dataset in the form of mutual information,

so the present results can be compared with theirs by

using mutual information. For the Graz dataset AARcoefficients of order six were considered as feature

vectors [6].

Nine other groups from different universities have appliedtheir algorithms to the same dataset and their results can

be found in Table 2 together with the results from this

research (the last three rows). The details of other groupsalgorithms can also be found on the BCI 2003 website.

The time course of mutual information during time can be

seen in Figure 4.

4. DISCUSSION AND CONCLUSION

In this research, the EEG signal was classified using

different machine learning techniques. The algorithmswere applied to two known datasets in the BCI field and

the results were presented in the previous section.

Aside from the Bayesian quadratic classifier, and exceptfor subject six, the Bayesian network is better than other

classifiers for Purdue dataset. For subject one it is even

better than the Bayesian quadratic classifier. This

comparison is made according to classification accuracy.The Bayesian network has also kept comparable results in

the Graz dataset. While Bayesian quadratic classifier did

not have the same good results on Graz dataset.

From the point of the standard deviation of the error, the

Bayesian network has always been better compared to

other classifiers and this means a more consistentclassification. This improvement can also be seen in

Figure 4 which gives an almost smooth curve of mutual

information and classification error during time. TheBayesian network classifier was for the first time used for

such a purpose and gave good classification accuracy as

can be seen in the results nevertheless the execution time

was too long making this classifier unsuitable for online

BCI system developments at least by considering ourcurrent speed of processing.

The results with Bayesian quadratic classifier was more

for Purdue dataset, compared to others. It should also beconsidered that EEG signal is non-stationary, meaning

that EEG statistics varies during the time so the model

(refer to method section) may not well represent the signalif it is taken in many more sessions. This can be clearly

seen in the reduced results for subject five which is taken

over more sessions (three) compared to others (two) and

also the results take from Graz dataset that there are many

more sessions of EEG.

The Fisher linear classifier has given comparable

accuracy compared to nonlinear methods but withconsiderable less amount of time. In several developed

BCIs so far linear discriminant analysis has been

implemented as the classifier [1]. The HMM classifier hashad the lowest results compared to other classifiers. In the

present study a simple structure of HMM was

implemented. This might have been the reason for the

HMM resulting in significantly poor results compared to

other classifiers. Merging these classifiers in a form of ahybrid classifier will be the topic for the future work.


6/6

6

Figure4. Time course of mutual information (bits) and error rate (the bottom figure) for Bayesian network classifier

5. REFERENCES

[1] Jonathan R. Wolpaw, Niels Birbaumer, Dennis J.

McFarland, Gert Pfurtscheller, and Theresa M. VaughanBrain-Computer Interfaces for Communication and

Control Clinical Neurophysiology Vol. 113, p 767-791,

2002.

[2] Dennis J. McFarland, William A. Sarnacki, Theresa

M. Vaughan, Jonathan R. Wolpaw Brain-computer

interface (BCI) operation: signal and noise during earlytraining sessions Clinical Neurophysiology Vol. 116, p

5662, Jan 2005.

[3] Kouhyar Tavakolian Investigation and Comparison

of Different Mental Task Classification by Linear andNonlinear Techniques Applied to EEG Signal, Master of

Science thesis, Department of Electrical and Computer

Engineering, University of Tehran, July 2003.

[4] Kouhyar Tavakolian, A. M. Nasrabadi, Siamak

Rezaei, Selecting Better EEG channels for classificationof Mental Tasks, In the Proceedings of the IEEE

International Symposium On Circuits and Systems

ISCAS2004, pages 537-540, Vancouver, Canada, May

2004.

[5] Zachary A. Keirn Jorge I. Aunon A New Mode ofCommunication between Man and His Surroundings

IEEE Trans. On BME, Vol. 37, No. 12, Dec 1990.

[6]Graz dataset:

http://ida.first.fraunhofer.de/projects/bci/competition

[7] Kouhyar Tavakolian, Siamak Rezaei Classification ofMental Tasks Using Gaussian Mixture Bayesian Network

Classifiers IEEE international workshop on biomedical

circuits and systems. Singapore, Dec 2004.

[8] Todd K. Moon, Wynn C. Stirling MathematicalMethods and Algorithms for signal processing Prentice

Hall 2000.

[9] L. Rabiner A tutorial on Hidden Markov Models andselected applications in speech recognition Proc. IEEE

77(2):257286, 1989.

[10] Bayes Net Toolbox for Matlab written by Kevin

Murphy.

www.ai.mit.edu/~murphyk/Software/BNT/bnt.html

[11] Andrew Webb, Statistical Pattern Recognition,Arnold, London 1999.

[12] Sergios Theodoridis, KonstantinosKoutroumbas,Pattern Recognition, 1999 Academic Press.

[13] Finn V. Jensen. Bayesian Networks and Decision

Graphs. Springer 2001.

[14] Nai-Jen Huan and Ramaswamy Palaniappan Neural

network classification of autoregressive features fromelectroencephalogram signals for braincomputer

interface design Institute of Physics Publishing Journalof Neural EngineeringVol1 pages 142-150 2004.

[15] Matlab software website: www.mathwork.com

Kouhyar Tavakolian et al- Mental Task Classification for Brain Computer Interface Applications

Documents

Transcript of Kouhyar Tavakolian et al- Mental Task Classification for Brain Computer Interface Applications