Kouhyar Tavakolian et al- Mental Task Classification for Brain Computer Interface Applications
Transcript of Kouhyar Tavakolian et al- Mental Task Classification for Brain Computer Interface Applications
-
8/3/2019 Kouhyar Tavakolian et al- Mental Task Classification for Brain Computer Interface Applications
1/6
1
Mental Task Classification for Brain Computer Interface Applications
Kouhyar Tavakolian,Engineering Science, Simon Fraser University
Faratash Vasefi, Engineering Science, Simon Fraser University
Kaveh Naziripour, Engineering Science, Simon Fraser University
Siamak Rezaei, Computer Science, University of Northern British Columbia
Abstract: In this work the application of different machine learning techniques for classification of mental tasks from
Electroencephalograph (EEG) signals is investigated. The main application for this research is the improvement of brain
computer interface (BCI) systems. For this purpose, Bayesian graphical network, Neural Network, Bayesian quadratic,
Fisher linear and Hidden Markov Model classifiers are applied to two known EEG datasets in the BCI field. The Bayesian
network classifier is used for the first time in this work for classification of EEG signals. The Bayesian network appeared
to have significant accuracy. In addition to classical correct classification accuracy criteria, the mutual information is also
used to compare the classification results with other BCI groups.
Keywords: EEG, brain computer interface, Bayesian network classifier, neural networks, mutual information.
1. INTRODUCTION
Mental task classification by recognizingElectroencephalographic (EEG) patterns is an importantand challenging biomedical signal processing problem.
Such classification can be utilized to enable a patient tocommunicate without any overt physical movement. This
is done just by the computer processing of the patients
brain waves as can be seen in the block diagram of Figure
1. Developments of faster digital computers and betterEEG devices have motivated many researchers to work on
BCI systems [1] [2].
So far the accuracy of classification has been one of the
main pitfalls of the developed BCI systems which directly
affects the decisions made as the BCI output. Thisaccuracy is affected by the quality of EEG signal and the
processing algorithms. The processing algorithms include preprocessing, feature extraction and feature
classification. In our previous research the effect of
different feature extraction algorithms [3] and different
number of EEG channels [4] on classification accuracywas investigated. In the current work, the effects of
different types of classifiers on the accuracy ofclassification are investigated and compared.
In the present research, the classification of mental tasks
using the Purdue University EEG dataset [5] and the EEG
dataset from Department of Medical Informatics,University of Technology Graz [5] are investigated. Both
datasets are known and well established datasets in the
BCI field and are accessible from internet. Autoregressive(AR) and adaptive autoregressive coefficients (AAR)
were extracted from the EEG windows for all classifiers.
These extracted features were inputted to the next stage ofBCI, which is the classifier. The same extracted features
for all classifiers facilitated the comparison of classifiers
efficiencies.
The main focus was on investigation and comparison offeed forward neural network, Bayesian quadratic,
Bayesian network, Fisher linear classifier and Hidden
Markov Models (HMM) in mental task classification.These classifiers are known methods in the machine
learning literature. The classifiers are intentionally
chosen to cover both linear and nonlinear methods. The
Gaussian mixture model is represented as a Bayesiannetwork and this is the first time that such a classifier is
used for the EEG signal classification [7].
We trained the Bayesian network and Hidden Markov
model using expectation maximization (EM) algorithm.Mixture models are a type of density models. They arecomprised of a number of component functions that in our
case were Gaussians. These component functions are
combined to provide a multimodal density.
-
8/3/2019 Kouhyar Tavakolian et al- Mental Task Classification for Brain Computer Interface Applications
2/6
2
Figure1. Flow of the methodology
Following in the paper in the method section first thedatasets and preprocessing are introduced and then the
five different classifiers are briefly explained. The EEG
data was classified by these classifiers and results are
presented in the tables and the figure of the result section.There is a discussion of results and a conclusion at the end
of the paper.
2. METHODS
In this section first the EEG datasets are introduced andthen the applied machine learning algorithms are brieflyexplained as also can be seen in Figure 1.
2.1. EEG recordings and Preprocessing
The main differences between the two EEG datasets are
that the Purdue dataset is taken during the performance offive mental tasks while for the Graz dataset there are just
two mental activities of left and right hand movement. On
the other hand, the Graz dataset has many more sessionscompared to the Purdue dataset.
2.1.1 Purdue dataset: The Purdue dataset was acquired byAunon and Keirn [5] in university of Purdue and has been
taken from seven subjects during performance of five
different mental tasks. An elastic electrode cap was used
to record from positions C3, C4, P3, P4, O1, and O2 onthe scalp as can be seen in Figure 2. Data was recorded at
a sampling rate of 250 Hz with a 12 bit A/D converter.
Eye blinks were detected by means of a separate channelof data recorded from two electrodes placed above and
below the subject's left eye. The subjects were asked to
perform five mental tasks:
(a) Baseline task. The subjects were asked to relax as
much as possible.
(b) Letter task. The subjects were instructed to mentallycompose a letter to a friend or relative without vocalizing.
(c) Math task. The subjects were given nontrivial
multiplication problems, such as 49 times 78.
(d) Visual counting task. The subjects were asked to
imagine a blackboard and to visualize numbers beingwritten on the board sequentially.
(e) Geometric figure rotation. The subjects were asked to
visualize a particular three dimensional block figure being
rotated about an axis. Data was recorded for 10 seconds
during each task and each task was repeated five times persession.
In this work the algorithms were applied to the subjectshaving more than one session of EEG signal which were
subjects 1, 3, 6 and 5.
The eye blinks were removed with two different methods.In the first method a simple time filter was used by
excluding sudden jumps of EEG made by the eye
movement. To do this the EOG channel of the EEG
dataset was used. This was accomplished by calculatingthe average of the signal on windows of length 20
milliseconds and then removing those windows havingaverages greater than two times the average of signal in
window of length 500 millisecond. In the other method
independent component analysis (ICA) was used which
resulted in much better classification accuracy [7]. In thispaper the results with the time filter are reported because
the differences of classifiers were more distinct in this
method.
Figure2. The electrode placement for Purdue dataset
2.1.2 Graz dataset: The Graz dataset [6] was recorded
from a normal female subject during a feedback session.
The task was to control a feedback bar by means ofimagery and left or right hand movements. The
experiment consisted of 7 runs with 40 trials each. All
runs were conducted on the same day with several
MentalTask EEG
Signals
FeatureExtraction
AR or AAR
Mental TaskRecognized
(BCI output)
Classifiers:HMM,
Neural Networks,Bayesian networks,
Bayes quadratic classifier
Fisher linear classifier
Pre-
processing
-
8/3/2019 Kouhyar Tavakolian et al- Mental Task Classification for Brain Computer Interface Applications
3/6
3
Node1
Node2
Node3 Gaussian
Mean, covariance
Component
1 or 2
Class
B/M
minutes break in between them. Three bipolar EEG
channels were measured over C3, Cz and C4. The EEGwas sampled with 128Hz sampling rate and was filtered
between 0.5 and 30Hz. The trials for training and testing
were randomly chosen. This can prevent any systematiceffect due to the feedback. In this dataset the eye artifact
had been already removed.
2.2. Classifiers
In this section the five classifiers are briefly introduced
for more detailed information one can refer to the
references [8] [9] [10] [11] [12] and [13] or other texts onmachine learning or pattern recognition.
2.2.1 Bayesian graphical network classifierBayesian Network is a modeling tool that combines
directed acyclic graphs with Bayesian probability. Figure
3 shows the example of Bayesian network which consists
of a causal graph combined with an underlying
probability distribution. Each node of the network in the
figure corresponds to a variable and edges representcausality between these events. The other elements of a
Bayesian network are probability distributions associated
with each node. With this information the network canmodel probabilities of complex causal relationships [13].
The graphical model corresponding to the Bayesian
network used in this work is shown in Figure 3. Note thatthe square box in the figure corresponds to the input
extracted features. The rectangular box corresponds to the
Gaussian mixture components. The square and
rectangular nodes represent discrete values while the
round node in the figure represents continuous values.
The graph structure of this model can be represented by
the following adjacency matrix: 011, 001, 000. TheBayesian Network Toolbox (BNT) was used forimplementing the classifier [10]. The model was trained
using the EM algorithm. EM works by starting with a
randomly initialized model (mean and covariance here),
and then iteratively refining the model
parameters to produce a locally optimal maximum-
likelihood fit. So, the EMalgorithm is composed of twosteps. In the first step, each data point undergoes a soft-
assignment to each mixture component. In the second
step, the parameters of the model are adjusted to fit thedata based on the soft assignment of the previous step [8].
2.2.2 Neural Networks
For this classifier a two layer feed forward neural network
was implemented with 20 neurons in the hidden layer and
one neuron in the output layer. This structure (using 20
neurons in the hidden layer) was found to be the optimumone according to [3]. We set a 0.5 threshold for the output
neuron. Values more than 0.5 and lower than 1 wereassigned to one of the tasks and values between 0 and 0.5
to the other task. The network was trained using the error
back propagation algorithm. The Neural Network toolbox
of Matlab was used for this part of the research.
2.2.3 Hidden Markov Model
The Hidden Markov Model is a finite set of states, each of
which is associated with a probability distribution which
is generally multidimensional like this case. Transitionsamong the states are governed by a set of probabilities
called transition probabilities. In a particular state an
outcome or observation can be generated, according to theassociated probability distribution [9]. In this research the
observation was the extracted EEG features explained
previously generated by a Gaussian mixture model that is
characterized by three matrices for mean, variance and
mixture percentages. There was a transition matrix formoving between our two states. These parameters are all
updated by the EM algorithm explained earlier and at the
end there will be two trained HMMs corresponding to thetwo mental tasks.To classify the test vectors given by the
5-fold cross validation scheme the likelihood of them tobelong to each of these HMMs were calculated. The onehaving more likelihood was assigned to that mental task.
Figure3. Gaussian mixture model represented as a simple graphical model. B stands for baseline and M for
Multiplication tasks in the Purdue dataset.
-
8/3/2019 Kouhyar Tavakolian et al- Mental Task Classification for Brain Computer Interface Applications
4/6
4
Table1.Bayesian Graphical Networks (BNT), Neural Network, Bayes Quadratic classifier, Fisher linear and Hidden
Markov Model are compared for classification of binary combinations of five mental tasks. The results in table are
averaged over 10 different possible binary combinations of mental tasks.
2.2.4 Bayesian Quadratic classifier
Given a set of classes (mental task here) M characterized
by a set of known parameters in model a set of EEG
extracted feature vector X belongs to the class (mental
task) which has the highest probability. This is shown in(1) and is known as Bayes decision rule
klXMPXMPMX lkk ),,|(),|( (1)
To calculate the a posteriori probability shown, Bayes lawwas used which finally by assuming that features are
distributed normally, lead to a quadratic classifier format
known as Bayes Quadratic classifier [12]. The parameters
are mean and covariance of our training vectors and
likelihoods are calculated as stated above.
2.2.5 Linear Fisher Classifier
According to equation0wXW
T+ in the case of two
classes (mental tasks) a linear classifier can assign a
negative value to feature vector X, belonging to onemental task and positive values to it belonging to the otherclass. The aim was to find W that reduces number of
misclassification and to do so there were some criterions
to be optimized [11]. The approach taken by fisher was to
find a linear combination of the variables that separates
the two classes as much as possible. That is, the directionis sought, along which the two classes are best separated
in some sense. The criterion proposed by Fisher is the
ratio between-class to with-in class variances. Formally, adirection w is wanted such that (2) is maximized.
wSw
mmwJ
W
T
T
F
2
21 |)(| = (2)
1m and 2m are group means and WS is the within class
sample covariance matrix [12].
3. RESULTSIn this section the results of classifications are presented.
First the Purdue dataset results are presented followed by
the results taken from Graz dataset.
3.1. Purdue dataset results:
Binary combinations of five mental tasks were classified
for subjects one, three five and six and classifications
were done on the total of two or three sessions of EEGdataset, depending on the subject. Considering the number
of mental tasks which is five this leads to 10 pairs of
binary mental tasks. For this dataset AR coefficients oforder six were considered as features vectors.
For each individual pair considering the 5-fold crossvalidation scheme, the classification was performed for 44
times and averaged over all combinations to compute the
final classification accuracy.In Table 1, the results for five
classifiers can be seen. Each item in Table 1, was
calculated by averaging over all ten pairs of mental tasks.The last row of the table is the average of each
classification method on all subjects altogether
For each individual pair considering the 5-fold cross
validation scheme, the classification was performed for 44
times and averaged over all combinations to compute thefinal classification accuracy. In Table 1, the results for
five classifiers can be seen. For each of the results in
Table 1 the above results were averaged for all ten pairs
of mental tasks. The last row of the table is the average of
each classification method on all subjects altogether.
As of Table 1, in subject one Bayesian network is better
than Bayesian quadratic classifier. In the average for all
subjects the Bayesian network is just two percent lower
than Bayes quadratic classifier and is better compared to Neural Network, Fisher linear classifier and HMM. On
the other hand the standard deviation is lower by Bayesiannetwork. This can show a more consistent classification
of this classifier. Considering the execution time Bayesian
network was the most time consuming classifier while
Fisher linear
Sub.
BNT
Neural
Network Bayes
Fisher
Linear HMM
1 94.072.2 92.482.9 93.782.8 91.152.7 70.188.8
3 87.433.9 85.044.3 89.223.5 82.774.1 64.109.1
5 82.482.8 82.613.0 86.583.4 81.793.1 62.437.8
6 90.312.7 89.393.1 92.493.2 90.383.1 64.618.3
means 88.573.0 87.383.4 90.513.2 86.633.3 65.338.5
-
8/3/2019 Kouhyar Tavakolian et al- Mental Task Classification for Brain Computer Interface Applications
5/6
5
Table2. The summary of the results of different groups on the Graz dataset. The three last rows are results obtained
in this research. Considering the value of MI for Bayesian network, the result of this work ranks second compared to
others.
Ranking Groups Minimum
Error
Maximum
SNR
Minimum MI
1 C 10.71 1.34 0.61
2 F 15.71 0.90 0.46
3 B 17.14 0.86 0.45
4 A 13.57 0.85 0.44
5 G 17.14 0.50 0.29
6 I 23.57 0.44 0.26
7 E 17.14 0.34 0.21
8 D 32.14 0.14 0.09
9 H 49.29 0.00 0.00
Bayesian network - 16.43 1.00 0.50
Neural network - 15.71 1.04 0.51
Bayes classifier - 17.14 0.71 0.38
classifier and Bayesian quadratic classifiers used the
minimum amount of time to be trained and classify EEG
extracted vectors.
3.2. Graz dataset results:
It is quite common to use the error rate for comparingdifferent methods. However, the error rate takes into
account just the sign of the classifier output but not the
magnitude. For this reason, the mutual information is usedto compare the different results. On the other hand, other
groups working on the same dataset have expressed their
results on Graz dataset in the form of mutual information,
so the present results can be compared with theirs by
using mutual information. For the Graz dataset AARcoefficients of order six were considered as feature
vectors [6].
Nine other groups from different universities have appliedtheir algorithms to the same dataset and their results can
be found in Table 2 together with the results from this
research (the last three rows). The details of other groupsalgorithms can also be found on the BCI 2003 website.
The time course of mutual information during time can be
seen in Figure 4.
4. DISCUSSION AND CONCLUSION
In this research, the EEG signal was classified using
different machine learning techniques. The algorithmswere applied to two known datasets in the BCI field and
the results were presented in the previous section.
Aside from the Bayesian quadratic classifier, and exceptfor subject six, the Bayesian network is better than other
classifiers for Purdue dataset. For subject one it is even
better than the Bayesian quadratic classifier. This
comparison is made according to classification accuracy.The Bayesian network has also kept comparable results in
the Graz dataset. While Bayesian quadratic classifier did
not have the same good results on Graz dataset.
From the point of the standard deviation of the error, the
Bayesian network has always been better compared to
other classifiers and this means a more consistentclassification. This improvement can also be seen in
Figure 4 which gives an almost smooth curve of mutual
information and classification error during time. TheBayesian network classifier was for the first time used for
such a purpose and gave good classification accuracy as
can be seen in the results nevertheless the execution time
was too long making this classifier unsuitable for online
BCI system developments at least by considering ourcurrent speed of processing.
The results with Bayesian quadratic classifier was more
for Purdue dataset, compared to others. It should also beconsidered that EEG signal is non-stationary, meaning
that EEG statistics varies during the time so the model
(refer to method section) may not well represent the signalif it is taken in many more sessions. This can be clearly
seen in the reduced results for subject five which is taken
over more sessions (three) compared to others (two) and
also the results take from Graz dataset that there are many
more sessions of EEG.
The Fisher linear classifier has given comparable
accuracy compared to nonlinear methods but withconsiderable less amount of time. In several developed
BCIs so far linear discriminant analysis has been
implemented as the classifier [1]. The HMM classifier hashad the lowest results compared to other classifiers. In the
present study a simple structure of HMM was
implemented. This might have been the reason for the
HMM resulting in significantly poor results compared to
other classifiers. Merging these classifiers in a form of ahybrid classifier will be the topic for the future work.
-
8/3/2019 Kouhyar Tavakolian et al- Mental Task Classification for Brain Computer Interface Applications
6/6
6
Figure4. Time course of mutual information (bits) and error rate (the bottom figure) for Bayesian network classifier
5. REFERENCES
[1] Jonathan R. Wolpaw, Niels Birbaumer, Dennis J.
McFarland, Gert Pfurtscheller, and Theresa M. VaughanBrain-Computer Interfaces for Communication and
Control Clinical Neurophysiology Vol. 113, p 767-791,
2002.
[2] Dennis J. McFarland, William A. Sarnacki, Theresa
M. Vaughan, Jonathan R. Wolpaw Brain-computer
interface (BCI) operation: signal and noise during earlytraining sessions Clinical Neurophysiology Vol. 116, p
5662, Jan 2005.
[3] Kouhyar Tavakolian Investigation and Comparison
of Different Mental Task Classification by Linear andNonlinear Techniques Applied to EEG Signal, Master of
Science thesis, Department of Electrical and Computer
Engineering, University of Tehran, July 2003.
[4] Kouhyar Tavakolian, A. M. Nasrabadi, Siamak
Rezaei, Selecting Better EEG channels for classificationof Mental Tasks, In the Proceedings of the IEEE
International Symposium On Circuits and Systems
ISCAS2004, pages 537-540, Vancouver, Canada, May
2004.
[5] Zachary A. Keirn Jorge I. Aunon A New Mode ofCommunication between Man and His Surroundings
IEEE Trans. On BME, Vol. 37, No. 12, Dec 1990.
[6]Graz dataset:
http://ida.first.fraunhofer.de/projects/bci/competition
[7] Kouhyar Tavakolian, Siamak Rezaei Classification ofMental Tasks Using Gaussian Mixture Bayesian Network
Classifiers IEEE international workshop on biomedical
circuits and systems. Singapore, Dec 2004.
[8] Todd K. Moon, Wynn C. Stirling MathematicalMethods and Algorithms for signal processing Prentice
Hall 2000.
[9] L. Rabiner A tutorial on Hidden Markov Models andselected applications in speech recognition Proc. IEEE
77(2):257286, 1989.
[10] Bayes Net Toolbox for Matlab written by Kevin
Murphy.
www.ai.mit.edu/~murphyk/Software/BNT/bnt.html
[11] Andrew Webb, Statistical Pattern Recognition,Arnold, London 1999.
[12] Sergios Theodoridis, KonstantinosKoutroumbas,Pattern Recognition, 1999 Academic Press.
[13] Finn V. Jensen. Bayesian Networks and Decision
Graphs. Springer 2001.
[14] Nai-Jen Huan and Ramaswamy Palaniappan Neural
network classification of autoregressive features fromelectroencephalogram signals for braincomputer
interface design Institute of Physics Publishing Journalof Neural EngineeringVol1 pages 142-150 2004.
[15] Matlab software website: www.mathwork.com