Kouhyar Tavakolian et al- Mental Task Classification for Brain Computer Interface Applications

download Kouhyar Tavakolian et al- Mental Task Classification for Brain Computer Interface Applications

of 6

Transcript of Kouhyar Tavakolian et al- Mental Task Classification for Brain Computer Interface Applications

  • 8/3/2019 Kouhyar Tavakolian et al- Mental Task Classification for Brain Computer Interface Applications

    1/6

    1

    Mental Task Classification for Brain Computer Interface Applications

    Kouhyar Tavakolian,Engineering Science, Simon Fraser University

    Faratash Vasefi, Engineering Science, Simon Fraser University

    Kaveh Naziripour, Engineering Science, Simon Fraser University

    Siamak Rezaei, Computer Science, University of Northern British Columbia

    Abstract: In this work the application of different machine learning techniques for classification of mental tasks from

    Electroencephalograph (EEG) signals is investigated. The main application for this research is the improvement of brain

    computer interface (BCI) systems. For this purpose, Bayesian graphical network, Neural Network, Bayesian quadratic,

    Fisher linear and Hidden Markov Model classifiers are applied to two known EEG datasets in the BCI field. The Bayesian

    network classifier is used for the first time in this work for classification of EEG signals. The Bayesian network appeared

    to have significant accuracy. In addition to classical correct classification accuracy criteria, the mutual information is also

    used to compare the classification results with other BCI groups.

    Keywords: EEG, brain computer interface, Bayesian network classifier, neural networks, mutual information.

    1. INTRODUCTION

    Mental task classification by recognizingElectroencephalographic (EEG) patterns is an importantand challenging biomedical signal processing problem.

    Such classification can be utilized to enable a patient tocommunicate without any overt physical movement. This

    is done just by the computer processing of the patients

    brain waves as can be seen in the block diagram of Figure

    1. Developments of faster digital computers and betterEEG devices have motivated many researchers to work on

    BCI systems [1] [2].

    So far the accuracy of classification has been one of the

    main pitfalls of the developed BCI systems which directly

    affects the decisions made as the BCI output. Thisaccuracy is affected by the quality of EEG signal and the

    processing algorithms. The processing algorithms include preprocessing, feature extraction and feature

    classification. In our previous research the effect of

    different feature extraction algorithms [3] and different

    number of EEG channels [4] on classification accuracywas investigated. In the current work, the effects of

    different types of classifiers on the accuracy ofclassification are investigated and compared.

    In the present research, the classification of mental tasks

    using the Purdue University EEG dataset [5] and the EEG

    dataset from Department of Medical Informatics,University of Technology Graz [5] are investigated. Both

    datasets are known and well established datasets in the

    BCI field and are accessible from internet. Autoregressive(AR) and adaptive autoregressive coefficients (AAR)

    were extracted from the EEG windows for all classifiers.

    These extracted features were inputted to the next stage ofBCI, which is the classifier. The same extracted features

    for all classifiers facilitated the comparison of classifiers

    efficiencies.

    The main focus was on investigation and comparison offeed forward neural network, Bayesian quadratic,

    Bayesian network, Fisher linear classifier and Hidden

    Markov Models (HMM) in mental task classification.These classifiers are known methods in the machine

    learning literature. The classifiers are intentionally

    chosen to cover both linear and nonlinear methods. The

    Gaussian mixture model is represented as a Bayesiannetwork and this is the first time that such a classifier is

    used for the EEG signal classification [7].

    We trained the Bayesian network and Hidden Markov

    model using expectation maximization (EM) algorithm.Mixture models are a type of density models. They arecomprised of a number of component functions that in our

    case were Gaussians. These component functions are

    combined to provide a multimodal density.

  • 8/3/2019 Kouhyar Tavakolian et al- Mental Task Classification for Brain Computer Interface Applications

    2/6

    2

    Figure1. Flow of the methodology

    Following in the paper in the method section first thedatasets and preprocessing are introduced and then the

    five different classifiers are briefly explained. The EEG

    data was classified by these classifiers and results are

    presented in the tables and the figure of the result section.There is a discussion of results and a conclusion at the end

    of the paper.

    2. METHODS

    In this section first the EEG datasets are introduced andthen the applied machine learning algorithms are brieflyexplained as also can be seen in Figure 1.

    2.1. EEG recordings and Preprocessing

    The main differences between the two EEG datasets are

    that the Purdue dataset is taken during the performance offive mental tasks while for the Graz dataset there are just

    two mental activities of left and right hand movement. On

    the other hand, the Graz dataset has many more sessionscompared to the Purdue dataset.

    2.1.1 Purdue dataset: The Purdue dataset was acquired byAunon and Keirn [5] in university of Purdue and has been

    taken from seven subjects during performance of five

    different mental tasks. An elastic electrode cap was used

    to record from positions C3, C4, P3, P4, O1, and O2 onthe scalp as can be seen in Figure 2. Data was recorded at

    a sampling rate of 250 Hz with a 12 bit A/D converter.

    Eye blinks were detected by means of a separate channelof data recorded from two electrodes placed above and

    below the subject's left eye. The subjects were asked to

    perform five mental tasks:

    (a) Baseline task. The subjects were asked to relax as

    much as possible.

    (b) Letter task. The subjects were instructed to mentallycompose a letter to a friend or relative without vocalizing.

    (c) Math task. The subjects were given nontrivial

    multiplication problems, such as 49 times 78.

    (d) Visual counting task. The subjects were asked to

    imagine a blackboard and to visualize numbers beingwritten on the board sequentially.

    (e) Geometric figure rotation. The subjects were asked to

    visualize a particular three dimensional block figure being

    rotated about an axis. Data was recorded for 10 seconds

    during each task and each task was repeated five times persession.

    In this work the algorithms were applied to the subjectshaving more than one session of EEG signal which were

    subjects 1, 3, 6 and 5.

    The eye blinks were removed with two different methods.In the first method a simple time filter was used by

    excluding sudden jumps of EEG made by the eye

    movement. To do this the EOG channel of the EEG

    dataset was used. This was accomplished by calculatingthe average of the signal on windows of length 20

    milliseconds and then removing those windows havingaverages greater than two times the average of signal in

    window of length 500 millisecond. In the other method

    independent component analysis (ICA) was used which

    resulted in much better classification accuracy [7]. In thispaper the results with the time filter are reported because

    the differences of classifiers were more distinct in this

    method.

    Figure2. The electrode placement for Purdue dataset

    2.1.2 Graz dataset: The Graz dataset [6] was recorded

    from a normal female subject during a feedback session.

    The task was to control a feedback bar by means ofimagery and left or right hand movements. The

    experiment consisted of 7 runs with 40 trials each. All

    runs were conducted on the same day with several

    MentalTask EEG

    Signals

    FeatureExtraction

    AR or AAR

    Mental TaskRecognized

    (BCI output)

    Classifiers:HMM,

    Neural Networks,Bayesian networks,

    Bayes quadratic classifier

    Fisher linear classifier

    Pre-

    processing

  • 8/3/2019 Kouhyar Tavakolian et al- Mental Task Classification for Brain Computer Interface Applications

    3/6

    3

    Node1

    Node2

    Node3 Gaussian

    Mean, covariance

    Component

    1 or 2

    Class

    B/M

    minutes break in between them. Three bipolar EEG

    channels were measured over C3, Cz and C4. The EEGwas sampled with 128Hz sampling rate and was filtered

    between 0.5 and 30Hz. The trials for training and testing

    were randomly chosen. This can prevent any systematiceffect due to the feedback. In this dataset the eye artifact

    had been already removed.

    2.2. Classifiers

    In this section the five classifiers are briefly introduced

    for more detailed information one can refer to the

    references [8] [9] [10] [11] [12] and [13] or other texts onmachine learning or pattern recognition.

    2.2.1 Bayesian graphical network classifierBayesian Network is a modeling tool that combines

    directed acyclic graphs with Bayesian probability. Figure

    3 shows the example of Bayesian network which consists

    of a causal graph combined with an underlying

    probability distribution. Each node of the network in the

    figure corresponds to a variable and edges representcausality between these events. The other elements of a

    Bayesian network are probability distributions associated

    with each node. With this information the network canmodel probabilities of complex causal relationships [13].

    The graphical model corresponding to the Bayesian

    network used in this work is shown in Figure 3. Note thatthe square box in the figure corresponds to the input

    extracted features. The rectangular box corresponds to the

    Gaussian mixture components. The square and

    rectangular nodes represent discrete values while the

    round node in the figure represents continuous values.

    The graph structure of this model can be represented by

    the following adjacency matrix: 011, 001, 000. TheBayesian Network Toolbox (BNT) was used forimplementing the classifier [10]. The model was trained

    using the EM algorithm. EM works by starting with a

    randomly initialized model (mean and covariance here),

    and then iteratively refining the model

    parameters to produce a locally optimal maximum-

    likelihood fit. So, the EMalgorithm is composed of twosteps. In the first step, each data point undergoes a soft-

    assignment to each mixture component. In the second

    step, the parameters of the model are adjusted to fit thedata based on the soft assignment of the previous step [8].

    2.2.2 Neural Networks

    For this classifier a two layer feed forward neural network

    was implemented with 20 neurons in the hidden layer and

    one neuron in the output layer. This structure (using 20

    neurons in the hidden layer) was found to be the optimumone according to [3]. We set a 0.5 threshold for the output

    neuron. Values more than 0.5 and lower than 1 wereassigned to one of the tasks and values between 0 and 0.5

    to the other task. The network was trained using the error

    back propagation algorithm. The Neural Network toolbox

    of Matlab was used for this part of the research.

    2.2.3 Hidden Markov Model

    The Hidden Markov Model is a finite set of states, each of

    which is associated with a probability distribution which

    is generally multidimensional like this case. Transitionsamong the states are governed by a set of probabilities

    called transition probabilities. In a particular state an

    outcome or observation can be generated, according to theassociated probability distribution [9]. In this research the

    observation was the extracted EEG features explained

    previously generated by a Gaussian mixture model that is

    characterized by three matrices for mean, variance and

    mixture percentages. There was a transition matrix formoving between our two states. These parameters are all

    updated by the EM algorithm explained earlier and at the

    end there will be two trained HMMs corresponding to thetwo mental tasks.To classify the test vectors given by the

    5-fold cross validation scheme the likelihood of them tobelong to each of these HMMs were calculated. The onehaving more likelihood was assigned to that mental task.

    Figure3. Gaussian mixture model represented as a simple graphical model. B stands for baseline and M for

    Multiplication tasks in the Purdue dataset.

  • 8/3/2019 Kouhyar Tavakolian et al- Mental Task Classification for Brain Computer Interface Applications

    4/6

    4

    Table1.Bayesian Graphical Networks (BNT), Neural Network, Bayes Quadratic classifier, Fisher linear and Hidden

    Markov Model are compared for classification of binary combinations of five mental tasks. The results in table are

    averaged over 10 different possible binary combinations of mental tasks.

    2.2.4 Bayesian Quadratic classifier

    Given a set of classes (mental task here) M characterized

    by a set of known parameters in model a set of EEG

    extracted feature vector X belongs to the class (mental

    task) which has the highest probability. This is shown in(1) and is known as Bayes decision rule

    klXMPXMPMX lkk ),,|(),|( (1)

    To calculate the a posteriori probability shown, Bayes lawwas used which finally by assuming that features are

    distributed normally, lead to a quadratic classifier format

    known as Bayes Quadratic classifier [12]. The parameters

    are mean and covariance of our training vectors and

    likelihoods are calculated as stated above.

    2.2.5 Linear Fisher Classifier

    According to equation0wXW

    T+ in the case of two

    classes (mental tasks) a linear classifier can assign a

    negative value to feature vector X, belonging to onemental task and positive values to it belonging to the otherclass. The aim was to find W that reduces number of

    misclassification and to do so there were some criterions

    to be optimized [11]. The approach taken by fisher was to

    find a linear combination of the variables that separates

    the two classes as much as possible. That is, the directionis sought, along which the two classes are best separated

    in some sense. The criterion proposed by Fisher is the

    ratio between-class to with-in class variances. Formally, adirection w is wanted such that (2) is maximized.

    wSw

    mmwJ

    W

    T

    T

    F

    2

    21 |)(| = (2)

    1m and 2m are group means and WS is the within class

    sample covariance matrix [12].

    3. RESULTSIn this section the results of classifications are presented.

    First the Purdue dataset results are presented followed by

    the results taken from Graz dataset.

    3.1. Purdue dataset results:

    Binary combinations of five mental tasks were classified

    for subjects one, three five and six and classifications

    were done on the total of two or three sessions of EEGdataset, depending on the subject. Considering the number

    of mental tasks which is five this leads to 10 pairs of

    binary mental tasks. For this dataset AR coefficients oforder six were considered as features vectors.

    For each individual pair considering the 5-fold crossvalidation scheme, the classification was performed for 44

    times and averaged over all combinations to compute the

    final classification accuracy.In Table 1, the results for five

    classifiers can be seen. Each item in Table 1, was

    calculated by averaging over all ten pairs of mental tasks.The last row of the table is the average of each

    classification method on all subjects altogether

    For each individual pair considering the 5-fold cross

    validation scheme, the classification was performed for 44

    times and averaged over all combinations to compute thefinal classification accuracy. In Table 1, the results for

    five classifiers can be seen. For each of the results in

    Table 1 the above results were averaged for all ten pairs

    of mental tasks. The last row of the table is the average of

    each classification method on all subjects altogether.

    As of Table 1, in subject one Bayesian network is better

    than Bayesian quadratic classifier. In the average for all

    subjects the Bayesian network is just two percent lower

    than Bayes quadratic classifier and is better compared to Neural Network, Fisher linear classifier and HMM. On

    the other hand the standard deviation is lower by Bayesiannetwork. This can show a more consistent classification

    of this classifier. Considering the execution time Bayesian

    network was the most time consuming classifier while

    Fisher linear

    Sub.

    BNT

    Neural

    Network Bayes

    Fisher

    Linear HMM

    1 94.072.2 92.482.9 93.782.8 91.152.7 70.188.8

    3 87.433.9 85.044.3 89.223.5 82.774.1 64.109.1

    5 82.482.8 82.613.0 86.583.4 81.793.1 62.437.8

    6 90.312.7 89.393.1 92.493.2 90.383.1 64.618.3

    means 88.573.0 87.383.4 90.513.2 86.633.3 65.338.5

  • 8/3/2019 Kouhyar Tavakolian et al- Mental Task Classification for Brain Computer Interface Applications

    5/6

    5

    Table2. The summary of the results of different groups on the Graz dataset. The three last rows are results obtained

    in this research. Considering the value of MI for Bayesian network, the result of this work ranks second compared to

    others.

    Ranking Groups Minimum

    Error

    Maximum

    SNR

    Minimum MI

    1 C 10.71 1.34 0.61

    2 F 15.71 0.90 0.46

    3 B 17.14 0.86 0.45

    4 A 13.57 0.85 0.44

    5 G 17.14 0.50 0.29

    6 I 23.57 0.44 0.26

    7 E 17.14 0.34 0.21

    8 D 32.14 0.14 0.09

    9 H 49.29 0.00 0.00

    Bayesian network - 16.43 1.00 0.50

    Neural network - 15.71 1.04 0.51

    Bayes classifier - 17.14 0.71 0.38

    classifier and Bayesian quadratic classifiers used the

    minimum amount of time to be trained and classify EEG

    extracted vectors.

    3.2. Graz dataset results:

    It is quite common to use the error rate for comparingdifferent methods. However, the error rate takes into

    account just the sign of the classifier output but not the

    magnitude. For this reason, the mutual information is usedto compare the different results. On the other hand, other

    groups working on the same dataset have expressed their

    results on Graz dataset in the form of mutual information,

    so the present results can be compared with theirs by

    using mutual information. For the Graz dataset AARcoefficients of order six were considered as feature

    vectors [6].

    Nine other groups from different universities have appliedtheir algorithms to the same dataset and their results can

    be found in Table 2 together with the results from this

    research (the last three rows). The details of other groupsalgorithms can also be found on the BCI 2003 website.

    The time course of mutual information during time can be

    seen in Figure 4.

    4. DISCUSSION AND CONCLUSION

    In this research, the EEG signal was classified using

    different machine learning techniques. The algorithmswere applied to two known datasets in the BCI field and

    the results were presented in the previous section.

    Aside from the Bayesian quadratic classifier, and exceptfor subject six, the Bayesian network is better than other

    classifiers for Purdue dataset. For subject one it is even

    better than the Bayesian quadratic classifier. This

    comparison is made according to classification accuracy.The Bayesian network has also kept comparable results in

    the Graz dataset. While Bayesian quadratic classifier did

    not have the same good results on Graz dataset.

    From the point of the standard deviation of the error, the

    Bayesian network has always been better compared to

    other classifiers and this means a more consistentclassification. This improvement can also be seen in

    Figure 4 which gives an almost smooth curve of mutual

    information and classification error during time. TheBayesian network classifier was for the first time used for

    such a purpose and gave good classification accuracy as

    can be seen in the results nevertheless the execution time

    was too long making this classifier unsuitable for online

    BCI system developments at least by considering ourcurrent speed of processing.

    The results with Bayesian quadratic classifier was more

    for Purdue dataset, compared to others. It should also beconsidered that EEG signal is non-stationary, meaning

    that EEG statistics varies during the time so the model

    (refer to method section) may not well represent the signalif it is taken in many more sessions. This can be clearly

    seen in the reduced results for subject five which is taken

    over more sessions (three) compared to others (two) and

    also the results take from Graz dataset that there are many

    more sessions of EEG.

    The Fisher linear classifier has given comparable

    accuracy compared to nonlinear methods but withconsiderable less amount of time. In several developed

    BCIs so far linear discriminant analysis has been

    implemented as the classifier [1]. The HMM classifier hashad the lowest results compared to other classifiers. In the

    present study a simple structure of HMM was

    implemented. This might have been the reason for the

    HMM resulting in significantly poor results compared to

    other classifiers. Merging these classifiers in a form of ahybrid classifier will be the topic for the future work.

  • 8/3/2019 Kouhyar Tavakolian et al- Mental Task Classification for Brain Computer Interface Applications

    6/6

    6

    Figure4. Time course of mutual information (bits) and error rate (the bottom figure) for Bayesian network classifier

    5. REFERENCES

    [1] Jonathan R. Wolpaw, Niels Birbaumer, Dennis J.

    McFarland, Gert Pfurtscheller, and Theresa M. VaughanBrain-Computer Interfaces for Communication and

    Control Clinical Neurophysiology Vol. 113, p 767-791,

    2002.

    [2] Dennis J. McFarland, William A. Sarnacki, Theresa

    M. Vaughan, Jonathan R. Wolpaw Brain-computer

    interface (BCI) operation: signal and noise during earlytraining sessions Clinical Neurophysiology Vol. 116, p

    5662, Jan 2005.

    [3] Kouhyar Tavakolian Investigation and Comparison

    of Different Mental Task Classification by Linear andNonlinear Techniques Applied to EEG Signal, Master of

    Science thesis, Department of Electrical and Computer

    Engineering, University of Tehran, July 2003.

    [4] Kouhyar Tavakolian, A. M. Nasrabadi, Siamak

    Rezaei, Selecting Better EEG channels for classificationof Mental Tasks, In the Proceedings of the IEEE

    International Symposium On Circuits and Systems

    ISCAS2004, pages 537-540, Vancouver, Canada, May

    2004.

    [5] Zachary A. Keirn Jorge I. Aunon A New Mode ofCommunication between Man and His Surroundings

    IEEE Trans. On BME, Vol. 37, No. 12, Dec 1990.

    [6]Graz dataset:

    http://ida.first.fraunhofer.de/projects/bci/competition

    [7] Kouhyar Tavakolian, Siamak Rezaei Classification ofMental Tasks Using Gaussian Mixture Bayesian Network

    Classifiers IEEE international workshop on biomedical

    circuits and systems. Singapore, Dec 2004.

    [8] Todd K. Moon, Wynn C. Stirling MathematicalMethods and Algorithms for signal processing Prentice

    Hall 2000.

    [9] L. Rabiner A tutorial on Hidden Markov Models andselected applications in speech recognition Proc. IEEE

    77(2):257286, 1989.

    [10] Bayes Net Toolbox for Matlab written by Kevin

    Murphy.

    www.ai.mit.edu/~murphyk/Software/BNT/bnt.html

    [11] Andrew Webb, Statistical Pattern Recognition,Arnold, London 1999.

    [12] Sergios Theodoridis, KonstantinosKoutroumbas,Pattern Recognition, 1999 Academic Press.

    [13] Finn V. Jensen. Bayesian Networks and Decision

    Graphs. Springer 2001.

    [14] Nai-Jen Huan and Ramaswamy Palaniappan Neural

    network classification of autoregressive features fromelectroencephalogram signals for braincomputer

    interface design Institute of Physics Publishing Journalof Neural EngineeringVol1 pages 142-150 2004.

    [15] Matlab software website: www.mathwork.com