Chapter 15: Classification of Time-Embedded EEG Using Short-Time Principal Component Analysis

Chapter 15: Classification of Time-Embedded EEG Using Short-Time Principal Component Analysis

by Nguyen Duc Thang

5/2009

Outline

Part one Introduction Principal Component Analysis (PCA) Signal Fraction Analysis (SFA) EEG signal representation Short time PCA

Part two Classifier Experimental setups, results, and analysis

Outline

Overview the previous presentations Common Spatial Patterns (CSP)

Classifiers Experimental setups and results Analysis, discussion, and conclusions

An architecture of EEG-based BCI system

Feature extraction

Classification

PCA, SFA,

Short time PCA

LDA, SVM

The shortcomings of conventional PCA

projection line

Not good for large number of samples

Short time PCA approach

Apply PCA on short durations

Extract short time PCA features

D

Time-embedded features

h

D

h

window

PC

A n basic vectors

D

n

stack

Short time PCA features

1 X Dn

The role of Singular Value Decomposition (SVD) in PCA

w1

w2

Using SVD, we can compute the eigenvector w of covariance matrix Cx (maximize variance)

wCw xT

wmax

Generalized SVDUsing GSVD, we can find generalized eigenvector w that maximizes the variance when projecting data A into w and minimizes the variance when projecting data B into w

wBw

wAw

yT

w

xT

w

min

max

Minimize variance

A

B

w

Maximize variance

Common Spatial Pattern

For 2-classes: Choose m eigenvectors, that maximize the

variance of class A and minimize the variance of class B

Choose m eigenvectors, that maximize the variance of class B and minimize the variance of class A

The basic vectors W = total 2m eigenvectors Examples: Distinguish left-hand movement

and right-hand movement

Common Spatial Pattern

For n-classes (Combine with classifier) n-classes are converted to n(n-1)/2 2-classes

CSP

New trials are assigned to the class for which most classifiers (2-classes) are voting

A

B

C

D

(AB), (BC), (CD), (DA), (AC), (BD)

Outline

Overview the previous presentations Common Spatial Patterns


Linear Discriminant Analysis (LDA) LDA is a simple classification approach in which

the samples from each class are modeled by Gaussian distribution

)()(21

1

1

)|(

))((1

1

kT

k

k

k

xx

K

k Cx

Tkk

Cxkk

ekClassxP

xxKN

xN

Linear discriminant boundary

)(21)()(

21

21

)|()|(

111

1111

)()(21)()(

21 11

jTji

Tiji

Tij

jTjj

Ti

Tii

T

xxxx

xx

xx

ee

jClassxPiClassxP

jT

jiT

i

Boundary

Linear discriminant boundary

12 13

12

13

23

Outline



The parameters of EEG representationsr EEG channels

l+1dimensions 1

)(...

)(...

)(...

)(

)(1

1

)(lr

ltx

tx

ltx

tx

tx

r

r

l: the numbers of lags

W=[w1,w2,…wf…wf+m…] → choose m basic vectors

f is first chosen basic vectorTime-embedded features

s is window size

window

Cross-Validation Training Procedure The training process

The training trials are randomly partitioned into 80% for constructing classifier and 20% for evaluating

This partition and evaluation process is repeated five times

The set of parameters getting best validation performances are chosen

The testing process Use the learned parameters to apply to test trials

Experiment 1: Five-task dataset The subjects perform five mental tasks: (1)

resting task, (2) mental letter writing, (3) mental multiplication of two multi-digit number, (4) visual counting, and (5) visual rotation

Each task is repeated five trials 6 electrodes are used: C3, C4, P3, P4, O1, O2,

record each trial 10s/250 Hz

Learning parameters

Confusion matrix for short-time PCA representation averaged over test trials

Visualize the classification results Given a set of samples X={x1,x2,…,xn} that belong to k class and

have dimension D>3. How to visualize X ? For each class, apply K-means clustering to find N cluster points

(center of cluster). We have a total K x N points Using Multidimensional Scale (MDS) to map points in D dimension

to d ≤3 (Preserve distance between points)

Visualize the classification results

Three-task dataset

Three subjects perform 3 tasks: imagine left hand movement, imagine right hand movement, and generate words

The subjects perform given task 15s, then switch to another task at the operator’s request

There are three training dataset and one test set of data

EEG signal are recorded at 512 Hz using 32 electrodes

Short-time PCA procedure for three-task dataset Bandpass-filtered data 8-30 Hz Down-sampled to 128 Hz The best parameters from the learning process are

given

Subject Number of lags

First vector

Number of

vectors1 2 1 5

2 2 1 4

3 3 1 5

The other methods

S. Sun et al. remove 7 electrodes, bandpass filtered 8-13 Hz (subjects 1-2),11-15 Hz (subject 3) . Multi-class CSP for extract features and SVM for classification

Schlögl et al. downsampled to 128 Hz, extract all bipolar channels (496) +32 monopolar channels. Each channels extract the features: AR (order=3), Bandpower in α and β bands. LDA are used as classifier

The other methods (cont.)

Arbabi et al. downsampled to 128 Hz, filtered to 0.5-45 Hz. Using some statistical features and Bayesian classifier.

Salehi use all raw data, features: PSD and some statistical time domain features (not mentioned). Bayesian classifier

Comparison results

Visualize the classification results

Outline



Improve the classifier performance by smoothing Many incorrect classification appear as single

samples

If n continuous samples have same class, we can decide the majority class

With smoothing, the accuracy is improved from 78.7% to 82.7% (five-task dataset)

Analyze the parameters of EEG representation Number of lags = 2-3, window size 125, the first basic vector should

be early in order, number of basic = 20, subtract mean has minor effects

Analyze the importance of electrodes The weights of the discriminant functions are summarized

corresponding to each electrode The variances of the weights grouped by this way are plotted

The parietal electrodes are most important for mental task discrimination

Conclusion

This chapter describes a new approach of extracting features from EEG signals by using short-time PCA

For five-mental dataset, combining short-time PCA with simple classifier LDA, this approach achieves 80% accuracy

On three-task dataset, this approach places second among five compared methods

Some analysis about the parameters of the system and the roles of electrodes are also given

Chapter 15: Classification of Time-Embedded EEG Using Short-Time Principal Component Analysis

Documents

Transcript of Chapter 15: Classification of Time-Embedded EEG Using Short-Time Principal Component Analysis