Towards AutomatedRecognitionof Human Emotionsusing EEG · Towards Automated Recognition of Human...

Towards Automated Recognition of Human Emotions using

EEG

by

Haiyan Xu

A thesis submitted in conformity with the requirementsfor the degree of Master of Applied Science

Graduate Department of Electrical and Computer EngineeringUniversity of Toronto

Copyright c© 2012 by Haiyan Xu

Abstract

Towards Automated Recognition of Human Emotions using EEG

Haiyan Xu

Master of Applied Science

Graduate Department of Electrical and Computer Engineering

University of Toronto

2012

Emotional states greatly influence many areas in our daily lives, such as: learning, deci-

sion making and interaction with others. Therefore, the ability to detect and recognize

one’s emotional states is essential in intelligence Human Machine Interaction (HMI). In

this thesis, a pattern classification framework was developed to sense and communicate

emotion changes expressed by the Central Nervous System (CNS) through the use of

EEG signals. More specifically, an EEG-based subject-dependent affect recognition sys-

tem was developed to quantitatively measure and categorize three affect states: Positively

excited, neutral and negatively excited. Several existing feature extraction algorithms

and classifiers were researched, analyzed and evaluated through a series of classification

simulations using a publicly available emotion-based EEG database. Simulation results

were presented followed by an interpretation discussion.

The findings in this thesis can be useful for the design of affect sensitive applications

such as augmented means of communication for severely disabled people that cannot

directly express their emotions. Furthermore, we have shown that with significantly

reduced number of channels, classification rates maintained a level that is feasible for

emotion recognition. Thus current HMI paradigms to integrate consumer electronics

such as smart hand-held devices with commercially available EEG headsets is promising

and will significantly broaden the application cases.

ii

Contents

1 Introduction 1

1.1 Emotional Intelligence and Human Machine Interaction (HMI) . . . . . . 2

1.2 Affective Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.3 Affect-sensitive Applications . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.3.1 Human-Machine Interaction (HMI) . . . . . . . . . . . . . . . . . 6

1.3.2 Social Signal Processing . . . . . . . . . . . . . . . . . . . . . . . 6

1.3.3 Health and Rehabilitation Applications . . . . . . . . . . . . . . . 7

1.4 Technical Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.5 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

1.6 Thesis Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2 State of the Art in Affect Recognition 14

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.2 Modeling Affect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.2.1 Discrete Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.2.2 Dimensional Model . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.2.3 Hybrid Discrete-Dimensional Model of Affect . . . . . . . . . . . . 17

2.3 Affect Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.3.1 Facial Expressions and Affect . . . . . . . . . . . . . . . . . . . . 19

2.3.2 Audio Analysis and Affect . . . . . . . . . . . . . . . . . . . . . . 20

iii

2.3.3 Physiological Expressions of Affect . . . . . . . . . . . . . . . . . 20

2.3.4 Affect Expression Through Peripheral Nervous System (PNS) . . 22

2.3.5 Affective Expression Through Central Nervous System (CNS) . . 23

2.4 Emotion Elicitation Protocols . . . . . . . . . . . . . . . . . . . . . . . . 23

2.4.1 International Affect Pictures System (IAPS) . . . . . . . . . . . . 24

2.5 Multimodality Affect Detection . . . . . . . . . . . . . . . . . . . . . . . 26

2.5.1 Fusion at the Feature Level . . . . . . . . . . . . . . . . . . . . . 27

2.5.2 Fusion at the Decision Level . . . . . . . . . . . . . . . . . . . . . 28

2.6 State of the Art Emotion Recognition Performances . . . . . . . . . . . . 29

2.7 Ethical and Privacy Concerns on Physiological Signal Collection . . . . . 33

2.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3 EEG Signal Characteristics and Preprocessing Methods 35

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.2 EEG Signal Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . 36

3.2.1 Electrode Placement for EEG Recording Devices . . . . . . . . . 38

3.2.2 Measuring Emotion Using EEG . . . . . . . . . . . . . . . . . . . 39

3.2.3 Spectral Characteristics of EEG . . . . . . . . . . . . . . . . . . . 41

3.3 Preprocessing of the EEG Recordings . . . . . . . . . . . . . . . . . . . . 42

3.3.1 EEG Referencing . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

3.3.2 Rejecting Artifacts Based on Channel Statistics . . . . . . . . . . 43

3.3.3 Filter Data Using Fast Fourier Transform (FFT) . . . . . . . . . . 44

3.3.4 Independent Component Analysis . . . . . . . . . . . . . . . . . . 45

3.3.5 Wavelet Decomposition for Denoising . . . . . . . . . . . . . . . . 46

3.4 Ground Truth Definition and Validation . . . . . . . . . . . . . . . . . . 48

3.4.1 Pearson Correlation Coefficients for Ground Truth Validation . . 49

3.4.2 Confusion Matrix for Ground Truth Validation . . . . . . . . . . 50

3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

iv

4 Methods for Emotion Assessments using EEG 53

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

4.2 Previous Feature Extraction Methods . . . . . . . . . . . . . . . . . . . . 54

4.2.1 Time Domain Analysis . . . . . . . . . . . . . . . . . . . . . . . . 55

4.2.1.1 Statistical-based Features . . . . . . . . . . . . . . . . . 55

4.2.1.2 Higher Order Crossings . . . . . . . . . . . . . . . . . . 56

4.2.2 Spectral Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

4.2.2.1 Event Related Potential and Spectrogram . . . . . . . . 57

4.2.3 Time-Spectral Analysis . . . . . . . . . . . . . . . . . . . . . . . . 59

4.2.3.1 Wavelet-based Features . . . . . . . . . . . . . . . . . . 59

4.2.4 Channel selection: single channel vs. multiple channels . . . . . . 61

4.3 Classification Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

4.3.1 Linear Discriminate Analysis . . . . . . . . . . . . . . . . . . . . . 64

4.3.2 K Nearest Neighbours . . . . . . . . . . . . . . . . . . . . . . . . 65

4.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

5 Empirical Mode Decomposition for Emotion Classification 68

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

5.2 Empirical Mode Decomposition (EMD) . . . . . . . . . . . . . . . . . . . 69

5.2.1 Decomposition Procedure . . . . . . . . . . . . . . . . . . . . . . 71

5.2.2 Hilbert-Huang Spectrum . . . . . . . . . . . . . . . . . . . . . . . 72

5.2.3 Multivariate EMD . . . . . . . . . . . . . . . . . . . . . . . . . . 75

5.3 Signal Reconstruction using MEMD as a Filter Bank . . . . . . . . . . . 77

5.3.1 MEMD for Feature Extraction . . . . . . . . . . . . . . . . . . . . 80

5.4 Genetic Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

5.4.1 Fitness Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

v

6 Experimental Setup and Simulation Results 86

6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

6.2 Data Collection Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

6.2.1 Recording Device: Biosemi Active 2 . . . . . . . . . . . . . . . . . 91

6.2.2 Ground Truth Definition . . . . . . . . . . . . . . . . . . . . . . . 92

6.2.3 Ground Truth Validation Using Pearson Correlation Coefficients . 93

6.2.4 Ground Truth Validation Using Confusion Matrix . . . . . . . . . 94

6.3 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

6.4 Data Splitting and K Cross Validation . . . . . . . . . . . . . . . . . . . 96

6.4.1 k-Fold Cross Validation . . . . . . . . . . . . . . . . . . . . . . . . 98

6.5 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

6.5.1 Simulation Results Using All Channels . . . . . . . . . . . . . . . 99

6.5.1.1 Subject-Specific Emotion Recognition . . . . . . . . . . 100

6.5.1.2 Cross-Subject Emotion Recognition . . . . . . . . . . . . 101

6.5.2 Simulation Results with Channel Reduction . . . . . . . . . . . . 101

6.5.2.1 Channel Reduction in Reference to Commercial Devices 101

6.5.2.2 Channel Reduction Using Genetic Algorithm . . . . . . 104

6.6 Sensitivity Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

6.6.1 The Effect Of Sampling Rate On System Performance . . . . . . . 108

6.6.2 Parameters For Setting Window Size (epoch) . . . . . . . . . . . 110

6.6.3 Parameters for Wavelet Feature Evaluation . . . . . . . . . . . . . 111

6.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

7 Conclusions and Future Works 114

7.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

7.1.1 Key Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . 115

7.2 Future Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

vi

7.2.1 Directions for Future Study On Utilizing EEG Signals For Affect

Detection Applications . . . . . . . . . . . . . . . . . . . . . . . . 116

A Neighbouring Electrodes for Local Laplacian Filter 118

B List of IAPS Images Used for the Experiment 120

C Confusion Matrix Ground Truth Validation of the Database 124

D Empirical Mode Decomposition (EMD) Algorithm 127

Bibliography 129

vii

List of Tables

2.1 State of the Art Emotion Recognition Performances . . . . . . . . . . . . 30

2.2 State of the Art Emotion Recognition Performances using EEG . . . . . 31

3.1 Subbands of the EEG Signals . . . . . . . . . . . . . . . . . . . . . . . . 41

3.2 Wavelet Decomposition of EEG signals into various frequency bands (fs =

1024) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

3.3 Sample Self-Assessment Values vs. Values provided with chose stimuli

(e.g., images from IAPS) . . . . . . . . . . . . . . . . . . . . . . . . . . 50

3.4 Confusion Matrix Components . . . . . . . . . . . . . . . . . . . . . . . . 51

4.1 EEG channels selected for analysis. . . . . . . . . . . . . . . . . . . . . . 63

5.1 Method Comparison between Fourier, Wavelet and Hilbert-Huang Trans-

form in Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

6.1 Pearson correlation coefficient between IAPS scores and self assessments

per participant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

6.2 Averaged Self-Assessment Classification Accuracy (in Percentage) for the

Three Chosen Emotions . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

6.3 The features extracted from the EEG signals . . . . . . . . . . . . . . . . 96

6.4 Overview of the dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

6.5 Emotion Recognition rates using ALL 54 electrodes and 5NN . . . . . . . 100

viii

6.6 Cross-Subject Emotion Recognition rates using ALL 54 electrodes . . . . 101

6.7 Device Specifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

6.8 Subject Specific Recognition rates using 8 electrodes and 5NN . . . . . . 103

6.9 Cross-subject Recognition rate using only 8 electrodes . . . . . . . . . . . 103

6.10 Channels selected using GA algorithm . . . . . . . . . . . . . . . . . . . 107

6.11 Emotion Recognition rates using electrodes selected using GA . . . . . . 107

6.12 Recognition rates of emotion using channels selected by GA . . . . . . . 108

6.13 Cross subject emotion recognition rates using different wavelets for DWT 111

A.1 Associated neighbour electrodes for Local Laplacian filters [5] . . . . . . 119

B.1 List of IAPS Images Used for Session 1 . . . . . . . . . . . . . . . . . . . 121

B.2 List of IAPS Images Used for the Session 2 . . . . . . . . . . . . . . . . . 122

B.3 List of IAPS Images Used for the Session 3 . . . . . . . . . . . . . . . . . 123

C.1 Participant 1: Self-Assessment Classification Accuracy (in Percentage) for

the Three Chosen Emotions . . . . . . . . . . . . . . . . . . . . . . . . . 124

C.2 Participant 2: Self-Assessment Classification Accuracy (in Percentage) of


C.3 Participant 3:Self-Assessment Classification Accuracy (in Percentage) of






ix

List of Figures

1.1 Emotion and Core Affects. Figure adapted from [79] . . . . . . . . . . . 3

1.2 Mayer and Salovey’s (1997) Four-Branch Model of Emotional Intelligence 3

1.3 Affect Intelligent Human and Machine Interface (HMI) . . . . . . . . . . 5

1.4 Diagram of a typical machine learning problem . . . . . . . . . . . . . . 9

2.1 Example of basic emotions using facial expressions . . . . . . . . . . . . . 16

2.2 Circumplex Models of Emotion (Image adapted from [48]) . . . . . . . . 16

2.3 Hybrid Discrete-Dimensional Model of Affect . . . . . . . . . . . . . . . . 18

2.4 Affect Expression Modalities. Figure adapted from [79] . . . . . . . . . . 19

2.5 Image Samples for The International Affective Picture System (IAPS) . . 25

2.6 SAM-Scales for valence (top), arousal (bottom) . . . . . . . . . . . . . . 26

2.7 Multiple Modalities Model with Decision Level Fusion [45] . . . . . . . . 28

3.1 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3.2 Standardized EEG Recording (F3) . . . . . . . . . . . . . . . . . . . . . 37

3.3 EEG Signal Contaminated with Eye-blinking . . . . . . . . . . . . . . . 37

3.4 Spectrum Characteristics of EEG . . . . . . . . . . . . . . . . . . . . . . 38

3.5 Electrodes placements of different EEG recording systems . . . . . . . . 39

3.6 Cross section of the human brain . . . . . . . . . . . . . . . . . . . . . . 40

3.7 Sample EEG Amplitude with Normal Fitting . . . . . . . . . . . . . . . 44

x

3.8 Four sample mother wavelet functions used for DWT decomposition of

EEG signals: (a) Mexican hat wavelet, (b) Daubechies order 8 wavelet

(db8), (c) biorthogonal wavelet order 1.3 (bior1.3), and (d) biorthogonal

wavelet order 1.5 (bior1.5). . . . . . . . . . . . . . . . . . . . . . . . . . . 47

3.9 EEG denoising using Discrete Wavelet Transform (DWT) . . . . . . . . . 48

4.1 System components in the training stage . . . . . . . . . . . . . . . . . . 54

4.2 Power within sub bands for F3, F4 for Negative Positive and Calm states 59

4.3 Discrete Wavelet Decomposition using db4 Wavelet . . . . . . . . . . . . 60

4.4 Emotive Epoc Neuroheadset . . . . . . . . . . . . . . . . . . . . . . . . 62

4.5 Examples of LDA Classifier . . . . . . . . . . . . . . . . . . . . . . . . . 65

4.6 Examples of KNN Classifier . . . . . . . . . . . . . . . . . . . . . . . . . 66

5.1 Sample EMD Decomposition for Participant 2 Session 3 Negatively Excited 73

5.2 Sample EMD Decomposition for Participant 2 Session 3 Positively Excited 73

5.3 Hilbert Huang Spectrum for Instantaneous Frequency between 0− 70Hz 75

5.4 The filter bank property of Regular EMD . . . . . . . . . . . . . . . . . . 78

5.5 The filter bank property of MEMD . . . . . . . . . . . . . . . . . . . . . 78

5.6 Instantaneous Amplitude and Averaged Frequency of the IMFs . . . . . . 79

5.7 MEMD for Signal Reconstruction and Feature Analysis . . . . . . . . . . 81

5.8 The block diagram for Genetic Algorithm . . . . . . . . . . . . . . . . . . 82

6.1 Experimental Components used to for Simulation . . . . . . . . . . . . . 87

6.2 The Three Emotion Classes Studied in This Project . . . . . . . . . . . . 89

6.3 Protocol description for eNTERFACE06-EMOBRAIN database . . . . . 90

6.4 Selected IAPS images for the 3 classes emotion elicitation experiment . . 91

6.5 Biosemi Active Two . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

6.6 HOC Order vs. Correct Recognition Rate (54 Channels) . . . . . . . . . 95

6.7 HOC Order vs. Correct Recognition Rate (6 Channels) . . . . . . . . . . 96

xi

6.8 Recognition Rate using HOC features for K = 1,3,5,7,9 . . . . . . . . . . 101

6.9 6 Channels referenced to Emotive Epoch . . . . . . . . . . . . . . . . . . 103

6.10 Averaged and Maximum Fitness (correct recognition rate) in each gener-

ation using LDA and GA . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

6.11 Channels selected through Genetic Algorithm . . . . . . . . . . . . . . . 106

6.12 Channel of significance obtained through Genetic Algorithm . . . . . . . 106

6.13 Sampling Rate vs. Correct Recognition Rate using All Electrodes and LDA109

6.14 Sampling Rate vs. Correct Recognition Rate using All Electrodes and kNN109

xii

List of Acronyms

Acronym Description

AC Affective Computing

ANOVA ANalysis Of Variance

ANS Autonomic Nervous System

BVP Blood Volume Pulse

CNS Central Nervous System

ECG Electrocardiogram

EEG Electroencephalogram

EI Emotional Intelligence

EMD Empirical Mode Decomposition

EMG Electromyogram

ERP Event Related Potential

FFT Fast Fourier Transform

GSR Galvanic Skin Response

HMI Human Machine Interaction

HOC Higer Order Crossings

IAPS International Affective Picture Systems

ICA Independent Component Analysis

IF Instantaneous Frequency

IMF Intrisic Mode Function

KNN K-Nearest Neighbors

LDA Linear Discriminate Analysis

MEMD Multi-variate Empirical Mode Decomposition

RSP respiration

SFFS Sequential Floating Forward Selection

xiii

STFT Short Term Fourier Transform

SVM Support Vector Machine

fMRI functional Magnetic Resonance Imagery

QDA Quadratic Discriminate Analysis

xiv

List of Symbols

Symbol Description

Xi, i = 1, · · · , N Raw N-sample EEG signal

T Duriation of a trial

µx Mean if signal x

σx Standard deviation of signal x

δx The mean of the absolute values of the first differ-

ences of signal x

Ne Number of Electrodes

f js,i = [µx, σx, δx, δx, γx, γx] Statistical feature vector for sample i and and elec-

trode j

ENTl Wavelet entropy at lth scale

ENGl Wavelet engergy at lth scale

f jw,i = [ENGj

l , ENT jl ] Wavelet-based feature vector for sample i and and

electrode j

CX(l, n) Wavelet coefficients at lth scale for signal x

Li, i = 1, · · · , N Self-assessment scores on the projected stimuli

Ki, i = 1, · · · , N Scores provided with the projectd stimuli

xv

Chapter 1

Introduction

Emotional states greatly influence many areas in our daily lives, such as: learning, deci-

sion making and interaction with others. Our decision and course of actions are adapted

to the emotion cues we received throughout this process. This makes the exchange of

information much more effective and smooth. Emotion is an import part of the human-

human communication. Emotion provides important cues in disambiguating the message

we are sending.

With technology advancements, a daily life is entangled with interactions between

humans and human with machines. Human is very efficient in decoding the emotional

cues presented during the human-to-human interaction, and provides adjusted, social

context-appropriate responses, which makes such interaction efficient and smooth.

However, in the case of human-to-machine interactions, because the machine’s re-

sponse or interface is often predefined, rigid and unconditional, users are often left un-

satisfied and frustrated. This work aims to improve a computer’s ability to correctly

recognize human emotional states.

1

1.1 Emotional Intelligence and Human Machine Interaction (HMI)

1.1 Emotional Intelligence and Human Machine In-

teraction (HMI)

The term emotion when used in non-scientific environment typically refers to the proto-

typed emotions, the clearest cases of emotion, such as: anger, disgust, fear, happiness,

sadness, and surprise. However, strictly speaking as stated in cognitive theory [79] and

shown in Figure 1.1, emotion is a unique, personal expression that differs under social

context, culture background and personal experience. Therefore, for empirical emotion

recognition applications, the concept of emotion lacks necessary and sufficient features.

On the other hand, as an alternate and more practical interpretation of emotion, the con-

cept of core affect is introduced as the neurophysiological states measured as the simple

raw feelings evident in moods and emotions [80, 81]. According to this representation, at

any given time, one’s emotional state is represented as a point on the core affect model

(more details provided in Section 2.2). In the other words, affect or core affect is a

measure of the intensity if emotion in the physiological perspective.

Emotional intelligence [19] is a widely accepted concept and to incorporate this in-

formation in the human-machine interactions will make such interactions more intuitive,

flexible and efficient. As stated in [19], emotional intelligence can be encapsulated in

to four branches as shown in Figure 1.2. It provides a summarized guideline on how

emotions are perceived and reflected on our cognitive and thinking process.

Without emotional cues, our interaction with machines can be frustrating and counter-

productive. Here is a simple user case as an example, a user needs to contact customer

service through the phone after purchasing a product and was connected to the IVR

(Interactive Voice Response system), or so called virtual agent. In many cases, there

is no other way to get through the calling queue unless you pass through the virtual

agent, but after a few tries and the ’agent’ still can’t figure out what the user is saying.

The rest of the scenario has been shown with numerous examples online, smashing the

2

1.1 Emotional Intelligence and Human Machine Interaction (HMI)

Core Affect

Perception of

Affective Quality

Attribution to Object

Appraisal

Action

Emotional Meta-

Experirience

Emotion Regulation

Prototype of a Specific

Emotion

Figure 1.1: Emotion and Core Affects. Figure adapted from [79]

The implications of emotion,

from their feelings to their

meanings, are considered

1. Emotional

Perception

EMOTIONAL

INTELLIGENCE

2. Emotional

Integration

3. Emotional

Understanding

4. Emotional

Manageement

Emotions are

perceived and

expressed

Emotions are sensed,

and begin automatic

influences on cognitionEmotions enter the cognitive

system as noticed signals and

as influences on congition

Emotions and emotion-related

information is attended to

Emotional signals about

relationships are

understood, along with

their interactive and

temporal implications

Thoughts promote emotional,

intellectual, and personal

growth

Management encourages

openness to feeings

Figure 1.2: Mayer and Salovey’s (1997) Four-Branch Model of Emotional Intelligence

3

figures/CoreAffect.eps

figures/emotional_Intelligence.eps

1.2 Affective Computing

phone, yelling at his/her PC are typical responses. As we can see, with the increas-

ingly use of machines, it is desirable to narrow the communication gap between human

and machines to resemble the human-to-human communication. Emotionally-intelligent

Human-Machine Interface (HMI) refers to the scenario that machines, such as a per-

sonal computer, can detect, recognize and respond to the user’s emotional states. Such

”human-centered-computing” is also referred to as Affective Computing (AC).


Affect computing has been an active research topic for the past two decades and has

shown a strong growth in the past few years. It aims to narrow the communication

gap between humans and machines. With the advancements of the human computer

interface, there is an inevitable need for machines to understand and react to the affect

state of the user. Even though the definition of affect itself is a topic of debate within the

psychological literature, it is globally accepted that affects such as moods and emotional

states significantly influence the outcomes of people’s daily activities in learning and

decision making. In the Affect Computing literature, there are mainly three types of affect

computing applications, affect detection, affect mimicking and actual affect recognition

as humans do [48]. With the enhancement of emotion recognition capability, human-

computer interaction will be much more efficient and enjoyable.

Affect detection is a critical step towards affects recognition, and is the first step

towards affect-sensitive applications. A machine will not be able to ’respond’ to the

user’s affects without accurately detecting it first. In the field of affect detection, many

approaches have been established in the past twenty years utilizing audio, visual (facial

expressions), body movement analysis, and physiological signals (peripheral) and more

recently, autonomic signals such as ECG. However, comparing to other pattern recogni-

tion and machine learning problems, the results are disappointing. While AC recognition

4


User Interface

Cloud

Sensors

Emotional Cues User Interface

1

2

3

Emotion Adaptation

Priorities

InterfaceUser

Emotion

AssessmentUser

Remote

Monitoring

User Interface

Service

Figure 1.3: Affect Intelligent Human and Machine Interface (HMI)

rates of 60−80% [14] are common, in most other recognition problems, recognition rates

of > 90% and often greater than 95% are reported. The reason for a lower than other

pattern classification performance is due to the complex, multiple-mapping between the

acquired signal and the actual affect state present. To fully understand and classify affect

is next to impossible without all the key factors such as the social context, the user’s

culture background and education. Affect, such as emotion, is also very spontaneous and

a not very well defined psychophysiological process. Therefore, in the literature of affect

detection, only a well approximate solution is possible.

There are still more challenging problems to be solved besides the improvement of

affect detection accuracy. Currently, most of the research results are produced under

controlled environments. In real life, such data collection might be prohibiting, such as

the availability of frontal face images or high quality audio signals under a noisy envi-

ronment. Recently, many research results have been presented showing the specificity

5

figures/HMIinterface.eps

1.3 Affect-sensitive Applications

between physiological signals and prototyped emotions (or discrete emotions). Physiolog-

ical signals, such as Electrocardiogram (ECG), Galvanic Skin Response (GSR), breath-

ing, and Electroencephalogram (EEG), can be captured continuously with non-intrusive

means, and usually less affected by the external noise source. Therefore, the goal of

this thesis is to investigate the feasibility of physiological signals, Electroencephalogram

(EEG) in particular, to improve the affective computing methods in the long term goal

of developing affect-sensitive HMI.


Affect-sensitive applications are being developed in fields such as gaming, health-care

and learning technology.

1.3.1 Human-Machine Interaction (HMI)

During the learning process, learners experience various emotions such as satisfaction,

happiness or frustration and sadness. The emotional states of a learner can significantly

affect the outcome of the learning process [44, 52]. If a computer interface can recognize

and adapt to such emotional changes as a class teacher can do by changing the material

presented or the way it was presented, it will positively impact the learning gain and

improve the overall learning experience.

1.3.2 Social Signal Processing

Affect-sensitive applications are being developed in the field of social signal processing

for information indexing and retrieval according to their associated affect state. With

the use of surveillance camera or voice recorder, emotional cues are collected and used for

storing or retrieving multimedia contents such as songs, videos and images. This (often)

real-time feedback system can increase the dynamics of the interface and enhance the

6


user-centered computing experience.

1.3.3 Health and Rehabilitation Applications

Affect-sensitive applications are also very useful in behavior prediction and monitoring

[74, 22]. The autism spectrum disorders are neuro-developmental disorders that are typ-

ically shown by a combination of a lack of social interactions or communication, and

repetitive patterns of violent behaviors or self-destructive activities. Autistic people usu-

ally experience frequent mood swings and a high level of anxiety in social interactions

due to their inability to cooperate and express their feelings or to be understood. Studies

have shown that even though autistic people seem to be calm before they erupt to prob-

lematic behaviors, or self-injury, there have been dramatic changes in the physiological

signals. Most of these changes are a result of frustration and anger caused by misunder-

standings between the communicators. Affect-sensitive applications can provide critical

information in sensing such mood and physiological changes and essentially decrease the

occurrence of such misunderstandings and narrow the emotional communication gaps.

From the health care perspective, an appropriate assessment of the patient’s emotional

state can be a key indicator of the mental and physical health status, but the power of

emotions themselves over the recovery process has also been documented [10]. In a tele-

health scenario, clinicians could greatly increase the quality of their services if a system

can accurately assess patient’s emotional states which are not directly accessible due to

the non-physical presence of the patients. Collection of physiological signals and mapping

of the these to emotional states can synthesize the patient’s affective information for the

health-care provider.

7

1.4 Technical Challenges


To design a system for emotion assessment, the first and also critical step is to define

the emotions that will be detected for the system. However, currently emotion is not a

well-defined concept and remains as an active research topic in the Psychology literature.

To define a specific emotion is not an easy task, for example. in the English literature,

there are over 3000 words to describe emotions or affects [95]. Besides the large variety

of emotion description, the perception and expression of each emotion also varies greatly

for an individual due to under different social settings. For example, a smile at a social

gathering could due to the true happy feeling of the subject, or a polite gesture or to

mask his or her true feelings. To detect the true feelings under such natural environment

is a very challenging research project. There are several models that have been proposed

towards the understanding of emotion and on the approaches to generalize the definition

of emotions across different culture and education background. Their feasibility is still a

topic of debate, however, more empirical studies have shown the effectiveness of models

such as circumplex model or the basic emotions model (see more details in Section 2.2).

Ground truth definition for such emotion detection system is another challenging

problem. Ground truth refers to the known labels for the input signals (samples), such

as to which emotional states each sample belongs, that will be used in designing the

machine learning system as shown in Figure 1.4. For a supervised machine learning

problem (labels are known to the system), the correct labeling of input samples is a

critical and important step [24]. A machine learning algorithm will try to find a pattern

that is most consistent amongst all observed samples of each class and tries to use this

pattern and decision rules to predict the class to which the new samples belongs. If the

data is mislabeled, the class-specific patterns amongst the observations become vague

and as a consequence, larger number of learning mistakes will occur and the learning

system might fail altogether. Most of the current research on affect detection has been

designed on databases with emotion elicited using methods such as, professional actors

8


Machine Learning

algorithm

Labeled training

samplesPrediction Rule

New sample

Predicted Label

(classification)

Figure 1.4: Diagram of a typical machine learning problem

posing (facial), self-induced, or observing images or audio clips that are believed to evoke

the desired emotion. However, little work has been shown on how well these data truly

represents the targeted affect states. As a result, there is no way to detect ’bad’ samples

other than basic signal quality check on the amount of noise or artefact contamination.

In other words, there is no way to define or reject mislabeled data. This can pose a

significant challenge in developing a data model (often statistical) for detection problems

[24].

Several physiological signals, such as ECG, GSR and BVP have been actively stud-

ied as suitable means for recognizing affects through the use of appropriate pattern

recognition techniques; promising results have been reported lately [14]. However, these

physiological signals can also produce similar changes by factors that are not related to

emotions. For example, increased physical movements can increase the instantaneous

heart rate, higher ambient temperature can increase the skin conductance without the

presence of excitement or fear. Also, to collect this large variety of signals required by

such sensor networks, the system could potentially be infeasible for practical, long-term

continuous monitoring (i.e. days). For example, a subject is required to be stationary

with one of the fingers to be still in order to collect peripheral signals such as Galvanic

9

figures/label_learning.eps

1.5 Contributions

Skin Response (GSR); this would interfere with our use of hands and is highly imprac-

tical. Therefore, alternate methods in less intrusive manners are needed for a practical

system to be used in the natural settings for continuous caption of sensorial signals. The

response time which is time required for an emotional cue to be present on the selected

physiological signal is another application concern. For example, a signal originating

from the Central Nervous System will vary in the order of milliseconds (ms) when the

affect states vary, but it will take much longer to shown through peripheral signal such

as GSR.

Recent studies [58, 72, 66] have shown that signals generated from the Central Nervous

System (CNS) (i.e., EEG) can be an alternate and potentially more powerful means for

affect detection analysis since they are less affected by a change in physical activity.

In particular, studies using fMRI [83] has shown that the presence of affect states is

correlated with the frontal cortex synchronization, asymmetry between the left and right

frontal lobes, which can be used as a base for automatic affect states detection. Therefore,

it was the objective of my study to examine the feasibility and characterization of EEG

signals for emotion detection.

1.5 Contributions

For this project, my objective was to explore the universal characteristics of human

emotions using EEG signals; seeking a new method that can sense and communicate

autonomic nervous system (ANS) arousal in daily life. The focus of this research was on

the development of a signal processing framework for affect detection using brainwaves.

Specifically, this research initiative was to develop an EEG-based non-subject-specific

affect detection system that can quantify the emotional states of a person. Several

existing feature extraction algorithms were researched and analyzed. Their effectiveness

was evaluated through a series of classification simulations with the use of a publicly

10

1.5 Contributions

available emotion-specific EEG database. Simulation results on the accuracy of correctly

predicting new samples were presented and discussed. Several classification methods were

selected based on the set of features they were to be used with. These classifiers were

discussed in detail and evaluated through classification simulations.

The contributions of this thesis are:

1. Identification of key points in computerized emotion assessment: emotion elici-

tation, ground truth validation and experimental protocol setup. Many emotion

elicitation methods have been used in the process of generating emotion-specific

samples, however, in many cases, there is an uncertainty and often debatable choice

of ground truth definition. In this thesis, two methods that can be used to validate

the labeling of ground truth for collected samples have been researched, discussed

and analyzed.

2. Investigation of the use of a novel time-frequency analysis method, Empirical Mode

Decomposition, on EEG analysis and emotion classification. Empirical Mode De-

composition (EMD) has been used used in emotion recognition study using EEG

and shown promising results. However there are many unsolved challenges in terms

of the application of this method. Since this decomposition method is entirely data-

driven and the decomposition outputs depend on the time domain local character-

istics of the signal, the number of decomposition levels varies between channels and

recording scenario (trials and sessions). As a result, to define a common feature

space is very challenging or impossible. To resolve this problem, an expanded ver-

sion of EMD, Multi-variate EMD (MEMD) is researched, analyzed, and evaluated

through simulations using EEG signals.

3. Introduction and presentation of a framework for EEG emotion classification, var-

ious state-of-art feature extraction (statistical, spectral, HOC, wavelet) and classi-

fication algorithms (LDA [24, 9], KNN[17]) on emotion analysis using EEG signal

11

1.6 Thesis Overview

were investigated and implemented. Classification performance and application

limitations were presented through simulation results on a publicly available EEG-

Emotion (3 Classes) dataset. The process of applying MEMD algorithm on all the

channels produces a large number of Intrinsic Mode Functions (IMFs), time-varying

frequency components of an input EEG recording. Concatenation of all features

directly extracted from each IMF will significantly increase the dimension of the

feature space with much redundancy in the information presented. As a result of

the high dimension of feature space, a very large number of samples are required

to produce a meaningful statistical model [23]. Generic Algorithm (GA) is applied

to reduce the number of channels and the number of IMFs for feature extraction

analysis. The final results provide two key pieces of information in terms of under-

standing emotion and brain waves, one is the location of the most emotion-specific

channels (discriminating power) and the frequency range of the emotion-specific

brain waves (Instantaneous Frequency analysis of the IMFs). This will provide a

mean to compare and validate the research results from the Psychophysiological

literature, and aid the study and understanding of human emotion. Practical con-

straints such as the minimum required window length in time domain and the edge

effects of windowing operation in time domain were discussed.

1.6 Thesis Overview

The remainder of this thesis is organized as follows.

• Chapter 2 introduces the notion of affect as a mean for emotion assessment affect

from the psycho-physiology perceptively. Three models for affect representation

are presented. Various affect assessment modalities and the fusion concept for

multiple modality analysis are discussed as well. This chapter ends with a review

of state-of-the-art classification performance.

12

1.6 Thesis Overview

• Chapter 3 describes the use of physiological signals on affect detection. It begins

with a brief review on the correlation between various physiological signals and

affect states. Then topics on EEG signal acquisition and preprocessing approaches

are listed and discussed in detail.

• Chapter 4 starts with a system-level overview on the proposed automated affective

detection process, followed by an in-depth review on related feature extraction

methods and classifiers. State-of-art feature extraction algorithms were analyzed

in terms of the theoretical aspect and the practical implementation constraints.

• Chapter 5 first presents a novel feature extraction algorithm utilizing Multi-variate

Empirical Mode Decomposition (MEMD). Secondly, genetic algorithm, as a feature

dimension reduction algorithm, is introduced and implemented for this research

work. Regarding the application of MEMD, several implementation factors such

as computational complexity, edge effect due to time-domain windowing operation

are also evaluated.

• Chapter 6 provides the experimental protocol for a publicly available database.

This dataset is used to evaluate and compare the classification performance of

various selected feature extraction algorithms. Simulation results using the new

and existing feature extraction algorithms are presented and analyzed.

• Chapter 7 concludes this thesis by summarizing the contributions of this thesis and

provides possible future research directions.

13

Chapter 2

State of the Art in Affect

Recognition

2.1 Introduction

Emotion expression is a result of complex interactions of the biological nature with the

surrounding environment based on observation, personal experience, and self-regulation.

However, some of these factors are spontaneous (self-regulation) and vary greatly from

person to person (personal experience). Due to this complication, to model such a multi-

facet process is a very challenging research problem and has long believed to be impossi-

ble. In this chapter, we are going to discuss a few basic and also very important concepts

involved in the realm of affective computing. The topics discussed here are the emotional

models that could be used to define human emotions, the emotion elicitation methods

involved in collecting experimental data and a brief review on the modalities that have

been used for affect recognition. Towards the end of the chapter, a list of state-of-the-art

approaches and recognition performances are listed and discussed.

14

2.2 Modeling Affect

2.2 Modeling Affect

In the Psychology literature, there are mainly three models that have been used to

represent the actual emotional state of a person. The main difference between the three

models is the number of emotions that are represented within the model. For example, the

discreet model 2.2.1 is associated mainly with the six basic or prototypical emotions that

are believed to be universal among the human populations, which is a much smaller set

than the emotions presented in the circumplex model 2.2.2. The choice of emotion models

also depends on the intended application. The discrete model of emotion has widely

been used for emotion recognition applications utilizing facial images, which is believed

to be universal among a large population across gender and ethnicities [25, 27]. The

circumplex model is adopted in most emotion analysis methods involving physiological

signals [14, 76].

2.2.1 Discrete Model

The discrete model of emotion was first introduced by Ekman [25] in 1971, who analyzed

the correlation between human emotions and facial expressions among subjects with

different culture backgrounds and declared the existence of a universal set of ’primary’

emotions (e.g. fear, anger or disgust). The six basic or universal emotions are joy,

sadness, surprise, anger, love, fear, also referred to as the prototypical set of emotions.

The term basic, used interchangeably with ’universal’, indicates that these basic emotions

are expressed universally across different culture background and gender etc. According

to the basic emotion model, at any time instance, if a person is experiencing an emotion,

he or she should be able to choose one of the emotions out of the six basic emotions

presented in this model to best approximate his or her true feelings. In other words, the

discrete model acts like a complete set for the description of emotions.

15

2.2 Modeling Affect

Figure 2.1: Example of basic emotions using facial expressions

2.2.2 Dimensional Model

The circumplex model of emotion was proposed by Russell [79] in 1999 and was devel-

oped based on Cognitive Theory, where emotions are represented on a 2D plane (see

Figure 2.2), with one dimension of judged valence (i.e., pleasure/positive or displea-

sure/negative) and the other of arousal. Valence stands for one’s judgment about a

situation as positive or negative and arousal spans from calmness to excitement, express-

ing the degree of one’s excitation.

ACTIVATION

DEACTIVATION

PLEASANTUNPLEASANT

tense

nervous

stressed

upset

sad

depressed

lethargic

Fatiguedcalm

relaxed

serene

contented

happy

elated

excited

alert

Figure 2.2: Circumplex Models of Emotion (Image adapted from [48])

In interpreting this 2-D structure, proponents of the circumplex model of affect sug-

gest that all affective states arise from two independent neurophysiological systems, the

16

figures/basic_emotions.eps

figures/CircumplexModel.eps

2.2 Modeling Affect

valence and arousal systems. Each and every affective experience is the consequence

of a linear combination of these two independent systems, which is then interpreted as

representing a particular emotion (see Figure 2.2). Fear, for example, is conceptualized

by circumplex theorists as a neurophysiological state typically involving the combination

of negative valence and heightened arousal in the CNS. The subjective experience of fear

arises out of cognitive interpretations of these patterns of physiological activity that oc-

cur in the context of eliciting stimuli. As emotions are experienced and communicated,

cognitive interpretations are employed to identify the neurophysiological changes in the

valence and arousal systems and conceptually organize these physiological changes in

relation to the eliciting stimuli, memories of prior experiences, behavioral responses, and

semantic knowledge [79].

Despite the differing descriptive labels applied to these dimensions (active/deactivation,

valence/arousal), the 2-D structure is found consistently across a large number of studies.

For instance, Osgood (1952) [69] validated this model by factor analysis. In compari-

son, this model is more suitable for emotion analysis using physiological signals than the

widely accepted Ekman’s model which was based on facial expression.

2.2.3 Hybrid Discrete-Dimensional Model of Affect

Although the two models are typically presented as mutually exclusive, for emotion

analysis using physiological signals, a hybrid model that places discrete emotions, with

respect to both ANS and self-report variables, in dimensional affective space has been

proved to be most effective and is consistent with self-assessments reports [87, 15].

This model provides us with a way to bridge many previous findings on prototypical

emotions and cognitive process (e.g., learning and decision making), but also ensure the

consistency between physiological expressions of emotion and emotion modeling that the

circumplex model provides.

In this study, we have adopted the use of the hybrid model and focused our work

17

2.3 Affect Expressions

Happiness

Neutral

Sadness

Disgust

Anger

Fear

Surprise

ACTIVATION

DEACTIVATION

PLEASANTUNPLEASANT

tense

nervous

stressed

upset

sad

depressed

lethargic

Fatiguedcalm

relaxed

serene

contented

happy

elated

excited

alert

Figure 2.3: Hybrid Discrete-Dimensional Model of Affect

on the recognition of three key affective states. Based on the hybrid model of affect,

we infer that high arousal can be interrelated as highly motivated, high valence means

the current situation is pleasant and approachable, whereas low valence is unpleasant,

avoidable. Hence, in the domain of learning, decision making and behavior monitoring,

we would like to know when the person is happy (’positively excited’), or frustrated

(’negatively excited’), or bored (’calm’).


Emotion can be expressed in a few ways, as shown in Figure 2.4. Many emotion recog-

nition approaches have been established using facial images, verbal, body movements,

peripheral physiological signals and more recently, autonomic signals. Each approach

has its own advantages and application limitations. A brief review will be listed in the

following sections.

18

figures/hybrid_model.eps


Event

e.g., Danger Emotion

Subjective Feeling

e.g., Afraid

Nonverbal Signal

e.g., face, voice

Automatic Pattern

Instrumental Action

e.g., flight

Figure 2.4: Affect Expression Modalities. Figure adapted from [79]

2.3.1 Facial Expressions and Affect

The face plays a significant role in human emotion perception and expression. The

association between face and affective arousal was confirmed by a series of systematic

studies in the field of psychology [25, 26]. Ekman et al. [27] has proposed the use of

the Facial Action Coding System (FACS) which is a comprehensive and anatomically

based system that is used to measure all visually discernible facial movements in terms

of atomic facial actions called Action Units (AUs). These AUs can be used for any

higher order decision making process including recognition of basic emotions according

to Emotional FACS (EMFACS) rules and a variety of affective states according to FACS

Affect Interpretation Database (FACSAID), as well as for recognition of other complex

psychological states such as depression or pain. AUs of the FACS are very suitable to be

used in studies on human naturalistic facial behavior as the thousands of anatomically

possible facial expressions (independently of their higher-level interpretation) can be

described as combinations of 27 basic AUs and a number of AU descriptors. It is not

surprising, therefore, that an increasing number of studies on human spontaneous facial

behavior are aimed at automatic AU recognition (e.g., [8], [97], [53]).

19

figures/affect_expression.eps


2.3.2 Audio Analysis and Affect

Speech is another important communication device in human communication. It deliv-

ers affective information through explicit messages (what we say), and implicit messages

(how we say it) that reflect the way the words are spoken. Although cognitive scientists

have not identified the optimal set of vocal cues that reliably discriminate among affec-

tive and attitudinal states, listeners seem to be rather accurate in decoding some basic

emotions from prosody [28] and some non-basic affective states such as distress, anxiety,

boredom, and sexual interest from nonlinguistic vocalizations like laughs, cries, sighs, and

yawns. The basic emotion-related prosodic features extracted from audio signal include

pitch, energy, and speech rate. Cowie et al. [18] provided a comprehensive summary of

qualitative acoustic correlations for prototypical emotions.

However, in real life, such data collection might be prohibiting, such as the availability

of frontal face images or high quality audio signals under a noisy environment. On the

other hand, physiological signals, such as EEG, can be captured continuously with non-

intrusive means, and usually less affected by the external noise source.

2.3.3 Physiological Expressions of Affect

Changes in biological signals are related to many psychological constructs and it is of

great importance to distinguish such differences in the literature of Affect Computing.

There is a many-to-many mapping between psychology and physiological changes, which

makes affect detection a very challenging problem. Physiological signals can be divided

into two categories: the ones originate from the peripheral nervous system (e.g., Electro-

cardiogram (ECG), Galvanic Skin Response (GSR) and Blood Volume Pressure (BVP))

and the ones originate from Central Nervous System (CNS) (e.g., EEG). In recent years,

more research studies have been carried out with the first category of signals and have

produced interesting results [3, 87, 15]. However, few studies has been done with the sec-

20


ond category of signals even though the cognitive theory states that the brain is heavily

involved in emotions [16].

There are two types of affect changes that are of interest in the affect detection us-

ing physiological signals, one is emotion and the other is mood. Emotion changes are

believed to be of short time duration and mood is more long lasting. Therefore, if one

is interested in detecting emotions, short-term biological changes such as skin conduc-

tance change will be more relevant than heart rate variability, which is more suitable for

mood changes. Advantages of affect detection applications using Physiological signals are

multifold: continuous recording, good time resolution (EEG), overcome social masking

or self-regulation factors, and it’s much harder to fake emotions through physiological

changes than facial or verbal expressions. Through the use of a sensor networks, multiple

modality (e.g., ECG, EEG, GSR) emotion detection system using physiological signals

could outperform the facial or audio based approaches.

The amount of information that the physiological signals can provide is increasing,

mainly due to major improvements in the accuracy of psychophysiology equipment and

associated data analysis techniques. Still, physiological signals are currently recorded

using equipment and techniques that are more intrusive than those recording facial and

vocal expression. Fortunately, some of the challenges associated with deploying intru-

sive physiological sensing devices in real-world contexts are being mitigated by recent

advances in the design of wearable sensors (e.g., [1]). Even more promising is the inte-

gration of physiological sensors with equipment that records the widely varying pattern

of actions associated with emotional experience. When statistical learning methods are

applied to these vast repositories of affective data, they can be used to find patterns

associated with different action tendencies in different contexts or individuals, linking

them to corresponding differences in autonomic changes that are associated with these

actions.

21


2.3.4 Affect Expression Through Peripheral Nervous System

(PNS)

The James and Lange theories [76] emphasize that emotional experience is embodied in

peripheral physiology (e.g. heart rate, ElectroMyogram (EMG), galvanic skin response

(GSR)). Autonomic Nervous System (ANS) is part of the peripheral nervous system that

acts as the control system for regulating breathing, heart rate and other bodily functions.

The Autonomic Nervous System (ANS) has two widely recognized subdivisions, sympa-

thetic and parasympathetic, which work together to regulate physiological arousal. When

the body is under external challenges, the sympathetic system increases the metabolic

output to generate the ’flight and fight’ response, on the other hand, the parasympathetic

system works complementary to the sympathetic system, as to bring the body back to

the equilibrium ’rest’ state [20]. Increased sympathetic activity (sympathetic arousal)

elevates heart rate, blood pressure and sweating, and redirects blood from the intestinal

reservoir towards the skeletal muscles, lungs, heart and brain in preparation for motor

action [3].

The connections between ANS and emotions have long been an active and often de-

batable topic. In [41, 58], emotion was proposed as being ”changes” in the Sympathetic

Nervous System (SNS). In more recent studies [3, 87, 15], it has been shown that the ba-

sic emotions (as anger, disgust, fear, happiness, sadness and surprise) are correlated with

ANS signals such as ECG, Galvanic Skin Response (GSR ) and Blood Volume Pressure

(BVP). Even though results were often controversial between those studies, some physi-

ological correlation of emotions could be identified more frequently than others: increase

of heart rate, skin conductance level and systolic blood pressure has been associated with

fear (e.g. [3]), while an increase of heart rate, systolic and diastolic blood pressure has

been associated with anger (e.g., [20]). Sadness has been found to sometimes lead to an

increased and other times decreased heart rate [87].

22

2.4 Emotion Elicitation Protocols

2.3.5 Affective Expression Through Central Nervous System

(CNS)

Brain signals are part of the Central Nervous System, which could potentially be more

advantageous than the signals generated from the Autonomous Nervous System (ANS),

as other physical factors that are not related to emotions could cause similar physiological

changes as well. For example, increased physical activity can increase the instantaneous

heart rate, increase of ambient temperature can increase the skin conductance, but is not

related to the presence of excitement or fear.

Previous EEG studies [16, 33] generally suggest that EEG is correlated with the affect

states through the oscillation pattern variation and also the lateral (frontal) asymmetry;

greater activation of the right frontal lobe accompanies the experience of more negatively

valenced emotions, whereas greater left frontal activation accompanies the experience of

more positively valenced experiences [6].


The ability to obtain affect-specific physiological signals is one of the most critical com-

ponents in any affect detection or recognition system. Facial images or audio recordings

with emotional expressions are typically collected from professional actors, who knows

in advance the output expressions for each emotion and mimicking such facial expres-

sions. An alternate approach for inducing emotions is to present particular stimuli to

an ordinary participant. Various stimuli can be used, such as: images, sounds (words or

phonics), videos or video games. This approach is advantageous because there is no need

for a professional actor and the responses are closer to the ones observed in real life.

Each emotion elicitation method has its own advantages. Pictures, such as the ones

from the International Affective Picture Systems(IAPS), have been chosen and proved to

be effective in evoking emotions[15]. However, stimuli of pictures can build up tolerance,

23


as the response will be less pronounced if the training and testing sessions are closely

aligned in time. While pictures are static, videos or films contain dynamic content. Such

stimuli may evoke varying emotions which can complicated the emotion classification

process [12]. Overall, large variations between subjects of different cultural backgrounds,

education and past personal experience exist [20]. Therefore, stimuli scores (labels)

evaluated by professional psychologists should be considered along with participant’s self

assessments for overall classification performance analysis.

2.4.1 International Affect Pictures System (IAPS)

The International Affective Picture System (IAPS) was developed by Lang et al [56]

and adopted for many psychophysiological studies involving emotion induction. Several

studies have shown the usefulness of images to elicit emotional responses that trigger

discriminative patterns in both the central and peripheral nervous system [15, 63]. The

IAPS contains 900 emotionally evocative images evaluated by several American partic-

ipants on two dimensions of nine points each (1-9) (see Figure 2.6): valence (ranging

from positive to negative or unpleasant to pleasant) and arousal (ranging from calm to

exciting). The mean and variance of participant judgments for both arousal and valence

are computed from these evaluation.

The labeling of the images follows the Circumplex model of affect proposed by Russell

[79]. The dimensional approach implies a very different rating method, which is based on

the assumption that emotions can be described by their degrees of valence, arousal and

dominance. Valence refers to the quality of an emotion (from unpleasant to pleasant)

while arousal describes the activation level (from calm to excited). Dominance is an indi-

cator for the control a person feels to have over a situation (from weak to strong). Lang

[56] introduced rating scales for these dimensions which consist of pictures of manikins,

called SAM for Self Assessment Manikin (see Figure 2.6). The pictures of the IAPS 2.5

are standardized on the basis of ratings of their valence, arousal, and dominance. During

24


Figure 2.5: Image Samples for The International Affective Picture System (IAPS)

25

figures/IAPS2.eps

2.5 Multimodality Affect Detection

an emotion induction phase, subjects were prompted with evaluated pictures of people

making angry, disgusted, fearful, happy, sad or surprised faces. The corresponding emo-

tion word was written underneath each picture and subjects were asked to pick one. If

test-subjects had trouble choosing one of the pictures/words simply because they did not

feel like one of them, they were asked to say so and in addition state if they either felt a

different emotion (then name it) or if they did not experience an emotion at all.

Valence (Negative - Positive)

1 2 3 4 5 6 7 8 9

Activation (Calm - Excited)

1 2 3 4 5 6 7 8 9

Figure 2.6: SAM-Scales for valence (top), arousal (bottom)


Modality fusion is required when we have inputs from various physiological signals. Al-

though the benefit of fusion (i.e., audio-visual fusion, linguistic (vocal) and paralinguistic

(non-vocal) fusion, multi-visual-cue fusion from face, head and body gestures) for affect

recognition is expected from engineering and psychological perspectives, our knowledge

of how humans achieve this fusion is extremely limited. The neurological studies on fu-

sion of sensory neurons [3] seem to support early fusion (i.e., feature-level fusion) rather

than late fusion (i.e., decision-level fusion). Feature-level fusion refers to the approach

26

figures/SAM.eps


that a decision is made on a joint feature vector composed of features from each modal-

ity. Decision-level fusion refers to the approach that a decision is made after evaluating

features from each modality independently. However, it is an open issue how to construct

suitable joint feature vectors from different modalities with different time scales, different

metric levels and different dynamic structures, based on existing methods. Due to these

difficulties, most researchers choose decision-level fusion that simplifies the fusion prob-

lem by introducing the conditional dependent assumption. Based on current knowledge

in the literature, optimization methods for multimodal fusion are largely unexplored. A

number of issues relevant to fusion require further investigation, such as the optimal level

of integrating these different streams, the optimal function for the integration, as well as

inclusion of suitable estimations of reliability of each stream.

2.5.1 Fusion at the Feature Level

Fusion at the feature level is to construct an augmented feature set that contains features

from each individual modality. Firstly, features are extracted from each input physiolog-

ical signal, such ECG, GSR, breathing and EEG. Typically these signals have different

sampling rate, range of amplitude and most importantly, different time and frequency

resolution. Secondly, the individual feature sets were combined linearly or non-linearly

into an augmented feature set. In the linear case, concatenation of the features was done

to combine the features extracted from the different peripheral sensors (GSR, Respiration

belt, etc.) in a unique feature set. It can also be used to fuse peripheral features with

features computed from the EEG signals.

Concatenating Nf feature sets f 1, · · · , fNf into a new feature set f which consists of

the concatenation of feature vectors f ji for each sample i and all feature set j:

fi = [f 1i · · · f

ji · · · f

Nf ] (2.1)

Test samples or newly acquired samples will be projected into this feature space and with

27


a selected classifier, a class label will be assigned.

2.5.2 Fusion at the Decision Level

A multiple modalities model with fusion process at the feature level will break down if

one or more sensor inputs are missing or severely degraded due to artefact interferences.

To overcome this problem, fusion at the decision level is proposed which is often more

practical and desirable. Many methods have been proposed in the literature for such

approach, such as majority voting, the class label with the most vote will be assigned as

the class output. More sophisticated methods such as mixture of Gaussian model [45, 43]

will also take into account of prior information thus provides a statistically more reliable

model.

Figure 2.7: Multiple Modalities Model with Decision Level Fusion [45]

Usually, performing late integration is chosen over performing early integration for

two primary reasons. First, the feature concatenation used in early integration would

28

figures/decision_fusion.eps

2.6 State of the Art Emotion Recognition Performances

result in a high dimensional data space, making a large multi-modal dataset necessary for

robust training of the classifier [23]. Second, late integration provides greater flexibility

in modeling. For instance, with late integration it is possible to train the face and posture

classifiers on different data sources and different classifiers thus providing the best accu-

racy for each modality separately. However, designing optimal strategies for decision-level

fusion is still an open research problem. Various approaches have been proposed in the

literature, such as: product rule, sum rule, using weights, maximum/minimum/median

rule, majority vote [49, 55].

2.6 State of the Art Emotion Recognition Perfor-

mances

Affective computing has been a very active research field over the past few years, espe-

cially research involving the use of physiological signals. The goal is to define a statistical

model to associate an observed physiological change (pattern) to an emotional state.

Table 2.2 provides a non-exhaustive list of relevant studies in the recent years con-

cerning emotion detection applications using physiological signals. Due to the lack of a

common testing platform or testing criterions between various approaches, it is difficult

to draw a fair comparison between these studies.

29

2.6

State

ofth

eArt

Emotio

nRecognitio

nPerfo

rmances

Table 2.1: State of the Art Emotion Recognition Performances

Study Data Set Emotions Stimulus Signals Methods (Feature Extrac-

tion & Classifier)

Best Results

(Correct classifi-

cation Rate)

Picard et al. [74] 1 sub, 20 days Neutral, anger hate, grief self-induction ECG, RSP,

GSR, EMG

SFFS-Fisher projection 81%

Katsis et al. [46] 10 sub High/low stress, disappoint-

ment and euphoria

Driving simula-

tion

EMG, ECG,

RSP, GSR

SVM 79%

Wagner et al. [94] 1 sub, 25 record-

ings over 25 days

Anger, sadness, joy and

pleasure

Music chosen by

the participant

EMG, ECG,

GSR, RSP

LDA, KNN, MLP, SFFS,

Fisher, ANOVA

92%

Haag et al. [32] 1 sub, several

days

Arousal, valence Images from

IAPS

EMG, GSR,

skin temp, BVP,

ECG, REP

Neural Networks for regres-

sion

97%

Takahashi et al. [90] 12 sub Joy, anger, sadness, fear, re-

laxation

Film clips EEG, BVP,

GSR

Linear SVM one vs. all 42%

Leon et al. [60] 9 sub Neutral, neg. excied, pos.

excited

Images from

IAPS

Heart Rate,

GSR, BVP

Sequential analysis and au-

toassociative networks

71%

*RSP: respiration, ECG: Electrocardiogram, EMG: Electromyography, EEG: Electroencephalogram, GSR: Galvanic Skin Response, BVP: Blood

Volume Pulse. Classification acronyms are : Sequential Floating Forward Search (SFFS), Linear discriminant analysis (LDA), Support Vector Machine

(SVM), Mean Square Error (MSE), Multi Layer Perceptron (MLP), K-Nearest Neighbors (KNN), ANalysis Of Variance (ANOVA).

30

2.6

State

ofth

eArt

Emotio

nRecognitio

nPerfo

rmances

Table 2.2: State of the Art Emotion Recognition Performances using EEG

Study Data Set Emotions Stimulus Feature Extraction Methods Classifier Best Results (Correct

classification Rate)

AlZoubi et al. [7] sub: 3 10 categories Self induced Narrow band power (1Hz) KNN, SVM subject-specific

39.97% - 66.74%

Panagiotis et al.[72] sub: 16, 9-32

years old

Six basic emo-

tions

Images Higher Order Crossing QDA Happiness(71.43%),

surprise(71.34%),

anger(77.33%),

Fear(83.67%), Dis-

gust(78.67%) Sad-

ness(61.83%)

Horlings et al. [34] sub: 10, 19-29

years old

Valence, arousal Images from

IAPS

Frequency band-power, cross-

correlation between band-power,

peak alpha frequency

SVM valence (32%),arousal

(37%)

*RSP: respiration, ECG: Electrocardiogram, EMG: Electromyography, EEG: Electroencephalogram, GSR: Galvanic Skin Response, BVP: Blood

Volume Pulse. Classification acronyms are : Sequential Floating Forward Search (SFFS), Linear discriminant analysis (LDA), Support Vector Machine

(SVM), Mean Square Error (MSE), Multi Layer Perceptron (MLP), K-Nearest Neighbors (KNN), ANalysis Of Variance (ANOVA).

31

2.6 State of the Art Emotion Recognition Performances

There are a few key points involved in the studies shown in Table 2.2 are listed below,

details related to each feature analysis approaches will be discussed in Section 4.2.

• Number of subjects: in general, findings obtained through a larger number of

subjects are more generalizable and more significant. However, if the results are

subject-specific, training and probing samples are collected from the same subject,

then the number of subjects involved is less important.

• Temporal distance between sample collection: if an elapse of time occured between

the collected samples, then the findings are more significant and better represents

the actual application scenario. Habituation occurs when samples are collected

closely in time from the same subjects.

• Methods used for emotion elicitation can add extra dynamics to the collected data.

Therefore, performances of different affect detection systems can not be compared

or generalized without further understanding the impact of each stimuli on the

emotion-specific dataset.

• Emotion models/classes: as shown in Section 2.2, the model used to represent

the evoked emotional states is important in determining the association between

the observed physiological pattern and the true emotional states. The ground

truth definition relies especially heavily on the choice of a representative model and

system recognition performance can differ significantly between different choices of

representative models.

• Single/multiple modality: in general, a larger number of sensors provide more

discriminating information between emotional classes, however, the selected set of

sensors should comply with both the feasibility and the practicality requirements

of an end-user system.

32

2.7 Ethical and Privacy Concerns on Physiological Signal Collection

2.7 Ethical and Privacy Concerns on Physiological

Signal Collection

With the increasing use of physiological signals in consumer electronics, there is an urgent

need to address the security and privacy implications for such commercial applications.

Currently there is no readily available model to systematically identify risk issues for any

commercial applications using physiological signals such as ECG or EEG.

The major concerns faced by commercial applications utilizing physiological signals

can be summarized as follows. First of all, the current sensing equipment is increasingly

more powerful, which allows it to be used for simple readings, like recording body tem-

perature, as well as complex applications such as emotion recognition. Secondly, these

ubiquitous devices are not always visible and can obtain potentially private information

without the awareness of the subject.

In the case of affect detection, the goal of privacy protection is not to invade one’s pri-

vacy by controlling his/her actions, but to support them by providing a better interaction

environment according to the user’s emotional state. EEG data of individual persons col-

lected by EEG headsets can be considered as persons’ private health information, since

physiological signals have been conventionally used in professional health and medical

applications. This belief holds even when physiological signals (EEG) are collected using

commercially available headsets and are used for leisure (non-medical) purposes. Due to

this unique application of consumer electronics and physiological data collection, there is

no readily available regulatory framework or practical approaches on the collection, stor-

age, and transmission of such personal information using commercial devices involving

physiological signals. If not dealt with carefully, such frontier applications can have huge

security and privacy implications.

In terms of collecting such potentially sensitive physiological data, the privacy by

design [57] approach has been highly appraised and adopted in many practices. ’Privacy

33

2.8 Summary

by design’ refers to a proactive approach at the system design level to minimize the

data collected and released for the protection of privacy and security of such data. The

main recommendation is summarized below and should be considered for any application

design that involves potentially sensitive physiological signals.

• only collect data for a well-defined purpose (no in-advance storage)

• only collect data relevant for the purpose (not more)

• only keep data as long as it is necessary for the purpose

2.8 Summary

This chapter has provided the background materials for affect detection. First, three

models for affect states representation were introduced and discussed. Hybrid model for

affect representation was chosen for this study. This model combines the advantages

of universality of basic emotions and also the effectiveness of the circumplex model in

representing affects using physiological signals. Secondly, common emotion elicitation

methods were presented and analyzed on their applicability. Image based elicitation

method (e.g., IAPS images) were chosen for our analysis Thirdly, various affect analysis

modalities were discussed with the emphasis on modalities based on physiological signals.

Various fusion techniques were also discussed to illustrate the benefits and challenges on

feature-level or decision-level fusion. Towards the end of the chapter, state-of-the-art

affect classification performance related to the use of physiological signals were reviewed

and discussed.

34

Chapter 3

EEG Signal Characteristics and

Preprocessing Methods

3.1 Introduction

Automated recognition of human emotions utilizing EEG signals is a relatively new re-

search topic that is being actively studied in the community of Affect Computing. The

EEG signals are generated from the Central Nervous System (CNS) and directly reflect

the brain activity, which can potentially overcome the challenges faced by other physio-

logical signals (such as galvanic skin response) with undesired interferences resulting from

non-emotional, physical or environmental changes. EEG signals collected from multiple

channels with correlated information can potentially produce a more reliable and robust

emotion recognition system. As shown in Figure 3.1, the objective of an automated EEG

affective signal processing system is to develop a statistical model based on the input

training samples (supervised machine learning) that can predict the label of a testing

sample with highest accuracy under given constraints. Recently, with the development

of electronics such as wearable sensors, high fidelity cheap and unobtrusive EEG headsets

are easily accessible, which has the potential to revolutionize the current generation of

35

3.2 EEG Signal Characteristics

affect computing applications.

Test EEG

Signals

Figure 3.1: System Overview

This chapter will provide some background information on the characteristics of EEG

signal, the typical recording setup and application constraints. Noise or artefact inter-

ferences will also be introduced and solutions on how to eliminate such interference in

actual signal processing applications will be presented.


Electroencephalography (EEG) is a measurement of the electrical activity within the

brain [68] that are produced by synchronous neuron activity and captured using multiple

electrodes resting on the scalp that are spatially located according to a specific system

known as the 10-20 system [91]. More details on the spatial locations of the electrodes

can be found in Section 3.2.1.

EEG signals are a time series indicating the oscillatory nature of the brainwaves. A

sample of EEG recording from an electrodes located at left Frontal lobe (F3) is shown in

36

figures/SystemDiagram.eps


2 4 6 8 10 12

−2

−1

0

1

2

3

Standardized EEG (F3)

Time (sec.)

Figure 3.2: Standardized EEG Recording (F3)

Figure 3.2. EEG signals are non-linear, non-stationary, and usually contaminated with

substantial amount of noise caused by thermal fluctuations, artefact (muscle movements,

electrodes movements) and technical interference (power line). Figure 3.3 shows a sample

of EEG recording contaminated by eye-blinking. It is difficult to infer EEG signals in

its time-domain raw format due the the above reasons. It is also a challenging signal

processing problem since EEG signal violates the stationary assumption that are set in

many conventional time-series analysis methods.

2 2.5 3 3.5 4 4.5 5 5.5 6time (sec.)

Contaminated EEG

artefact caused by eye blinks

Figure 3.3: EEG Signal Contaminated with Eye-blinking

37

figures/F3.eps

figures/noisyEEG.eps


10 20 30 40 50 60 70 80

Single−Sided Amplitude Spectrum of x(t)

Frequency (Hz)

|X(f)

|

Figure 3.4: Spectrum Characteristics of EEG

Through the use of Fast Fourier Transform (FFT) and Hamming window, we can

estimate the spectral characteristics of the recorded EEG as shown in Figure 3.4. We

observe that most of the energy of the EEG signal resides in the frequency range less

than 65 Hz and with a large spike in the range of < 4Hz (due to artefact).

3.2.1 Electrode Placement for EEG Recording Devices

EEG recording devices typically have a sampling frequency of 512 Hz, 1024Hz or higher,

with the number of electrodes ranging from 32-256, but compared to the enormous num-

ber of Neuron activity sources, this is considered to be low of spatial resolution, and with

the high sampling frequency, high temporal resolution. Electrodes placements for 32 and

64 channels are shown in Figure 3.5. Detailed electrodes placements comparison can be

found in [42], where three placements systems, international 10− 20 and with more elec-

trodes, 10−10, 10−5 are compared. The ”10” and ”20” refer to the fact that the actual

38

figures/amplitude_spectrum.eps


distances between adjacent electrodes are either 10% or 20% of the total front-back or

right-left distance of the skull. Smaller distance between adjacent electrodes will improve

spatial resolution of the EEG recordings but also increase the total number of electrodes.

FPzFP2

AF4AF8

AFzAF3

FP1

AF7

Fz F2 F4F8

F6F1F3F5

F7

FCz FC2 FC4 FC6FC8

FC1FC3FC5FC7

Cz C2 C4 C6 T8T7

CPz CP2 CP4 CP6TP8TP7

C5 C3 C1

CP5 CP3 CP1

Pz P2 P4 P6P8

P10

P1P3P5P7

P9 POzPO3

PO7PO4

PO8

CzC2C1

CMS DRL

Iz

(a) The 32-channel system (a.k.a.10-20 system) (b) The 64-channel system

Figure 3.5: Electrodes placements of different EEG recording systems

3.2.2 Measuring Emotion Using EEG

The theory of the limbic system [77] has been the dominant theory in linking the brain

structure and the emotional expression. Limbic system is often described as a positive

feedback circuit, the Papez circuit [59]. The basic elements of the limbic system are the

amygdala, prefrontal cortex, anterior cingulate cortex, hypothalamus and insular cortex.

The physical location of these elements are show in Figure 3.6.

• Amygdala [20], are two groups of neurons deep inside the human brain. Together,

they are one of the most important brain regions for emotion. Amygdala intercon-

nects with the sensory cortical areas and acts as the translator between the stimuli

(sensory) and the autonomic nervous response system. Amygdala is also responsi-

39

figures/e_placement.eps

figures/e_placement.eps


Neocortex

Limbic

Diencephaion

Brainstem

Cerebellum

Frontal

lobes

Prefrontal

lobes

Temporal

lobes

Thalamus

Parietal

lobes Occibital

lobes

Figure 3.6: Cross section of the human brain

ble for helping the brain to learn the association between emotional events and the

specific stimuli, which will become an long-term emotional memory with repetitive

occurrences.

• Hypothalamus, is the part of the brain that coordinates emotional behaviors and

expressions.The hypothalamus controls many processes in the body, such as body

temperature, hunger and thirst. It also handles the release of some hormones. As

such, the hypothalamus is involved in processing emotions and sexual arousal.

• Prefrontal cortex, located at the front of the frontal lobe and is decision making

and also the approach/withdraw reactions [20]. It is involved in planning, making

decisions based on earlier experiences and working towards a goal.

• Insular cortex, is said to be associated with emotional experience and produces

conscious feelings [3].

40

figures/HumanBrain.eps


3.2.3 Spectral Characteristics of EEG

EEG signals can be divided into the five rhythms as shown in Table 3.1. According to the

origin of the rhythm and the underlying brain networks [59], Beta waves are connected

to an alert state of mind, whereas alpha waves are more dominant in a relaxed person

[3]. Research has also shown a link between alpha activity and brain inactivation, which

also leads to the same conclusion. Therefore, characteristics in Beta and Alpha waves

are key indicators of the state of arousal the subject is in. Delta band contains mostly

noise such as pulses, neck movement, and eye blinking.

Table 3.1: Subbands of the EEG Signals

Band Frequency (Hz) Location of Origin Reason for Activity

Delta 0-14 Thalamus Slow-wave Sleep, wak-

ing

Theta 4-7 Hippocamus and Cor-

tex

Idle state in cortex,

Emotional stress,

frustration, disap-

pointment

Alpha 8-13 Posterior regions, oc-

cipital lobe, cortex

Closed eyes, and idle

state in cortex,relaxed

Beta 13-30 Cortex (e.g. Motor

and Sensory)

Active/busy, concern-

trate/alert, attention

Gamma 30-100 Cortex Sensory processing

and cognitive task

41

3.3 Preprocessing of the EEG Recordings


EEG signals collected from the scalp are typically contaminated with various noises (i.e.,

thermal noise) and artefact (electrode movement, electrophysiological potentials gener-

ated by muscle activities, such as eye-movements, biting, chewing) which are continuous

in time and very large in amplitude. This results in poor signal quality and makes it

challenging and often insufficient for direct interpretation. The objective of the pre-

processing step for such signal processing system is to eliminate the noise and artefact

interference and also to reduce the interference from adjacent neural networks due to

volume conduction.

Artefact removal is essential for any robust EEG emotion recognition system. Sev-

eral methods have been used in the literature for removing or reducing such interfer-

ence. Current artefact removal methods such as digital band-pass filter eliminates all

frequency components over the cut-off frequency, which could potentially contain cru-

cial information in detection, which can leads to incorrect interpretation of EEG signals.

More recently, Independent Component Analysis (ICA) has been applied into the arte-

fact reduction of EEG, which has been proved to be effective in removing muscle artefact

caused by eye-movements, but it also suppresses genuine brain activity (cross talk be-

tween brain and muscle activity) [39, 13]. Wavelet-based artefact rejection methods has

also been shown to be very effective.

3.3.1 EEG Referencing

The measured EEG potential is a sum of contributions from all the active sources in

the brain. The deep sources tend to contribute more uniformly to all electrodes, as the

distances to each electrode are of the same magnitude order. On the other hand, cortical

sources tend to influence only the closest electrodes. Raw EEG signals collected directly

from the scalp provide a poor signal to noise ratio and it is necessary to re-reference them

42


to eliminate the interference propagated from distance sources. The CMS electrode is a

typical referencing electrode to many EEG headsets (e.g., Biosemi Active II,with which

our testing data set was collected) therefore, we will refer to this setting in the following

description. To obtain a Laplacian reference [5], the following Laplacian operator was

applied to each electrode i:

xi(n) = xi(n)−1

Ni

∑

j∈Neig(i)

xj(n) (3.1)

where xi is the CMS referenced signal of electrode i, xi the Laplacian referenced signal,

n the sample number, Neigi the neighbor electrodes (see Appendix A) of electrode i and

Ni the size of this neighborhood.

3.3.2 Rejecting Artifacts Based on Channel Statistics

This method is based on the observation that the amplitude of the ’noise-free’ EEG

is much smaller than that of the artefact caused by electrode or muscle movements. A

threshold-based outlier detection algorithm makes use of the mean and standard deviation

(std) of the EEG amplitude. The outliers are defined through comparing the EEG

amplitude to a threshold value that set by the reverse-test, which is calculated based on

all the samples in a trial to eliminate outliers. To increase the accuracy and robustness

of this process, the mean and standard deviation should be calculated iteratively after

eliminating a predefined number of outliers (usually much less than the number of outliers

present in the segment of recording, i.e., 5 vs 500). To simplify the calculation, we assume

that the amplitude of the recorded EEG signal has a Gaussian distribution. As shown in

Figure 3.7, this assumption is proven to be valid through empirical observation. In the

next step, a threshold value is defined according to the mean and standard deviation of

the observed segment. Artefact or outlier regions were defined by finding the components

that have amplitudes larger than this defined threshold value. One way to define such

threshold is to use the inverse Q function at p = 0.05: Let x1, x2, ..., xN be a zero-mean

43


−25 −20 −15 −10 −5 0 5 10 15 20 250

2

4

6

8

10

12

14x 10

4

Fre

quen

cy o

f App

eare

nce

Sample EEG Amplitude with Normal Fitting

Figure 3.7: Sample EEG Amplitude with Normal Fitting

time series,

Athresh = Q−1(0.05) ∗ σx (3.2)

σx =

√

√

√

√

1

N

N∑

t=1

(x(t)− µx)2 (3.3)

Statistically speaking, signals with amplitude larger than this threshold value would

less than 5% of chance in appearing in the collected sample, which in term can be

considered as outliers or ’abnormal’. In our case, we would consider the recorded segment

is originated from artefacts.

3.3.3 Filter Data Using Fast Fourier Transform (FFT)

In [50], it was shown that event-related EEG oscillation pattern variations are expressed

mostly in the spectral components that resides within the Alpha (8 − 13Hz), and Beta

(13 − 30Hz) bands. Other bands such as the Delta band (up to 4Hz) contains mostly

noise such as pulses, neck movement, and eye blinking. Beta waves are connected to

an alert state of mind, whereas alpha waves are more dominant in a relaxed person.

44

figures/eeg_amplitude_normal.eps


Therefore, to preprocess the input EEG signals, a Butterworth 10th order band-pass

filter (8−30Hz) was applied to the collected EEG signals to extract the Alpha and Beta

waves. The order of the Butterworth filter was chosen by the minimum order required to

meet the constraints such that no more than 3dB loss within the passband and at least

60dB attenuation in the stopband.

3.3.4 Independent Component Analysis

The EEG data consists of recordings of electrical potentials in many different locations

on the scalp. These potentials are presumably generated by mixing some underlying

components of brain activity. We would like to find the original components of brain

activity, but we can only observe mixtures of the components. ICA [39] can reveal

underlying dynamics on brain activity by giving access to its independent components

[13].

A statistical model of independent component analysis is shown in Eq. 3.5, which is

adopted through the ”latent variables” model [39]. Assume we observe n linear mixtures

x1, ..., xn of n independent components.

xj = aj1s1 + aj2s2 + ... + ajnsn, ∀j (3.4)

It is convenient to use vector-matrix notation instead of the sums like in the previous

equation. Let us denote by x the random vector whose elements are the mixtures x1, ..., xn

and likewise by s the random vector with elements s1, ..., sn. Let us denote by A the

matrix with elements aij .

x =n

∑

i=1

aiSi = As (3.5)

The ICA model is a generative model, which means that it describes how the observed

data is generated by a process of mixing the components si. The independent components

are latent variables, meaning that they cannot be directly observed. Also the mixing

45


matrix is assumed to be unknown. All we observe is the random vector x, and we must

estimate both A and s using it.

3.3.5 Wavelet Decomposition for Denoising

Discrete Wavelet Decomposition (DWT) for signal analysis has been used for denoising

and feature extraction in many biomedical signal analysis [54, 98]. Typically, biomedical

signals (i.e., EEG) are non-linear and non-stationary, which degrades the performance of

wavelet methods. However, it is shown that non-linear wavelets (e.g, bior1.3) are effective

in analyzing biomedical signals. Therefore, for our analysis, biorthognal wavelet (bior

1.3) was used for DWT-based denoising analysis [98].

To define the scales of interest in DWT process, we made use of the scale and frequency

relationship introduced in [70], where at each decomposition scale upper frequency of the

current band is halved. A signal can be fully decomposed into n levels when the total

number of data points is N, whereN = k×2n. Each of these wavelet levels (v) corresponds

to a frequency band given by

f = 2v(fsN) (3.6)

Shown in Table 3.2, this frequency range association between sampling rate and

decomposition level is not exact, but a good enough approximation for our study. The

sampling frequency for the data to be used in Chapter 6 is 1024Hz.

In summary, the method proposed involves the following steps:

1. To apply Discrete Wavelet Transform (DWT) to the contaminated EEG signals and

decompose it to seven levels with ’biorthogonal’ (bio1.3) as a basis function .

2. Define threshold value through the use of a statistical thresholding function such as

the rigorous Stein’s unbiased estimate of risk (rigorous SURE) or heuristic SURE.

3. To identify the ocular artefact and apply a desired de-noising technique

4. To apply wavelet reconstruction procedure to reconstruct the denoised EEG signal.

46


Table 3.2: Wavelet Decomposition of EEG signals into various frequency bands (fs =

1024)

Frequency Range Decomposition Level Frequency Bands Frequency Bandwidth

0-4 A7 Delta 4

4-8 D7 Theta 4

8-16 D6 Alpha 8

16-32 D5 Beta 16

32-64 D4 Gama 32

64-128 D3 mostly Noise 64

Figure 3.8: Four sample mother wavelet functions used for DWT decomposition of EEG

signals: (a) Mexican hat wavelet, (b) Daubechies order 8 wavelet (db8), (c) biorthogonal

wavelet order 1.3 (bior1.3), and (d) biorthogonal wavelet order 1.5 (bior1.5).

47

figures/wavelets.eps

3.4 Ground Truth Definition and Validation

0 1 2 3 4 5 6 7 8 9−1500

−1000

−500

0

500

1000

1500

2000EEG denoisng using Discrete Wavelet Transform

Time (seconds)

Noisy EEGDenoised EEG

Figure 3.9: EEG denoising using Discrete Wavelet Transform (DWT)

The final reconstructed EEG signal is shown in Figure 3.9. Comparing to Fourier

transform denoising method, DWT is an efficient method for noise removal and is better

at preserving the shape of the peaks [54]. Since most emotion analysis algorithms [65, 66]

are carried out in the Alpha (8 − 12Hz) and Beta (13 − 30Hz) bands, alternatively,

wavelet coefficients obtained at the decomposition level 6 and 7 can be used exclusively

to reconstruct the denoised EEG signal, which serves as both a filtering and denoising

technique.


It is very important that the labels for the collected data can accurately represent the

actual affect-state present in the data. However, affect expression of individuals is typ-

ically affected by age, culture background, and current health state [58, 80]. Also the

transition level is influenced by current state signal value [12]. Therefore, there is very

limited cross-reference results in the work of affect detection using EEG signals, and the

claimed emotion states were also debatable.

In general, there are three ways that we can obtain the ground truth label:

48

figures/wavelet_denoise.eps


• participants to self-assess their emotions on the projected stimuli.

• labels of the specific stimuli used

• labels obtained from a third source, such as streamed facial images while collecting

the desired verbal or physiological signals (e.g., ECG, EEG, GSR).

However, from the engineering design point of view, user inputs are often not acces-

sible or non-reliable due to physical or other constraints. For example, for people with

autism, such self report of emotional states is prohibiting. In the process of develop-

ing a statistical model, often self-assessment values were obtained to validate the chosen

ground-truth labels for the collected emotional data.

Currently, data labels are obtained either through self-assessment reports or rely on

the labels of the projected stimuli. However, emotions are known to be very subjective

and are dependent on past experience so that one can never be very sure whether a block

elicits the expected emotion or not. To reduce such discrepancy, we present here two

methods that can be used to validate the labels obtained through self-assessment and

through known stimuli labels.

3.4.1 Pearson Correlation Coefficients for Ground Truth Vali-

dation

Correlation is a measure of the relation between two or more variables. It determines the

extent to which values of the two variables are ”proportional” to each other. Commonly

used method for determine such correlation is the Pearson product correlation method.

let Li, i = 1 · · ·N denote the self-assessment scores and Ki, i = 1 · · ·N denotes the

scores provided with the projection stimuli:

r =1

N − 1

N∑

i=1

(

Li − L

sL

)(

Ki −K

sK

)

(3.7)

49


Table 3.3: Sample Self-Assessment Values vs. Values provided with chose stimuli (e.g.,

images from IAPS)

Predicted Self Assessment

Valence Arousal Valence Arousal

7.242 5.516 7.20 7.20

2.430 6.018 5.40 3.60

4.998 3.020 7.20 3.60

7.272 5.620 5.40 5.40

7.766 5.880 5.40 7.20

2.184 6.526 1.80 9.00

4.830 3.226 3.60 7.20

2.110 5.914 1.80 7.20

......

......

where X and X are the sample means; sL and sK are the sample standard deviation; N

is the population size. The correlation coefficient (r) represents the linear relationship

between two variables. r takes the value from −1 to +1 where −1 represents a perfect

negative correlation where the two variables are inversely proportional and +1 represents

a perfect positive correlation where the two variables are completely proportional.

3.4.2 Confusion Matrix for Ground Truth Validation

A confusion matrix [51] contains information about actual and predicted classifications

done by a classification system. Performance of such systems is commonly evaluated

using the data in the matrix. The following table shows the confusion matrix for a two

class classifier.

The entries in the confusion matrix have the following meaning in the context of our

50

3.5 Summary

study:

• a is the number of correctly-matched instances that the predicted label(class 1)

matches the given label (class 1) for the training data,

• b is number of the mismatched instances that the samples are from class 2 but were

predicted as from Class 1,

• c is the number of correctly-matched instances that the predicted label(class 2)

matches the given label (class 2) for the training data,

• d is the number of mismatched instances that the samples are from class 1 but were

predicted as from Class 2,

Table 3.4: Confusion Matrix Components

Predicted

Class1 Class2

ActualClass1 a b

Class2 c d

The accuracy (K) is the proportion of the total number of predictions that were

correct. It is determined using the equation:

K =(a+ d)

(a+ b+ c+ d)(3.8)

3.5 Summary

In this chapter, affect detection analysis using EEG signal was introduced and discussed.

It started with the introduction of limbic system as the linkage between the brain struc-

ture and the emotion expression. Next spectral characteristics of EEG signals was dis-

cussed. It is shown that Alpha (8−12Hz) and Beta (13−30Hz) rhythms were originated

51

3.5 Summary

from the limbic system and are useful in EEG emotion analysis. Time domain charac-

teristics of the EEG signal were also discussed with the focus of dealing with artefact

interferences and preprocessing methods for signal analysis applications using EEG sig-

nals.

52

Chapter 4

Methods for Emotion Assessments

using EEG

4.1 Introduction

As a supervised machine learning system, as shown in Figure 4.1, there are two stages in

this pattern classification process, training stage is to develop a statistical model based

on the collected, labeled samples, and optimize the model-specific parameters for best

performance. The testing stage is to test and validate the developed model as how well

it represents the desired phenomenon.

Since raw EEG signals are high in dimension with redundancy information, a typical

EEG signal processing system needs to first reduce the effects of noises (preprocessing

stage), then extract features to reduce the dimensionality of the signals, and if possible,

increase the separability of classes by choosing the optimal projection direction using

statistical methods. For comparison reasons, in this study, four feature extraction tech-

niques for EEG-based affect detection will be examined, implemented and applied to the

acquired database in Chapter 6. Simulation results using these techniques will provide

an insight in the effectiveness of each constructed feature vector (FVs) in representing

53

4.2 Previous Feature Extraction Methods

Figure 4.1: System components in the training stage

the affect states. Details on each feature analysis algorithm are further explained in the

following subsections.


The feature extraction step has become an important and often essential step in a machine

learning process. In essence, the process is trying to determine the most relevant set of

features in differentiating affective states through the use of an optimization criterion.

The benefit of applying feature extraction algorithm are multi-folds. First of all, because

of the high dimension of the acquired EEG signal and typically the much lower number

of samples for each pattern class, i.e., an exponentially larger sample size is required

to have a meaningful statistical analysis [23]. This is often referred to as the curse of

dimensionality. Secondly, by projecting the training samples into a lower dimensional

feature space, the irrelevant or redundant formation is removed which will lead to better

separation between the sample classes and better classification performance at the end.

54

figures/training.eps


Lastly, through the use of feature dimension reduction, the computational costs are

lowered in the successive classification process.

It is well known that the frequency spectrum of the EEG changes with age. In the

present study several steps were taken to control for maturational EEG spectral changes,

including selection of subjects with similar ages (except for the toddlers), and using

analysis of covariance and adjusting for age. However, the study on this topic is beyond

the scope of this project even though it would be essential in generalizing the research

findings in this study to a much larger testing population.

4.2.1 Time Domain Analysis

4.2.1.1 Statistical-based Features

EEG signals are a reflection of the oscillatory pattern of the action potentials of the

central nervous system. Therefore, to analyze the oscillatory pattern in the time domain

is a natural and most direct way to understand EEG patterns. However, as previously

stated, the raw time domain recording are contaminated with noises and artefacts, to

reduce some interferences, signal components of interest (8 − 30Hz) are extracted first.

Next, six time domain parameters proposed by Picard [74] were calculated on the N

values (5 seconds at 256 samples per second gives N=1280). The statistical features used

to form the proposed FVs are defined as (Xi, i = 1 · · ·N is the raw N-sample EEG signal)

given in the following.

1. The mean of the raw signal

µx =1

N

N∑

t=1

X(t) = X(t) (4.1)

2. The standard deviation of the raw signal

σx =

√

√

√

√

1

N

N∑

t=1

(X(t)− µx)2 (4.2)

55


3. The mean of the absolute values of the first differences of the raw signal

δx =1

N − 1

N−1∑

t=1

|X(t+ 1)−X(t)| (4.3)

4. The mean of the absolute values of the first differences of the standardized signal

δx =1

N − 1

N−1∑

t=1

∣

∣X(t+ 1)−X(t)∣

∣ =δxσx

(4.4)

5. The mean of the absolute values of the second differences of the raw signal

γx =1

N − 2

N−2∑

t=1

|X(t+ 2)−X(t)| (4.5)

6. The mean of the absolute values of the second differences of the standardized signal

γx =1

N − 2

N−2∑

t=1

∣

∣X(t+ 2)−X(t)∣

∣ =γxδx

(4.6)

Concatenating Ne feature sets f 1, · · · , fNe into a new feature set f consists in the

concatenation of feature vectors f ji for each sample i and all feature set j:

f js,i = [µx, σx, δx, δx, γx, γx],

fs,i = [f 1i , · · · , f

ji , · · · , f

Ne] (4.7)

4.2.1.2 Higher Order Crossings

Observed time series of physiological signals such as EEG, display both local and global

up and down movements. Characteristics of the oscillatory mode process discrimination

powers and can be extracted as features for classification purpose. The oscillation be-

havior, seen in a finite zero-mean time series Xi, i = 1 · · ·N can be expressed through

the zero-crossing count. In general, when a filter is applied to a time series it changes

its oscillation; hence its zero-crossing counts too. Under this perspective, the following

iterative procedure could be assumed: apply a filter to the time series, and count the

number of zero-crossings in the filtered time series; apply yet another filter to the origi-

nal time series, and again observe the resulting zero-crossings, and so onfilter and count.

56


The resulting zero-crossing counts are referred to as HOC [47]. When a specific sequence

of filters is applied to a time series, the corresponding sequence of zero-crossing counts

is obtained, resulting in the so-called HOC sequence. Many different types of HOC se-

quences can be constructed by appropriate filter design, according to the desired spectral

and discrimination analysis.

Let X1, X2, ..., XN be a zero-mean stationary time series, the zero-crossing count in

discrete time is defined as the number of symbol changes in the corresponding clipped

binary time series [47]

Zt =

1, ifXt ≥ 0

0, ifXt < 0(4.8)

The number of zero-crossings, denoted by D, is defined in terms of Zt

D =

N∑

t=2

[Zt − Zt−1]2, 0 ≤ D ≤ N − 1 (4.9)

HOC combines ZC counts and linear operations: the difference operator is a linear

high-pass filter

∇Zt ≡ Zt − Zt−1 (4.10)

and the squared gain of the second difference ∇2 is a more pronounced high-pass filter.

To extract the HOC features, typically signal components in the 8 − 30Hz are ex-

tracted first, which covered the conventional Alpha and Beta waves. Signals are also

passed through a zero mean process. Recently, Petrantonakis [71] has used a combina-

tion of EMD based adaptive filtering and Higher Order Crossings analysis, which shows

promising results on the classification of six basic emotions.

4.2.2 Spectral Analysis

4.2.2.1 Event Related Potential and Spectrogram

Event related potential analysis is another one of the commonly used methods on EEG

pattern classification. The energy variation of the EEG signals between frequency ranges

57


is an direct indicator of the activation or deactivation of certain underlying neruo-

netowrks, or the oscillation frequency shift of the EEG signals. In [50], it was shown

that event-related EEG oscillation pattern changes are expressed mostly in the spectral

components that resides within the Alpha (8 − 13Hz), and Beta (13 − 30Hz) bands.

Other bands such as the Delta band (up to 4Hz) contains mostly noise such as pulses,

neck movement, and eye blinking. Beta waves are connected to an alert state of mind,

whereas alpha waves are more dominant in a relaxed person.

Therefore, to construct the narrow-band energy feature set, the Alpha and Beta waves

(8−30Hz)are extracted from the raw EEG time series first. Following this step, narrow-

band energy features were computed from EEG signals by applying the FFT algorithm

on the whole duration of a sample (2.5 seconds). Frequency components within the

alpha and theta waves are further divided into 1 or 2 Hz sub-bands and energy within

each frequency band is calculated and used as features. The underlying assumption

is that EEG signals are stationary for the duration of each sample(2.5 seconds). This

assumption is also applied for any other power spectrum calculation method such as

Short Term Fourier Transform (STFT). This method emphasizes that the use of narrow

frequency bands reduces the danger that frequency specific effects go undetected.

The feature vector fe for a given sample is then constructed by concatenating all the

power values of the 11 frequency bands for each electrode. The length of the energy-based

feature vector f je,i for a sample i and feature set j is thus

f je,i = [E1E2 · · ·E11], (4.11)

fe,i = [f 1e,i, · · · , f

je,i, · · · , f

Ne ] for j = 1 · · ·Ne (4.12)

where 11 is the number of frequency bands, Ne the number of electrodes.

58


1 2 30

1

2

3

4

5

6

7

8

9

10

11

12

13x 10

5

Sig

nal P

ower

1 2 30

2

4

6

8

10

12

x 105

Figure 4.2: Power within sub bands for F3, F4 for Negative Positive and Calm states

4.2.3 Time-Spectral Analysis

4.2.3.1 Wavelet-based Features

Wavelet-based methods are a subset of time-frequency analysis methods. It overcomes the

lack of time-stamp for events associated with Fourier-based methods, but provide more

in-depth understanding in the spectral domain. Murugappan et al. [66, 67] proposed a

new approach for feature extraction process. In this approach, EEG signals first goes

through a zero mean, Unit variance (standardization) process, the resulted preprocessed

data is subject of discrete wavelet transform [67].

The Discrete Wavelet Transformation (DWT) was applied individually on each chan-

nel. Scale is related to frequency as shown in Table 3.2. Figure 4.3 shows a sample

wavelet coefficients obtained during this process. Wavelet coefficients are calculated for

frequencies of interest. The frequencies of interest is 8 − 30Hz, and the spectral resolu-

tion is set to be 1 Hz. Daubechies fourth-order orthonormal bases (db4) was employed to

calculate the wavelet coefficients at the lth scale, CX(l, n), that correspond to the alpha

band (812Hz)and Beta band (13− 30Hz) were used to estimate the wavelet energy and

59

figures/3classesF3F4.eps


Figure 4.3: Discrete Wavelet Decomposition using db4 Wavelet

60

figures/dwt.eps


wavelet entropy, given by

ENGl =

2S−l−1

∑

n=1

|CX(l, n)|2 ,

N = 2S, 1 < l < S. (4.13)

ENTl = −

2S−l−1

∑

n=1

|CX(l, n)|2 log(|CX(l, n)|

2),

N = 2S, 1 < l < S. (4.14)

The parameters of 4.13 and 4.14 were used as a feature vector fw (i.e., f jw,i = [ENGl, ENTl]).

4.2.4 Channel selection: single channel vs. multiple channels

Multi-channel recordings would intuitively produce more robust recognition rate, as cor-

related inputs provide more or higher dimensional information in the feature space. How-

ever, a large number of electrodes (greater than 32 channels) are not always feasible for

many applications, especially the ones that geared towards non-medical, commercial

applications such as augmented emotion communication, health service or behaviorial

monitoring. It will be very beneficial to researchers and consumer electronic developers

to exam the feasibility of a subset of the channels for affect detection and recognition

applications, for example, using commercially available EEG headsets (e.g., Emotive

headset [1]).

One way to reduce the number of channels is based on previous results on the associ-

ation of affect and different regions of the brain. In numerous EEG studies [4], [21] gen-

erally suggest that greater activation of the right frontal lobe accompanies the experience

of more negatively valenced emotions, whereas greater left frontal activation accompanies

the experience of more positively valenced experiences. Due to this asymmetrical neural

activity within the brain when an emotional state was present, our intuitive choice of

channels were the ones collected from the frontal left and right hemisphere. To test our

the feasibility of commercially available headsets, the Emotive EPOC Neuroheadset 4.4

61

4.3 Classification Methods

was chosen for our study. The Emotiv Software Development Kit [1]for research includes

a 14 channel (plus CMS/DRL references, P3/P4 locations) high resolution, neuro-signal

acquisition and processing wireless neuroheadset. Channel names based on the Interna-

tional 10 − 20 locations are: AF3, F7, F3, FC5, T7, P7, O1, O2, P8, T8, FC6, F4,

F8, AF4.

(a) Emotive sensor layout (b) Emotiv EPOC Neuroheadset

Figure 4.4: Emotive Epoc Neuroheadset

Due to the difference between the recording devices and recording cases for the

database to be used in Chapter 6 and the number of channels is further reduced to 6

channels as shown in Table 4.1. Simulation results will be discussed in details in Chapter

6.


The choice of classifiers is mostly determined by how well the class separates in the

feature space. If the projected class samples are linearly separable, then linear classifiers

(e.g. SVM [84], LDA [24, 9]) are preferred, otherwise non-linear classifiers such as K

Nearest Neighbours (KNN) are preferred. There is no ’best’ classifier that fits all type of

62

figures/emotive_sensor_layout.eps

figures/emotive_device.eps


Table 4.1: EEG channels selected for analysis.

Device Database Sampling Rate Channels

Biosemi Active 2 eNTERFACE06 1024Hz 54

Emotive EPOC Self-collected 128Hz 14

Channels selected for analysis AF3, F7, F3, FC5, FC6, F4, F8, AF4

features. Since we are going to evaluate the effectiveness of four rather different feature

extraction algorithms and the class separability is uncertain, the following three classifiers

were implemented and used to generate simulation results in Chapter 6.

Given training data (yi,xi) ∈ (−1,+1) × Rn, i = 1, · · · , l, where yi is the two-class

label and xi is the feature vector, some classification methods construct the following

decision function:

d(x) ≡ wTφ(x) + b (4.15)

where w is the weight vector and b is an intercept, or called bias. A non-linear classifier

maps each instance x to a higher dimensional vector φ(x) if data are not linearly separa-

ble. If φ(x) = x, data points are not mapped, we say 4.15 is a linear classifier. Because

non-linear classifiers use more features, generally they outperform the linear classifier in

terms of prediction accuracy.

The projection of a test sample z in the feature space is obtained as:

d(z) = wTφ(z) + b (4.16)

In this two-class estimation problem, z is estimated to be from class C1 if d(z) < 0 and

C2 if d(z) > 0.

63


4.3.1 Linear Discriminate Analysis

Since the EEG feature sets can be of very high dimensionality (thousands of features)

compared to the number of samples in the sets , there is always a linear boundary that

can completely separate training samples of the different classes [40]. Another advantage

of using linear classifiers is that they give better generalized solutions. LDA can also

provide probabilistic output if decision fusion approach is implemented.

Training of LDA is carried out through scatter matrix analysis of the training samples.

LDA aims to derive the most discriminating features in the produced feature space based

on the maximization of the so-called Fisher’s discriminate criterion [24]:

WLDA = arg maxW

∣

∣W TSBW∣

∣

|W TSwW |(4.17)

where SB and Sw are between class and within class scatter matrix:

SB =

C∑

n=1

NC(Xc − X)(Xc − X)T

SW =

NC∑

n=1

(Xcn − Xc)(Xcn − Xc)T

in the definition above, C is the number of classes, NC is the number of training samples in

class C, Xcn is the nth sample of class C, and the mean for class C is Xc =1

NC

∑

m,Cmxm.

The projection of a test sample z in the feature space is obtained as:

L = W TLDAz (4.18)

In the case of small sample size where the number of samples is less than twice the

dimension of the features, LDA algorithm should not be applied directly [24]. Since LDA

is based on the sample covariance, the within class scatter matrix SW might become

singular and sparse. To resolve the singularity problem, the regularized version of the

LDA algorithm should be considered, where the sample covariance will be added into a

identity matrix of the same size. This approach will introduce a small bias into the final

results, but often is tolerable [40]. A 3 classes classification process using LDA is shown

in Figure 4.5 and the lines are the decision boundaries.

64


−6 −4 −2 0 2 4 6 8 10 12 14−2

0

2

4

6

8

10

12

Feature Variable One

Fea

ture

Var

iabl

e T

wo

Sample for Linear Discriminate Analysis

Class 1 SamplesClass 2 SamplesClass 3 Samples

Class 3 Region

Class 2 Region

Class 1 Region

Figure 4.5: Examples of LDA Classifier

4.3.2 K Nearest Neighbours

K-Nearest Neighbors (kNN) [17] is an instances based classifier for which the label for an

probing or new sample is defined by labels of adjacent instances using an voting criterion

(typically distance measure). For example, given a set of training samples with known

labeling, in the process of finding the class membership to a new probing point, we first

find K neighbours that are closest to the new point using a chosen distance measure,

e.g., Euclidean distance. When a tie between multiple points of same distance occurs,

tie-breaking methods such as majority rule was used for nearest points tie breaking. In

general, odd number of neighbors 1, 3, 5, 7, 9 were picked to avoid ties. The output label

for the probing sample is assigned to the label of the majority of the K neighbours.

The number of neighbours or parameter K is very important in determine the final

performance of this classifier. By increasing the number of neighbors, the effect of arte-

facts are reduced within classes, but the class boundary are also enlarged, which could

potentially degrade the classification performance. Therefore, the final recognition per-

formance is most dependent on the class separation in the feature space. For any given

65

figures/LDA2.eps


problem, a small value of k will lead to a large variance in predictions. Alternatively,

setting k to a large value may lead to a large model bias. Thus, k should be set to

a value large enough to minimize the probability of misclassification and small enough

(with respect to the number of cases in the example sample) so that the K nearest points

are close enough to the query point. Thus, like any smoothing parameter, there is an

optimal value for k that achieves the right trade off between the bias and the variance of

the model.

The advantage of KNN is that it is easy to implement and when the training sample

is not too large, it is low in computation complexity. However, since KNN is instance

based model generation, which largely depends on the training samples in the set. It does

not generalize well and needs to be trained by the samples closely resemble the testing

samples.

Figure 4.6: Examples of KNN Classifier

66

figures/KNN.eps

4.4 Summary

4.4 Summary

In the chapter, various feature analysis algorithms for EEG signal analysis in the af-

fect detection application were introduced and discussed in depth. These algorithms

were grouped into two main categories, time domain oscillatory-pattern based methods

and spectral domain energy-based methods. Two commonly used classifiers, k Nearest

Neighbour (kNN) and Linear Discriminate Analysis (LDA), were presented and their

theoretical foundation was discussed.

67

Chapter 5

Empirical Mode Decomposition for

Emotion Classification

5.1 Introduction

With the previously stated feature extraction methods, there are several unsolved prob-

lems: first of all, EEG signal is non-linear (due to contamination on noises (determin-

istic, Gaussian white noise) and artefact (stochastic)) and non-stationary. As shown in

table 5.1, to apply previously discussed signal analysis methods such as Fourier and

wavelets, we usually assume the signal is stationary, or piece-wise stationary (STFT).

Features extracted using such conventional methods are not preserving the non-linear

or non-stationary characteristics of the original signal.Also for the transient detection

applications, conventional (classical) signal processing methods like Fourier Analysis will

not provide the correlation between an occurrence of an event and its time stamp.

In the research of Brain-computer Interface and affective computing, we are still at

the very beginning stage of fully understanding the structure of EEG signals and the un-

derlying neuro-networks associated with Emotion. In the application of understanding

human emotional expressions using Central Nervous System (CNS) (i.e., EEG), there is

68

5.2 Empirical Mode Decomposition (EMD)

no concrete evidence as which channels and what frequency ranges will be best represen-

tative in this machine learning, pattern classification problem.

The approaches stated in this chapter are in hope to answer the following two ques-

tions

1. Through the use of a novel signal processing method, we will decompose the EEG

recordings from different channels to Instantaneous frequency, and through the selection

of IMFs, we will be able to draw conclusion in what frequency range is the most affect-

specific. The challenge of frequency alignment and common scales faced by the IMFs

extracted using standard EMD were solved through the use of an extended version of

EMD, Multivariate EMD.

2. Rank the channels of significance for affect detection through the use of Genetic Al-

gorithm. After this study, we will be define the most information rich channels for affect

detection, which will provide key information for the design of consumer electronics for

affect related applications. Detailed discussion on GA were presented towards the end of

the chapter.


In recent years, as a time-frequency analysis method, Hilbert-Huang Transform (HHT)

[37], resorting the use of Instantaneous Frequency (IF) has been proven to be effective

in analyze non-linear and non-stationary signals. To compute a meaningful IF from a

multi-component signal through the construction of a Analytical Signal (AS) or Hilbert

transform, we must first reduce this multi-component signal into a collection of mono-

component functions [38]. Multi-component refers to the cases that there are multi-

extremas between two consecutive zero-crossings for a oscillating signal, which indicates

the coexisting of multiple frequency components at any given time instance. In section

5.2.2, we will see that this is the case with EEG signals (see Figure 5.3) and that EEG

69


signals are multi-component in nature. However, the elements of these coexisting fre-

quency components are also varying with time, typical for non-stationary signal. This

is fundamentally different from the Fourier based analysis methods where the amplitude

and frequency are fixed for the harmonics for the chosen time frame.

Table 5.1: Method Comparison between Fourier, Wavelet and Hilbert-Huang Transform

in Data Analysis

Methods Fourier Wavelet Hilbert-Huang

Basis Harmonics with con-

stant amplitude and

frequency for each

component

predefined basis, scal-

ing and translation

applied

Adaptive basis based

on local property,

time-varying ampli-

tude and frequency

Linearity Yes, superposition of

Harmonics

No No

Time-spectral corre-

spondence

Convolution,

occurrence-time

stamp available

Convolution, approxi-

mate time stamp

Phase Differentiation,

precise time stamp

Presentation Energy-frequency Energy-time-

frequency

Energy-time-

frequency

In this new approach [37], Empirical Mode Decomposition (EMD)was presented and

used to decomposed multicomponent signal into a set of monocomponent signals. EMD

[36] is data driven, which will be able to preserve the non-linear and non-stationary

property of the signal. Comparing to other time-frequency analysis algorithms, another

main advantage of EMD algorithm is that the decomposition algorithm does not require

a pre-defined or a priori basis like wavelet analysis or FFT. Like wavelet methods, EMD

method will decompose the original time series into a set of oscillatory modes, termed as

Intrinsic Mode Functions (IMFs), but unlike wavelet, we are not restricted to the fixed,

70


predefined set of wavelets, which could be challenge to determine which one is the most

appropriate basis for a particular analysis.

It is my objective to exam, evaluate the use of HHT and EMD methods on the a non-

linear and nonstationary signal, such as EEG signals, in the emotion (affect) detection

applications. However, because EMD is still an evolving, empirical-based signal process-

ing method and lack of theoretical base. There are many problems exist in practical

use, especially with signals such as EEG, which consists multiple channels and correlated

with neurological events. This chapter will first provide an brief explanation on the EMD

algorithm. Some of these application issues and constraints will be discussed in details

and possible solutions are presented as well.

5.2.1 Decomposition Procedure

EMD is an adaptive signal decomposition method with which any complicated signal

can be decomposed into a series of Intrinsic Mode Functions(IMFs). There are two main

criterions to meet for during this decomposition procedure as stated in the original paper

[29] :

1. For each extracted signal mode (IMF), the number of extrema and the number of

zero crossings must differ at most by one, in other words, there should have no

riding waves for each IMF.

2. The mean value of the envelopes defined by the local maxima and local minima

should be zero at any point, this indicates that the maxima and minima locates

symmetrically along the local mean (zero).

The first condition ensures the applicability of Hibert Transform on calculating phys-

ically meaningful Instantaneous Frequency [38], the application of EMD followed by

Hilbtert Transform is often referred to as the Hilbert-Huang Transform (HHT). Detail

71


procedure, also called sifting, for the EMD algorithm is shown below given a time series

x(t),

Algorithm 1. The standard EMD Algorithm

1. Find the locations of all the extrema of x′(t).

2. Interpolate (using cubic spline interpolation) between all the minima (respectively

maxima) to obtain the lower signal envelope, emin(k) (respectively higher envelope emax(k)).

3. Compute the local mean m(k) = [emin(k) + emax(k)]/2.

4. Subtract the local mean from the signal to obtain the ’oscillatory mode’ s(k) =

x′(k)−m(k).

5. If s(k) obeys the stopping criteria, then we define d(k) = s(k) as an IMF, otherwise

set x′(k) = s(k) and repeat the process from step 1.

The sifting process should stop when the residue r(k) become a constant, a monotonic

function, or a function contains only a single extrema, from which no more IMF can be

extracted. If the data has a trend, then the final residue should be the trend of the signal.

A sample of EMD decomposition outputs is shown in Figure 5.1 and Figure 5.2, the

total number of IMFs are determined by the oscillatory complexity of the original signal

and the stopping criterion.

5.2.2 Hilbert-Huang Spectrum

In [38], it is shown that in order to obtain a physical meaningful Instantaneous Frequency

(IF) through the use of analytical signal, the crucial and necessary condition is that the

signal has to be monocomponent, zero-mean locally and the wave has to be symmetric

with respect to the zero mean. In other words, the construction of analytical signal for

calculation of IF works only with mono-component signals only not complex waves with

72


sign

al

Empirical Mode Decomposition

imf1

imf2

imf3

imf4

imf5

imf6

imf7

imf8

imf9

imf1

0re

s.

Figure 5.1: Sample EMD Decomposition for Participant 2 Session 3 Negatively Excited

sign

al


imf1

imf2

imf3

imf4

imf5

imf6

imf7

imf8

imf9

imf1

0im

f11

res.

Figure 5.2: Sample EMD Decomposition for Participant 2 Session 3 Positively Excited

73

figures/class3_sub2ses3.eps

figures/class1_sub2ses3.eps


riding waves. By applying EMD before the construction of analytical signal and the

application of Hilbert transform, we are able to extract the meaningful instantaneous

frequency for the multicomponent signals. More information on instantaneous frequency

can also be found in [96].

First, let’s define instantaneous frequency, Gabor [30] was the first to introduce a

complex analytic signal, which was later employed to define the instantaneous frequency

as the time derivative of the phase of a signal by Ville [93]. For a mathematical definition,

let’s reconsider the definition of analytical signal, where the imaginary part is the Hilbert

transform x of the signal x(t). Then we can write

z(t) = x(t) + jx(t),

x(t) = x(t) ∗1

πt(5.1)

or in the exponential form

z(t) = a(t)eiΦ(t) (5.2)

where amplitude a(t) and the phase Φ(t) are defined as

a(t) =√

x2(t) + x2(t) (5.3)

therefore, the instantaneous frequency of the signal x(t) is

IF (t) =dΦ(t)

dt=

x(t) ˙x(t)− x(t)x(t)

x2(t)− x2(t)(5.4)

However, there are also challenges in the interpretation of Instantaneous frequency,

as discussed here [11].

From Figure 5.3 we see that there are multiple frequency components at a time

instance. From this plot, we can also see the 50Hz power line interference (sample data

was recorded at University of Zagreb, Department of Telecommunications, Croatia).

74


Number of samples

Inst

anta

neou

s F

requ

ency

Multi−components of EEG Signal

500 1000 1500 2000 2500

10

20

30

40

50

60

70

Figure 5.3: Hilbert Huang Spectrum for Instantaneous Frequency between 0− 70Hz

5.2.3 Multivariate EMD

EEG signals are typically multiple channels, and although we can apply EMD indepen-

dently to each channel to extract the IMFs, and potentially we could obtain the same

number of IMFs as well for each channel. There is one fundamental problem, since the

signals are from different sources (channels), the frequency content of a particular IMF

are unmatched across different channels, this will make the interpretation of a particular

number of IMFs (subset) or across different channels or trials very difficult. It will also

challenge the interpretation of the significance of the results. More details on the chal-

lenges with multivariate inputs and the misalignment of Instantaneous frequencies from

corresponding IMFs can be found here [78]. This prompt the adaption of multivariate

version of the EMD, which ensures the same number of IMFs across all multi-variables,

and also the IMFs indexing problem.

Mode alignment in multivariate data corresponds to finding a set of common scales

or modes across different components (variables) of a multivariate signal, thus ensuring

75

figures/multiple_component.eps


that the IMFs are matched both in the number and in scale properties. This is extremely

important, since we have to have the same number of features across different channels,

trials, sessions and different subjects. It is impossible to design a feature extracting

algorithm with varying length of features. The final dimension of the features will be

determined by the total number of IMFs and the specific feature extraction algorithm

we apply to each IMF. Therefore, if the number of IMFs vary from channel to channel,

from trial to trial, it is extremely hard to define a common space for all the samples.

New samples will be projected onto the feature space spanned by the features extracted

from the IMFs.

To solve this problem, an augmented matrix of size M × N is used as input to the

EMD feature extraction module, where M equals to the number of electrodes (54) for

each sample multiplies the total number of samples available and N equals to the signal

length per channel (2.5 × fs). A sliding window of size 30 with 10 overlapping is used

to calculate the IMF components for each electrode, and the IMF extraction method is

used across all desired channels (electrodes).

Algorithm 2. Multivariate extension of EMD Algorithm (MEMD)

1. Choose a suitable pointset for sampling on an (n− 1) sphere.

2. Calculate a projection, denoted by P θk(t)Tt=1, of the input signal {v(t)}Tt=1 along the

direction vector xθk , for all k (the whole set of direction vectors), giving pθk(t)Kk=1 as the

set of projections.

3. Find the time instants{

tθki

}

corresponding to the maxima of the set of projected

signals pθk(t)Kk=1.

4. Interpolate[

tθki , v(tθki )]

to obtain multivariate envelope curves eθk(t)Kk=1.

5. For a set of K direction vectors, the mean m(t) of the envelope curves is calculated as

m(t) =1

K

K∑

k=1

eθk(t) (5.5)

76

5.3 Signal Reconstruction using MEMD as a Filter Bank

6. Extract the ’detail’ d(t) using d(t) = x(t) −m(t). If the ’detail’ d(t) fulfills the stop-

page criterion for a multivariate IMF, apply the above procedure to x(t)−d(t), otherwise

apply it to d(t).

5.3 Signal Reconstruction using MEMD as a Filter

Bank

It has been shown that EMD behaves as a filter band by Flandrin et.al in 2004[29],

lately Rehman [92] also studied the filter bank property of MEMD on EEG signal. The

filtering process is carried out based on the Instantaneous Frequency of the signal at

each time instance, without the requirement of any information at other time instances.

For a multiple component signal where multiple intrinsic frequencies super-imposed at

each time instance, through the use of EMD, signal is decomposed into a series of IMFs,

each with its own instantaneous frequency. This is very different from the conventional

filtering process, since the filtered result is adaptive, and is able to signal components

that are over-lapping both in time and frequency. This is mainly due to the fact that the

result is not influenced by a set of a priori basis and no convolution procedure required.

Other methods, such as wavelet analysis, a predetermined basis, is convolved with the

multiple component signal in time and to re-fine the frequency resolution, the ’mother’

wavelet is scaled at each decomposition level (expand by a factor of α in time will result

a 1αchange is the frequency domain).

MEMD is an analysis method that in many aspects gives a better understanding of

the physics behind the signals [38]. Because of its ability to describe short time changes

in frequencies that cannot be resolved by Fourier spectral analysis, it can be used for

nonlinear and nonstationary time series analysis. Each extracted signal admits well-

77


100

101

102

10−8

10−6

10−4

10−2

Frequency

PS

D in

Log

Sca

le

IMFs obtained using Regular EMD

Figure 5.4: The filter bank property of Regular EMD

100

101

102

10−8

10−6

10−4

10−2

Frequency

PS

D in

Log

sca

le

IMFs obtained using Multivariate EMD

Figure 5.5: The filter bank property of MEMD

78

figures/filter_bank_1.eps

figures/filter_bank_2.eps


defined instantaneous frequency (see Figure 5.6). Due to the above stated properties,

MEMD can be used effectively as a filter bank to extract frequency components of interest

for EEG signal analysis. As stated in Section 3.2, physiological signals like EEG is very

noisy and denoising such signals is one of the most important step. Effective filtering

process will enable better understanding of the underlying physiological process and the

intrinsic characteristics or sources more accessible.

5 10 15 20

0.20.40.6

Inst Freq: 166.15

5 10 15 20

0.20.40.6

Inst Freq: 107.17

5 10 15 20

0.20.40.6

Inst Freq: 64.12

5 10 15 20

1

2

Instantaneous Amplitude and Averaged Frequency of the IMFs

Inst Freq: 25.05

5 10 15 20

0.20.4

Inst Freq: 14.93

5 10 15 20

0.20.40.6

Inst Freq: 8.67

5 10 15 20

0.20.4

Inst Freq: 6.95

5 10 15 200

0.20.40.6

Inst Freq: 5.51

5 10 15 20

0.20.40.6

Inst Freq: 4.81

5 10 15 20

0.10.20.3

Inst Freq: 4.52

5 10 15 20

0.20.4

Time

Inst Freq: 8.09

Figure 5.6: Instantaneous Amplitude and Averaged Frequency of the IMFs

We now consider a unique reconstruction method based on the Hilbert-Huang spec-

trum which we refer to as Hilbert-Huang (HH) reconstruction. In recent paper, Tomasz

Rutkowski [82] suggested to select segments of IMFs based on its instantaneous frequency

79

figures/imfs_InstFs.eps


components, shown on the Hilbert Spectrum. Typically, we have a band of signal of in-

terest, and also a margin of , for example, 10% is used, then we label the time intervals

of such occurrence in the time domain to reconstruct the EEG signals, or to isolate the

regions of interest of the IMFs for further analysis.

Given a signal d(k), we propose to remove any unwanted frequency information and

construct a signal, d(k),that retains only desired frequency characteristics from d(k).This

is achieved by first decomposing d(k) into a set of N IMFs, ci(k), and determining the

instantaneous frequencies. fi(k) denotes the instantaneous frequency of the ith IMF at

time instant k. Given the scenario where it is required to retain frequencies greater that

flow and lower than fhigh, we have

ci(k) =

ci(k), if flow < fi(k) < fhigh

0 otherwise(5.6)

Essentially all values of ci(k) that do not fall within the desired frequency range are

set to zero. We can construct d(k) by summation of the IMF values that fall within the

desired frequency range, to obtain

d(k) =

N∑

i=1

ci(k) (5.7)

In this thesis work, one of the objectives is to exam the use of MEMD as a filter bank to

eliminate effects from noises and artefacts, has been effective, however, by simply sum-

ming up IMF components that within the desired frequency band, we might potentially

cause discontinuity or spurious effect in the reconstructed EEG. Another solution has

been proposed by [61], where a weight matrix is used to optimize the selection of IMF

components and maintain the continuity of the background components.

5.3.1 MEMD for Feature Extraction

In the paper by [72], Petrantonakis presented the relation between IMFs and felicitated

emotion presented in the EEG signal. In his study, by using Genetic Algorithm and the

80

5.4 Genetic Algorithm

Initial Signal

MEMD-IMFs Extraction

IMFs Selection

Signal Reconstruction

StatisticalNarrow-

bandwavelet HOC

ME

MD

Fil

teri

ng

Classification

Affect Recognition Rate

Fe

atu

re A

na

lysis

0

0

1

1

0

1

0

0

0

0

0

0

0

LDA, kNN

Feature Vector

Pos. exited, Neg. excited, Neutral

Figure 5.7: MEMD for Signal Reconstruction and Feature Analysis

Fractal dimension fitness function stated here [73], 3 − 5 IMFs were selected for each

channel. We are adapting the feature extraction method presented in [72] for our own

analysis, which will be discussed in Chapter 6.


Due to the large number of IMFs produced after the decomposition along with a large

number of EEG channels present, there is much redundancy in the features directly

extracted from all IMFs. There are two problems present due to the high dimension of

81

figures/MEMD.eps


feature space, first of all, it is the computation complexity and secondly, a very large

number of samples are required to produce a meaningful statistical model [23]. Since we

have a small number of observations comparing to the dimension of the features, also

the uncertainty of which electrodes provides more information in terms of discriminating

one emotion class from the other. Generic Algorithm is applied to reduce the number of

channels and the number of IMFs for feature extraction analysis. The objective here is

to use GA on the collection features in hopes to reduce the feature dimension and also

discover the main class-specific features, to boost classification performance.

Initial

Population

EEG

Features

Continue

Evolution?

Optimal

Features

Fitness

Calculation

Crossover

Mutation

Updated

Population

No

Yes

0

0

1

1

0

1

0

0

0

0

0

0

0

Figure 5.8: The block diagram for Genetic Algorithm

Generic algorism is a non-ranking, global optimization algorithm that was introduced

in [31]. The optimization process is mimicking the natural selection from a large popula-

tion members. It iteratively modifies (mutates) a population of individuals (variables of

the features space) to maximize a fitness criterion. At each step, genetic algorithm tries

82

figures/GAflowchart2.eps


to select the best individuals, which will be used to allow the generation of offsprings

(with set crossover rates). Over successive generations, the generation evolves towards

an optimal solution. The algorithm terminates when maximum generation is reached.

The selection of maximum generation is typically chosen when the fitness criterion (or

performance0 stabilizes.

There are mainly three parameters that governs the process at each step (see Figure.

5.8):

1. Select the individuals (parents) to generate next generation

2. Crossover rules applied to combine two parents to form children for the next genera-

tion

3. Mutation rules apply random changes to individual parents to form children

Related papers [86, 89] have shown that genetic algorithms, from the nonranking

group, can be successfully applied to feature selection. When using ranking methods such

as Principle Component Analysis (PCA) the chosen feature vectors can contain features

that are correlated with each other and at the same time do not bring in significant new

information for the classifier. When use the correct classification rate as a fitness measure,

Genetic algorithm will choose features that are most representative to the class labels.

Another major advantages of Genetic Algorithm is the fact that we are able to represent

our features and electrodes in binary form to for feature reduction, which differs greatly

from the conventional, ranking based methods such as Principle Component Analysis

(PCA) or Linear Discriminate Analysis (LDA).

On the other hand, the main disadvantage is the fact the fitness criterion is calculated

based on a large number of samples, population at each generation times the number of

generations, which can take a long time to compute before a result can be obtained. Also

due to the randomness of mutation and crossover at each generation, each run of genetic

algorithm creates slightly different set of features. Since the features are not ranked, in

83


general, we do not know which feature is more significant in the classification process

later on. Therefore, to elevate the above two shortcomings and to learn the importance of

each features, multiple runs of genetic algorithm is proposed and the features with higher

frequencies in appearance are considered to be more significant and are selected in the

channel reduction process. It also enables us to determine where to place the electrodes

and which frequencies of EEG spectrum are the most important.

5.4.1 Fitness Functions

Fitness function as a selection criterion is to conserve some of the characteristics of EEG

signal in the feature reduction process. Previously, methods to best preserve the char-

acteristics of EEG signals, such as the energy (event-related potential) and complexity

(Fractal Dimension) have been used in the literature. One of the promising fitness func-

tion is based on the fractal dimension, which is a measure of the irregularity of a curve.

For example, fractal dimension method has been used in the analysis of epileptic seizure

using EEG [75]. However, such selections of fitness function gives no consideration of

the association of features with class labels or anything beyond each individual samples.

Since the objective of my study is to find the most discriminating channels and frequency

range, a different fitness function has be to be considered. Here the correct classification

rate is used as a fitness measure for the GA.

The aim of the genetic algorithm is to maximize the fitness function. After the IMFs

have been chosen, with the inverse operation of the EMD algorithm, a new EEG signal

is constructed using only the selected IMFs. Figure 5.8 demonstrates the GA feature

selection process in an EEG signal which corresponds to the emotion of positively excited.

84

5.5 Summary

5.5 Summary

In this chapter, a novel time-frequency signal processing method, Hilbert-Huang Trans-

form (HHT), was introduced. This method is useful in calculating the instantaneous

frequencies of a non-linear and non-stationary signal (e.g., EEG signals) through the

combined use of Hilbert transform and Empirical Mode Decomposition (EMD). Like

wavelet methods, EMD method will decompose the original time series into a set of oscil-

latory modes, termed as Intrinsic Mode Functions (IMFs), but unlike wavelet, we are not

restricted to the fixed, predefined set of wavelets, which could be challenge to determine

which one is the most appropriate basis for a particular analysis. HHT is completely

data driven and does not assume the signal to be stationary or piece-wise stationary as

required by other conventional signal processing methods (e.g., Fourier based methods)

that are violated by most biomedical signals.

To have an unified understanding on the individual modes and frequency scales across

all EEG channels, an extended version of Empirical Mode Decomposition (MEMD) were

investigated and applied to all channels of EEG. MEMD algorithm was further applied

as a filter bank to extract EEG components of interest (Alpha and Beta waves) and used

to reconstruct the EEG signals that are used for feature analysis in Chapter 6.

To reduce the computational complexity of such EEG analysis system and also to

better understand the association of EEG channels and affect expression, Genetic Al-

gorithm (GA) is researched and applied to extract the most affect-specific channels out

of a much larger number of channels (54). Genetic Algorithm is a non-ranking global

optimization method that maximize a set optimization criterion, for this study, we have

chosen the correct classification rate as a fitness function. Its effectiveness will be further

discussed in Chapter 6 through the use of a set of simulation results.

85

Chapter 6

Experimental Setup and Simulation

Results

6.1 Introduction

In this chapter, a series of simulation testings on emotion (affect) detection using EEG

signals are presented and discussed. The focus of the simulation testings are multiple

folds. Due to the complex nature of human emotions and the non-unique mapping

between the emotion expressions and the CNS signals, four features analysis algorithms

that emphasize on either the event-related potential or oscillation pattern variation of

the EEG signal are implemented and tested for better understanding of emotion-specific

characteristics of EEG signals. A publicly available EEG-Emotion (3 Classes) dataset

will be used to evaluate the efficacy of these features. The experimental protocol and

related components will be discussed and evaluated in order to determine the appropriate

simulation parameters. Through the use of this dataset, classification performance using

these features will be presented and discussed towards the end of the chapter along with

application limitations.

Given the large number of electrodes present in many research work, augmented fea-

86

6.1 Introduction

3 Emotions* No. of trials *54 channels

(30 trials for Pos, Neu. Neg. per session)

Raw EEG

Signal

Multivariate EMD

Genetic Algorithm

Channel selection

& IMF selection

Classification:

kNN, LDA

Feature Analysis

Best

Performance?

Emotions

IMF selection

Classification:

kNN, LDA

Feature Analysis

Emotions

EEG Reconstruction

EEG Reconstruction

Figure 6.1: Experimental Components used to for Simulation

87

figures/signal_pipeline2.eps

6.2 Data Collection Protocol

tures directly extracted from each channel provide much redundancy in discriminating

information and are also unnecessarily high in dimension. The high dimension of re-

sulted feature space will require a very large number of samples to produce a meaningful

statistical model [23] which pose great challenge in the experimental process. To solve

this problem, generic algorithm, as a global optimization algorithm, is applied to sys-

tematically reduce the number of channels and the number of IMFs required for feature

construction. The outcome of this study provides two key information in understand-

ing the relationship between emotion and brain waves, one is the location of the most

emotion-specific channels (discriminating power) and the frequency range of the emotion-

specific brain waves (Instantaneous Frequency analysis of the IMFs). This will provide us

with a mean to compare and validate the research findings from the Psychophysiological

literature and aid the study on understanding of human emotion. Practical constraints

such as the minimum required window length in time domain and the edge effect of

windowing operation will also be discussed.


The choice of emotions to be detected is largely dependent on the specific application. For

example, detecting disgust is not as important as frustration in the learning process, and

distinguishing positive feelings from the negative ones is sufficient enough as a feedback

to the service a customer received. In the context of multimedia content indexing and

retrieving, a larger group of emotion labels will be needed to provide fast and meaningful

information.

For this study, we have focused our work on the recognition of three affective states

for the following reasons. According to the circumplex model of emotion (see Section 2.2)

and findings in the Psychology literature [3, 58, 22], high arousal can be interrelated as

highly motivated, high valence means the current situation is pleasant and approachable,

88


whereas low valence is unpleasant, avoidable. Hence, in the domain of learning, decision

making and behavior monitoring, there are three emotional states that play the most

significant role. We would like to know when the person is happy (’positively excited’),

or frustrated (’negatively excited’), or bored (’calm’). Therefore, these three affect states

will provide critical and helpful information for the affect-sensitive applications shown in

section 1.3.

Excited

Negative Positive

Calm

ACTIVATION

DEACTIVATION

PLEASANTUNPLEASANT

tense

nervous

stressed

upset

sad

depressed

lethargic

Fatiguedcalm

relaxed

serene

contented

happy

elated

excited

alert

Figure 6.2: The Three Emotion Classes Studied in This Project

To test the performance of the designed system, we make use of a publicly avail-

able database (Emobrain) that was recorded during eNTERFACE06 workshop and was

specifically designed for detecting the above three affective states. The eNTERFACE06-

EMOBRAIN database [2] was collected by a research group during a summer workshop

on multimodal interfaces in 2006, at University of Zagreb, Dubrovnik, Croatia. The ob-

jective of the database was to provide a common framework for emotion assessment from

multimodal physiological signals. It consists emotionally-driven physiologically signals

from both the peripheral (galvanic skin response, respiration and blood volume pressure)

and central nervous system (EEG and frontal fNIRS). Since our study is focused only on

brainwaves, we have only used the EEG recordings. More details can be found in [85].

EEG data were collected from 5 participants, aged from 22-38, for three different

89

figures/3EClasses_faces.eps

figures/3EClasses.eps


Figure 6.3: Protocol description for eNTERFACE06-EMOBRAIN database

sessions with 30 trials per session. The experimental protocol is detailed in Fig. 6.3. For

each session, participants were stimulated using images selected from the International

Affective Picture System (IAPS) [56]. The images were divided into three categories

: exciting negative, neutral and exciting positive based on their valence and arousal

scores shown in Table 6.2. These thresholds were imperially defined according to the

circumplex model shown in section 2.2.2.Each trial consists of a block of five images

selected for the same affect class, this to insure stability of the emotion over time. Each

picture was displayed on the screen for 2.5 seconds leading to a total of 12.5 seconds per

block. Blocks of different classes were displayed in random order to avoid participant

habituation. The total number of observations obtained was 5× 3× 30 = 450.

calm : arousal < 4;< 4valence < 6

positive exciting : valence > 6.8;

V ar(valence) < 2;

arousal > 5

negative exciting : valence < 3; arousal > 5

This selection resulted in 106, 71, and 150 pictures respectively for the above three

90

figures/database.eps


classes and shown in Fig. 6.4.

1 2 3 4 5 6 7 8 91

2

3

4

5

6

7

8

9

Arousal Scores

Val

ence

Sco

res

Postively Excited

Negatively Excited

Calm

Figure 6.4: Selected IAPS images for the 3 classes emotion elicitation experiment

6.2.1 Recording Device: Biosemi Active 2

EEG signals were recorded using the Biosemi Active II system (shown in Figure 6.5)

which has 64 surface electrodes sampled at 1024 Hz. Recorded data is saved in EDF

format, so it can be converted other formats such as csv or text files. However due to

the parallel recording of fNIRS signals, ten frontal electrodes were removed and the final

EEG recordings were only consists of signals from 54 electrodes. (64 less the following

following ten frontal electrodes: F5, F8, AF7, AF8, AFz, Fp1, Fp2, Fpz, F7, F6, due

to the simultaneous placements of fNIR sensors).

91

figures/IAPS_Classes_EEG.eps


(a) Sensor layout (b) Biosemi Active Two

Figure 6.5: Biosemi Active Two

6.2.2 Ground Truth Definition

Emotion is known to be very subjective and dependent on social context and previous

experience [80]. However, emotion consistency across participants is important in de-

signing a generalized emotion recognition system. One can never be sure that the person

feels the emotion that was intended by the pictures; and a self-assessment gives a good

estimation whether the pictures evoked similar emotions among participants.

In order to deal with the problem of emotion consistency across participants (i.e.,

two subjects may experience a very different feeling during the same stimulus) partici-

pants were also asked to self-assess their emotions on a simplified version of the SAM

(Self-Assessment Manikin) scale. Defining a ground-truth for the purpose of emotion

assessment strongly depends on the protocol used to record emotional reactions. Since

self-evaluations were also collected, the ground-truth can be defined either based on the

classes defined by the IAPS evaluations, or using the self-evaluations.The ground-truth

analysis can be found in the original database paper, see reference [64]. Self-assessment of

the images is a good way to estimated whether desired emotion have been induced from

92

figures/biosemi_sensor_layout.eps

figures/biosemi.eps


the subject, also to have an idea about the emotion stimulation ”level” of the subject.

For this experiment, participants were asked to rank their emotions on the scale of 1-5

how the arousal or valence components have been.[64] on a simplified version of the SAM

(Self-Assessment Manikin) scale. Since the SAM scores were obtained after the projection

of 5 images taken from the same class, a new set of IAPS scores were computed as the

mean value of the IAPS scores for the 5 images in that trial.

6.2.3 Ground Truth Validation Using Pearson Correlation Co-

efficients

In order to understand the relationship between the self assessments and the IAPS

scores, we calculated the Pearson correlation coefficients for the two variables, valence

and arousal. The averaged Pearson correlation coefficient between the IAPS scores and

SAM scores is .754 for the valence dimension, and .817 for the arousal dimension. These

values show that the correspondence between the expected emotion and the experienced

emotion is very good, and that the images do evoke the desired emotion most of the

times.

Table 6.1: Pearson correlation coefficient between IAPS scores and self assessments per

participant

Correlation Valence Correlation Arousal

P1 0.6994 0.9387

P2 0.6286 0.8628

P3 0.6550 0.6583

P4 0.9816 0.8533

P5 0.8057 0.7716

Further examination of the self assessment inputs from Participant 2 and Participant

93


5 has shown that their inputs in each trial for the valence and arousal are mostly equal,

and have a variance around 1 between the two variables. However, it might be an very

hard task to quantize one’s emotion according to the 2D model, for example, some persons

tends to give extreme scores where others always choose the center. This does not mean

that the participants didn’t experience the emotion, it simply implies that he/she was

having trouble to ’express’ the emotion (which is indicated with the correct recognition

rate, shows the existence of an emotion state). Also from the application point of view,

self-assessment inputs may be not always possible to obtain, such as people with autism

disorder. Therefore, due to the above reasons, we used the IAPS scores for the final

labeling of the data.

6.2.4 Ground Truth Validation Using Confusion Matrix

Through the use of self-assessed values, to validate the collected EEG database we ob-

tained the confusion matrix for each participants using the IAPS labels for each sample

against those obtained from self-assessment. Which is shown below:

We compared the averaged self-Assessment scores obtained from all 5 participants

against the IAPS inputs and the results are shown in the confusion matrix 6.2.

Table 6.2: Averaged Self-Assessment Classification Accuracy (in Percentage) for the

Three Chosen Emotions

Emotions Calm Pos. Excited Neg. Excited

Calm 73.43 23.81 2.76

Pos. Excited 36.96 60.18 2.86

Neg. Excited 18.91 0.71 80.38

94

6.3 Feature Extraction

6.3 Feature Extraction

In the recognition rate presented in the literature of affect detection, due to the large

variation in the experimental protocol, there is no clear evidence as which characteristics

of EEG that is best representable in affect detection. However, in general, the approaches

are all focused on the Event Related Potential (EDR) and oscillation pattern variation.

In order to best exam the effectiveness of each feature extraction method and have a

fair comparison. Several state-of-art feature extraction algorithms, refer to section 4.2

for more details, were implemented and used to construct the feature sets for affect

classification.

To extract the HOC features, we further examined the parameter L (order) by setting

L = 3−30 and the optimal L (highest classification rate) value was used to for simulations

in the later sections:

0 5 10 15 20 25 300.8

0.82

0.84

0.86

0.88

0.9

0.92

0.94

Order of HOC

Cor

rect

Rec

ogni

tion

Rat

e

Subject Independent Recognition Rate using HOC and 54 Channels

Figure 6.6: HOC Order vs. Correct Recognition Rate (54 Channels)

As is shown on the plot, L = 9 provides the highest correct classification rate and

this finding is consistent in findings shown in [71]. For the rest of the analysis using HOC

features, we set the order value to 9 for further analysis.

Each recorded sample is 2.5 seconds long and was recorded during the image projec-

95

figures/order_RegRate_54.eps

6.4 Data Splitting and K Cross Validation

0 5 10 15 20 25 300.8

0.82

0.84

0.86

0.88

0.9

0.92

0.94

The order of HOC

Cor

rect

Rec

ogni

tion

Rat

e

Subject Independent Recognition Rate using HOC and 6 Channels

Figure 6.7: HOC Order vs. Correct Recognition Rate (6 Channels)

tion period for one of the three affect states. For each sample, four sets of features were

generated on the MEMD-reconstructed EEG signal of each channel. Table ?? shows the

dimension of each feature type where Ne is the number of channels used for each analysis.

Table 6.3: The features extracted from the EEG signals

Type of features Size Reference

Statistical fst 6×Ne 4.2.1.1

Narrow-band Energy 11×Ne 4.2.2.1

Higher Order Crossings 9×Ne 4.2.1.2

Wavelet Energy and Entropy 2× 3×Ne 4.2.3.1


For the selected Emobrain database, there were 30 trials for each session and three

sessions in total for each subject. Table 6.4 shows the total number of samples used for

96

figures/order_RegRate_6.eps


this study. Samples from subject 1 session 1 were excluded due to different sampling rate

and also the missing inputs of IAPS images used for this session, the image listing file

associated with this recording session was a duplicate of session 2, which raises question

on the legitimacy of these samples. Samples from subject 2 session 1 were also excluded

due to inconsistency in recording setting comparing to other trials.

Table 6.4: Overview of the dataset

subject Used sessions Total trials Pos. Trials Neg. Trials

S1 sess2,3 300 100 100

S2 sess2,3 300 100 100

S3 sess1,2,3 450 150 150

S4 sess1,2,3 450 150 150

S5 sess1,2,3 450 150 150

In this supervised machine learning process, as shown in the system diagram ??, there

are two stages (training and testing) involved to obtain the desired statistical model for

such classification purpose. The training samples were used to fit the model using a

fitness criterion (e.g., minimizing the probability of error). The testing samples were

used to test how well this model holds with un-seen, new samples. There are a few

ways to obtain the training and testing samples. Ideally, the testing samples should be

recorded during a testing session at which the developed model are targeted to be used.

However, such multiple-session recordings at different time or location can cause severe

data inconsistency (e.g., through hardware setup) that are not relevant in the model

development or un-necessary in the preliminary design. One way to get around this is

to divide randomly the existing dataset into two disjoint groups for training and testing

purposes and the testing data should be used only once to test the model. This is usually

referred to as hold-out test.

97

6.5 Simulation Results

6.4.1 k-Fold Cross Validation

Holdout testing provides an unbiased measure of performance,however, when the deal-

ing with a small data set, k-fold cross validation process can be helpful. K-fold cross-

validation builds on the idea of holdout testing by rotating data through the process.

Data is again divided randomly into groups, but now k equal-sized groups are used. The

train-test process is repeated k times, each time leaving a different segment of the data

out, as the test set.

Typical K values are from 5 to 10, with a common choice for k is 10 resulting in 10-

fold cross-validation. In 10-fold cross-validation, the observations are randomly assigned

to 10 groups. Ten separate models are built and tested on distinct data segments. The

resulting 10 performance measures are unbiased since none of them was built with test

data that was used during training. The single, final performance measurement is taken

as the mean of these 10 performance measures.

For our study, due to the small dataset constraint, k-fold cross validation process was

used to obtain the simulation results shown in the sections below.


This section will provide the simulation results on affect detection using EEG. In general,

two sets of results are presented here, the subject-specific recognition rates and the non-

subject specific recognition rates. Subject-specific refers to the cases that the training

and testing samples are from the same subject, with a unseen portion of the samples

(testing) used for validate the overall performance. Cross-subject refer to the case that

observations from different subjects over various sessions were concatenated into one

augmented matrix and a portion of this matrix (observations) were used as training

samples and the remaining portion was used for testing purposes following the details

stated in the K-fold validation process. Also the recognition rates were generated using

98


features without feature reduction.

6.5.1 Simulation Results Using All Channels

Since the EEG feature sets, are of very high dimensionality (thousands of features) com-

pared to the number of samples in the sets (450 per trial or 2250 per image, depending

on the classification scheme), it is believed there is always a linear boundary that can

completely separate training samples of the different classes. Another advantage of us-

ing linear classifiers is that they give better generalized solutions. However, in the high

dimensional EEG feature space, a exponentially larger sample size is required to have a

meaningful statistical analysis [23]. Linear classifiers such as Linear Discriminating Anal-

ysis (LDA) on a small sample size would run into the singularity problem and feature

reduction mechanisms need to be used to solve the sparsity of the samples in the high

dimensional feature space. The choice of feature reduction methods can greatly affect

the recognition rate later on, which makes it harder to compare the effectiveness of the

selected features on representing the variations of EEG signals under emotion stimuli.

For the first stage of simulation testing, we opted for a non-linear classification method

without feature reduction to avoid error propagation due to such feature reduction meth-

ods. In the case when the feature dimension is comparable to the number of samples

available, LDA was also applied to test the feasibility of a linear classifier.

K Nearest Neighbors (kNN) classifier with Euclidean distance as a distance metric

was used here, and odd number of neighbors 1, 3, 5, 7, 9 were picked to avoid ties, how-

ever, similar results would be obtained with even numbers when majority rule was used

for nearest points tie breaking. kNN is an instance based method. By increasing the

number of neighbors, the effect of artefacts are reduced within classes, but the class

boundary between classes are also enlarged, which could potentially degrade the classifi-

cation performance. Therefore, the final recognition performance is most depends on the

class separation in the feature space.

99


5-fold cross validation process was carried out to test the robustness of our system

and also to overcome the sample size problem, at each folder, 80% of the samples were

used for training and 20% of the samples for testing. Simulation results shown in Table

6.5 were generated using both kNN and LDA classifier on the specified features without

applying feature reduction algorithms.

6.5.1.1 Subject-Specific Emotion Recognition

The following table shows recognition rate using all the Electrodes.

Table 6.5: Emotion Recognition rates using ALL 54 electrodes and 5NN

Participants Statistical Narrow-bands Power HOC Wavelet

P1 87.67 82.67 96.33 62.67

P2 88.67 90.00 97.00 91.67

P3 86.22 84.22 93.33 75.11

P4 88.00 93.33 93.33 88.224

P5 83.56 88.22 97.78 76.67

Based on the above results, we performed a sensitivity test on the kNN classifier. We

examined how well kNN performs with varying number of Neighbours, K = 1, 3, 5, 7, 9,

using HOC features.

As we can see, the recognition rate is rather stable with varying number of neighbours,

this shows that the three emotion classes were well separated in the feature space and a

linear classifier is suitable for this classification as well.

100


1 3 5 7 90

10

20

30

40

50

60

70

80

90

100

Number of neighbors for KNN

Cor

rect

Rec

ogni

tion

Rat

e

Recognition Rate for Each Participant Using HOC

Figure 6.8: Recognition Rate using HOC features for K = 1,3,5,7,9

Table 6.6: Cross-Subject Emotion Recognition rates using ALL 54 electrodes

Classifier Statistical Narrow-bands Power HOC Wavelet

5NN 81.39 82.62 90.77 77.44

LDA 59.18 63.49 79.64 55.90

6.5.1.2 Cross-Subject Emotion Recognition

6.5.2 Simulation Results with Channel Reduction

In the part of the testing, we would like to find out which channels pertain high class-

related discriminating powers.

6.5.2.1 Channel Reduction in Reference to Commercial Devices

One of the objectives of this project is to investigate the feasibility of commercially

available EEG recording devices for emotion analysis. The EEG recordings from the

eNTERFACE06 project was collected using Biosemi Active II, a EEG cap that has 64

different channels and was designed for medical applications. Emotive epoch is a com-

101

figures/hocKNN.eps


mercially available, wireless neuro headset with 14 EEG channels. As is shown in Table

6.7, the Emotiv Software Development Kit (SDK) for research includes a 14 channel

(plus CMS/DRL references, P3/P4 locations), high resolution and provides wirelessly

neuro-signal acquisition and processing.

Table 6.7: Device Specifications

Device Biosemi Active 2 Emotive EPOC SDK

Data Format EDF MAT

Resolution 24 bits ADC 16 bits (14 bits effective)

Sampling Rate 1024Hz 128 SPS (2018 Hz internal)

Channels 64 14

Channels in common AF3, F7, F3, FC5, FC6, F4, F8, AF4

However, because both caps use a standardized electrode placement according to

the 10-20 system, by comparing the location of the electrodes, we were able select a

set of EEG channels that are present in both device (shown in Figure 6.9), and obtain

classification results using features extracted from the common channels. Table 6.8shows

the recognition rate with EEG channels that are present in both devices.

We carried the simulation testing using reduced number of electrodes that are common

in our data set and also present in in the commercial headset as listed above. We used

the same simulation parameters for feature extraction and classifier design. The subject-

specific emotion detection performance is listed below.

We further tested the classification performance with the cross-subject case.

The correct recognition rate using KNN classifier shown in Table 6.9 are near identical

to the results in Table 6.6 with significantly reduced number of electrodes. These results

provided evidence on the feasibility of consumer grade headsets for real-time, emotion

recognition in mobile applications. However, the recognition rate using LDA decreases

dramatically, this indicates that the samples of the reduced channels are not linearly

102


Figure 6.9: 6 Channels referenced to Emotive Epoch

FPzFP2

AF4AF8

AFzAF3

FP1

AF7

Fz F2 F4F8

F6F1F3F5

F7

FCz FC2 FC4 FC6FC8

FC1FC3FC5FC7

Cz C2 C4 C6 T8T7


C5 C3 C1

CP5 CP3 CP1

Pz P2 P4 P6P8

P10

P1P3P5

P7

P9 POzPO3

PO7PO4

PO8

CzC2C1

CMS DRL

Iz

Table 6.8: Subject Specific Recognition rates using 8 electrodes and 5NN

Participants Statistical Narrow-bands Power HOC Wavelet

P1 86.33 84.33 93.67 61.67

P2 86.00 89.33 98.33 89.67

P3 83.33 86.44 95.11 74.89

P4 88.67 90.44 97.33 88.00

P5 83.33 86.44 96.89 78.89

Table 6.9: Cross-subject Recognition rate using only 8 electrodes

Classifier Statistical Narrow-bands Power HOC Wavelet

5NN 68.15 78.15 89.64 58.87

LDA 38.77 39.90 43.13 37.74

separable in the projected feature space.

103

figures/Emotive_select_54_6.eps


6.5.2.2 Channel Reduction Using Genetic Algorithm

As it has been mentioned in section 5.4, we used a genetic algorithm for feature (here

refers to the channels) selection. The starting population consisted of 1000 individuals

and each of them contained a randomly generated binary string of the length of the total

number of channels. Next operations of mutation and crossover were performed (with

some selected probability). In this way, exchange of genes was realized and what follows

exchange of features. Only the best adapted individuals passed to the next step of the

algorithm. In order to verify which individuals were the best adapted a fitness function

(shown in section 5.4.1 was used, which trained the classifier and next classified the

data and returned the correct classification rate. For classification Linear Discriminant

Analysis and K Nearest Neighbours was used. The fitness function performed 10 fold

cross validation test and returned percentage of the correct classification rate as fitness

measure. The algorithm terminates when maximum number of generation was reached,

200 generations were calculated here (6.10). The GA process is shown in Figure 6.10

with the correct classification performance for a single launch of the genetic algorithm.

The algorithm stabilized after about 100 generations. Thus the algorithm can be stopped

after reaching about 100 generations which means that the time of its operation will be

shortened. Additionally, the execution time of GA can be shortened by modifying the

selected probabilities of mutation and crossover operations. However, the time required

by GA for feature selection is fairly long in practice, so GA is rather suitable for the

analysis of data in off-line mode.

Due the randomness of crossover and mutation process, each run of genetic algorithm

may result in selection of different set of features. So it is very important to determine

which of these features (and the same time which channels) bring important information

to the classification process 6.11. To solve this problem genetic algorithm was launched

ten times and next the selected features were compared. Features with the most repeating

frequency were selected as the final set of features. Results can be seen in Figure 6.11.

104


0 10 20 30 40 50 60 70 80 90 1000.35

0.4

0.45

0.5

0.55

0.6

0.65

0.7

Generation

Fitn

ess

Maximum and Average Fitness

Average FitnessMax Fitness

Figure 6.10: Averaged and Maximum Fitness (correct recognition rate) in each generation

using LDA and GA

After this process, we will obtain the following important information as which re-

gion or location of electrodes provide more class-related discriminating information. To

further apply our findings here, we can compare the similarity between the locations of

the reduced number of electrodes with those of the commercially available devices such

Emotive. Comparing to the results we obtained in this study, we can draw such conclu-

sion as whether the commercial device will or will not be feasible for emotion analysis

applications.

The following are the channels selected after 10 runs of genetic algorithm. Channels

of 6 and 10 were presented here.

Next we tested the system performance using the channels selected using GA algo-

rithm

105

figures/54_HOC_may27.eps


5 10 15 20 25 30 35 40 45 500

10

20

30

40

50

60

70

80

90

100

Channel Index

Fre

quen

cy o

f App

eare

nce

Channel selection using Genetic Algorithm

Channel index60%65%70%

Figure 6.11: Channels selected through Genetic Algorithm

FPzFP2

AF4AF8

AFzAF3

FP1

AF7

Fz F2 F4F8

F6F1F3F5

F7

FCz FC2 FC4 FC6FC8

FC1FC3FC5FC7

Cz C2 C4 C6 T8T7


C5 C3 C1

CP5 CP3 CP1

Pz P2 P4 P6P8

P10

P1P3P5

P7

P9 POzPO3

PO7PO4

PO8

OzO2O1

CMS DRL

Iz

(a) 8 Channels Selected using GA

FPzFP2

AF4AF8

AFzAF3

FP1

AF7

Fz F2 F4F8

F6F1F3F5

F7

FCz FC2 FC4 FC6FC8

FC1FC3FC5FC7

Cz C2 C4 C6 T8T7


C5 C3 C1

CP5 CP3 CP1

Pz P2 P4 P6P8

P10

P1P3P5

P7

P9 POzPO3

PO7PO4

PO8

OzO2O1

CMS DRL

Iz

(b) 14 Channels Selected using GA

Figure 6.12: Channel of significance obtained through Genetic Algorithm

106

figures/GA_channel.eps

figures/GA_select_54_8.eps

figures/GA_select_54_14.eps

6.6 Sensitivity Testing

Table 6.10: Channels selected using GA algorithm

Frequency of

Appearance

Number of

Channels

Channels Selected

≥ 60% 18 Fc7, Fc5, Fc3, Cp4, Fz,

Cz, P1, Af4, F4, P2, Fc4,

Cp4, Cp6, Fc8, Iz, O1,

Ft7, Ft8

> 65% 8 Fc3, Fc5, Fc2, Fc4, F4,

AF4, CP4, Cp6

> 70% 6 Fc3, Fc5, Fc2, Fc4, F4,

AF4,

Table 6.11: Emotion Recognition rates using electrodes selected using GA

Number of Channels Classifier Statistical Narrow-bands Power HOC Wavelet

185NN 80.97 83.08 90.21 72.10

LDA 48.92 51.13 62.05 48.31

105NN 79.74 82.82 88.77 69.54

LDA 43.03 46.46 56.31 45.18

65NN 76.46 81.28 89.79 67.85

LDA 40.82 44.87 53.21 41.28


The aim of this part of the simulation testing is to evaluate the influence of a few pa-

rameters on the final classification performance. The findings here will provide some

insights in setting the parameters for a specific affect detection system. The parameters

are selected based on their relevance in a practical affect detection system which has con-

107


Table 6.12: Recognition rates of emotion using channels selected by GA

Method Number of Channels

All 54

channels

8 - Referenced to

commercial device

Channel selected using GA

18 10 6

5NN 90.77 89.64 89.23 88.77 89.79

LDA 79.64 43.13 62.05 56.31 53.21

straints in processing power and processing time. Down-sampling is a common practice

to reduce the overall computational complexity to the system, however the consequence

of such practice on the final classification performance is not clear and worth of study.

Secondly, EEG recordings are continuous in time, therefore, a segmentation process is in-

evitable. The choice of window length (epoch) in the time domain and the edge effect due

to this windowing process could potentially degrade the final classification performance

and should be dealt with carefully.

6.6.1 The Effect Of Sampling Rate On System Performance

The EEG recordings we tested on have a sampling rate of 1024 Hz which is relatively

high comparing to the spectral characteristics of EEG signals (most energy concentrates

in the frequency range that is less than 50 Hz). We down sampled the recordings from

the original 1024Hz to 512Hz and 256Hz and applied the same set of feature extraction

algorithms and classifiers. The final classification performance using kNN is shown in

Figure 6.14 and LDA shown in Figure 6.13.

From the figures, we see that recognition rates degrades significantly for LDA when

the sampling rate is below 512Hz. Recognition rates using kNN classifier for all four types

of features stays the same when the original data were down-sampled. The significant

degrade of system performance using the time domain features is caused by the poor

108


256 512 10240

10

20

30

40

50

60

70

80

90

100

Sampling Frequency

Cor

rect

Rec

ogni

tion

Rat

e

Sampling Frequency vs. Correct Recognition Rate (all electrodes, LDA)

statisticalnarrow−bandHOCwavelet−based

Figure 6.13: Sampling Rate vs. Correct Recognition Rate using All Electrodes and LDA

256 512 10240

10

20

30

40

50

60

70

80

90

100Correct Recognition Rate vs. Sampling Rate (all electrodes, kNN)

Sampling Rate

Cor

rect

Rec

ogni

tion

Rat

e

statisticalnarrow−bandhocwavelet

based

Figure 6.14: Sampling Rate vs. Correct Recognition Rate using All Electrodes and kNN

109

figures/54_LDA_sampling.eps

figures/54_kNN_sampling.eps


amplitude resolution due to down sampling which significantly reduces the separability

of the classes in the HOC feature space. In [88], a sampling limit for EMD algorithm

was proposed, which also suggest that a 5 times of the Nyquist rate is required for best

performance of the EMD algorithm. If we assume the spectrum of interest is less than

60Hz, then the Nyquist rate would be 120Hz, and the required sampling rate for EMD

would be 300Hz. When down-sampled to 256 Hz, the reconstructed EEG signal is less

accurate which produces less representative features of each classes and leads to poor

overall system performance.

6.6.2 Parameters For Setting Window Size (epoch)

A significant characteristic of the EEG measurements is its high temporal resolution,

as the EEG sampling frequency is considerably higher (256 − 1024Hz) than the native

frequencies of brain signals (usually considered to be below 50 Hz). This enables temporal

redundancy of source data, i.e. a high degree of continuity which should also be reflected

on the inverse solution. Unfortunately, explicit temporal regularization is very difficult

to implement mostly because of the computational complexity required, as the size of the

regularization matrix would be multiplied by the size of the temporal window through a

Kronecker product [35], and of the absence of explicit quantifiable priors on temporal

brain activity behavior. On the other hand, one can still impose implicit temporal

regularization benefiting from the aforementioned redundancy. Indeed, by performing

the inversion simultaneously on several time frames, the focalization procedure acts as an

implicit continuity constraint, the same volumes being identically selected and reweighted

for reconstruction in successive time frames. The maximum length Tmax of the time

window can be set in concordance with the Nyquist frequency:

Tmax = floor

(

fs2fup

)

(6.1)

where fs is the EEG sampling frequency and fup is the upper limit of the EEG

110


frequency band. The reconstruction can then be performed on sliding windows of length

max T ≤ Tmax.

6.6.3 Parameters for Wavelet Feature Evaluation

In general, most wavelets in common use are not appropriate for filtering the nonsta-

tionary signals [62]. If an inappropriate mother wavelet was used in the process of

wavelet-based multiresolution, the phenomena of energy leakage in the reconstructed

power spectrum of all filtered bands will be more apparent. Thus, the choice of appro-

priate mother wavelet basis should be the most important first step in wavelet-based

multiresolution analysis. To determine whether the mother wavelet used is appropri-

ate for wavelet-based feature analysis, we have carried out testings on the following set

of wavelets.These wavelet functions have been chosen due to their near optimal time-

frequency localization properties. Moreover, the waveforms of these wavelets are similar

to the waveforms to be detected in the EEG signal. Therefore, extraction of EEG signals

features are more likely to be successful [67]. The correct recognition rates were listed in

Table 6.13. From these results we see that the classification performance is very similar

among these wavelet functions and can be used interchangeably.

Table 6.13: Cross subject emotion recognition rates using different wavelets for DWT

Wavelets

Classifier db4 db8 sym4 sym8 coif5 bior1.3

kNN 76.67 76.15 76.00 76.21 76.56 75.64

LDA 56.62 54.41 57.49 57.18 56.77 56.26

111

6.7 Summary

6.7 Summary

In this chapter, three research objectives were tested through a set of classification simula-

tions on a emotion-specific data set. First of all, the feasibility of MEMD as a filter-bank

for extracting signal components of interest were tested. Four types of features were

extracted from the reconstructed EEG signals and used for the classification of three

affect states: positively exited, negatively excited and neutral. Since EEG signals vary

greatly between subjects and often does not generalize well for a designed model. Two

types of classification performances were tested, the subject specific and cross-subject

recognition rates were presented for each simulation. Subject specific classification refers

to the case that the training samples for developing the statistical model and the testing

samples are from the same subject, whereas, the training samples for the cross-subject

classification simulation were from different subjects and the testing samples are mostly

from different subjects as well. The maximum correct classification rate for subject spe-

cific case is 97.78% using HOC features and kNN classifier with an average of 95.56%.

Correct recognition rate for the cross subject case is 90.77% using hoc features and kNN

classifier, and 79.64% using hoc features and LDA classifier. Although the dataset is

relatively small, these results is sufficient enough to show that EEG signal is feasible for

affect detection analysis.

Secondly, subject specific and cross subject classification performances on reduced

channels were examined. We have selected the reduced channels based on two cases.

• The first is to use the common channels that are also present in a commercially

available non-medical grade EEG headsets. The classification performance from

this reduced channels setting will provide insights on the feasibility of consumer-

grade EEG headsets for affect detection applications. The correct recognition rate

for the subject specific case has a maximum of 98.33% with an average of 96.27%

using HOC features and kNN classifier. The correct classification rate for the

112

6.7 Summary

cross subject analysis is 89.64% with the use of HOC features and kNN classifier

and 43.13% with the use of LDA classifier. From these results, we see that HOC

features are better at representing affect-specific EEG signals and it is feasible to

use consumer grade EEG headsets for affect detection analysis. LDA classification

performance degrades significantly with the reduction of channels.

• The second is to use channels selected through the application of Genetic Algorithm.

The classification performance using this reduced channels will provide an upper

bound on a pre-defined number of electrodes given the current feature extraction

analysis and classification settings. When the number of channels was 10, a cross-

subject correct recognition rate of 88.77% was achieved using HOC features and

kNN classifier and 56.31% using LDA classifier. When the number of channels was

set to 6, a cross-subject recognition rate of 89.79% was achieved using HOC features

and kNN classifier and 53.21% using LDA classifier. From these results, we see that

the major improvement by using Genetic Algorithm on channel selection was the

increase of correct recognition rate using LDA. LDA classifier is desirable because

it is simple and requires less computation than the instance-based kNN classifier

for a testing sample.

Thirdly, a set of parameters that are important for practical real-time signal process-

ing applications which typically are constrained with computational complexity (e.g.,

smarted devices) were evaluated and their impacts on the overall classification perfor-

mance were also tested.

113

Chapter 7

Conclusions and Future Works

7.1 Conclusions

Affective computing and emotion intelligence have been two very important studying

area in the field of Human-Machine Interface. Affect detection is the first step towards

emotion-aware applications and also is a critical step in the success of an emotion recog-

nition system. EEG signals are believed to be feasible for affect detection applications

due to their direct association with the human brain and short response time required in

detecting the shift of affect states. Compared to other affect detection methods such as

the Body Sensor Network (BSN), which requires a variety of sensors to be attached to

a person and can be prohibiting to a person’s normal daily activities; the commercially

available, non-medical grade EEG headsets seem to be an outstanding alternative and

can potentially provide better classification performance.

This thesis has focused on the feasibility of utilizing brainwaves for affect detection

applications. It has contributed to the field of affective computing in both the under-

standing of the association between affective states and the Central Nervous signals

(EEG) and the feasible ways of utilizing it from the system design point of view. Due to

the increasing use of non-medical grade EEG headsets for commercial applications, this

114

7.1 Conclusions

study was also set to determine the performance of affect detection applications on signif-

icantly reduced number of electrodes. Through the use of a novel time-spectral analysis

algorithm (MEMD), the thesis has shown that EEG signal is suitable for affect detection.

It also provided quantitative evidence for the affective related statements provided from

the human Psychology perspective in terms of the frontal asymmetry activity between

the hemisphere in expressing different affects.

7.1.1 Key Contributions

This study has contributed to the field of affective detection in three ways. The first

contribution is a framework that utilizes EEG signals and MEMD for emotion detection

applications. It has also further contributed in further our understanding of affect (physi-

ological expression of emotion) expression through the use of multi-channel EEG signals.

Affect specific EEG channels and frequency range of interest were thoroughly studied

and presented. This information will be beneficial in the design of reduced-channel EEG

headsets for portable devices such as smart phones.

• This work provided a framework using EEG for affect detection. Through the

study on the characteristics of EEG signal in time and spectral domain. A series

of preprocessing and feature extraction methods were researched, analyzed and

implemented. The oscillatory pattern in time domain and the energy variation

in the spectral domain were key to these methods and shown to be efficient in

detection affect variations.

• This thesis also demonstrated the application of a novel time-spectral analysis al-

gorithm, MEMD, for multi-channel EEG signal analysis. MEMD is a recently

developed signal processing method that is completely data driven and preserves

the non-linear and non-stationary characteristics of the EEG signal. MEMD can

be applied to a wide range of biomedical signals and has shown promising results.

115

7.2 Future Works

Hilbert Huang transform, which is the combined application of MEMD or EMD

algorithm with Hilbert transform, can provide key information such as, the instan-

taneous phase and amplitude, in understanding EEG signals.

• Genetic Algorithm was applied in the process of selecting the affect-specific EEG

channels and also to extract the frequency range of interest for better detection

performance. This global optimization approach allowed us to systematically select

the desired number of electrodes based on their significance in affect detection.

7.2 Future Works

While this thesis has shown the effectiveness in using EEG for affect detection, there are

a few questions to be addressed in future works.

7.2.1 Directions for Future Study On Utilizing EEG Signals For

Affect Detection Applications

In future research, the application approach presented in this dissertation can be further

improved in the following ways:

• Incorporate automatic labeling of samples from another modality during the train-

ing stage, which will shift this machine learning problem from supervised learning

to unsupervised learning. In many practical cases, the direct feedback from the user

for labeling the training samples is not possible (neural disorders such as autism)

or not reliable as some people are not aware or good at quantizing their own feel-

ings. Also, as stated in section 2.4, affective expressions in EEG vary significantly

between subjects. To develop a robust machine learning system, a large number

of training samples is required. This calls for an alternative and maybe a more

reliable and consistent way of labeling a large number of samples.

116

7.2 Future Works

• An optimal fusion algorithm for multimodality system , e.g, the use of EEG signals

and facial images, is not present in the literature. Various fusion techniques, in

both the feature level and the decision level, should be investigated.

• The novel time-spectral signal analysis approach, MEMD is effective in separating

multi-component non-linear and non-stationary signals. However, the current im-

plementation is very slow and there is a great need for the computational improve-

ments for near-real time analysis. In particular, the parallel calculation of different

projection directions to boost computation time should be studied to shorten pro-

cessing time especially if real-time processing is required or a large number of EEG

channels will be utilized.

117

Appendix A

Neighbouring Electrodes for Local

Laplacian Filter

118

Table A.1: Associated neighbour electrodes for Local Laplacian filters [5]

Electrode Neighbours electrodes Electrode Neighbours electrodes

Fp1 F7, F5, AF7, AFz, Fpz Fpz Fp1, AFz, Fp2

AF7 Fp1, F5, F3, AF3, AFz Fp2 Fpz, AFz, AF8, F6, F8

AF3 AFz, AF7, F3, F1, Fz, AF4 AF8 Fp2, AFz, AF4, F4, F6

F1 F3, FC3, FC1, Fz, AF3 AF4 AF8, AFz, AF3, Fz, F2, F4

F3 F5, FC5, FC3, F1, AF3, AF7 AFz Fpz, Fp1, AF7, AF3, AF4, AF8, Fp2

F5 F7, FT7, FC5, F3, AF7, Fp1 Fz AF3, F1, FC1, FCz, FC2, F2, AF4

F7 FT7, F5, Fp1 F2 AF4, Fz, FC2, FC4, F4

FT7 T7, FC5, F5, F7 F4 F6, AF8, AF4, F2, FC4, FC6

FC5 FT7, T7, C5, FC3, F3, F5 F6 F8, Fp2, AF8, F4, FC6, FT8

FC3 C5, C3, C1, FC1, F1, F3, FC5 F8 Fp2, F6, FT8

FC1 F1, FC3, C1, FCz, Fz FT8 F8, F6, FC6, T8

C1 FCz, FC1, FC3, C3, CP1, Cz FC6 FT8, F6, F4, FC4, C6, T8

C3 C1, FC3, C5, CP3, CP1 FC4 FC6 F4, F2, FC2, C2, C4, C6

C5 CP5, CP3, C3, FC3, FC5, T7 FC2 F2, Fz, FCz, C2, FC4

T7 TP7, CP5, C5, FC5, FT7 FCz Fz, FC1, C1, Cz, C2, FC2

TP7 P9, P7, CP5, T7 Cz FCz, C1, CP1, CPz, CP2, C2

CP5 TP7, P7, P5, CP3, C5, T7 C2 FC4, FC2, FCz, Cz, CP2, C4

CP3 CP5, P5, P3, CP1, C3, C5 C4 C6, FC4, C2, CP2, CP4

CP1 Cz, C1, C3, CP3, P3, Pz, CPz C6 T8, FC6, FC4, C4, CP4, CP6

P1 POz, P2, Pz, P3, P5, PO3 T8 FT8, FC6, C6, CP6, TP8

P3 P5, P1, Pz, CP1, CP3 TP8 T8, CP6, P8, P10

P5 PO3, P1, P3, CP3, CP5, P7 CP6 TP8, T8, C6, CP4, P6, P8

P7 PO7, PO3, P5, CP5, TP7, P9 CP4 CP6, C6, C4, CP2, P4, P6

P9 O1, PO7, P7, TP7 CP2 C4, C2, Cz, CPz, Pz, P4, CP4

PO7 Iz, Oz, PO3, P7, P9, O1 P2 PO4, P6, P4, Pz, P1, POz

PO3 Oz, POz, P1, P5, P7, PO7 P4 P6, CP4, CP2, Pz, P2

O1 Iz, PO7, P9 P6 P8, CP6, CP4, P4, P2, PO4

Iz O2, PO8, Oz, PO7, O1 P8 TP8, CP6, P6, PO4, PO8, P10

Oz Iz, PO8, PO4, POz, PO3, PO7 P10 TP8, P8, PO8, O2

POz Oz, PO4, P2, P1, PO3 PO8 O2, P10, P8, PO4, Oz, Iz

Pz P2, P4, CP2, CPz, CP1, P3, P1 PO4 P8, P6, P2, POz, Oz, PO8

CPz Cz, CP1, Pz, CP2 O2 P10, PO8, Iz

119

Appendix B

List of IAPS Images Used for the

Experiment

120

Table B.1: List of IAPS Images Used for Session 1

Trials Images per trial

1 7090.jpg 7006.jpg 7030.jpg 5530.jpg 9700.jpg










11 6250.1.jpg 3080.jpg 7380.jpg 3064.jpg 3051.jpg



14 7547.jpg 6570.2jpg 7140.jpg 7035.jpg 7100.jpg











25 6415.jpg 9903.jpg 2710.jpg 8230.jpg 2352.2.jpg

26 7283.jpg 2745.1.jpg 7002.jpg 2980.jpg 7950.jpg


28 4610.jpg 5460.jpg 8034.jpg 2352.1.jpg 8370.jpg


30 6212.jpg 2799.jpg 3120.jpg 9520.jpg 3053.jpg121

Table B.2: List of IAPS Images Used for the Session 2





















20 9570.jpg 6350.jpg 9635.1.jpg 3000.jpg 4664.2.jpg











Table B.3: List of IAPS Images Used for the Session 3
































Appendix C

Confusion Matrix Ground Truth

Validation of the Database

C: Self-assessment values from each participants D: Confusion Matrix validating the

ground truth of the database

More detailed analysis on the emotion elicitation results were shown below with con-

fusion matrix outputs for each participants:

Table C.1: Participant 1: Self-Assessment Classification Accuracy (in Percentage) for

the Three Chosen Emotions


Calm 94.74 5.26 0

Pos. Excited 56.25 43.75 0

Neg. Excited 5.26 0 94.74

124

Table C.2: Participant 2: Self-Assessment Classification Accuracy (in Percentage) of the



Calm 89.66 10.34 0

Pos. Excited 90.48 9.52 0

Neg. Excited 57.14 0 42.86

Table C.3: Participant 3:Self-Assessment Classification Accuracy (in Percentage) of the



Calm 62.07 27.59 10.34

Pos. Excited 4.76 80.95 14.29

Neg. Excited 3.57 3.57 92.86




Calm 100 0 0

Pos. Excited 33.33 66.67 0

Neg. Excited 21.43 0 78.57

125




Calm 20.69 75.86 3.45

Pos. Excited 0 100 0

Neg. Excited 7.14 0 92.86

126

Appendix D


(EMD) Algorithm

EMD is an adaptive signal decomposition method with which any complicated signal

can be decomposed into a series of Intrinsic Mode Functions(IMFs). There are two main

criterions to meet for during this decomposition procedure as stated in the original paper

[29] :

1. For each extracted signal mode (IMF), the number of extrema and the number of

zero crossings must differ at most by one, in other words, there should have no

riding waves for each IMF.

2. The mean value of the envelopes defined by the local maxima and local minima

should be zero at any point, this indicates that the maxima and minima locates

symmetrically along the local mean (zero).

Algorithm 1. The standard EMD Algorithm

1. Find the locations of all the extrema of x′(t).

2. Interpolate (using cubic spline interpolation) between all the minima (respectively

127

maxima) to obtain the lower signal envelope, emin(k) (respectively higher envelope emax(k)).

3. Compute the local mean m(k) = [emin(k) + emax(k)]/2.

4. Subtract the local mean from the signal to obtain the ’oscillatory mode’ s(k) =

x′(k)−m(k).

5. If s(k) obeys the stopping criteria, then we define d(k) = s(k) as an IMF, otherwise

set x′(k) = s(k) and repeat the process from step 1.

The sifting process should stop when the residue r(k) become a constant, a monotonic

function, or a function contains only a single extrema, from which no more IMF can be

extracted. If the data has a trend, then the final residue should be the trend of the signal.

128

Bibliography

[1] http://www.emotiv.com/apps/epoc/.

[2] http://enterface.tel.fer.hr/docs/database_files/eNTERFACE06_EMOBRAIN.html.

[3] Ralph Adolphs, Daniel Tranel, and Antonio R Damasio. Dissociable neural systems

for recognizing emotions. Brain and Cognition, 52(1):61–69, 2003.

[4] Geoffrey L. Ahern and Gary E. Schwartz.

[5] T. I. Alecu. Robust Focalized Brain Activity Reconstruction using ElectroEncephalo-

Grams. PhD thesis, University of Geneva, 2005.

[6] Nelson Torro Alves, Srgio Sheiji Fukusima, and Antonio Aznar-Casanova. Models of

brain asymmetry in emotional processing. Psychology and Neuroscience, 1(1):63–66,

2008.

[7] Omar Alzoubi, Rafael A Calvo, and Ronald H Stevens. Classification of eeg for

affect recognition : An adaptive approach. Emotion, pages 52–61, 2009.

[8] Marian S. Bartlett, Gwen Littlewort, Ian Fasel, and Javier R. Movellan. Real time

face detection and facial expression recognition: Development and applications to

human computer interaction. In CVPRW ’03: Proceedings of the Conference on

Computer Vision and Pattern Recognition, Workshop, 2003, volume 5, page 53,

2003.

129

http://www.emotiv.com/apps/epoc/

http://enterface.tel.fer.hr/docs/database_files/eNTERFACE06_EMOBRAIN.html

BIBLIOGRAPHY

[9] Peter N. Belhumeur, P. Hespanha, and David J. Kriegman. Eigenfaces vs. fisherfaces:

Recognition using class specific linear projection. IEEE Trans. Pattern Analysis and

Machine Intelligence, pages 711–720, 1997.

[10] Yvonne F Birks and Ian S Watt. Emotional intelligence and patient-centred care.

JRSM, 100(8):368–374, 2007.

[11] B. Boashash. Estimating and interpreting the instantaneous frequency of a signal.

ii. algorithms and applications. Proceedings of the IEEE, 80(4):540 –568, April 1992.

[12] D.O. Bos. Eeg-based emotion recognition. The Influence of Visual and Auditory

Stimuli University, 2006.

[13] Nabila Bouzida, Laurent Peyrodie, and Christian Vasseur. Ica and a gauge of filter

for the automatic filtering of an eeg signal. IEEE International Joint Conference on

Neural Networks (IJCNN ’05), Montreal, Canada., 4:2508–2513, Aug 2005.

[14] Rafael A. Calvo and Sidney D’Mello. Affect detection: An interdisciplinary review

of models, methods, and their applications. IEEE Transactions on Affective Com-

puting, 1(1):18–37, January 2010.

[15] Israel C Christie and Bruce H Friedman. Autonomic specificity of discrete emotion

and dimensions of affective space: a multivariate approach. International Journal

of Psychophysiology, 51(2):143 – 153, 2004.

[16] James A Coan and John J.B Allen. Frontal eeg asymmetry as a moderator and

mediator of emotion. Biological Psychology, 67(12):7 – 50, 2004.

[17] T. Cover and P. Hart. Nearest neighbor pattern classification. Information Theory,

IEEE Transactions on, 13(1):21–27, January 1967.

[18] Roddy Cowie and Randolph R. Cornelius. Describing the emotional states that are

expressed in speech. Speech Communication, 40(12):5 – 32, 2003.

130

BIBLIOGRAPHY

[19] Mayer J. D and Salovey P. What is emotional intelligence? P. Salovey & D.

Sluyter (Eds.) Emotional development and emotional intelligence: Implications for

educators, pages 3–31, 1997.

[20] R J Davidson, D C Jackson, and N H Kalin. Emotion, plasticity, context, and regu-

lation: perspectives from affective neuroscience. Psychological Bulletin, 126(6):890–

909, 2000.

[21] RJ Davidson and NA Fox. Asymmetrical brain activity discriminates between pos-

itive and negative affective stimuli in human infants. Science, 218(4578):1235–1237,

1982.

[22] Samantha Dockray and Andrew Steptoe. Positive affect and psychobiological pro-

cesses. Neuroscience & Biobehavioral Reviews, 35(1):69 – 75, 2010.

[23] David L Donoho, Iain Johnstone, Bob Stine, and Gregory Piatetsky-shapiro. High-

dimensional data analysis: The curses and blessings of dimensionality. Statistics,

pages 1–33, 2000.

[24] Richard O. Duda, Peter E. Hart, and David G. Stork. Pattern Classification (2nd

Edition). Wiley-Interscience, 2 edition, November 2001.

[25] P Ekman. Universals and cultural differences in facial expressions of emotion. Ne-

braska Symposium On Motivation, 19(4):207–283, 1971.

[26] Paul Ekman, Wallace V. Friesen, and Phoebe Ellsworth. Emotion in the Human

Face. Oxford University Press, 1972.

[27] Paul Ekman, Wallace V Friesen, and Joseph C Hager. Facial Action Coding System,

volume 97. A Human Face, 2002.

131

BIBLIOGRAPHY

[28] Raul Fernandez and Rosalind Picard. Recognizing affect from speech prosody us-

ing hierarchical graphical models. Speech Commun., 53(9-10):1088–1103, November

2011.

[29] P. Flandrin, G. Rilling, and P. Goncalves. Empirical mode decomposition as a filter

bank. Signal Processing Letters, IEEE, 11(2):112 – 114, Feburary 2004.

[30] Dennis Gabor. Theory of communication. J. Inst. Elect. Eng., 93:429–457, 1946.

[31] D E Goldberg. Genetic Algorithms in Search, Optimization, and Machine Learning.

Addison-Wesley, 1989.

[32] Andreas Haag, Silke Goronzy, Peter Schaich, and Jason Williams. Emotion recogni-

tion using bio-sensors: First steps towards an automatic system. Affective Dialogue

Systems, i(6):36–48, 2004.

[33] Thomas Holtgraves and Adam Felton. Hemispheric asymmetry in the processing

of negative and positive words: A divided field study. Cognition and Emotion,

25(4):691–699, 2011.

[34] Robert Horlings, Dragos Datcu, and Leon J. M. Rothkrantz. Emotion recognition

using brain activity. In Proceedings of the 9th International Conference on Com-

puter Systems and Technologies and Workshop for PhD Students in Computing,

CompSysTech ’08, pages 6:II.1–6:1, New York, NY, USA, 2008. ACM.

[35] Roger A. Horn and Charles R. Johnson. Topics in Matrix Analysis. Cambridge

University Press, June 1994.

[36] N. E. Huang, Z. Shen, S. R. Long, M. C. Wu, H. H. Shih, Q. Zheng, N. C. Yen, C. C.

Tung, and H. H. Liu. The empirical mode decomposition and the hilbert spectrum for

nonlinear and non-stationary time series analysis. Proceedings of the Royal Society of

132

BIBLIOGRAPHY

London. Series A: Mathematical, Physical and Engineering Sciences, 454(1971):903–

995, March 1998.

[37] Norden E. Huang. An adaptive data analysis method for nonlinear and nonstationary

time series: The empirical mode decomposition and hilbert spectral analysis. In Tao

Qian, Mang I Vai, and Yuesheng Xu, editors, Wavelet Analysis and Applications,

Applied and Numerical Harmonic Analysis, pages 363–376. Birkhuser Basel, 2007.

[38] Norden E. Huang, Zhaohua Wu, Steven R. Long, Kenneth C. Arnold, Xianyao Chen,

and Karin Blank. On instantaneous frequency. World Scientific, 1(2):177–229, 2009.

[39] A Hyvrinen and E Oja. Independent component analysis: algorithms and applica-

tions. Neural Networks, 13(4-5):411–430, 2000.

[40] Anil K. Jain, Robert P. W. Duin, and Jianchang Mao. Statistical pattern recogni-

tion: A review. IEEE Transactions on Pattern Analysis and Machine Intelligence,

22(1):4–37, 2000.

[41] William James. What is an emotion? Mind, 9(34):188–205, Apr. 1884.

[42] Valer Jurcak, Daisuke Tsuzuki, and Ippeita Dan. 10/20, 10/10, and 10/5 systems

revisited: their validity as relative head-surface-based positioning systems. Neu-

roImage, 34(4):1600–1611, February 2007.

[43] Ashish Kapoor, Hyungil Ahn, and Rosalind W. Picard. Mixture of gaussian pro-

cesses for combining multiple modalities. In Proceedings of the 6th international

conference on Multiple Classifier Systems, MCS’05, pages 86–96, Berlin, Heidelberg,

2005. Springer-Verlag.

[44] Ashish Kapoor, Selene Mota, and Rosalind W. Picard. Towards a learning compan-

ion that recognizes affect. In In AAAI Fall Symposium, 2001.

133

BIBLIOGRAPHY

[45] Ashish Kapoor, Rosalind W. Picard, and Yuri Ivanov. Probabilistic combination

of multiple modalities to detect interest. In Interest, International Conference on

Pattern Recognition, pages 969–972, 2004.

[46] C. D. Katsis, N. Katertsidis, G. Ganiatsas, and D. I. Fotiadis. Toward Emotion

Recognition in Car-Racing Drivers: A Biosignal Processing Approach. IEEE Trans-

actions on Systems Man and Cybernetics Part A Systems and Humans, 38(3), May

2008.

[47] Bejamin Kedem. Time Series Analysis by Higher Order Crossings. Institute of

Electrical and Electronics Engineering (April 1994), April 1994.

[48] Jonghwa Kim and E. Andre. Emotion recognition based on physiological changes in

music listening. Pattern Analysis and Machine Intelligence, IEEE Transactions on,

30(12):2067 –2083, dec. 2008.

[49] Josef Kittler, Mohamad Hatef, Robert P. W. Duin, and Jiri Matas. On combin-

ing classifiers. IEEE Transactions on Pattern Analysis and Machine Intelligence,

20:226–239, 1998.

[50] W. Klimesch. Eeg alpha and theta oscillations reflect cognitive and memory per-

formance: a review and analysis. Brain Research Reviews, 29(2-3):169–195, April

1999.

[51] R Kohavi and F Provost. Glossary of terms. Machine Learning, 30(2):271–274, 1998.

[52] Barry Kort, Rob Reilly, and Rosalind W. Picard. An affective model of inter-

play between emotions and learning: Reengineering educational pedagogy-building

a learning companion. In In, pages 43–48. IEEE Computer Society, 2001.

[53] I. Kotsia and I. Pitas. Facial expression recognition in image sequences using geo-

134

BIBLIOGRAPHY

metric deformation features and support vector machines. Image Processing, IEEE

Transactions on, 16(1):172 –187, jan. 2007.

[54] P Senthil Kumar, R Arumuganathan, K Sivakumar, and C Vimal. A wavelet based

statistical method for de-noising of ocular artifacts in EEG signals. IJCSNS Inter-

national Journal of Computer Science and Network Security, 8(9):87–92, 2008.

[55] Ludmila I. Kuncheva. A theoretical study on six classifier fusion strategies. IEEE

Trans. Pattern Anal. Mach. Intell., 24(2):281–286, February 2002.

[56] P. J. Lang, M. M. Bradley, and B. N. Cuthbert. International affective picture sys-

tem (IAPS): Affective ratings of pictures and instruction manual. Technical report,

University of Florida, Gainesville, FL, 2008.

[57] Marc Langheinrich. Privacy by design - principles of privacy-aware ubiquitous sys-

tems. In Proceedings of UbiComp (Ubiquitous Computing), pages 273–291, sep.

[58] Joseph E. LeDoux. The emotional brain: The mysterious underpinnings of emotional

life. Simon & Schuster, March 1998.

[59] Joseph E. LeDoux. Emotion circuits in the brain. Annual Review of Neuroscience,

23(1):155–184, 2000.

[60] Enrique Leon, Graham Clarke, Victor Callaghan, and Francisco Sepulveda. A user-

independent real-time emotion recognition system for software agents in domestic

environments. Engineering Applications of Artificial Intelligence, 20(3):337 – 345,

2007.

[61] David Looney, Ling Li, Tomasz M. Rutkowski, Danilo P. Mandic, and Andrzej

Cichocki. Ocular artifacts removal from EEG using EMD. In Rubin Wang, Enhua

Shen, and Fanji Gu, editors, Advances in Cognitive Neurodynamics ICCN 2007,

pages 831–835. Springer Netherlands, 2008.

135

BIBLIOGRAPHY

[62] S. G. Mallat. A theory for multiresolution signal decomposition: the wavelet rep-

resentation. Pattern Analysis and Machine Intelligence, IEEE Transactions on,

11(7):674–693, 1989.

[63] Jon D Morris. Observations : Sam : The self-assessment manikin an efficient cross-

cultural measurement of emotional response 1. Journal of Advertising Research,

35(December):1–6, 1995.

[64] Javier Cruz Mota, Luong Hong Viet, Alice Caplier, and Michele Rombaut. Emotion

detection in the loop from brain signals and facial images, 2006.

[65] R.D. Munk. Technical report, Technical University of Denmark, Department of

Informatics and Mathematical Modeling, Cognitive Systems.

[66] M. Murugappan. Human emotion classification using wavelet transform and knn. In

Pattern Analysis and Intelligent Robotics (ICPAIR), 2011 International Conference

on, volume 1, pages 148 –153, june 2011.

[67] M. Murugappan, M. Rizon, R. Nagarajan, and S. Yaacob. EEG feature extraction

for classifying emotions using FCM and FKM. In Proceedings of the 7th WSEAS

International Conference on Applied Computer and Applied Computational Science,

pages 299–304, Stevens Point, Wisconsin, USA, 2008. World Scientific and Engi-

neering Academy and Society (WSEAS).

[68] E. Niedermeyer and F. Lopes da Silva. Electroencephalography: Basic Principles,

Clinical Applications, and Related Fields. Lippincott Williams & Wilkins, 5th edi-

tion, nov 2004.

[69] Charles E. Osgood. The nature and measurement of meaning. Psychological Bulletin,

49(3):197 – 237, 1952.

136

BIBLIOGRAPHY

[70] C. Parameswariah and M. Cox. Frequency characteristics of wavelets. Power Engi-

neering Review, IEEE, 22(1):72, jan. 2002.

[71] P.C. Petrantonakis and L.J. Hadjileontiadis. Adaptive extraction of emotion-related

eeg segments using multidimensional directed information in time-frequency domain.

In Engineering in Medicine and Biology Society (EMBC), 2010 Annual International

Conference of the IEEE, pages 1 –4, 31 2010-sept. 4 2010.

[72] P.C. Petrantonakis and L.J. Hadjileontiadis. Emotion recognition from brain sig-

nals using hybrid adaptive filtering and higher order crossings analysis. Affective

Computing, IEEE Transactions on, 1(2):81 –97, july-dec. 2010.

[73] A. Petrosian. Kolmogorov complexity of finite sequences and recognition of different

preictal EEG patterns. In Computer-Based Medical Systems, 1995., Proceedings of

the Eighth IEEE Symposium on, pages 212 –217, jun 1995.

[74] Rosalind W. Picard. Future affective technology for autism and emotion communi-

cation. 364(1535):3575–3584, 2009.

[75] G.E. Polychronaki, P. Ktonas, S. Gatzonis, P.A. Asvestas, E. Spanou, A. Siatouni,

H. Tsekou, D. Sakas, and K.S. Nikita. Comparison of fractal dimension estimation

algorithms for epileptic seizure onset detection. In BioInformatics and BioEngi-

neering, 2008. BIBE 2008. 8th IEEE International Conference on, pages 1 –6, oct.

2008.

[76] Jonathan Posner, James A Russell, and Bradley S Peterson. The circumplex model

of affect: an integrative approach to affective neuroscience, cognitive development,

and psychopathology. Development and Psychopathology, 17(3):715–734, 2005.

[77] Dale Purves, George J. Augustine, David Fitzpatrick, William C. Hall, Anthony-

Samuel LaMantia, and Leonard E. White. Neuroscience. Sinauer Associates, Inc.,

fifth edition, November 2011.

137

BIBLIOGRAPHY

[78] N. Rehman and D. P. Mandic. Multivariate empirical mode decomposition. Pro-

ceedings of the Royal Society A: Mathematical, Physical and Engineering Science,

466(2117):1291–1302, 2010.

[79] J A Russell and L F Barrett. Core affect, prototypical emotional episodes, and other

things called emotion: dissecting the elephant. Journal of Personality and Social

Psychology, 76(5):805–819, 1999.

[80] James A. Russell. Core affect and the psychological construction of emotion. Psy-

chological Review, (1):145–172, 2003.

[81] James A Russell. Emotion, core affect, and psychological construction. Cognition

& Emotion, 23(7):1259–1283, 2009.

[82] Rutkowski, Mandic, A. Cichocki, and Przybyszewski. EMD approach to multichan-

nel EEG data-the amplitude and phase components clustering analysis. Journal of

Circuits, Systems, and Computers, 19(1), 2010.

[83] Dean Sabatinelli, Peter J. Lang, Andreas Keil, and Margaret M. Bradley. Emotional

perception: Correlation of functional mri and event-related potentials. Cerebral

Cortex, 17(5):1085–1091, 2007.

[84] George E. Sakr, Imad H. Elhajj, and Huda Abou-Saad Huijer. Support vector

machines to define and detect agitation transition. IEEE Trans. Affect. Comput.,

1:98–108, July 2010.

[85] Arman Savran, Koray Ciftci, Guillame Chanel, Javier C. Mota, Luong H. Viet, Blent

Sankur, Lale Akarun, Alice Caplier, and Michele Rombaut. Emotion Detection in

the Loop from Brain Signals and Facial Images. In Proceedings of the eNTERFACE

2006 Workshop, Dubrovnik, Croatia, July 2006.

138

BIBLIOGRAPHY

[86] M Schroder, M Bogdan, T Hinterberger, and N Birbaumer. Automated EEG feature

selection for brain computer interfaces, pages 626–629. IEEE, 2003.

[87] Chad L. Stephens, Israel C. Christie, and Bruce H. Friedman. Autonomic specificity

of basic emotions: Evidence from pattern classification and cluster analysis. Bio-

logical Psychology, 84(3):463 – 473, 2010. ¡ce:title¿The biopsychology of emotion:

Current theoretical and empirical perspectives¡/ce:title¿.

[88] N. Stevenson, M. Mesbah, and B. Boashash. A sampling limit for the empirical

mode decomposition. In Signal Processing and Its Applications, 2005. Proceedings

of the Eighth International Symposium on, volume 2, pages 647 – 650, 28-31, 2005.

[89] R. Sukanesh and R. Harikumar. A comparison of genetic algorithm & neural net-

work (mlp) in patient specific classification of epilepsy risk levels from eeg signals.

Engineering Letters, 14(1):96–104, 2007.

[90] Kazuhiko Takahashi. Remarks on emotion recognition from biopotential signals. In

in 2nd Int. Conf. on Autonomous Robots and Agents, 2004, pages 186–191.

[91] Vernon L. Towle, Jos Bolaos, Diane Suarez, Kim Tan, Robert Grzeszczuk, David N.

Levin, Raif Cakmur, Samuel A. Frank, and Jean-Paul Spire. The spatial location

of EEG electrodes: locating the best-fitting sphere relative to cortical anatomy.

Electroencephalography and Clinical Neurophysiology, 86(1):1 – 6, 1993.

[92] N. ur Rehman and D.P. Mandic. Filter bank property of multivariate empirical

mode decomposition. Signal Processing, IEEE Transactions on, 59(5):2421 –2426,

may 2011.

[93] J. Ville. Theory and Applications of the Notion of Complex Signal. Rand, 1958.

[94] J. Wagner, J. Kim, and E. Andre. From physiological signals to emotions: Imple-

menting and comparing selected methods for feature extraction and classification. In

139

BIBLIOGRAPHY

Multimedia and Expo, 2005. ICME 2005. IEEE International Conference on, pages

940–943. IEEE, July 2005.

[95] C M Whissell. The dictionary of affect in language, volume 4, pages 113–131. Aca-

demic Press, 1989.

[96] Zhaohua Wu and Norden E. Huang. A study of the characteristics of white

noise using the empirical mode decomposition method. Proceedings of the Royal

Society of London. Series A: Mathematical, Physical and Engineering Sciences,

460(2046):1597–1611, June 2004.

[97] Yongmian Zhang and Qiang Ji. Active and dynamic information fusion for facial

expression understanding from image sequences. Pattern Analysis and Machine

Intelligence, IEEE Transactions on, 27(5):699–714, 2005.

[98] T Zikov, S Bibian, G A Dumont, M Huzmezan, and C R Ries. A wavelet based de-

noising technique for ocular artifact correction of the electroencephalogram, volume 1,

pages 98–105. IEEE, 2002.

140

Towards AutomatedRecognitionof Human Emotionsusing EEG · Towards Automated Recognition of Human...

Documents

Transcript of Towards AutomatedRecognitionof Human Emotionsusing EEG · Towards Automated Recognition of Human...