Towards AutomatedRecognitionof Human Emotionsusing EEG · Towards Automated Recognition of Human...
-
Upload
truongdien -
Category
Documents
-
view
229 -
download
3
Transcript of Towards AutomatedRecognitionof Human Emotionsusing EEG · Towards Automated Recognition of Human...
Towards Automated Recognition of Human Emotions using
EEG
by
Haiyan Xu
A thesis submitted in conformity with the requirementsfor the degree of Master of Applied Science
Graduate Department of Electrical and Computer EngineeringUniversity of Toronto
Copyright c© 2012 by Haiyan Xu
Abstract
Towards Automated Recognition of Human Emotions using EEG
Haiyan Xu
Master of Applied Science
Graduate Department of Electrical and Computer Engineering
University of Toronto
2012
Emotional states greatly influence many areas in our daily lives, such as: learning, deci-
sion making and interaction with others. Therefore, the ability to detect and recognize
one’s emotional states is essential in intelligence Human Machine Interaction (HMI). In
this thesis, a pattern classification framework was developed to sense and communicate
emotion changes expressed by the Central Nervous System (CNS) through the use of
EEG signals. More specifically, an EEG-based subject-dependent affect recognition sys-
tem was developed to quantitatively measure and categorize three affect states: Positively
excited, neutral and negatively excited. Several existing feature extraction algorithms
and classifiers were researched, analyzed and evaluated through a series of classification
simulations using a publicly available emotion-based EEG database. Simulation results
were presented followed by an interpretation discussion.
The findings in this thesis can be useful for the design of affect sensitive applications
such as augmented means of communication for severely disabled people that cannot
directly express their emotions. Furthermore, we have shown that with significantly
reduced number of channels, classification rates maintained a level that is feasible for
emotion recognition. Thus current HMI paradigms to integrate consumer electronics
such as smart hand-held devices with commercially available EEG headsets is promising
and will significantly broaden the application cases.
ii
Contents
1 Introduction 1
1.1 Emotional Intelligence and Human Machine Interaction (HMI) . . . . . . 2
1.2 Affective Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Affect-sensitive Applications . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3.1 Human-Machine Interaction (HMI) . . . . . . . . . . . . . . . . . 6
1.3.2 Social Signal Processing . . . . . . . . . . . . . . . . . . . . . . . 6
1.3.3 Health and Rehabilitation Applications . . . . . . . . . . . . . . . 7
1.4 Technical Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.5 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.6 Thesis Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2 State of the Art in Affect Recognition 14
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2 Modeling Affect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2.1 Discrete Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2.2 Dimensional Model . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2.3 Hybrid Discrete-Dimensional Model of Affect . . . . . . . . . . . . 17
2.3 Affect Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.3.1 Facial Expressions and Affect . . . . . . . . . . . . . . . . . . . . 19
2.3.2 Audio Analysis and Affect . . . . . . . . . . . . . . . . . . . . . . 20
iii
2.3.3 Physiological Expressions of Affect . . . . . . . . . . . . . . . . . 20
2.3.4 Affect Expression Through Peripheral Nervous System (PNS) . . 22
2.3.5 Affective Expression Through Central Nervous System (CNS) . . 23
2.4 Emotion Elicitation Protocols . . . . . . . . . . . . . . . . . . . . . . . . 23
2.4.1 International Affect Pictures System (IAPS) . . . . . . . . . . . . 24
2.5 Multimodality Affect Detection . . . . . . . . . . . . . . . . . . . . . . . 26
2.5.1 Fusion at the Feature Level . . . . . . . . . . . . . . . . . . . . . 27
2.5.2 Fusion at the Decision Level . . . . . . . . . . . . . . . . . . . . . 28
2.6 State of the Art Emotion Recognition Performances . . . . . . . . . . . . 29
2.7 Ethical and Privacy Concerns on Physiological Signal Collection . . . . . 33
2.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3 EEG Signal Characteristics and Preprocessing Methods 35
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.2 EEG Signal Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.2.1 Electrode Placement for EEG Recording Devices . . . . . . . . . 38
3.2.2 Measuring Emotion Using EEG . . . . . . . . . . . . . . . . . . . 39
3.2.3 Spectral Characteristics of EEG . . . . . . . . . . . . . . . . . . . 41
3.3 Preprocessing of the EEG Recordings . . . . . . . . . . . . . . . . . . . . 42
3.3.1 EEG Referencing . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.3.2 Rejecting Artifacts Based on Channel Statistics . . . . . . . . . . 43
3.3.3 Filter Data Using Fast Fourier Transform (FFT) . . . . . . . . . . 44
3.3.4 Independent Component Analysis . . . . . . . . . . . . . . . . . . 45
3.3.5 Wavelet Decomposition for Denoising . . . . . . . . . . . . . . . . 46
3.4 Ground Truth Definition and Validation . . . . . . . . . . . . . . . . . . 48
3.4.1 Pearson Correlation Coefficients for Ground Truth Validation . . 49
3.4.2 Confusion Matrix for Ground Truth Validation . . . . . . . . . . 50
3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
iv
4 Methods for Emotion Assessments using EEG 53
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.2 Previous Feature Extraction Methods . . . . . . . . . . . . . . . . . . . . 54
4.2.1 Time Domain Analysis . . . . . . . . . . . . . . . . . . . . . . . . 55
4.2.1.1 Statistical-based Features . . . . . . . . . . . . . . . . . 55
4.2.1.2 Higher Order Crossings . . . . . . . . . . . . . . . . . . 56
4.2.2 Spectral Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.2.2.1 Event Related Potential and Spectrogram . . . . . . . . 57
4.2.3 Time-Spectral Analysis . . . . . . . . . . . . . . . . . . . . . . . . 59
4.2.3.1 Wavelet-based Features . . . . . . . . . . . . . . . . . . 59
4.2.4 Channel selection: single channel vs. multiple channels . . . . . . 61
4.3 Classification Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.3.1 Linear Discriminate Analysis . . . . . . . . . . . . . . . . . . . . . 64
4.3.2 K Nearest Neighbours . . . . . . . . . . . . . . . . . . . . . . . . 65
4.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5 Empirical Mode Decomposition for Emotion Classification 68
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
5.2 Empirical Mode Decomposition (EMD) . . . . . . . . . . . . . . . . . . . 69
5.2.1 Decomposition Procedure . . . . . . . . . . . . . . . . . . . . . . 71
5.2.2 Hilbert-Huang Spectrum . . . . . . . . . . . . . . . . . . . . . . . 72
5.2.3 Multivariate EMD . . . . . . . . . . . . . . . . . . . . . . . . . . 75
5.3 Signal Reconstruction using MEMD as a Filter Bank . . . . . . . . . . . 77
5.3.1 MEMD for Feature Extraction . . . . . . . . . . . . . . . . . . . . 80
5.4 Genetic Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.4.1 Fitness Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
v
6 Experimental Setup and Simulation Results 86
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
6.2 Data Collection Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
6.2.1 Recording Device: Biosemi Active 2 . . . . . . . . . . . . . . . . . 91
6.2.2 Ground Truth Definition . . . . . . . . . . . . . . . . . . . . . . . 92
6.2.3 Ground Truth Validation Using Pearson Correlation Coefficients . 93
6.2.4 Ground Truth Validation Using Confusion Matrix . . . . . . . . . 94
6.3 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
6.4 Data Splitting and K Cross Validation . . . . . . . . . . . . . . . . . . . 96
6.4.1 k-Fold Cross Validation . . . . . . . . . . . . . . . . . . . . . . . . 98
6.5 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
6.5.1 Simulation Results Using All Channels . . . . . . . . . . . . . . . 99
6.5.1.1 Subject-Specific Emotion Recognition . . . . . . . . . . 100
6.5.1.2 Cross-Subject Emotion Recognition . . . . . . . . . . . . 101
6.5.2 Simulation Results with Channel Reduction . . . . . . . . . . . . 101
6.5.2.1 Channel Reduction in Reference to Commercial Devices 101
6.5.2.2 Channel Reduction Using Genetic Algorithm . . . . . . 104
6.6 Sensitivity Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
6.6.1 The Effect Of Sampling Rate On System Performance . . . . . . . 108
6.6.2 Parameters For Setting Window Size (epoch) . . . . . . . . . . . 110
6.6.3 Parameters for Wavelet Feature Evaluation . . . . . . . . . . . . . 111
6.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
7 Conclusions and Future Works 114
7.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
7.1.1 Key Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . 115
7.2 Future Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
vi
7.2.1 Directions for Future Study On Utilizing EEG Signals For Affect
Detection Applications . . . . . . . . . . . . . . . . . . . . . . . . 116
A Neighbouring Electrodes for Local Laplacian Filter 118
B List of IAPS Images Used for the Experiment 120
C Confusion Matrix Ground Truth Validation of the Database 124
D Empirical Mode Decomposition (EMD) Algorithm 127
Bibliography 129
vii
List of Tables
2.1 State of the Art Emotion Recognition Performances . . . . . . . . . . . . 30
2.2 State of the Art Emotion Recognition Performances using EEG . . . . . 31
3.1 Subbands of the EEG Signals . . . . . . . . . . . . . . . . . . . . . . . . 41
3.2 Wavelet Decomposition of EEG signals into various frequency bands (fs =
1024) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.3 Sample Self-Assessment Values vs. Values provided with chose stimuli
(e.g., images from IAPS) . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.4 Confusion Matrix Components . . . . . . . . . . . . . . . . . . . . . . . . 51
4.1 EEG channels selected for analysis. . . . . . . . . . . . . . . . . . . . . . 63
5.1 Method Comparison between Fourier, Wavelet and Hilbert-Huang Trans-
form in Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
6.1 Pearson correlation coefficient between IAPS scores and self assessments
per participant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
6.2 Averaged Self-Assessment Classification Accuracy (in Percentage) for the
Three Chosen Emotions . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
6.3 The features extracted from the EEG signals . . . . . . . . . . . . . . . . 96
6.4 Overview of the dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
6.5 Emotion Recognition rates using ALL 54 electrodes and 5NN . . . . . . . 100
viii
6.6 Cross-Subject Emotion Recognition rates using ALL 54 electrodes . . . . 101
6.7 Device Specifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
6.8 Subject Specific Recognition rates using 8 electrodes and 5NN . . . . . . 103
6.9 Cross-subject Recognition rate using only 8 electrodes . . . . . . . . . . . 103
6.10 Channels selected using GA algorithm . . . . . . . . . . . . . . . . . . . 107
6.11 Emotion Recognition rates using electrodes selected using GA . . . . . . 107
6.12 Recognition rates of emotion using channels selected by GA . . . . . . . 108
6.13 Cross subject emotion recognition rates using different wavelets for DWT 111
A.1 Associated neighbour electrodes for Local Laplacian filters [5] . . . . . . 119
B.1 List of IAPS Images Used for Session 1 . . . . . . . . . . . . . . . . . . . 121
B.2 List of IAPS Images Used for the Session 2 . . . . . . . . . . . . . . . . . 122
B.3 List of IAPS Images Used for the Session 3 . . . . . . . . . . . . . . . . . 123
C.1 Participant 1: Self-Assessment Classification Accuracy (in Percentage) for
the Three Chosen Emotions . . . . . . . . . . . . . . . . . . . . . . . . . 124
C.2 Participant 2: Self-Assessment Classification Accuracy (in Percentage) of
the Three Chosen Emotions . . . . . . . . . . . . . . . . . . . . . . . . . 125
C.3 Participant 3:Self-Assessment Classification Accuracy (in Percentage) of
the Three Chosen Emotions . . . . . . . . . . . . . . . . . . . . . . . . . 125
C.4 Participant 4: Self-Assessment Classification Accuracy (in Percentage) of
the Three Chosen Emotions . . . . . . . . . . . . . . . . . . . . . . . . . 125
C.5 Participant 5: Self-Assessment Classification Accuracy (in Percentage) of
the Three Chosen Emotions . . . . . . . . . . . . . . . . . . . . . . . . . 126
ix
List of Figures
1.1 Emotion and Core Affects. Figure adapted from [79] . . . . . . . . . . . 3
1.2 Mayer and Salovey’s (1997) Four-Branch Model of Emotional Intelligence 3
1.3 Affect Intelligent Human and Machine Interface (HMI) . . . . . . . . . . 5
1.4 Diagram of a typical machine learning problem . . . . . . . . . . . . . . 9
2.1 Example of basic emotions using facial expressions . . . . . . . . . . . . . 16
2.2 Circumplex Models of Emotion (Image adapted from [48]) . . . . . . . . 16
2.3 Hybrid Discrete-Dimensional Model of Affect . . . . . . . . . . . . . . . . 18
2.4 Affect Expression Modalities. Figure adapted from [79] . . . . . . . . . . 19
2.5 Image Samples for The International Affective Picture System (IAPS) . . 25
2.6 SAM-Scales for valence (top), arousal (bottom) . . . . . . . . . . . . . . 26
2.7 Multiple Modalities Model with Decision Level Fusion [45] . . . . . . . . 28
3.1 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.2 Standardized EEG Recording (F3) . . . . . . . . . . . . . . . . . . . . . 37
3.3 EEG Signal Contaminated with Eye-blinking . . . . . . . . . . . . . . . 37
3.4 Spectrum Characteristics of EEG . . . . . . . . . . . . . . . . . . . . . . 38
3.5 Electrodes placements of different EEG recording systems . . . . . . . . 39
3.6 Cross section of the human brain . . . . . . . . . . . . . . . . . . . . . . 40
3.7 Sample EEG Amplitude with Normal Fitting . . . . . . . . . . . . . . . 44
x
3.8 Four sample mother wavelet functions used for DWT decomposition of
EEG signals: (a) Mexican hat wavelet, (b) Daubechies order 8 wavelet
(db8), (c) biorthogonal wavelet order 1.3 (bior1.3), and (d) biorthogonal
wavelet order 1.5 (bior1.5). . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.9 EEG denoising using Discrete Wavelet Transform (DWT) . . . . . . . . . 48
4.1 System components in the training stage . . . . . . . . . . . . . . . . . . 54
4.2 Power within sub bands for F3, F4 for Negative Positive and Calm states 59
4.3 Discrete Wavelet Decomposition using db4 Wavelet . . . . . . . . . . . . 60
4.4 Emotive Epoc Neuroheadset . . . . . . . . . . . . . . . . . . . . . . . . 62
4.5 Examples of LDA Classifier . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.6 Examples of KNN Classifier . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.1 Sample EMD Decomposition for Participant 2 Session 3 Negatively Excited 73
5.2 Sample EMD Decomposition for Participant 2 Session 3 Positively Excited 73
5.3 Hilbert Huang Spectrum for Instantaneous Frequency between 0− 70Hz 75
5.4 The filter bank property of Regular EMD . . . . . . . . . . . . . . . . . . 78
5.5 The filter bank property of MEMD . . . . . . . . . . . . . . . . . . . . . 78
5.6 Instantaneous Amplitude and Averaged Frequency of the IMFs . . . . . . 79
5.7 MEMD for Signal Reconstruction and Feature Analysis . . . . . . . . . . 81
5.8 The block diagram for Genetic Algorithm . . . . . . . . . . . . . . . . . . 82
6.1 Experimental Components used to for Simulation . . . . . . . . . . . . . 87
6.2 The Three Emotion Classes Studied in This Project . . . . . . . . . . . . 89
6.3 Protocol description for eNTERFACE06-EMOBRAIN database . . . . . 90
6.4 Selected IAPS images for the 3 classes emotion elicitation experiment . . 91
6.5 Biosemi Active Two . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
6.6 HOC Order vs. Correct Recognition Rate (54 Channels) . . . . . . . . . 95
6.7 HOC Order vs. Correct Recognition Rate (6 Channels) . . . . . . . . . . 96
xi
6.8 Recognition Rate using HOC features for K = 1,3,5,7,9 . . . . . . . . . . 101
6.9 6 Channels referenced to Emotive Epoch . . . . . . . . . . . . . . . . . . 103
6.10 Averaged and Maximum Fitness (correct recognition rate) in each gener-
ation using LDA and GA . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
6.11 Channels selected through Genetic Algorithm . . . . . . . . . . . . . . . 106
6.12 Channel of significance obtained through Genetic Algorithm . . . . . . . 106
6.13 Sampling Rate vs. Correct Recognition Rate using All Electrodes and LDA109
6.14 Sampling Rate vs. Correct Recognition Rate using All Electrodes and kNN109
xii
List of Acronyms
Acronym Description
AC Affective Computing
ANOVA ANalysis Of Variance
ANS Autonomic Nervous System
BVP Blood Volume Pulse
CNS Central Nervous System
ECG Electrocardiogram
EEG Electroencephalogram
EI Emotional Intelligence
EMD Empirical Mode Decomposition
EMG Electromyogram
ERP Event Related Potential
FFT Fast Fourier Transform
GSR Galvanic Skin Response
HMI Human Machine Interaction
HOC Higer Order Crossings
IAPS International Affective Picture Systems
ICA Independent Component Analysis
IF Instantaneous Frequency
IMF Intrisic Mode Function
KNN K-Nearest Neighbors
LDA Linear Discriminate Analysis
MEMD Multi-variate Empirical Mode Decomposition
RSP respiration
SFFS Sequential Floating Forward Selection
xiii
STFT Short Term Fourier Transform
SVM Support Vector Machine
fMRI functional Magnetic Resonance Imagery
QDA Quadratic Discriminate Analysis
xiv
List of Symbols
Symbol Description
Xi, i = 1, · · · , N Raw N-sample EEG signal
T Duriation of a trial
µx Mean if signal x
σx Standard deviation of signal x
δx The mean of the absolute values of the first differ-
ences of signal x
Ne Number of Electrodes
f js,i = [µx, σx, δx, δx, γx, γx] Statistical feature vector for sample i and and elec-
trode j
ENTl Wavelet entropy at lth scale
ENGl Wavelet engergy at lth scale
f jw,i = [ENGj
l , ENT jl ] Wavelet-based feature vector for sample i and and
electrode j
CX(l, n) Wavelet coefficients at lth scale for signal x
Li, i = 1, · · · , N Self-assessment scores on the projected stimuli
Ki, i = 1, · · · , N Scores provided with the projectd stimuli
xv
Chapter 1
Introduction
Emotional states greatly influence many areas in our daily lives, such as: learning, deci-
sion making and interaction with others. Our decision and course of actions are adapted
to the emotion cues we received throughout this process. This makes the exchange of
information much more effective and smooth. Emotion is an import part of the human-
human communication. Emotion provides important cues in disambiguating the message
we are sending.
With technology advancements, a daily life is entangled with interactions between
humans and human with machines. Human is very efficient in decoding the emotional
cues presented during the human-to-human interaction, and provides adjusted, social
context-appropriate responses, which makes such interaction efficient and smooth.
However, in the case of human-to-machine interactions, because the machine’s re-
sponse or interface is often predefined, rigid and unconditional, users are often left un-
satisfied and frustrated. This work aims to improve a computer’s ability to correctly
recognize human emotional states.
1
1.1 Emotional Intelligence and Human Machine Interaction (HMI)
1.1 Emotional Intelligence and Human Machine In-
teraction (HMI)
The term emotion when used in non-scientific environment typically refers to the proto-
typed emotions, the clearest cases of emotion, such as: anger, disgust, fear, happiness,
sadness, and surprise. However, strictly speaking as stated in cognitive theory [79] and
shown in Figure 1.1, emotion is a unique, personal expression that differs under social
context, culture background and personal experience. Therefore, for empirical emotion
recognition applications, the concept of emotion lacks necessary and sufficient features.
On the other hand, as an alternate and more practical interpretation of emotion, the con-
cept of core affect is introduced as the neurophysiological states measured as the simple
raw feelings evident in moods and emotions [80, 81]. According to this representation, at
any given time, one’s emotional state is represented as a point on the core affect model
(more details provided in Section 2.2). In the other words, affect or core affect is a
measure of the intensity if emotion in the physiological perspective.
Emotional intelligence [19] is a widely accepted concept and to incorporate this in-
formation in the human-machine interactions will make such interactions more intuitive,
flexible and efficient. As stated in [19], emotional intelligence can be encapsulated in
to four branches as shown in Figure 1.2. It provides a summarized guideline on how
emotions are perceived and reflected on our cognitive and thinking process.
Without emotional cues, our interaction with machines can be frustrating and counter-
productive. Here is a simple user case as an example, a user needs to contact customer
service through the phone after purchasing a product and was connected to the IVR
(Interactive Voice Response system), or so called virtual agent. In many cases, there
is no other way to get through the calling queue unless you pass through the virtual
agent, but after a few tries and the ’agent’ still can’t figure out what the user is saying.
The rest of the scenario has been shown with numerous examples online, smashing the
2
1.1 Emotional Intelligence and Human Machine Interaction (HMI)
Core Affect
Perception of
Affective Quality
Attribution to Object
Appraisal
Action
Emotional Meta-
Experirience
Emotion Regulation
Prototype of a Specific
Emotion
Figure 1.1: Emotion and Core Affects. Figure adapted from [79]
The implications of emotion,
from their feelings to their
meanings, are considered
1. Emotional
Perception
EMOTIONAL
INTELLIGENCE
2. Emotional
Integration
3. Emotional
Understanding
4. Emotional
Manageement
Emotions are
perceived and
expressed
Emotions are sensed,
and begin automatic
influences on cognitionEmotions enter the cognitive
system as noticed signals and
as influences on congition
Emotions and emotion-related
information is attended to
Emotional signals about
relationships are
understood, along with
their interactive and
temporal implications
Thoughts promote emotional,
intellectual, and personal
growth
Management encourages
openness to feeings
Figure 1.2: Mayer and Salovey’s (1997) Four-Branch Model of Emotional Intelligence
3
1.2 Affective Computing
phone, yelling at his/her PC are typical responses. As we can see, with the increas-
ingly use of machines, it is desirable to narrow the communication gap between human
and machines to resemble the human-to-human communication. Emotionally-intelligent
Human-Machine Interface (HMI) refers to the scenario that machines, such as a per-
sonal computer, can detect, recognize and respond to the user’s emotional states. Such
”human-centered-computing” is also referred to as Affective Computing (AC).
1.2 Affective Computing
Affect computing has been an active research topic for the past two decades and has
shown a strong growth in the past few years. It aims to narrow the communication
gap between humans and machines. With the advancements of the human computer
interface, there is an inevitable need for machines to understand and react to the affect
state of the user. Even though the definition of affect itself is a topic of debate within the
psychological literature, it is globally accepted that affects such as moods and emotional
states significantly influence the outcomes of people’s daily activities in learning and
decision making. In the Affect Computing literature, there are mainly three types of affect
computing applications, affect detection, affect mimicking and actual affect recognition
as humans do [48]. With the enhancement of emotion recognition capability, human-
computer interaction will be much more efficient and enjoyable.
Affect detection is a critical step towards affects recognition, and is the first step
towards affect-sensitive applications. A machine will not be able to ’respond’ to the
user’s affects without accurately detecting it first. In the field of affect detection, many
approaches have been established in the past twenty years utilizing audio, visual (facial
expressions), body movement analysis, and physiological signals (peripheral) and more
recently, autonomic signals such as ECG. However, comparing to other pattern recogni-
tion and machine learning problems, the results are disappointing. While AC recognition
4
1.2 Affective Computing
User Interface
Cloud
Sensors
Emotional Cues User Interface
1
2
3
Emotion Adaptation
Priorities
InterfaceUser
Emotion
AssessmentUser
Remote
Monitoring
User Interface
Service
Figure 1.3: Affect Intelligent Human and Machine Interface (HMI)
rates of 60−80% [14] are common, in most other recognition problems, recognition rates
of > 90% and often greater than 95% are reported. The reason for a lower than other
pattern classification performance is due to the complex, multiple-mapping between the
acquired signal and the actual affect state present. To fully understand and classify affect
is next to impossible without all the key factors such as the social context, the user’s
culture background and education. Affect, such as emotion, is also very spontaneous and
a not very well defined psychophysiological process. Therefore, in the literature of affect
detection, only a well approximate solution is possible.
There are still more challenging problems to be solved besides the improvement of
affect detection accuracy. Currently, most of the research results are produced under
controlled environments. In real life, such data collection might be prohibiting, such as
the availability of frontal face images or high quality audio signals under a noisy envi-
ronment. Recently, many research results have been presented showing the specificity
5
1.3 Affect-sensitive Applications
between physiological signals and prototyped emotions (or discrete emotions). Physiolog-
ical signals, such as Electrocardiogram (ECG), Galvanic Skin Response (GSR), breath-
ing, and Electroencephalogram (EEG), can be captured continuously with non-intrusive
means, and usually less affected by the external noise source. Therefore, the goal of
this thesis is to investigate the feasibility of physiological signals, Electroencephalogram
(EEG) in particular, to improve the affective computing methods in the long term goal
of developing affect-sensitive HMI.
1.3 Affect-sensitive Applications
Affect-sensitive applications are being developed in fields such as gaming, health-care
and learning technology.
1.3.1 Human-Machine Interaction (HMI)
During the learning process, learners experience various emotions such as satisfaction,
happiness or frustration and sadness. The emotional states of a learner can significantly
affect the outcome of the learning process [44, 52]. If a computer interface can recognize
and adapt to such emotional changes as a class teacher can do by changing the material
presented or the way it was presented, it will positively impact the learning gain and
improve the overall learning experience.
1.3.2 Social Signal Processing
Affect-sensitive applications are being developed in the field of social signal processing
for information indexing and retrieval according to their associated affect state. With
the use of surveillance camera or voice recorder, emotional cues are collected and used for
storing or retrieving multimedia contents such as songs, videos and images. This (often)
real-time feedback system can increase the dynamics of the interface and enhance the
6
1.3 Affect-sensitive Applications
user-centered computing experience.
1.3.3 Health and Rehabilitation Applications
Affect-sensitive applications are also very useful in behavior prediction and monitoring
[74, 22]. The autism spectrum disorders are neuro-developmental disorders that are typ-
ically shown by a combination of a lack of social interactions or communication, and
repetitive patterns of violent behaviors or self-destructive activities. Autistic people usu-
ally experience frequent mood swings and a high level of anxiety in social interactions
due to their inability to cooperate and express their feelings or to be understood. Studies
have shown that even though autistic people seem to be calm before they erupt to prob-
lematic behaviors, or self-injury, there have been dramatic changes in the physiological
signals. Most of these changes are a result of frustration and anger caused by misunder-
standings between the communicators. Affect-sensitive applications can provide critical
information in sensing such mood and physiological changes and essentially decrease the
occurrence of such misunderstandings and narrow the emotional communication gaps.
From the health care perspective, an appropriate assessment of the patient’s emotional
state can be a key indicator of the mental and physical health status, but the power of
emotions themselves over the recovery process has also been documented [10]. In a tele-
health scenario, clinicians could greatly increase the quality of their services if a system
can accurately assess patient’s emotional states which are not directly accessible due to
the non-physical presence of the patients. Collection of physiological signals and mapping
of the these to emotional states can synthesize the patient’s affective information for the
health-care provider.
7
1.4 Technical Challenges
1.4 Technical Challenges
To design a system for emotion assessment, the first and also critical step is to define
the emotions that will be detected for the system. However, currently emotion is not a
well-defined concept and remains as an active research topic in the Psychology literature.
To define a specific emotion is not an easy task, for example. in the English literature,
there are over 3000 words to describe emotions or affects [95]. Besides the large variety
of emotion description, the perception and expression of each emotion also varies greatly
for an individual due to under different social settings. For example, a smile at a social
gathering could due to the true happy feeling of the subject, or a polite gesture or to
mask his or her true feelings. To detect the true feelings under such natural environment
is a very challenging research project. There are several models that have been proposed
towards the understanding of emotion and on the approaches to generalize the definition
of emotions across different culture and education background. Their feasibility is still a
topic of debate, however, more empirical studies have shown the effectiveness of models
such as circumplex model or the basic emotions model (see more details in Section 2.2).
Ground truth definition for such emotion detection system is another challenging
problem. Ground truth refers to the known labels for the input signals (samples), such
as to which emotional states each sample belongs, that will be used in designing the
machine learning system as shown in Figure 1.4. For a supervised machine learning
problem (labels are known to the system), the correct labeling of input samples is a
critical and important step [24]. A machine learning algorithm will try to find a pattern
that is most consistent amongst all observed samples of each class and tries to use this
pattern and decision rules to predict the class to which the new samples belongs. If the
data is mislabeled, the class-specific patterns amongst the observations become vague
and as a consequence, larger number of learning mistakes will occur and the learning
system might fail altogether. Most of the current research on affect detection has been
designed on databases with emotion elicited using methods such as, professional actors
8
1.4 Technical Challenges
Machine Learning
algorithm
Labeled training
samplesPrediction Rule
New sample
Predicted Label
(classification)
Figure 1.4: Diagram of a typical machine learning problem
posing (facial), self-induced, or observing images or audio clips that are believed to evoke
the desired emotion. However, little work has been shown on how well these data truly
represents the targeted affect states. As a result, there is no way to detect ’bad’ samples
other than basic signal quality check on the amount of noise or artefact contamination.
In other words, there is no way to define or reject mislabeled data. This can pose a
significant challenge in developing a data model (often statistical) for detection problems
[24].
Several physiological signals, such as ECG, GSR and BVP have been actively stud-
ied as suitable means for recognizing affects through the use of appropriate pattern
recognition techniques; promising results have been reported lately [14]. However, these
physiological signals can also produce similar changes by factors that are not related to
emotions. For example, increased physical movements can increase the instantaneous
heart rate, higher ambient temperature can increase the skin conductance without the
presence of excitement or fear. Also, to collect this large variety of signals required by
such sensor networks, the system could potentially be infeasible for practical, long-term
continuous monitoring (i.e. days). For example, a subject is required to be stationary
with one of the fingers to be still in order to collect peripheral signals such as Galvanic
9
1.5 Contributions
Skin Response (GSR); this would interfere with our use of hands and is highly imprac-
tical. Therefore, alternate methods in less intrusive manners are needed for a practical
system to be used in the natural settings for continuous caption of sensorial signals. The
response time which is time required for an emotional cue to be present on the selected
physiological signal is another application concern. For example, a signal originating
from the Central Nervous System will vary in the order of milliseconds (ms) when the
affect states vary, but it will take much longer to shown through peripheral signal such
as GSR.
Recent studies [58, 72, 66] have shown that signals generated from the Central Nervous
System (CNS) (i.e., EEG) can be an alternate and potentially more powerful means for
affect detection analysis since they are less affected by a change in physical activity.
In particular, studies using fMRI [83] has shown that the presence of affect states is
correlated with the frontal cortex synchronization, asymmetry between the left and right
frontal lobes, which can be used as a base for automatic affect states detection. Therefore,
it was the objective of my study to examine the feasibility and characterization of EEG
signals for emotion detection.
1.5 Contributions
For this project, my objective was to explore the universal characteristics of human
emotions using EEG signals; seeking a new method that can sense and communicate
autonomic nervous system (ANS) arousal in daily life. The focus of this research was on
the development of a signal processing framework for affect detection using brainwaves.
Specifically, this research initiative was to develop an EEG-based non-subject-specific
affect detection system that can quantify the emotional states of a person. Several
existing feature extraction algorithms were researched and analyzed. Their effectiveness
was evaluated through a series of classification simulations with the use of a publicly
10
1.5 Contributions
available emotion-specific EEG database. Simulation results on the accuracy of correctly
predicting new samples were presented and discussed. Several classification methods were
selected based on the set of features they were to be used with. These classifiers were
discussed in detail and evaluated through classification simulations.
The contributions of this thesis are:
1. Identification of key points in computerized emotion assessment: emotion elici-
tation, ground truth validation and experimental protocol setup. Many emotion
elicitation methods have been used in the process of generating emotion-specific
samples, however, in many cases, there is an uncertainty and often debatable choice
of ground truth definition. In this thesis, two methods that can be used to validate
the labeling of ground truth for collected samples have been researched, discussed
and analyzed.
2. Investigation of the use of a novel time-frequency analysis method, Empirical Mode
Decomposition, on EEG analysis and emotion classification. Empirical Mode De-
composition (EMD) has been used used in emotion recognition study using EEG
and shown promising results. However there are many unsolved challenges in terms
of the application of this method. Since this decomposition method is entirely data-
driven and the decomposition outputs depend on the time domain local character-
istics of the signal, the number of decomposition levels varies between channels and
recording scenario (trials and sessions). As a result, to define a common feature
space is very challenging or impossible. To resolve this problem, an expanded ver-
sion of EMD, Multi-variate EMD (MEMD) is researched, analyzed, and evaluated
through simulations using EEG signals.
3. Introduction and presentation of a framework for EEG emotion classification, var-
ious state-of-art feature extraction (statistical, spectral, HOC, wavelet) and classi-
fication algorithms (LDA [24, 9], KNN[17]) on emotion analysis using EEG signal
11
1.6 Thesis Overview
were investigated and implemented. Classification performance and application
limitations were presented through simulation results on a publicly available EEG-
Emotion (3 Classes) dataset. The process of applying MEMD algorithm on all the
channels produces a large number of Intrinsic Mode Functions (IMFs), time-varying
frequency components of an input EEG recording. Concatenation of all features
directly extracted from each IMF will significantly increase the dimension of the
feature space with much redundancy in the information presented. As a result of
the high dimension of feature space, a very large number of samples are required
to produce a meaningful statistical model [23]. Generic Algorithm (GA) is applied
to reduce the number of channels and the number of IMFs for feature extraction
analysis. The final results provide two key pieces of information in terms of under-
standing emotion and brain waves, one is the location of the most emotion-specific
channels (discriminating power) and the frequency range of the emotion-specific
brain waves (Instantaneous Frequency analysis of the IMFs). This will provide a
mean to compare and validate the research results from the Psychophysiological
literature, and aid the study and understanding of human emotion. Practical con-
straints such as the minimum required window length in time domain and the edge
effects of windowing operation in time domain were discussed.
1.6 Thesis Overview
The remainder of this thesis is organized as follows.
• Chapter 2 introduces the notion of affect as a mean for emotion assessment affect
from the psycho-physiology perceptively. Three models for affect representation
are presented. Various affect assessment modalities and the fusion concept for
multiple modality analysis are discussed as well. This chapter ends with a review
of state-of-the-art classification performance.
12
1.6 Thesis Overview
• Chapter 3 describes the use of physiological signals on affect detection. It begins
with a brief review on the correlation between various physiological signals and
affect states. Then topics on EEG signal acquisition and preprocessing approaches
are listed and discussed in detail.
• Chapter 4 starts with a system-level overview on the proposed automated affective
detection process, followed by an in-depth review on related feature extraction
methods and classifiers. State-of-art feature extraction algorithms were analyzed
in terms of the theoretical aspect and the practical implementation constraints.
• Chapter 5 first presents a novel feature extraction algorithm utilizing Multi-variate
Empirical Mode Decomposition (MEMD). Secondly, genetic algorithm, as a feature
dimension reduction algorithm, is introduced and implemented for this research
work. Regarding the application of MEMD, several implementation factors such
as computational complexity, edge effect due to time-domain windowing operation
are also evaluated.
• Chapter 6 provides the experimental protocol for a publicly available database.
This dataset is used to evaluate and compare the classification performance of
various selected feature extraction algorithms. Simulation results using the new
and existing feature extraction algorithms are presented and analyzed.
• Chapter 7 concludes this thesis by summarizing the contributions of this thesis and
provides possible future research directions.
13
Chapter 2
State of the Art in Affect
Recognition
2.1 Introduction
Emotion expression is a result of complex interactions of the biological nature with the
surrounding environment based on observation, personal experience, and self-regulation.
However, some of these factors are spontaneous (self-regulation) and vary greatly from
person to person (personal experience). Due to this complication, to model such a multi-
facet process is a very challenging research problem and has long believed to be impossi-
ble. In this chapter, we are going to discuss a few basic and also very important concepts
involved in the realm of affective computing. The topics discussed here are the emotional
models that could be used to define human emotions, the emotion elicitation methods
involved in collecting experimental data and a brief review on the modalities that have
been used for affect recognition. Towards the end of the chapter, a list of state-of-the-art
approaches and recognition performances are listed and discussed.
14
2.2 Modeling Affect
2.2 Modeling Affect
In the Psychology literature, there are mainly three models that have been used to
represent the actual emotional state of a person. The main difference between the three
models is the number of emotions that are represented within the model. For example, the
discreet model 2.2.1 is associated mainly with the six basic or prototypical emotions that
are believed to be universal among the human populations, which is a much smaller set
than the emotions presented in the circumplex model 2.2.2. The choice of emotion models
also depends on the intended application. The discrete model of emotion has widely
been used for emotion recognition applications utilizing facial images, which is believed
to be universal among a large population across gender and ethnicities [25, 27]. The
circumplex model is adopted in most emotion analysis methods involving physiological
signals [14, 76].
2.2.1 Discrete Model
The discrete model of emotion was first introduced by Ekman [25] in 1971, who analyzed
the correlation between human emotions and facial expressions among subjects with
different culture backgrounds and declared the existence of a universal set of ’primary’
emotions (e.g. fear, anger or disgust). The six basic or universal emotions are joy,
sadness, surprise, anger, love, fear, also referred to as the prototypical set of emotions.
The term basic, used interchangeably with ’universal’, indicates that these basic emotions
are expressed universally across different culture background and gender etc. According
to the basic emotion model, at any time instance, if a person is experiencing an emotion,
he or she should be able to choose one of the emotions out of the six basic emotions
presented in this model to best approximate his or her true feelings. In other words, the
discrete model acts like a complete set for the description of emotions.
15
2.2 Modeling Affect
Figure 2.1: Example of basic emotions using facial expressions
2.2.2 Dimensional Model
The circumplex model of emotion was proposed by Russell [79] in 1999 and was devel-
oped based on Cognitive Theory, where emotions are represented on a 2D plane (see
Figure 2.2), with one dimension of judged valence (i.e., pleasure/positive or displea-
sure/negative) and the other of arousal. Valence stands for one’s judgment about a
situation as positive or negative and arousal spans from calmness to excitement, express-
ing the degree of one’s excitation.
ACTIVATION
DEACTIVATION
PLEASANTUNPLEASANT
tense
nervous
stressed
upset
sad
depressed
lethargic
Fatiguedcalm
relaxed
serene
contented
happy
elated
excited
alert
Figure 2.2: Circumplex Models of Emotion (Image adapted from [48])
In interpreting this 2-D structure, proponents of the circumplex model of affect sug-
gest that all affective states arise from two independent neurophysiological systems, the
16
2.2 Modeling Affect
valence and arousal systems. Each and every affective experience is the consequence
of a linear combination of these two independent systems, which is then interpreted as
representing a particular emotion (see Figure 2.2). Fear, for example, is conceptualized
by circumplex theorists as a neurophysiological state typically involving the combination
of negative valence and heightened arousal in the CNS. The subjective experience of fear
arises out of cognitive interpretations of these patterns of physiological activity that oc-
cur in the context of eliciting stimuli. As emotions are experienced and communicated,
cognitive interpretations are employed to identify the neurophysiological changes in the
valence and arousal systems and conceptually organize these physiological changes in
relation to the eliciting stimuli, memories of prior experiences, behavioral responses, and
semantic knowledge [79].
Despite the differing descriptive labels applied to these dimensions (active/deactivation,
valence/arousal), the 2-D structure is found consistently across a large number of studies.
For instance, Osgood (1952) [69] validated this model by factor analysis. In compari-
son, this model is more suitable for emotion analysis using physiological signals than the
widely accepted Ekman’s model which was based on facial expression.
2.2.3 Hybrid Discrete-Dimensional Model of Affect
Although the two models are typically presented as mutually exclusive, for emotion
analysis using physiological signals, a hybrid model that places discrete emotions, with
respect to both ANS and self-report variables, in dimensional affective space has been
proved to be most effective and is consistent with self-assessments reports [87, 15].
This model provides us with a way to bridge many previous findings on prototypical
emotions and cognitive process (e.g., learning and decision making), but also ensure the
consistency between physiological expressions of emotion and emotion modeling that the
circumplex model provides.
In this study, we have adopted the use of the hybrid model and focused our work
17
2.3 Affect Expressions
Happiness
Neutral
Sadness
Disgust
Anger
Fear
Surprise
ACTIVATION
DEACTIVATION
PLEASANTUNPLEASANT
tense
nervous
stressed
upset
sad
depressed
lethargic
Fatiguedcalm
relaxed
serene
contented
happy
elated
excited
alert
Figure 2.3: Hybrid Discrete-Dimensional Model of Affect
on the recognition of three key affective states. Based on the hybrid model of affect,
we infer that high arousal can be interrelated as highly motivated, high valence means
the current situation is pleasant and approachable, whereas low valence is unpleasant,
avoidable. Hence, in the domain of learning, decision making and behavior monitoring,
we would like to know when the person is happy (’positively excited’), or frustrated
(’negatively excited’), or bored (’calm’).
2.3 Affect Expressions
Emotion can be expressed in a few ways, as shown in Figure 2.4. Many emotion recog-
nition approaches have been established using facial images, verbal, body movements,
peripheral physiological signals and more recently, autonomic signals. Each approach
has its own advantages and application limitations. A brief review will be listed in the
following sections.
18
2.3 Affect Expressions
Event
e.g., Danger Emotion
Subjective Feeling
e.g., Afraid
Nonverbal Signal
e.g., face, voice
Automatic Pattern
Instrumental Action
e.g., flight
Figure 2.4: Affect Expression Modalities. Figure adapted from [79]
2.3.1 Facial Expressions and Affect
The face plays a significant role in human emotion perception and expression. The
association between face and affective arousal was confirmed by a series of systematic
studies in the field of psychology [25, 26]. Ekman et al. [27] has proposed the use of
the Facial Action Coding System (FACS) which is a comprehensive and anatomically
based system that is used to measure all visually discernible facial movements in terms
of atomic facial actions called Action Units (AUs). These AUs can be used for any
higher order decision making process including recognition of basic emotions according
to Emotional FACS (EMFACS) rules and a variety of affective states according to FACS
Affect Interpretation Database (FACSAID), as well as for recognition of other complex
psychological states such as depression or pain. AUs of the FACS are very suitable to be
used in studies on human naturalistic facial behavior as the thousands of anatomically
possible facial expressions (independently of their higher-level interpretation) can be
described as combinations of 27 basic AUs and a number of AU descriptors. It is not
surprising, therefore, that an increasing number of studies on human spontaneous facial
behavior are aimed at automatic AU recognition (e.g., [8], [97], [53]).
19
2.3 Affect Expressions
2.3.2 Audio Analysis and Affect
Speech is another important communication device in human communication. It deliv-
ers affective information through explicit messages (what we say), and implicit messages
(how we say it) that reflect the way the words are spoken. Although cognitive scientists
have not identified the optimal set of vocal cues that reliably discriminate among affec-
tive and attitudinal states, listeners seem to be rather accurate in decoding some basic
emotions from prosody [28] and some non-basic affective states such as distress, anxiety,
boredom, and sexual interest from nonlinguistic vocalizations like laughs, cries, sighs, and
yawns. The basic emotion-related prosodic features extracted from audio signal include
pitch, energy, and speech rate. Cowie et al. [18] provided a comprehensive summary of
qualitative acoustic correlations for prototypical emotions.
However, in real life, such data collection might be prohibiting, such as the availability
of frontal face images or high quality audio signals under a noisy environment. On the
other hand, physiological signals, such as EEG, can be captured continuously with non-
intrusive means, and usually less affected by the external noise source.
2.3.3 Physiological Expressions of Affect
Changes in biological signals are related to many psychological constructs and it is of
great importance to distinguish such differences in the literature of Affect Computing.
There is a many-to-many mapping between psychology and physiological changes, which
makes affect detection a very challenging problem. Physiological signals can be divided
into two categories: the ones originate from the peripheral nervous system (e.g., Electro-
cardiogram (ECG), Galvanic Skin Response (GSR) and Blood Volume Pressure (BVP))
and the ones originate from Central Nervous System (CNS) (e.g., EEG). In recent years,
more research studies have been carried out with the first category of signals and have
produced interesting results [3, 87, 15]. However, few studies has been done with the sec-
20
2.3 Affect Expressions
ond category of signals even though the cognitive theory states that the brain is heavily
involved in emotions [16].
There are two types of affect changes that are of interest in the affect detection us-
ing physiological signals, one is emotion and the other is mood. Emotion changes are
believed to be of short time duration and mood is more long lasting. Therefore, if one
is interested in detecting emotions, short-term biological changes such as skin conduc-
tance change will be more relevant than heart rate variability, which is more suitable for
mood changes. Advantages of affect detection applications using Physiological signals are
multifold: continuous recording, good time resolution (EEG), overcome social masking
or self-regulation factors, and it’s much harder to fake emotions through physiological
changes than facial or verbal expressions. Through the use of a sensor networks, multiple
modality (e.g., ECG, EEG, GSR) emotion detection system using physiological signals
could outperform the facial or audio based approaches.
The amount of information that the physiological signals can provide is increasing,
mainly due to major improvements in the accuracy of psychophysiology equipment and
associated data analysis techniques. Still, physiological signals are currently recorded
using equipment and techniques that are more intrusive than those recording facial and
vocal expression. Fortunately, some of the challenges associated with deploying intru-
sive physiological sensing devices in real-world contexts are being mitigated by recent
advances in the design of wearable sensors (e.g., [1]). Even more promising is the inte-
gration of physiological sensors with equipment that records the widely varying pattern
of actions associated with emotional experience. When statistical learning methods are
applied to these vast repositories of affective data, they can be used to find patterns
associated with different action tendencies in different contexts or individuals, linking
them to corresponding differences in autonomic changes that are associated with these
actions.
21
2.3 Affect Expressions
2.3.4 Affect Expression Through Peripheral Nervous System
(PNS)
The James and Lange theories [76] emphasize that emotional experience is embodied in
peripheral physiology (e.g. heart rate, ElectroMyogram (EMG), galvanic skin response
(GSR)). Autonomic Nervous System (ANS) is part of the peripheral nervous system that
acts as the control system for regulating breathing, heart rate and other bodily functions.
The Autonomic Nervous System (ANS) has two widely recognized subdivisions, sympa-
thetic and parasympathetic, which work together to regulate physiological arousal. When
the body is under external challenges, the sympathetic system increases the metabolic
output to generate the ’flight and fight’ response, on the other hand, the parasympathetic
system works complementary to the sympathetic system, as to bring the body back to
the equilibrium ’rest’ state [20]. Increased sympathetic activity (sympathetic arousal)
elevates heart rate, blood pressure and sweating, and redirects blood from the intestinal
reservoir towards the skeletal muscles, lungs, heart and brain in preparation for motor
action [3].
The connections between ANS and emotions have long been an active and often de-
batable topic. In [41, 58], emotion was proposed as being ”changes” in the Sympathetic
Nervous System (SNS). In more recent studies [3, 87, 15], it has been shown that the ba-
sic emotions (as anger, disgust, fear, happiness, sadness and surprise) are correlated with
ANS signals such as ECG, Galvanic Skin Response (GSR ) and Blood Volume Pressure
(BVP). Even though results were often controversial between those studies, some physi-
ological correlation of emotions could be identified more frequently than others: increase
of heart rate, skin conductance level and systolic blood pressure has been associated with
fear (e.g. [3]), while an increase of heart rate, systolic and diastolic blood pressure has
been associated with anger (e.g., [20]). Sadness has been found to sometimes lead to an
increased and other times decreased heart rate [87].
22
2.4 Emotion Elicitation Protocols
2.3.5 Affective Expression Through Central Nervous System
(CNS)
Brain signals are part of the Central Nervous System, which could potentially be more
advantageous than the signals generated from the Autonomous Nervous System (ANS),
as other physical factors that are not related to emotions could cause similar physiological
changes as well. For example, increased physical activity can increase the instantaneous
heart rate, increase of ambient temperature can increase the skin conductance, but is not
related to the presence of excitement or fear.
Previous EEG studies [16, 33] generally suggest that EEG is correlated with the affect
states through the oscillation pattern variation and also the lateral (frontal) asymmetry;
greater activation of the right frontal lobe accompanies the experience of more negatively
valenced emotions, whereas greater left frontal activation accompanies the experience of
more positively valenced experiences [6].
2.4 Emotion Elicitation Protocols
The ability to obtain affect-specific physiological signals is one of the most critical com-
ponents in any affect detection or recognition system. Facial images or audio recordings
with emotional expressions are typically collected from professional actors, who knows
in advance the output expressions for each emotion and mimicking such facial expres-
sions. An alternate approach for inducing emotions is to present particular stimuli to
an ordinary participant. Various stimuli can be used, such as: images, sounds (words or
phonics), videos or video games. This approach is advantageous because there is no need
for a professional actor and the responses are closer to the ones observed in real life.
Each emotion elicitation method has its own advantages. Pictures, such as the ones
from the International Affective Picture Systems(IAPS), have been chosen and proved to
be effective in evoking emotions[15]. However, stimuli of pictures can build up tolerance,
23
2.4 Emotion Elicitation Protocols
as the response will be less pronounced if the training and testing sessions are closely
aligned in time. While pictures are static, videos or films contain dynamic content. Such
stimuli may evoke varying emotions which can complicated the emotion classification
process [12]. Overall, large variations between subjects of different cultural backgrounds,
education and past personal experience exist [20]. Therefore, stimuli scores (labels)
evaluated by professional psychologists should be considered along with participant’s self
assessments for overall classification performance analysis.
2.4.1 International Affect Pictures System (IAPS)
The International Affective Picture System (IAPS) was developed by Lang et al [56]
and adopted for many psychophysiological studies involving emotion induction. Several
studies have shown the usefulness of images to elicit emotional responses that trigger
discriminative patterns in both the central and peripheral nervous system [15, 63]. The
IAPS contains 900 emotionally evocative images evaluated by several American partic-
ipants on two dimensions of nine points each (1-9) (see Figure 2.6): valence (ranging
from positive to negative or unpleasant to pleasant) and arousal (ranging from calm to
exciting). The mean and variance of participant judgments for both arousal and valence
are computed from these evaluation.
The labeling of the images follows the Circumplex model of affect proposed by Russell
[79]. The dimensional approach implies a very different rating method, which is based on
the assumption that emotions can be described by their degrees of valence, arousal and
dominance. Valence refers to the quality of an emotion (from unpleasant to pleasant)
while arousal describes the activation level (from calm to excited). Dominance is an indi-
cator for the control a person feels to have over a situation (from weak to strong). Lang
[56] introduced rating scales for these dimensions which consist of pictures of manikins,
called SAM for Self Assessment Manikin (see Figure 2.6). The pictures of the IAPS 2.5
are standardized on the basis of ratings of their valence, arousal, and dominance. During
24
2.4 Emotion Elicitation Protocols
Figure 2.5: Image Samples for The International Affective Picture System (IAPS)
25
2.5 Multimodality Affect Detection
an emotion induction phase, subjects were prompted with evaluated pictures of people
making angry, disgusted, fearful, happy, sad or surprised faces. The corresponding emo-
tion word was written underneath each picture and subjects were asked to pick one. If
test-subjects had trouble choosing one of the pictures/words simply because they did not
feel like one of them, they were asked to say so and in addition state if they either felt a
different emotion (then name it) or if they did not experience an emotion at all.
Valence (Negative - Positive)
1 2 3 4 5 6 7 8 9
Activation (Calm - Excited)
1 2 3 4 5 6 7 8 9
Figure 2.6: SAM-Scales for valence (top), arousal (bottom)
2.5 Multimodality Affect Detection
Modality fusion is required when we have inputs from various physiological signals. Al-
though the benefit of fusion (i.e., audio-visual fusion, linguistic (vocal) and paralinguistic
(non-vocal) fusion, multi-visual-cue fusion from face, head and body gestures) for affect
recognition is expected from engineering and psychological perspectives, our knowledge
of how humans achieve this fusion is extremely limited. The neurological studies on fu-
sion of sensory neurons [3] seem to support early fusion (i.e., feature-level fusion) rather
than late fusion (i.e., decision-level fusion). Feature-level fusion refers to the approach
26
2.5 Multimodality Affect Detection
that a decision is made on a joint feature vector composed of features from each modal-
ity. Decision-level fusion refers to the approach that a decision is made after evaluating
features from each modality independently. However, it is an open issue how to construct
suitable joint feature vectors from different modalities with different time scales, different
metric levels and different dynamic structures, based on existing methods. Due to these
difficulties, most researchers choose decision-level fusion that simplifies the fusion prob-
lem by introducing the conditional dependent assumption. Based on current knowledge
in the literature, optimization methods for multimodal fusion are largely unexplored. A
number of issues relevant to fusion require further investigation, such as the optimal level
of integrating these different streams, the optimal function for the integration, as well as
inclusion of suitable estimations of reliability of each stream.
2.5.1 Fusion at the Feature Level
Fusion at the feature level is to construct an augmented feature set that contains features
from each individual modality. Firstly, features are extracted from each input physiolog-
ical signal, such ECG, GSR, breathing and EEG. Typically these signals have different
sampling rate, range of amplitude and most importantly, different time and frequency
resolution. Secondly, the individual feature sets were combined linearly or non-linearly
into an augmented feature set. In the linear case, concatenation of the features was done
to combine the features extracted from the different peripheral sensors (GSR, Respiration
belt, etc.) in a unique feature set. It can also be used to fuse peripheral features with
features computed from the EEG signals.
Concatenating Nf feature sets f 1, · · · , fNf into a new feature set f which consists of
the concatenation of feature vectors f ji for each sample i and all feature set j:
fi = [f 1i · · · f
ji · · · f
Nf ] (2.1)
Test samples or newly acquired samples will be projected into this feature space and with
27
2.5 Multimodality Affect Detection
a selected classifier, a class label will be assigned.
2.5.2 Fusion at the Decision Level
A multiple modalities model with fusion process at the feature level will break down if
one or more sensor inputs are missing or severely degraded due to artefact interferences.
To overcome this problem, fusion at the decision level is proposed which is often more
practical and desirable. Many methods have been proposed in the literature for such
approach, such as majority voting, the class label with the most vote will be assigned as
the class output. More sophisticated methods such as mixture of Gaussian model [45, 43]
will also take into account of prior information thus provides a statistically more reliable
model.
Figure 2.7: Multiple Modalities Model with Decision Level Fusion [45]
Usually, performing late integration is chosen over performing early integration for
two primary reasons. First, the feature concatenation used in early integration would
28
2.6 State of the Art Emotion Recognition Performances
result in a high dimensional data space, making a large multi-modal dataset necessary for
robust training of the classifier [23]. Second, late integration provides greater flexibility
in modeling. For instance, with late integration it is possible to train the face and posture
classifiers on different data sources and different classifiers thus providing the best accu-
racy for each modality separately. However, designing optimal strategies for decision-level
fusion is still an open research problem. Various approaches have been proposed in the
literature, such as: product rule, sum rule, using weights, maximum/minimum/median
rule, majority vote [49, 55].
2.6 State of the Art Emotion Recognition Perfor-
mances
Affective computing has been a very active research field over the past few years, espe-
cially research involving the use of physiological signals. The goal is to define a statistical
model to associate an observed physiological change (pattern) to an emotional state.
Table 2.2 provides a non-exhaustive list of relevant studies in the recent years con-
cerning emotion detection applications using physiological signals. Due to the lack of a
common testing platform or testing criterions between various approaches, it is difficult
to draw a fair comparison between these studies.
29
2.6
State
ofth
eArt
Emotio
nRecognitio
nPerfo
rmances
Table 2.1: State of the Art Emotion Recognition Performances
Study Data Set Emotions Stimulus Signals Methods (Feature Extrac-
tion & Classifier)
Best Results
(Correct classifi-
cation Rate)
Picard et al. [74] 1 sub, 20 days Neutral, anger hate, grief self-induction ECG, RSP,
GSR, EMG
SFFS-Fisher projection 81%
Katsis et al. [46] 10 sub High/low stress, disappoint-
ment and euphoria
Driving simula-
tion
EMG, ECG,
RSP, GSR
SVM 79%
Wagner et al. [94] 1 sub, 25 record-
ings over 25 days
Anger, sadness, joy and
pleasure
Music chosen by
the participant
EMG, ECG,
GSR, RSP
LDA, KNN, MLP, SFFS,
Fisher, ANOVA
92%
Haag et al. [32] 1 sub, several
days
Arousal, valence Images from
IAPS
EMG, GSR,
skin temp, BVP,
ECG, REP
Neural Networks for regres-
sion
97%
Takahashi et al. [90] 12 sub Joy, anger, sadness, fear, re-
laxation
Film clips EEG, BVP,
GSR
Linear SVM one vs. all 42%
Leon et al. [60] 9 sub Neutral, neg. excied, pos.
excited
Images from
IAPS
Heart Rate,
GSR, BVP
Sequential analysis and au-
toassociative networks
71%
*RSP: respiration, ECG: Electrocardiogram, EMG: Electromyography, EEG: Electroencephalogram, GSR: Galvanic Skin Response, BVP: Blood
Volume Pulse. Classification acronyms are : Sequential Floating Forward Search (SFFS), Linear discriminant analysis (LDA), Support Vector Machine
(SVM), Mean Square Error (MSE), Multi Layer Perceptron (MLP), K-Nearest Neighbors (KNN), ANalysis Of Variance (ANOVA).
30
2.6
State
ofth
eArt
Emotio
nRecognitio
nPerfo
rmances
Table 2.2: State of the Art Emotion Recognition Performances using EEG
Study Data Set Emotions Stimulus Feature Extraction Methods Classifier Best Results (Correct
classification Rate)
AlZoubi et al. [7] sub: 3 10 categories Self induced Narrow band power (1Hz) KNN, SVM subject-specific
39.97% - 66.74%
Panagiotis et al.[72] sub: 16, 9-32
years old
Six basic emo-
tions
Images Higher Order Crossing QDA Happiness(71.43%),
surprise(71.34%),
anger(77.33%),
Fear(83.67%), Dis-
gust(78.67%) Sad-
ness(61.83%)
Horlings et al. [34] sub: 10, 19-29
years old
Valence, arousal Images from
IAPS
Frequency band-power, cross-
correlation between band-power,
peak alpha frequency
SVM valence (32%),arousal
(37%)
*RSP: respiration, ECG: Electrocardiogram, EMG: Electromyography, EEG: Electroencephalogram, GSR: Galvanic Skin Response, BVP: Blood
Volume Pulse. Classification acronyms are : Sequential Floating Forward Search (SFFS), Linear discriminant analysis (LDA), Support Vector Machine
(SVM), Mean Square Error (MSE), Multi Layer Perceptron (MLP), K-Nearest Neighbors (KNN), ANalysis Of Variance (ANOVA).
31
2.6 State of the Art Emotion Recognition Performances
There are a few key points involved in the studies shown in Table 2.2 are listed below,
details related to each feature analysis approaches will be discussed in Section 4.2.
• Number of subjects: in general, findings obtained through a larger number of
subjects are more generalizable and more significant. However, if the results are
subject-specific, training and probing samples are collected from the same subject,
then the number of subjects involved is less important.
• Temporal distance between sample collection: if an elapse of time occured between
the collected samples, then the findings are more significant and better represents
the actual application scenario. Habituation occurs when samples are collected
closely in time from the same subjects.
• Methods used for emotion elicitation can add extra dynamics to the collected data.
Therefore, performances of different affect detection systems can not be compared
or generalized without further understanding the impact of each stimuli on the
emotion-specific dataset.
• Emotion models/classes: as shown in Section 2.2, the model used to represent
the evoked emotional states is important in determining the association between
the observed physiological pattern and the true emotional states. The ground
truth definition relies especially heavily on the choice of a representative model and
system recognition performance can differ significantly between different choices of
representative models.
• Single/multiple modality: in general, a larger number of sensors provide more
discriminating information between emotional classes, however, the selected set of
sensors should comply with both the feasibility and the practicality requirements
of an end-user system.
32
2.7 Ethical and Privacy Concerns on Physiological Signal Collection
2.7 Ethical and Privacy Concerns on Physiological
Signal Collection
With the increasing use of physiological signals in consumer electronics, there is an urgent
need to address the security and privacy implications for such commercial applications.
Currently there is no readily available model to systematically identify risk issues for any
commercial applications using physiological signals such as ECG or EEG.
The major concerns faced by commercial applications utilizing physiological signals
can be summarized as follows. First of all, the current sensing equipment is increasingly
more powerful, which allows it to be used for simple readings, like recording body tem-
perature, as well as complex applications such as emotion recognition. Secondly, these
ubiquitous devices are not always visible and can obtain potentially private information
without the awareness of the subject.
In the case of affect detection, the goal of privacy protection is not to invade one’s pri-
vacy by controlling his/her actions, but to support them by providing a better interaction
environment according to the user’s emotional state. EEG data of individual persons col-
lected by EEG headsets can be considered as persons’ private health information, since
physiological signals have been conventionally used in professional health and medical
applications. This belief holds even when physiological signals (EEG) are collected using
commercially available headsets and are used for leisure (non-medical) purposes. Due to
this unique application of consumer electronics and physiological data collection, there is
no readily available regulatory framework or practical approaches on the collection, stor-
age, and transmission of such personal information using commercial devices involving
physiological signals. If not dealt with carefully, such frontier applications can have huge
security and privacy implications.
In terms of collecting such potentially sensitive physiological data, the privacy by
design [57] approach has been highly appraised and adopted in many practices. ’Privacy
33
2.8 Summary
by design’ refers to a proactive approach at the system design level to minimize the
data collected and released for the protection of privacy and security of such data. The
main recommendation is summarized below and should be considered for any application
design that involves potentially sensitive physiological signals.
• only collect data for a well-defined purpose (no in-advance storage)
• only collect data relevant for the purpose (not more)
• only keep data as long as it is necessary for the purpose
2.8 Summary
This chapter has provided the background materials for affect detection. First, three
models for affect states representation were introduced and discussed. Hybrid model for
affect representation was chosen for this study. This model combines the advantages
of universality of basic emotions and also the effectiveness of the circumplex model in
representing affects using physiological signals. Secondly, common emotion elicitation
methods were presented and analyzed on their applicability. Image based elicitation
method (e.g., IAPS images) were chosen for our analysis Thirdly, various affect analysis
modalities were discussed with the emphasis on modalities based on physiological signals.
Various fusion techniques were also discussed to illustrate the benefits and challenges on
feature-level or decision-level fusion. Towards the end of the chapter, state-of-the-art
affect classification performance related to the use of physiological signals were reviewed
and discussed.
34
Chapter 3
EEG Signal Characteristics and
Preprocessing Methods
3.1 Introduction
Automated recognition of human emotions utilizing EEG signals is a relatively new re-
search topic that is being actively studied in the community of Affect Computing. The
EEG signals are generated from the Central Nervous System (CNS) and directly reflect
the brain activity, which can potentially overcome the challenges faced by other physio-
logical signals (such as galvanic skin response) with undesired interferences resulting from
non-emotional, physical or environmental changes. EEG signals collected from multiple
channels with correlated information can potentially produce a more reliable and robust
emotion recognition system. As shown in Figure 3.1, the objective of an automated EEG
affective signal processing system is to develop a statistical model based on the input
training samples (supervised machine learning) that can predict the label of a testing
sample with highest accuracy under given constraints. Recently, with the development
of electronics such as wearable sensors, high fidelity cheap and unobtrusive EEG headsets
are easily accessible, which has the potential to revolutionize the current generation of
35
3.2 EEG Signal Characteristics
affect computing applications.
Test EEG
Signals
Figure 3.1: System Overview
This chapter will provide some background information on the characteristics of EEG
signal, the typical recording setup and application constraints. Noise or artefact inter-
ferences will also be introduced and solutions on how to eliminate such interference in
actual signal processing applications will be presented.
3.2 EEG Signal Characteristics
Electroencephalography (EEG) is a measurement of the electrical activity within the
brain [68] that are produced by synchronous neuron activity and captured using multiple
electrodes resting on the scalp that are spatially located according to a specific system
known as the 10-20 system [91]. More details on the spatial locations of the electrodes
can be found in Section 3.2.1.
EEG signals are a time series indicating the oscillatory nature of the brainwaves. A
sample of EEG recording from an electrodes located at left Frontal lobe (F3) is shown in
36
3.2 EEG Signal Characteristics
2 4 6 8 10 12
−2
−1
0
1
2
3
Standardized EEG (F3)
Time (sec.)
Figure 3.2: Standardized EEG Recording (F3)
Figure 3.2. EEG signals are non-linear, non-stationary, and usually contaminated with
substantial amount of noise caused by thermal fluctuations, artefact (muscle movements,
electrodes movements) and technical interference (power line). Figure 3.3 shows a sample
of EEG recording contaminated by eye-blinking. It is difficult to infer EEG signals in
its time-domain raw format due the the above reasons. It is also a challenging signal
processing problem since EEG signal violates the stationary assumption that are set in
many conventional time-series analysis methods.
2 2.5 3 3.5 4 4.5 5 5.5 6time (sec.)
Contaminated EEG
artefact caused by eye blinks
Figure 3.3: EEG Signal Contaminated with Eye-blinking
37
3.2 EEG Signal Characteristics
10 20 30 40 50 60 70 80
Single−Sided Amplitude Spectrum of x(t)
Frequency (Hz)
|X(f)
|
Figure 3.4: Spectrum Characteristics of EEG
Through the use of Fast Fourier Transform (FFT) and Hamming window, we can
estimate the spectral characteristics of the recorded EEG as shown in Figure 3.4. We
observe that most of the energy of the EEG signal resides in the frequency range less
than 65 Hz and with a large spike in the range of < 4Hz (due to artefact).
3.2.1 Electrode Placement for EEG Recording Devices
EEG recording devices typically have a sampling frequency of 512 Hz, 1024Hz or higher,
with the number of electrodes ranging from 32-256, but compared to the enormous num-
ber of Neuron activity sources, this is considered to be low of spatial resolution, and with
the high sampling frequency, high temporal resolution. Electrodes placements for 32 and
64 channels are shown in Figure 3.5. Detailed electrodes placements comparison can be
found in [42], where three placements systems, international 10− 20 and with more elec-
trodes, 10−10, 10−5 are compared. The ”10” and ”20” refer to the fact that the actual
38
3.2 EEG Signal Characteristics
distances between adjacent electrodes are either 10% or 20% of the total front-back or
right-left distance of the skull. Smaller distance between adjacent electrodes will improve
spatial resolution of the EEG recordings but also increase the total number of electrodes.
FPzFP2
AF4AF8
AFzAF3
FP1
AF7
Fz F2 F4F8
F6F1F3F5
F7
FCz FC2 FC4 FC6FC8
FC1FC3FC5FC7
Cz C2 C4 C6 T8T7
CPz CP2 CP4 CP6TP8TP7
C5 C3 C1
CP5 CP3 CP1
Pz P2 P4 P6P8
P10
P1P3P5P7
P9 POzPO3
PO7PO4
PO8
CzC2C1
CMS DRL
Iz
(a) The 32-channel system (a.k.a.10-20 system) (b) The 64-channel system
Figure 3.5: Electrodes placements of different EEG recording systems
3.2.2 Measuring Emotion Using EEG
The theory of the limbic system [77] has been the dominant theory in linking the brain
structure and the emotional expression. Limbic system is often described as a positive
feedback circuit, the Papez circuit [59]. The basic elements of the limbic system are the
amygdala, prefrontal cortex, anterior cingulate cortex, hypothalamus and insular cortex.
The physical location of these elements are show in Figure 3.6.
• Amygdala [20], are two groups of neurons deep inside the human brain. Together,
they are one of the most important brain regions for emotion. Amygdala intercon-
nects with the sensory cortical areas and acts as the translator between the stimuli
(sensory) and the autonomic nervous response system. Amygdala is also responsi-
39
3.2 EEG Signal Characteristics
Neocortex
Limbic
Diencephaion
Brainstem
Cerebellum
Frontal
lobes
Prefrontal
lobes
Temporal
lobes
Thalamus
Parietal
lobes Occibital
lobes
Figure 3.6: Cross section of the human brain
ble for helping the brain to learn the association between emotional events and the
specific stimuli, which will become an long-term emotional memory with repetitive
occurrences.
• Hypothalamus, is the part of the brain that coordinates emotional behaviors and
expressions.The hypothalamus controls many processes in the body, such as body
temperature, hunger and thirst. It also handles the release of some hormones. As
such, the hypothalamus is involved in processing emotions and sexual arousal.
• Prefrontal cortex, located at the front of the frontal lobe and is decision making
and also the approach/withdraw reactions [20]. It is involved in planning, making
decisions based on earlier experiences and working towards a goal.
• Insular cortex, is said to be associated with emotional experience and produces
conscious feelings [3].
40
3.2 EEG Signal Characteristics
3.2.3 Spectral Characteristics of EEG
EEG signals can be divided into the five rhythms as shown in Table 3.1. According to the
origin of the rhythm and the underlying brain networks [59], Beta waves are connected
to an alert state of mind, whereas alpha waves are more dominant in a relaxed person
[3]. Research has also shown a link between alpha activity and brain inactivation, which
also leads to the same conclusion. Therefore, characteristics in Beta and Alpha waves
are key indicators of the state of arousal the subject is in. Delta band contains mostly
noise such as pulses, neck movement, and eye blinking.
Table 3.1: Subbands of the EEG Signals
Band Frequency (Hz) Location of Origin Reason for Activity
Delta 0-14 Thalamus Slow-wave Sleep, wak-
ing
Theta 4-7 Hippocamus and Cor-
tex
Idle state in cortex,
Emotional stress,
frustration, disap-
pointment
Alpha 8-13 Posterior regions, oc-
cipital lobe, cortex
Closed eyes, and idle
state in cortex,relaxed
Beta 13-30 Cortex (e.g. Motor
and Sensory)
Active/busy, concern-
trate/alert, attention
Gamma 30-100 Cortex Sensory processing
and cognitive task
41
3.3 Preprocessing of the EEG Recordings
3.3 Preprocessing of the EEG Recordings
EEG signals collected from the scalp are typically contaminated with various noises (i.e.,
thermal noise) and artefact (electrode movement, electrophysiological potentials gener-
ated by muscle activities, such as eye-movements, biting, chewing) which are continuous
in time and very large in amplitude. This results in poor signal quality and makes it
challenging and often insufficient for direct interpretation. The objective of the pre-
processing step for such signal processing system is to eliminate the noise and artefact
interference and also to reduce the interference from adjacent neural networks due to
volume conduction.
Artefact removal is essential for any robust EEG emotion recognition system. Sev-
eral methods have been used in the literature for removing or reducing such interfer-
ence. Current artefact removal methods such as digital band-pass filter eliminates all
frequency components over the cut-off frequency, which could potentially contain cru-
cial information in detection, which can leads to incorrect interpretation of EEG signals.
More recently, Independent Component Analysis (ICA) has been applied into the arte-
fact reduction of EEG, which has been proved to be effective in removing muscle artefact
caused by eye-movements, but it also suppresses genuine brain activity (cross talk be-
tween brain and muscle activity) [39, 13]. Wavelet-based artefact rejection methods has
also been shown to be very effective.
3.3.1 EEG Referencing
The measured EEG potential is a sum of contributions from all the active sources in
the brain. The deep sources tend to contribute more uniformly to all electrodes, as the
distances to each electrode are of the same magnitude order. On the other hand, cortical
sources tend to influence only the closest electrodes. Raw EEG signals collected directly
from the scalp provide a poor signal to noise ratio and it is necessary to re-reference them
42
3.3 Preprocessing of the EEG Recordings
to eliminate the interference propagated from distance sources. The CMS electrode is a
typical referencing electrode to many EEG headsets (e.g., Biosemi Active II,with which
our testing data set was collected) therefore, we will refer to this setting in the following
description. To obtain a Laplacian reference [5], the following Laplacian operator was
applied to each electrode i:
xi(n) = xi(n)−1
Ni
∑
j∈Neig(i)
xj(n) (3.1)
where xi is the CMS referenced signal of electrode i, xi the Laplacian referenced signal,
n the sample number, Neigi the neighbor electrodes (see Appendix A) of electrode i and
Ni the size of this neighborhood.
3.3.2 Rejecting Artifacts Based on Channel Statistics
This method is based on the observation that the amplitude of the ’noise-free’ EEG
is much smaller than that of the artefact caused by electrode or muscle movements. A
threshold-based outlier detection algorithm makes use of the mean and standard deviation
(std) of the EEG amplitude. The outliers are defined through comparing the EEG
amplitude to a threshold value that set by the reverse-test, which is calculated based on
all the samples in a trial to eliminate outliers. To increase the accuracy and robustness
of this process, the mean and standard deviation should be calculated iteratively after
eliminating a predefined number of outliers (usually much less than the number of outliers
present in the segment of recording, i.e., 5 vs 500). To simplify the calculation, we assume
that the amplitude of the recorded EEG signal has a Gaussian distribution. As shown in
Figure 3.7, this assumption is proven to be valid through empirical observation. In the
next step, a threshold value is defined according to the mean and standard deviation of
the observed segment. Artefact or outlier regions were defined by finding the components
that have amplitudes larger than this defined threshold value. One way to define such
threshold is to use the inverse Q function at p = 0.05: Let x1, x2, ..., xN be a zero-mean
43
3.3 Preprocessing of the EEG Recordings
−25 −20 −15 −10 −5 0 5 10 15 20 250
2
4
6
8
10
12
14x 10
4
Fre
quen
cy o
f App
eare
nce
Sample EEG Amplitude with Normal Fitting
Figure 3.7: Sample EEG Amplitude with Normal Fitting
time series,
Athresh = Q−1(0.05) ∗ σx (3.2)
σx =
√
√
√
√
1
N
N∑
t=1
(x(t)− µx)2 (3.3)
Statistically speaking, signals with amplitude larger than this threshold value would
less than 5% of chance in appearing in the collected sample, which in term can be
considered as outliers or ’abnormal’. In our case, we would consider the recorded segment
is originated from artefacts.
3.3.3 Filter Data Using Fast Fourier Transform (FFT)
In [50], it was shown that event-related EEG oscillation pattern variations are expressed
mostly in the spectral components that resides within the Alpha (8 − 13Hz), and Beta
(13 − 30Hz) bands. Other bands such as the Delta band (up to 4Hz) contains mostly
noise such as pulses, neck movement, and eye blinking. Beta waves are connected to
an alert state of mind, whereas alpha waves are more dominant in a relaxed person.
44
3.3 Preprocessing of the EEG Recordings
Therefore, to preprocess the input EEG signals, a Butterworth 10th order band-pass
filter (8−30Hz) was applied to the collected EEG signals to extract the Alpha and Beta
waves. The order of the Butterworth filter was chosen by the minimum order required to
meet the constraints such that no more than 3dB loss within the passband and at least
60dB attenuation in the stopband.
3.3.4 Independent Component Analysis
The EEG data consists of recordings of electrical potentials in many different locations
on the scalp. These potentials are presumably generated by mixing some underlying
components of brain activity. We would like to find the original components of brain
activity, but we can only observe mixtures of the components. ICA [39] can reveal
underlying dynamics on brain activity by giving access to its independent components
[13].
A statistical model of independent component analysis is shown in Eq. 3.5, which is
adopted through the ”latent variables” model [39]. Assume we observe n linear mixtures
x1, ..., xn of n independent components.
xj = aj1s1 + aj2s2 + ... + ajnsn, ∀j (3.4)
It is convenient to use vector-matrix notation instead of the sums like in the previous
equation. Let us denote by x the random vector whose elements are the mixtures x1, ..., xn
and likewise by s the random vector with elements s1, ..., sn. Let us denote by A the
matrix with elements aij .
x =n
∑
i=1
aiSi = As (3.5)
The ICA model is a generative model, which means that it describes how the observed
data is generated by a process of mixing the components si. The independent components
are latent variables, meaning that they cannot be directly observed. Also the mixing
45
3.3 Preprocessing of the EEG Recordings
matrix is assumed to be unknown. All we observe is the random vector x, and we must
estimate both A and s using it.
3.3.5 Wavelet Decomposition for Denoising
Discrete Wavelet Decomposition (DWT) for signal analysis has been used for denoising
and feature extraction in many biomedical signal analysis [54, 98]. Typically, biomedical
signals (i.e., EEG) are non-linear and non-stationary, which degrades the performance of
wavelet methods. However, it is shown that non-linear wavelets (e.g, bior1.3) are effective
in analyzing biomedical signals. Therefore, for our analysis, biorthognal wavelet (bior
1.3) was used for DWT-based denoising analysis [98].
To define the scales of interest in DWT process, we made use of the scale and frequency
relationship introduced in [70], where at each decomposition scale upper frequency of the
current band is halved. A signal can be fully decomposed into n levels when the total
number of data points is N, whereN = k×2n. Each of these wavelet levels (v) corresponds
to a frequency band given by
f = 2v(fsN) (3.6)
Shown in Table 3.2, this frequency range association between sampling rate and
decomposition level is not exact, but a good enough approximation for our study. The
sampling frequency for the data to be used in Chapter 6 is 1024Hz.
In summary, the method proposed involves the following steps:
1. To apply Discrete Wavelet Transform (DWT) to the contaminated EEG signals and
decompose it to seven levels with ’biorthogonal’ (bio1.3) as a basis function .
2. Define threshold value through the use of a statistical thresholding function such as
the rigorous Stein’s unbiased estimate of risk (rigorous SURE) or heuristic SURE.
3. To identify the ocular artefact and apply a desired de-noising technique
4. To apply wavelet reconstruction procedure to reconstruct the denoised EEG signal.
46
3.3 Preprocessing of the EEG Recordings
Table 3.2: Wavelet Decomposition of EEG signals into various frequency bands (fs =
1024)
Frequency Range Decomposition Level Frequency Bands Frequency Bandwidth
0-4 A7 Delta 4
4-8 D7 Theta 4
8-16 D6 Alpha 8
16-32 D5 Beta 16
32-64 D4 Gama 32
64-128 D3 mostly Noise 64
Figure 3.8: Four sample mother wavelet functions used for DWT decomposition of EEG
signals: (a) Mexican hat wavelet, (b) Daubechies order 8 wavelet (db8), (c) biorthogonal
wavelet order 1.3 (bior1.3), and (d) biorthogonal wavelet order 1.5 (bior1.5).
47
3.4 Ground Truth Definition and Validation
0 1 2 3 4 5 6 7 8 9−1500
−1000
−500
0
500
1000
1500
2000EEG denoisng using Discrete Wavelet Transform
Time (seconds)
Noisy EEGDenoised EEG
Figure 3.9: EEG denoising using Discrete Wavelet Transform (DWT)
The final reconstructed EEG signal is shown in Figure 3.9. Comparing to Fourier
transform denoising method, DWT is an efficient method for noise removal and is better
at preserving the shape of the peaks [54]. Since most emotion analysis algorithms [65, 66]
are carried out in the Alpha (8 − 12Hz) and Beta (13 − 30Hz) bands, alternatively,
wavelet coefficients obtained at the decomposition level 6 and 7 can be used exclusively
to reconstruct the denoised EEG signal, which serves as both a filtering and denoising
technique.
3.4 Ground Truth Definition and Validation
It is very important that the labels for the collected data can accurately represent the
actual affect-state present in the data. However, affect expression of individuals is typ-
ically affected by age, culture background, and current health state [58, 80]. Also the
transition level is influenced by current state signal value [12]. Therefore, there is very
limited cross-reference results in the work of affect detection using EEG signals, and the
claimed emotion states were also debatable.
In general, there are three ways that we can obtain the ground truth label:
48
3.4 Ground Truth Definition and Validation
• participants to self-assess their emotions on the projected stimuli.
• labels of the specific stimuli used
• labels obtained from a third source, such as streamed facial images while collecting
the desired verbal or physiological signals (e.g., ECG, EEG, GSR).
However, from the engineering design point of view, user inputs are often not acces-
sible or non-reliable due to physical or other constraints. For example, for people with
autism, such self report of emotional states is prohibiting. In the process of develop-
ing a statistical model, often self-assessment values were obtained to validate the chosen
ground-truth labels for the collected emotional data.
Currently, data labels are obtained either through self-assessment reports or rely on
the labels of the projected stimuli. However, emotions are known to be very subjective
and are dependent on past experience so that one can never be very sure whether a block
elicits the expected emotion or not. To reduce such discrepancy, we present here two
methods that can be used to validate the labels obtained through self-assessment and
through known stimuli labels.
3.4.1 Pearson Correlation Coefficients for Ground Truth Vali-
dation
Correlation is a measure of the relation between two or more variables. It determines the
extent to which values of the two variables are ”proportional” to each other. Commonly
used method for determine such correlation is the Pearson product correlation method.
let Li, i = 1 · · ·N denote the self-assessment scores and Ki, i = 1 · · ·N denotes the
scores provided with the projection stimuli:
r =1
N − 1
N∑
i=1
(
Li − L
sL
)(
Ki −K
sK
)
(3.7)
49
3.4 Ground Truth Definition and Validation
Table 3.3: Sample Self-Assessment Values vs. Values provided with chose stimuli (e.g.,
images from IAPS)
Predicted Self Assessment
Valence Arousal Valence Arousal
7.242 5.516 7.20 7.20
2.430 6.018 5.40 3.60
4.998 3.020 7.20 3.60
7.272 5.620 5.40 5.40
7.766 5.880 5.40 7.20
2.184 6.526 1.80 9.00
4.830 3.226 3.60 7.20
2.110 5.914 1.80 7.20
......
......
where X and X are the sample means; sL and sK are the sample standard deviation; N
is the population size. The correlation coefficient (r) represents the linear relationship
between two variables. r takes the value from −1 to +1 where −1 represents a perfect
negative correlation where the two variables are inversely proportional and +1 represents
a perfect positive correlation where the two variables are completely proportional.
3.4.2 Confusion Matrix for Ground Truth Validation
A confusion matrix [51] contains information about actual and predicted classifications
done by a classification system. Performance of such systems is commonly evaluated
using the data in the matrix. The following table shows the confusion matrix for a two
class classifier.
The entries in the confusion matrix have the following meaning in the context of our
50
3.5 Summary
study:
• a is the number of correctly-matched instances that the predicted label(class 1)
matches the given label (class 1) for the training data,
• b is number of the mismatched instances that the samples are from class 2 but were
predicted as from Class 1,
• c is the number of correctly-matched instances that the predicted label(class 2)
matches the given label (class 2) for the training data,
• d is the number of mismatched instances that the samples are from class 1 but were
predicted as from Class 2,
Table 3.4: Confusion Matrix Components
Predicted
Class1 Class2
ActualClass1 a b
Class2 c d
The accuracy (K) is the proportion of the total number of predictions that were
correct. It is determined using the equation:
K =(a+ d)
(a+ b+ c+ d)(3.8)
3.5 Summary
In this chapter, affect detection analysis using EEG signal was introduced and discussed.
It started with the introduction of limbic system as the linkage between the brain struc-
ture and the emotion expression. Next spectral characteristics of EEG signals was dis-
cussed. It is shown that Alpha (8−12Hz) and Beta (13−30Hz) rhythms were originated
51
3.5 Summary
from the limbic system and are useful in EEG emotion analysis. Time domain charac-
teristics of the EEG signal were also discussed with the focus of dealing with artefact
interferences and preprocessing methods for signal analysis applications using EEG sig-
nals.
52
Chapter 4
Methods for Emotion Assessments
using EEG
4.1 Introduction
As a supervised machine learning system, as shown in Figure 4.1, there are two stages in
this pattern classification process, training stage is to develop a statistical model based
on the collected, labeled samples, and optimize the model-specific parameters for best
performance. The testing stage is to test and validate the developed model as how well
it represents the desired phenomenon.
Since raw EEG signals are high in dimension with redundancy information, a typical
EEG signal processing system needs to first reduce the effects of noises (preprocessing
stage), then extract features to reduce the dimensionality of the signals, and if possible,
increase the separability of classes by choosing the optimal projection direction using
statistical methods. For comparison reasons, in this study, four feature extraction tech-
niques for EEG-based affect detection will be examined, implemented and applied to the
acquired database in Chapter 6. Simulation results using these techniques will provide
an insight in the effectiveness of each constructed feature vector (FVs) in representing
53
4.2 Previous Feature Extraction Methods
Figure 4.1: System components in the training stage
the affect states. Details on each feature analysis algorithm are further explained in the
following subsections.
4.2 Previous Feature Extraction Methods
The feature extraction step has become an important and often essential step in a machine
learning process. In essence, the process is trying to determine the most relevant set of
features in differentiating affective states through the use of an optimization criterion.
The benefit of applying feature extraction algorithm are multi-folds. First of all, because
of the high dimension of the acquired EEG signal and typically the much lower number
of samples for each pattern class, i.e., an exponentially larger sample size is required
to have a meaningful statistical analysis [23]. This is often referred to as the curse of
dimensionality. Secondly, by projecting the training samples into a lower dimensional
feature space, the irrelevant or redundant formation is removed which will lead to better
separation between the sample classes and better classification performance at the end.
54
4.2 Previous Feature Extraction Methods
Lastly, through the use of feature dimension reduction, the computational costs are
lowered in the successive classification process.
It is well known that the frequency spectrum of the EEG changes with age. In the
present study several steps were taken to control for maturational EEG spectral changes,
including selection of subjects with similar ages (except for the toddlers), and using
analysis of covariance and adjusting for age. However, the study on this topic is beyond
the scope of this project even though it would be essential in generalizing the research
findings in this study to a much larger testing population.
4.2.1 Time Domain Analysis
4.2.1.1 Statistical-based Features
EEG signals are a reflection of the oscillatory pattern of the action potentials of the
central nervous system. Therefore, to analyze the oscillatory pattern in the time domain
is a natural and most direct way to understand EEG patterns. However, as previously
stated, the raw time domain recording are contaminated with noises and artefacts, to
reduce some interferences, signal components of interest (8 − 30Hz) are extracted first.
Next, six time domain parameters proposed by Picard [74] were calculated on the N
values (5 seconds at 256 samples per second gives N=1280). The statistical features used
to form the proposed FVs are defined as (Xi, i = 1 · · ·N is the raw N-sample EEG signal)
given in the following.
1. The mean of the raw signal
µx =1
N
N∑
t=1
X(t) = X(t) (4.1)
2. The standard deviation of the raw signal
σx =
√
√
√
√
1
N
N∑
t=1
(X(t)− µx)2 (4.2)
55
4.2 Previous Feature Extraction Methods
3. The mean of the absolute values of the first differences of the raw signal
δx =1
N − 1
N−1∑
t=1
|X(t+ 1)−X(t)| (4.3)
4. The mean of the absolute values of the first differences of the standardized signal
δx =1
N − 1
N−1∑
t=1
∣
∣X(t+ 1)−X(t)∣
∣ =δxσx
(4.4)
5. The mean of the absolute values of the second differences of the raw signal
γx =1
N − 2
N−2∑
t=1
|X(t+ 2)−X(t)| (4.5)
6. The mean of the absolute values of the second differences of the standardized signal
γx =1
N − 2
N−2∑
t=1
∣
∣X(t+ 2)−X(t)∣
∣ =γxδx
(4.6)
Concatenating Ne feature sets f 1, · · · , fNe into a new feature set f consists in the
concatenation of feature vectors f ji for each sample i and all feature set j:
f js,i = [µx, σx, δx, δx, γx, γx],
fs,i = [f 1i , · · · , f
ji , · · · , f
Ne] (4.7)
4.2.1.2 Higher Order Crossings
Observed time series of physiological signals such as EEG, display both local and global
up and down movements. Characteristics of the oscillatory mode process discrimination
powers and can be extracted as features for classification purpose. The oscillation be-
havior, seen in a finite zero-mean time series Xi, i = 1 · · ·N can be expressed through
the zero-crossing count. In general, when a filter is applied to a time series it changes
its oscillation; hence its zero-crossing counts too. Under this perspective, the following
iterative procedure could be assumed: apply a filter to the time series, and count the
number of zero-crossings in the filtered time series; apply yet another filter to the origi-
nal time series, and again observe the resulting zero-crossings, and so onfilter and count.
56
4.2 Previous Feature Extraction Methods
The resulting zero-crossing counts are referred to as HOC [47]. When a specific sequence
of filters is applied to a time series, the corresponding sequence of zero-crossing counts
is obtained, resulting in the so-called HOC sequence. Many different types of HOC se-
quences can be constructed by appropriate filter design, according to the desired spectral
and discrimination analysis.
Let X1, X2, ..., XN be a zero-mean stationary time series, the zero-crossing count in
discrete time is defined as the number of symbol changes in the corresponding clipped
binary time series [47]
Zt =
1, ifXt ≥ 0
0, ifXt < 0(4.8)
The number of zero-crossings, denoted by D, is defined in terms of Zt
D =
N∑
t=2
[Zt − Zt−1]2, 0 ≤ D ≤ N − 1 (4.9)
HOC combines ZC counts and linear operations: the difference operator is a linear
high-pass filter
∇Zt ≡ Zt − Zt−1 (4.10)
and the squared gain of the second difference ∇2 is a more pronounced high-pass filter.
To extract the HOC features, typically signal components in the 8 − 30Hz are ex-
tracted first, which covered the conventional Alpha and Beta waves. Signals are also
passed through a zero mean process. Recently, Petrantonakis [71] has used a combina-
tion of EMD based adaptive filtering and Higher Order Crossings analysis, which shows
promising results on the classification of six basic emotions.
4.2.2 Spectral Analysis
4.2.2.1 Event Related Potential and Spectrogram
Event related potential analysis is another one of the commonly used methods on EEG
pattern classification. The energy variation of the EEG signals between frequency ranges
57
4.2 Previous Feature Extraction Methods
is an direct indicator of the activation or deactivation of certain underlying neruo-
netowrks, or the oscillation frequency shift of the EEG signals. In [50], it was shown
that event-related EEG oscillation pattern changes are expressed mostly in the spectral
components that resides within the Alpha (8 − 13Hz), and Beta (13 − 30Hz) bands.
Other bands such as the Delta band (up to 4Hz) contains mostly noise such as pulses,
neck movement, and eye blinking. Beta waves are connected to an alert state of mind,
whereas alpha waves are more dominant in a relaxed person.
Therefore, to construct the narrow-band energy feature set, the Alpha and Beta waves
(8−30Hz)are extracted from the raw EEG time series first. Following this step, narrow-
band energy features were computed from EEG signals by applying the FFT algorithm
on the whole duration of a sample (2.5 seconds). Frequency components within the
alpha and theta waves are further divided into 1 or 2 Hz sub-bands and energy within
each frequency band is calculated and used as features. The underlying assumption
is that EEG signals are stationary for the duration of each sample(2.5 seconds). This
assumption is also applied for any other power spectrum calculation method such as
Short Term Fourier Transform (STFT). This method emphasizes that the use of narrow
frequency bands reduces the danger that frequency specific effects go undetected.
The feature vector fe for a given sample is then constructed by concatenating all the
power values of the 11 frequency bands for each electrode. The length of the energy-based
feature vector f je,i for a sample i and feature set j is thus
f je,i = [E1E2 · · ·E11], (4.11)
fe,i = [f 1e,i, · · · , f
je,i, · · · , f
Ne ] for j = 1 · · ·Ne (4.12)
where 11 is the number of frequency bands, Ne the number of electrodes.
58
4.2 Previous Feature Extraction Methods
1 2 30
1
2
3
4
5
6
7
8
9
10
11
12
13x 10
5
Sig
nal P
ower
1 2 30
2
4
6
8
10
12
x 105
Figure 4.2: Power within sub bands for F3, F4 for Negative Positive and Calm states
4.2.3 Time-Spectral Analysis
4.2.3.1 Wavelet-based Features
Wavelet-based methods are a subset of time-frequency analysis methods. It overcomes the
lack of time-stamp for events associated with Fourier-based methods, but provide more
in-depth understanding in the spectral domain. Murugappan et al. [66, 67] proposed a
new approach for feature extraction process. In this approach, EEG signals first goes
through a zero mean, Unit variance (standardization) process, the resulted preprocessed
data is subject of discrete wavelet transform [67].
The Discrete Wavelet Transformation (DWT) was applied individually on each chan-
nel. Scale is related to frequency as shown in Table 3.2. Figure 4.3 shows a sample
wavelet coefficients obtained during this process. Wavelet coefficients are calculated for
frequencies of interest. The frequencies of interest is 8 − 30Hz, and the spectral resolu-
tion is set to be 1 Hz. Daubechies fourth-order orthonormal bases (db4) was employed to
calculate the wavelet coefficients at the lth scale, CX(l, n), that correspond to the alpha
band (812Hz)and Beta band (13− 30Hz) were used to estimate the wavelet energy and
59
4.2 Previous Feature Extraction Methods
Figure 4.3: Discrete Wavelet Decomposition using db4 Wavelet
60
4.2 Previous Feature Extraction Methods
wavelet entropy, given by
ENGl =
2S−l−1
∑
n=1
|CX(l, n)|2 ,
N = 2S, 1 < l < S. (4.13)
ENTl = −
2S−l−1
∑
n=1
|CX(l, n)|2 log(|CX(l, n)|
2),
N = 2S, 1 < l < S. (4.14)
The parameters of 4.13 and 4.14 were used as a feature vector fw (i.e., f jw,i = [ENGl, ENTl]).
4.2.4 Channel selection: single channel vs. multiple channels
Multi-channel recordings would intuitively produce more robust recognition rate, as cor-
related inputs provide more or higher dimensional information in the feature space. How-
ever, a large number of electrodes (greater than 32 channels) are not always feasible for
many applications, especially the ones that geared towards non-medical, commercial
applications such as augmented emotion communication, health service or behaviorial
monitoring. It will be very beneficial to researchers and consumer electronic developers
to exam the feasibility of a subset of the channels for affect detection and recognition
applications, for example, using commercially available EEG headsets (e.g., Emotive
headset [1]).
One way to reduce the number of channels is based on previous results on the associ-
ation of affect and different regions of the brain. In numerous EEG studies [4], [21] gen-
erally suggest that greater activation of the right frontal lobe accompanies the experience
of more negatively valenced emotions, whereas greater left frontal activation accompanies
the experience of more positively valenced experiences. Due to this asymmetrical neural
activity within the brain when an emotional state was present, our intuitive choice of
channels were the ones collected from the frontal left and right hemisphere. To test our
the feasibility of commercially available headsets, the Emotive EPOC Neuroheadset 4.4
61
4.3 Classification Methods
was chosen for our study. The Emotiv Software Development Kit [1]for research includes
a 14 channel (plus CMS/DRL references, P3/P4 locations) high resolution, neuro-signal
acquisition and processing wireless neuroheadset. Channel names based on the Interna-
tional 10 − 20 locations are: AF3, F7, F3, FC5, T7, P7, O1, O2, P8, T8, FC6, F4,
F8, AF4.
(a) Emotive sensor layout (b) Emotiv EPOC Neuroheadset
Figure 4.4: Emotive Epoc Neuroheadset
Due to the difference between the recording devices and recording cases for the
database to be used in Chapter 6 and the number of channels is further reduced to 6
channels as shown in Table 4.1. Simulation results will be discussed in details in Chapter
6.
4.3 Classification Methods
The choice of classifiers is mostly determined by how well the class separates in the
feature space. If the projected class samples are linearly separable, then linear classifiers
(e.g. SVM [84], LDA [24, 9]) are preferred, otherwise non-linear classifiers such as K
Nearest Neighbours (KNN) are preferred. There is no ’best’ classifier that fits all type of
62
4.3 Classification Methods
Table 4.1: EEG channels selected for analysis.
Device Database Sampling Rate Channels
Biosemi Active 2 eNTERFACE06 1024Hz 54
Emotive EPOC Self-collected 128Hz 14
Channels selected for analysis AF3, F7, F3, FC5, FC6, F4, F8, AF4
features. Since we are going to evaluate the effectiveness of four rather different feature
extraction algorithms and the class separability is uncertain, the following three classifiers
were implemented and used to generate simulation results in Chapter 6.
Given training data (yi,xi) ∈ (−1,+1) × Rn, i = 1, · · · , l, where yi is the two-class
label and xi is the feature vector, some classification methods construct the following
decision function:
d(x) ≡ wTφ(x) + b (4.15)
where w is the weight vector and b is an intercept, or called bias. A non-linear classifier
maps each instance x to a higher dimensional vector φ(x) if data are not linearly separa-
ble. If φ(x) = x, data points are not mapped, we say 4.15 is a linear classifier. Because
non-linear classifiers use more features, generally they outperform the linear classifier in
terms of prediction accuracy.
The projection of a test sample z in the feature space is obtained as:
d(z) = wTφ(z) + b (4.16)
In this two-class estimation problem, z is estimated to be from class C1 if d(z) < 0 and
C2 if d(z) > 0.
63
4.3 Classification Methods
4.3.1 Linear Discriminate Analysis
Since the EEG feature sets can be of very high dimensionality (thousands of features)
compared to the number of samples in the sets , there is always a linear boundary that
can completely separate training samples of the different classes [40]. Another advantage
of using linear classifiers is that they give better generalized solutions. LDA can also
provide probabilistic output if decision fusion approach is implemented.
Training of LDA is carried out through scatter matrix analysis of the training samples.
LDA aims to derive the most discriminating features in the produced feature space based
on the maximization of the so-called Fisher’s discriminate criterion [24]:
WLDA = arg maxW
∣
∣W TSBW∣
∣
|W TSwW |(4.17)
where SB and Sw are between class and within class scatter matrix:
SB =
C∑
n=1
NC(Xc − X)(Xc − X)T
SW =
NC∑
n=1
(Xcn − Xc)(Xcn − Xc)T
in the definition above, C is the number of classes, NC is the number of training samples in
class C, Xcn is the nth sample of class C, and the mean for class C is Xc =1
NC
∑
m,Cmxm.
The projection of a test sample z in the feature space is obtained as:
L = W TLDAz (4.18)
In the case of small sample size where the number of samples is less than twice the
dimension of the features, LDA algorithm should not be applied directly [24]. Since LDA
is based on the sample covariance, the within class scatter matrix SW might become
singular and sparse. To resolve the singularity problem, the regularized version of the
LDA algorithm should be considered, where the sample covariance will be added into a
identity matrix of the same size. This approach will introduce a small bias into the final
results, but often is tolerable [40]. A 3 classes classification process using LDA is shown
in Figure 4.5 and the lines are the decision boundaries.
64
4.3 Classification Methods
−6 −4 −2 0 2 4 6 8 10 12 14−2
0
2
4
6
8
10
12
Feature Variable One
Fea
ture
Var
iabl
e T
wo
Sample for Linear Discriminate Analysis
Class 1 SamplesClass 2 SamplesClass 3 Samples
Class 3 Region
Class 2 Region
Class 1 Region
Figure 4.5: Examples of LDA Classifier
4.3.2 K Nearest Neighbours
K-Nearest Neighbors (kNN) [17] is an instances based classifier for which the label for an
probing or new sample is defined by labels of adjacent instances using an voting criterion
(typically distance measure). For example, given a set of training samples with known
labeling, in the process of finding the class membership to a new probing point, we first
find K neighbours that are closest to the new point using a chosen distance measure,
e.g., Euclidean distance. When a tie between multiple points of same distance occurs,
tie-breaking methods such as majority rule was used for nearest points tie breaking. In
general, odd number of neighbors 1, 3, 5, 7, 9 were picked to avoid ties. The output label
for the probing sample is assigned to the label of the majority of the K neighbours.
The number of neighbours or parameter K is very important in determine the final
performance of this classifier. By increasing the number of neighbors, the effect of arte-
facts are reduced within classes, but the class boundary are also enlarged, which could
potentially degrade the classification performance. Therefore, the final recognition per-
formance is most dependent on the class separation in the feature space. For any given
65
4.3 Classification Methods
problem, a small value of k will lead to a large variance in predictions. Alternatively,
setting k to a large value may lead to a large model bias. Thus, k should be set to
a value large enough to minimize the probability of misclassification and small enough
(with respect to the number of cases in the example sample) so that the K nearest points
are close enough to the query point. Thus, like any smoothing parameter, there is an
optimal value for k that achieves the right trade off between the bias and the variance of
the model.
The advantage of KNN is that it is easy to implement and when the training sample
is not too large, it is low in computation complexity. However, since KNN is instance
based model generation, which largely depends on the training samples in the set. It does
not generalize well and needs to be trained by the samples closely resemble the testing
samples.
Figure 4.6: Examples of KNN Classifier
66
4.4 Summary
4.4 Summary
In the chapter, various feature analysis algorithms for EEG signal analysis in the af-
fect detection application were introduced and discussed in depth. These algorithms
were grouped into two main categories, time domain oscillatory-pattern based methods
and spectral domain energy-based methods. Two commonly used classifiers, k Nearest
Neighbour (kNN) and Linear Discriminate Analysis (LDA), were presented and their
theoretical foundation was discussed.
67
Chapter 5
Empirical Mode Decomposition for
Emotion Classification
5.1 Introduction
With the previously stated feature extraction methods, there are several unsolved prob-
lems: first of all, EEG signal is non-linear (due to contamination on noises (determin-
istic, Gaussian white noise) and artefact (stochastic)) and non-stationary. As shown in
table 5.1, to apply previously discussed signal analysis methods such as Fourier and
wavelets, we usually assume the signal is stationary, or piece-wise stationary (STFT).
Features extracted using such conventional methods are not preserving the non-linear
or non-stationary characteristics of the original signal.Also for the transient detection
applications, conventional (classical) signal processing methods like Fourier Analysis will
not provide the correlation between an occurrence of an event and its time stamp.
In the research of Brain-computer Interface and affective computing, we are still at
the very beginning stage of fully understanding the structure of EEG signals and the un-
derlying neuro-networks associated with Emotion. In the application of understanding
human emotional expressions using Central Nervous System (CNS) (i.e., EEG), there is
68
5.2 Empirical Mode Decomposition (EMD)
no concrete evidence as which channels and what frequency ranges will be best represen-
tative in this machine learning, pattern classification problem.
The approaches stated in this chapter are in hope to answer the following two ques-
tions
1. Through the use of a novel signal processing method, we will decompose the EEG
recordings from different channels to Instantaneous frequency, and through the selection
of IMFs, we will be able to draw conclusion in what frequency range is the most affect-
specific. The challenge of frequency alignment and common scales faced by the IMFs
extracted using standard EMD were solved through the use of an extended version of
EMD, Multivariate EMD.
2. Rank the channels of significance for affect detection through the use of Genetic Al-
gorithm. After this study, we will be define the most information rich channels for affect
detection, which will provide key information for the design of consumer electronics for
affect related applications. Detailed discussion on GA were presented towards the end of
the chapter.
5.2 Empirical Mode Decomposition (EMD)
In recent years, as a time-frequency analysis method, Hilbert-Huang Transform (HHT)
[37], resorting the use of Instantaneous Frequency (IF) has been proven to be effective
in analyze non-linear and non-stationary signals. To compute a meaningful IF from a
multi-component signal through the construction of a Analytical Signal (AS) or Hilbert
transform, we must first reduce this multi-component signal into a collection of mono-
component functions [38]. Multi-component refers to the cases that there are multi-
extremas between two consecutive zero-crossings for a oscillating signal, which indicates
the coexisting of multiple frequency components at any given time instance. In section
5.2.2, we will see that this is the case with EEG signals (see Figure 5.3) and that EEG
69
5.2 Empirical Mode Decomposition (EMD)
signals are multi-component in nature. However, the elements of these coexisting fre-
quency components are also varying with time, typical for non-stationary signal. This
is fundamentally different from the Fourier based analysis methods where the amplitude
and frequency are fixed for the harmonics for the chosen time frame.
Table 5.1: Method Comparison between Fourier, Wavelet and Hilbert-Huang Transform
in Data Analysis
Methods Fourier Wavelet Hilbert-Huang
Basis Harmonics with con-
stant amplitude and
frequency for each
component
predefined basis, scal-
ing and translation
applied
Adaptive basis based
on local property,
time-varying ampli-
tude and frequency
Linearity Yes, superposition of
Harmonics
No No
Time-spectral corre-
spondence
Convolution,
occurrence-time
stamp available
Convolution, approxi-
mate time stamp
Phase Differentiation,
precise time stamp
Presentation Energy-frequency Energy-time-
frequency
Energy-time-
frequency
In this new approach [37], Empirical Mode Decomposition (EMD)was presented and
used to decomposed multicomponent signal into a set of monocomponent signals. EMD
[36] is data driven, which will be able to preserve the non-linear and non-stationary
property of the signal. Comparing to other time-frequency analysis algorithms, another
main advantage of EMD algorithm is that the decomposition algorithm does not require
a pre-defined or a priori basis like wavelet analysis or FFT. Like wavelet methods, EMD
method will decompose the original time series into a set of oscillatory modes, termed as
Intrinsic Mode Functions (IMFs), but unlike wavelet, we are not restricted to the fixed,
70
5.2 Empirical Mode Decomposition (EMD)
predefined set of wavelets, which could be challenge to determine which one is the most
appropriate basis for a particular analysis.
It is my objective to exam, evaluate the use of HHT and EMD methods on the a non-
linear and nonstationary signal, such as EEG signals, in the emotion (affect) detection
applications. However, because EMD is still an evolving, empirical-based signal process-
ing method and lack of theoretical base. There are many problems exist in practical
use, especially with signals such as EEG, which consists multiple channels and correlated
with neurological events. This chapter will first provide an brief explanation on the EMD
algorithm. Some of these application issues and constraints will be discussed in details
and possible solutions are presented as well.
5.2.1 Decomposition Procedure
EMD is an adaptive signal decomposition method with which any complicated signal
can be decomposed into a series of Intrinsic Mode Functions(IMFs). There are two main
criterions to meet for during this decomposition procedure as stated in the original paper
[29] :
1. For each extracted signal mode (IMF), the number of extrema and the number of
zero crossings must differ at most by one, in other words, there should have no
riding waves for each IMF.
2. The mean value of the envelopes defined by the local maxima and local minima
should be zero at any point, this indicates that the maxima and minima locates
symmetrically along the local mean (zero).
The first condition ensures the applicability of Hibert Transform on calculating phys-
ically meaningful Instantaneous Frequency [38], the application of EMD followed by
Hilbtert Transform is often referred to as the Hilbert-Huang Transform (HHT). Detail
71
5.2 Empirical Mode Decomposition (EMD)
procedure, also called sifting, for the EMD algorithm is shown below given a time series
x(t),
Algorithm 1. The standard EMD Algorithm
1. Find the locations of all the extrema of x′(t).
2. Interpolate (using cubic spline interpolation) between all the minima (respectively
maxima) to obtain the lower signal envelope, emin(k) (respectively higher envelope emax(k)).
3. Compute the local mean m(k) = [emin(k) + emax(k)]/2.
4. Subtract the local mean from the signal to obtain the ’oscillatory mode’ s(k) =
x′(k)−m(k).
5. If s(k) obeys the stopping criteria, then we define d(k) = s(k) as an IMF, otherwise
set x′(k) = s(k) and repeat the process from step 1.
The sifting process should stop when the residue r(k) become a constant, a monotonic
function, or a function contains only a single extrema, from which no more IMF can be
extracted. If the data has a trend, then the final residue should be the trend of the signal.
A sample of EMD decomposition outputs is shown in Figure 5.1 and Figure 5.2, the
total number of IMFs are determined by the oscillatory complexity of the original signal
and the stopping criterion.
5.2.2 Hilbert-Huang Spectrum
In [38], it is shown that in order to obtain a physical meaningful Instantaneous Frequency
(IF) through the use of analytical signal, the crucial and necessary condition is that the
signal has to be monocomponent, zero-mean locally and the wave has to be symmetric
with respect to the zero mean. In other words, the construction of analytical signal for
calculation of IF works only with mono-component signals only not complex waves with
72
5.2 Empirical Mode Decomposition (EMD)
sign
al
Empirical Mode Decomposition
imf1
imf2
imf3
imf4
imf5
imf6
imf7
imf8
imf9
imf1
0re
s.
Figure 5.1: Sample EMD Decomposition for Participant 2 Session 3 Negatively Excited
sign
al
Empirical Mode Decomposition
imf1
imf2
imf3
imf4
imf5
imf6
imf7
imf8
imf9
imf1
0im
f11
res.
Figure 5.2: Sample EMD Decomposition for Participant 2 Session 3 Positively Excited
73
5.2 Empirical Mode Decomposition (EMD)
riding waves. By applying EMD before the construction of analytical signal and the
application of Hilbert transform, we are able to extract the meaningful instantaneous
frequency for the multicomponent signals. More information on instantaneous frequency
can also be found in [96].
First, let’s define instantaneous frequency, Gabor [30] was the first to introduce a
complex analytic signal, which was later employed to define the instantaneous frequency
as the time derivative of the phase of a signal by Ville [93]. For a mathematical definition,
let’s reconsider the definition of analytical signal, where the imaginary part is the Hilbert
transform x of the signal x(t). Then we can write
z(t) = x(t) + jx(t),
x(t) = x(t) ∗1
πt(5.1)
or in the exponential form
z(t) = a(t)eiΦ(t) (5.2)
where amplitude a(t) and the phase Φ(t) are defined as
a(t) =√
x2(t) + x2(t) (5.3)
therefore, the instantaneous frequency of the signal x(t) is
IF (t) =dΦ(t)
dt=
x(t) ˙x(t)− x(t)x(t)
x2(t)− x2(t)(5.4)
However, there are also challenges in the interpretation of Instantaneous frequency,
as discussed here [11].
From Figure 5.3 we see that there are multiple frequency components at a time
instance. From this plot, we can also see the 50Hz power line interference (sample data
was recorded at University of Zagreb, Department of Telecommunications, Croatia).
74
5.2 Empirical Mode Decomposition (EMD)
Number of samples
Inst
anta
neou
s F
requ
ency
Multi−components of EEG Signal
500 1000 1500 2000 2500
10
20
30
40
50
60
70
Figure 5.3: Hilbert Huang Spectrum for Instantaneous Frequency between 0− 70Hz
5.2.3 Multivariate EMD
EEG signals are typically multiple channels, and although we can apply EMD indepen-
dently to each channel to extract the IMFs, and potentially we could obtain the same
number of IMFs as well for each channel. There is one fundamental problem, since the
signals are from different sources (channels), the frequency content of a particular IMF
are unmatched across different channels, this will make the interpretation of a particular
number of IMFs (subset) or across different channels or trials very difficult. It will also
challenge the interpretation of the significance of the results. More details on the chal-
lenges with multivariate inputs and the misalignment of Instantaneous frequencies from
corresponding IMFs can be found here [78]. This prompt the adaption of multivariate
version of the EMD, which ensures the same number of IMFs across all multi-variables,
and also the IMFs indexing problem.
Mode alignment in multivariate data corresponds to finding a set of common scales
or modes across different components (variables) of a multivariate signal, thus ensuring
75
5.2 Empirical Mode Decomposition (EMD)
that the IMFs are matched both in the number and in scale properties. This is extremely
important, since we have to have the same number of features across different channels,
trials, sessions and different subjects. It is impossible to design a feature extracting
algorithm with varying length of features. The final dimension of the features will be
determined by the total number of IMFs and the specific feature extraction algorithm
we apply to each IMF. Therefore, if the number of IMFs vary from channel to channel,
from trial to trial, it is extremely hard to define a common space for all the samples.
New samples will be projected onto the feature space spanned by the features extracted
from the IMFs.
To solve this problem, an augmented matrix of size M × N is used as input to the
EMD feature extraction module, where M equals to the number of electrodes (54) for
each sample multiplies the total number of samples available and N equals to the signal
length per channel (2.5 × fs). A sliding window of size 30 with 10 overlapping is used
to calculate the IMF components for each electrode, and the IMF extraction method is
used across all desired channels (electrodes).
Algorithm 2. Multivariate extension of EMD Algorithm (MEMD)
1. Choose a suitable pointset for sampling on an (n− 1) sphere.
2. Calculate a projection, denoted by P θk(t)Tt=1, of the input signal {v(t)}Tt=1 along the
direction vector xθk , for all k (the whole set of direction vectors), giving pθk(t)Kk=1 as the
set of projections.
3. Find the time instants{
tθki
}
corresponding to the maxima of the set of projected
signals pθk(t)Kk=1.
4. Interpolate[
tθki , v(tθki )]
to obtain multivariate envelope curves eθk(t)Kk=1.
5. For a set of K direction vectors, the mean m(t) of the envelope curves is calculated as
m(t) =1
K
K∑
k=1
eθk(t) (5.5)
76
5.3 Signal Reconstruction using MEMD as a Filter Bank
6. Extract the ’detail’ d(t) using d(t) = x(t) −m(t). If the ’detail’ d(t) fulfills the stop-
page criterion for a multivariate IMF, apply the above procedure to x(t)−d(t), otherwise
apply it to d(t).
5.3 Signal Reconstruction using MEMD as a Filter
Bank
It has been shown that EMD behaves as a filter band by Flandrin et.al in 2004[29],
lately Rehman [92] also studied the filter bank property of MEMD on EEG signal. The
filtering process is carried out based on the Instantaneous Frequency of the signal at
each time instance, without the requirement of any information at other time instances.
For a multiple component signal where multiple intrinsic frequencies super-imposed at
each time instance, through the use of EMD, signal is decomposed into a series of IMFs,
each with its own instantaneous frequency. This is very different from the conventional
filtering process, since the filtered result is adaptive, and is able to signal components
that are over-lapping both in time and frequency. This is mainly due to the fact that the
result is not influenced by a set of a priori basis and no convolution procedure required.
Other methods, such as wavelet analysis, a predetermined basis, is convolved with the
multiple component signal in time and to re-fine the frequency resolution, the ’mother’
wavelet is scaled at each decomposition level (expand by a factor of α in time will result
a 1αchange is the frequency domain).
MEMD is an analysis method that in many aspects gives a better understanding of
the physics behind the signals [38]. Because of its ability to describe short time changes
in frequencies that cannot be resolved by Fourier spectral analysis, it can be used for
nonlinear and nonstationary time series analysis. Each extracted signal admits well-
77
5.3 Signal Reconstruction using MEMD as a Filter Bank
100
101
102
10−8
10−6
10−4
10−2
Frequency
PS
D in
Log
Sca
le
IMFs obtained using Regular EMD
Figure 5.4: The filter bank property of Regular EMD
100
101
102
10−8
10−6
10−4
10−2
Frequency
PS
D in
Log
sca
le
IMFs obtained using Multivariate EMD
Figure 5.5: The filter bank property of MEMD
78
5.3 Signal Reconstruction using MEMD as a Filter Bank
defined instantaneous frequency (see Figure 5.6). Due to the above stated properties,
MEMD can be used effectively as a filter bank to extract frequency components of interest
for EEG signal analysis. As stated in Section 3.2, physiological signals like EEG is very
noisy and denoising such signals is one of the most important step. Effective filtering
process will enable better understanding of the underlying physiological process and the
intrinsic characteristics or sources more accessible.
5 10 15 20
0.20.40.6
Inst Freq: 166.15
5 10 15 20
0.20.40.6
Inst Freq: 107.17
5 10 15 20
0.20.40.6
Inst Freq: 64.12
5 10 15 20
1
2
Instantaneous Amplitude and Averaged Frequency of the IMFs
Inst Freq: 25.05
5 10 15 20
0.20.4
Inst Freq: 14.93
5 10 15 20
0.20.40.6
Inst Freq: 8.67
5 10 15 20
0.20.4
Inst Freq: 6.95
5 10 15 200
0.20.40.6
Inst Freq: 5.51
5 10 15 20
0.20.40.6
Inst Freq: 4.81
5 10 15 20
0.10.20.3
Inst Freq: 4.52
5 10 15 20
0.20.4
Time
Inst Freq: 8.09
Figure 5.6: Instantaneous Amplitude and Averaged Frequency of the IMFs
We now consider a unique reconstruction method based on the Hilbert-Huang spec-
trum which we refer to as Hilbert-Huang (HH) reconstruction. In recent paper, Tomasz
Rutkowski [82] suggested to select segments of IMFs based on its instantaneous frequency
79
5.3 Signal Reconstruction using MEMD as a Filter Bank
components, shown on the Hilbert Spectrum. Typically, we have a band of signal of in-
terest, and also a margin of , for example, 10% is used, then we label the time intervals
of such occurrence in the time domain to reconstruct the EEG signals, or to isolate the
regions of interest of the IMFs for further analysis.
Given a signal d(k), we propose to remove any unwanted frequency information and
construct a signal, d(k),that retains only desired frequency characteristics from d(k).This
is achieved by first decomposing d(k) into a set of N IMFs, ci(k), and determining the
instantaneous frequencies. fi(k) denotes the instantaneous frequency of the ith IMF at
time instant k. Given the scenario where it is required to retain frequencies greater that
flow and lower than fhigh, we have
ci(k) =
ci(k), if flow < fi(k) < fhigh
0 otherwise(5.6)
Essentially all values of ci(k) that do not fall within the desired frequency range are
set to zero. We can construct d(k) by summation of the IMF values that fall within the
desired frequency range, to obtain
d(k) =
N∑
i=1
ci(k) (5.7)
In this thesis work, one of the objectives is to exam the use of MEMD as a filter bank to
eliminate effects from noises and artefacts, has been effective, however, by simply sum-
ming up IMF components that within the desired frequency band, we might potentially
cause discontinuity or spurious effect in the reconstructed EEG. Another solution has
been proposed by [61], where a weight matrix is used to optimize the selection of IMF
components and maintain the continuity of the background components.
5.3.1 MEMD for Feature Extraction
In the paper by [72], Petrantonakis presented the relation between IMFs and felicitated
emotion presented in the EEG signal. In his study, by using Genetic Algorithm and the
80
5.4 Genetic Algorithm
Initial Signal
MEMD-IMFs Extraction
IMFs Selection
Signal Reconstruction
StatisticalNarrow-
bandwavelet HOC
ME
MD
Fil
teri
ng
Classification
Affect Recognition Rate
Fe
atu
re A
na
lysis
0
0
1
1
0
1
0
0
0
0
0
0
0
LDA, kNN
Feature Vector
Pos. exited, Neg. excited, Neutral
Figure 5.7: MEMD for Signal Reconstruction and Feature Analysis
Fractal dimension fitness function stated here [73], 3 − 5 IMFs were selected for each
channel. We are adapting the feature extraction method presented in [72] for our own
analysis, which will be discussed in Chapter 6.
5.4 Genetic Algorithm
Due to the large number of IMFs produced after the decomposition along with a large
number of EEG channels present, there is much redundancy in the features directly
extracted from all IMFs. There are two problems present due to the high dimension of
81
5.4 Genetic Algorithm
feature space, first of all, it is the computation complexity and secondly, a very large
number of samples are required to produce a meaningful statistical model [23]. Since we
have a small number of observations comparing to the dimension of the features, also
the uncertainty of which electrodes provides more information in terms of discriminating
one emotion class from the other. Generic Algorithm is applied to reduce the number of
channels and the number of IMFs for feature extraction analysis. The objective here is
to use GA on the collection features in hopes to reduce the feature dimension and also
discover the main class-specific features, to boost classification performance.
Initial
Population
EEG
Features
Continue
Evolution?
Optimal
Features
Fitness
Calculation
Crossover
Mutation
Updated
Population
No
Yes
0
0
1
1
0
1
0
0
0
0
0
0
0
Figure 5.8: The block diagram for Genetic Algorithm
Generic algorism is a non-ranking, global optimization algorithm that was introduced
in [31]. The optimization process is mimicking the natural selection from a large popula-
tion members. It iteratively modifies (mutates) a population of individuals (variables of
the features space) to maximize a fitness criterion. At each step, genetic algorithm tries
82
5.4 Genetic Algorithm
to select the best individuals, which will be used to allow the generation of offsprings
(with set crossover rates). Over successive generations, the generation evolves towards
an optimal solution. The algorithm terminates when maximum generation is reached.
The selection of maximum generation is typically chosen when the fitness criterion (or
performance0 stabilizes.
There are mainly three parameters that governs the process at each step (see Figure.
5.8):
1. Select the individuals (parents) to generate next generation
2. Crossover rules applied to combine two parents to form children for the next genera-
tion
3. Mutation rules apply random changes to individual parents to form children
Related papers [86, 89] have shown that genetic algorithms, from the nonranking
group, can be successfully applied to feature selection. When using ranking methods such
as Principle Component Analysis (PCA) the chosen feature vectors can contain features
that are correlated with each other and at the same time do not bring in significant new
information for the classifier. When use the correct classification rate as a fitness measure,
Genetic algorithm will choose features that are most representative to the class labels.
Another major advantages of Genetic Algorithm is the fact that we are able to represent
our features and electrodes in binary form to for feature reduction, which differs greatly
from the conventional, ranking based methods such as Principle Component Analysis
(PCA) or Linear Discriminate Analysis (LDA).
On the other hand, the main disadvantage is the fact the fitness criterion is calculated
based on a large number of samples, population at each generation times the number of
generations, which can take a long time to compute before a result can be obtained. Also
due to the randomness of mutation and crossover at each generation, each run of genetic
algorithm creates slightly different set of features. Since the features are not ranked, in
83
5.4 Genetic Algorithm
general, we do not know which feature is more significant in the classification process
later on. Therefore, to elevate the above two shortcomings and to learn the importance of
each features, multiple runs of genetic algorithm is proposed and the features with higher
frequencies in appearance are considered to be more significant and are selected in the
channel reduction process. It also enables us to determine where to place the electrodes
and which frequencies of EEG spectrum are the most important.
5.4.1 Fitness Functions
Fitness function as a selection criterion is to conserve some of the characteristics of EEG
signal in the feature reduction process. Previously, methods to best preserve the char-
acteristics of EEG signals, such as the energy (event-related potential) and complexity
(Fractal Dimension) have been used in the literature. One of the promising fitness func-
tion is based on the fractal dimension, which is a measure of the irregularity of a curve.
For example, fractal dimension method has been used in the analysis of epileptic seizure
using EEG [75]. However, such selections of fitness function gives no consideration of
the association of features with class labels or anything beyond each individual samples.
Since the objective of my study is to find the most discriminating channels and frequency
range, a different fitness function has be to be considered. Here the correct classification
rate is used as a fitness measure for the GA.
The aim of the genetic algorithm is to maximize the fitness function. After the IMFs
have been chosen, with the inverse operation of the EMD algorithm, a new EEG signal
is constructed using only the selected IMFs. Figure 5.8 demonstrates the GA feature
selection process in an EEG signal which corresponds to the emotion of positively excited.
84
5.5 Summary
5.5 Summary
In this chapter, a novel time-frequency signal processing method, Hilbert-Huang Trans-
form (HHT), was introduced. This method is useful in calculating the instantaneous
frequencies of a non-linear and non-stationary signal (e.g., EEG signals) through the
combined use of Hilbert transform and Empirical Mode Decomposition (EMD). Like
wavelet methods, EMD method will decompose the original time series into a set of oscil-
latory modes, termed as Intrinsic Mode Functions (IMFs), but unlike wavelet, we are not
restricted to the fixed, predefined set of wavelets, which could be challenge to determine
which one is the most appropriate basis for a particular analysis. HHT is completely
data driven and does not assume the signal to be stationary or piece-wise stationary as
required by other conventional signal processing methods (e.g., Fourier based methods)
that are violated by most biomedical signals.
To have an unified understanding on the individual modes and frequency scales across
all EEG channels, an extended version of Empirical Mode Decomposition (MEMD) were
investigated and applied to all channels of EEG. MEMD algorithm was further applied
as a filter bank to extract EEG components of interest (Alpha and Beta waves) and used
to reconstruct the EEG signals that are used for feature analysis in Chapter 6.
To reduce the computational complexity of such EEG analysis system and also to
better understand the association of EEG channels and affect expression, Genetic Al-
gorithm (GA) is researched and applied to extract the most affect-specific channels out
of a much larger number of channels (54). Genetic Algorithm is a non-ranking global
optimization method that maximize a set optimization criterion, for this study, we have
chosen the correct classification rate as a fitness function. Its effectiveness will be further
discussed in Chapter 6 through the use of a set of simulation results.
85
Chapter 6
Experimental Setup and Simulation
Results
6.1 Introduction
In this chapter, a series of simulation testings on emotion (affect) detection using EEG
signals are presented and discussed. The focus of the simulation testings are multiple
folds. Due to the complex nature of human emotions and the non-unique mapping
between the emotion expressions and the CNS signals, four features analysis algorithms
that emphasize on either the event-related potential or oscillation pattern variation of
the EEG signal are implemented and tested for better understanding of emotion-specific
characteristics of EEG signals. A publicly available EEG-Emotion (3 Classes) dataset
will be used to evaluate the efficacy of these features. The experimental protocol and
related components will be discussed and evaluated in order to determine the appropriate
simulation parameters. Through the use of this dataset, classification performance using
these features will be presented and discussed towards the end of the chapter along with
application limitations.
Given the large number of electrodes present in many research work, augmented fea-
86
6.1 Introduction
3 Emotions* No. of trials *54 channels
(30 trials for Pos, Neu. Neg. per session)
Raw EEG
Signal
Multivariate EMD
Genetic Algorithm
Channel selection
& IMF selection
Classification:
kNN, LDA
Feature Analysis
Best
Performance?
Emotions
IMF selection
Classification:
kNN, LDA
Feature Analysis
Emotions
EEG Reconstruction
EEG Reconstruction
Figure 6.1: Experimental Components used to for Simulation
87
6.2 Data Collection Protocol
tures directly extracted from each channel provide much redundancy in discriminating
information and are also unnecessarily high in dimension. The high dimension of re-
sulted feature space will require a very large number of samples to produce a meaningful
statistical model [23] which pose great challenge in the experimental process. To solve
this problem, generic algorithm, as a global optimization algorithm, is applied to sys-
tematically reduce the number of channels and the number of IMFs required for feature
construction. The outcome of this study provides two key information in understand-
ing the relationship between emotion and brain waves, one is the location of the most
emotion-specific channels (discriminating power) and the frequency range of the emotion-
specific brain waves (Instantaneous Frequency analysis of the IMFs). This will provide us
with a mean to compare and validate the research findings from the Psychophysiological
literature and aid the study on understanding of human emotion. Practical constraints
such as the minimum required window length in time domain and the edge effect of
windowing operation will also be discussed.
6.2 Data Collection Protocol
The choice of emotions to be detected is largely dependent on the specific application. For
example, detecting disgust is not as important as frustration in the learning process, and
distinguishing positive feelings from the negative ones is sufficient enough as a feedback
to the service a customer received. In the context of multimedia content indexing and
retrieving, a larger group of emotion labels will be needed to provide fast and meaningful
information.
For this study, we have focused our work on the recognition of three affective states
for the following reasons. According to the circumplex model of emotion (see Section 2.2)
and findings in the Psychology literature [3, 58, 22], high arousal can be interrelated as
highly motivated, high valence means the current situation is pleasant and approachable,
88
6.2 Data Collection Protocol
whereas low valence is unpleasant, avoidable. Hence, in the domain of learning, decision
making and behavior monitoring, there are three emotional states that play the most
significant role. We would like to know when the person is happy (’positively excited’),
or frustrated (’negatively excited’), or bored (’calm’). Therefore, these three affect states
will provide critical and helpful information for the affect-sensitive applications shown in
section 1.3.
Excited
Negative Positive
Calm
ACTIVATION
DEACTIVATION
PLEASANTUNPLEASANT
tense
nervous
stressed
upset
sad
depressed
lethargic
Fatiguedcalm
relaxed
serene
contented
happy
elated
excited
alert
Figure 6.2: The Three Emotion Classes Studied in This Project
To test the performance of the designed system, we make use of a publicly avail-
able database (Emobrain) that was recorded during eNTERFACE06 workshop and was
specifically designed for detecting the above three affective states. The eNTERFACE06-
EMOBRAIN database [2] was collected by a research group during a summer workshop
on multimodal interfaces in 2006, at University of Zagreb, Dubrovnik, Croatia. The ob-
jective of the database was to provide a common framework for emotion assessment from
multimodal physiological signals. It consists emotionally-driven physiologically signals
from both the peripheral (galvanic skin response, respiration and blood volume pressure)
and central nervous system (EEG and frontal fNIRS). Since our study is focused only on
brainwaves, we have only used the EEG recordings. More details can be found in [85].
EEG data were collected from 5 participants, aged from 22-38, for three different
89
6.2 Data Collection Protocol
Figure 6.3: Protocol description for eNTERFACE06-EMOBRAIN database
sessions with 30 trials per session. The experimental protocol is detailed in Fig. 6.3. For
each session, participants were stimulated using images selected from the International
Affective Picture System (IAPS) [56]. The images were divided into three categories
: exciting negative, neutral and exciting positive based on their valence and arousal
scores shown in Table 6.2. These thresholds were imperially defined according to the
circumplex model shown in section 2.2.2.Each trial consists of a block of five images
selected for the same affect class, this to insure stability of the emotion over time. Each
picture was displayed on the screen for 2.5 seconds leading to a total of 12.5 seconds per
block. Blocks of different classes were displayed in random order to avoid participant
habituation. The total number of observations obtained was 5× 3× 30 = 450.
calm : arousal < 4;< 4valence < 6
positive exciting : valence > 6.8;
V ar(valence) < 2;
arousal > 5
negative exciting : valence < 3; arousal > 5
This selection resulted in 106, 71, and 150 pictures respectively for the above three
90
6.2 Data Collection Protocol
classes and shown in Fig. 6.4.
1 2 3 4 5 6 7 8 91
2
3
4
5
6
7
8
9
Arousal Scores
Val
ence
Sco
res
Postively Excited
Negatively Excited
Calm
Figure 6.4: Selected IAPS images for the 3 classes emotion elicitation experiment
6.2.1 Recording Device: Biosemi Active 2
EEG signals were recorded using the Biosemi Active II system (shown in Figure 6.5)
which has 64 surface electrodes sampled at 1024 Hz. Recorded data is saved in EDF
format, so it can be converted other formats such as csv or text files. However due to
the parallel recording of fNIRS signals, ten frontal electrodes were removed and the final
EEG recordings were only consists of signals from 54 electrodes. (64 less the following
following ten frontal electrodes: F5, F8, AF7, AF8, AFz, Fp1, Fp2, Fpz, F7, F6, due
to the simultaneous placements of fNIR sensors).
91
6.2 Data Collection Protocol
(a) Sensor layout (b) Biosemi Active Two
Figure 6.5: Biosemi Active Two
6.2.2 Ground Truth Definition
Emotion is known to be very subjective and dependent on social context and previous
experience [80]. However, emotion consistency across participants is important in de-
signing a generalized emotion recognition system. One can never be sure that the person
feels the emotion that was intended by the pictures; and a self-assessment gives a good
estimation whether the pictures evoked similar emotions among participants.
In order to deal with the problem of emotion consistency across participants (i.e.,
two subjects may experience a very different feeling during the same stimulus) partici-
pants were also asked to self-assess their emotions on a simplified version of the SAM
(Self-Assessment Manikin) scale. Defining a ground-truth for the purpose of emotion
assessment strongly depends on the protocol used to record emotional reactions. Since
self-evaluations were also collected, the ground-truth can be defined either based on the
classes defined by the IAPS evaluations, or using the self-evaluations.The ground-truth
analysis can be found in the original database paper, see reference [64]. Self-assessment of
the images is a good way to estimated whether desired emotion have been induced from
92
6.2 Data Collection Protocol
the subject, also to have an idea about the emotion stimulation ”level” of the subject.
For this experiment, participants were asked to rank their emotions on the scale of 1-5
how the arousal or valence components have been.[64] on a simplified version of the SAM
(Self-Assessment Manikin) scale. Since the SAM scores were obtained after the projection
of 5 images taken from the same class, a new set of IAPS scores were computed as the
mean value of the IAPS scores for the 5 images in that trial.
6.2.3 Ground Truth Validation Using Pearson Correlation Co-
efficients
In order to understand the relationship between the self assessments and the IAPS
scores, we calculated the Pearson correlation coefficients for the two variables, valence
and arousal. The averaged Pearson correlation coefficient between the IAPS scores and
SAM scores is .754 for the valence dimension, and .817 for the arousal dimension. These
values show that the correspondence between the expected emotion and the experienced
emotion is very good, and that the images do evoke the desired emotion most of the
times.
Table 6.1: Pearson correlation coefficient between IAPS scores and self assessments per
participant
Correlation Valence Correlation Arousal
P1 0.6994 0.9387
P2 0.6286 0.8628
P3 0.6550 0.6583
P4 0.9816 0.8533
P5 0.8057 0.7716
Further examination of the self assessment inputs from Participant 2 and Participant
93
6.2 Data Collection Protocol
5 has shown that their inputs in each trial for the valence and arousal are mostly equal,
and have a variance around 1 between the two variables. However, it might be an very
hard task to quantize one’s emotion according to the 2D model, for example, some persons
tends to give extreme scores where others always choose the center. This does not mean
that the participants didn’t experience the emotion, it simply implies that he/she was
having trouble to ’express’ the emotion (which is indicated with the correct recognition
rate, shows the existence of an emotion state). Also from the application point of view,
self-assessment inputs may be not always possible to obtain, such as people with autism
disorder. Therefore, due to the above reasons, we used the IAPS scores for the final
labeling of the data.
6.2.4 Ground Truth Validation Using Confusion Matrix
Through the use of self-assessed values, to validate the collected EEG database we ob-
tained the confusion matrix for each participants using the IAPS labels for each sample
against those obtained from self-assessment. Which is shown below:
We compared the averaged self-Assessment scores obtained from all 5 participants
against the IAPS inputs and the results are shown in the confusion matrix 6.2.
Table 6.2: Averaged Self-Assessment Classification Accuracy (in Percentage) for the
Three Chosen Emotions
Emotions Calm Pos. Excited Neg. Excited
Calm 73.43 23.81 2.76
Pos. Excited 36.96 60.18 2.86
Neg. Excited 18.91 0.71 80.38
94
6.3 Feature Extraction
6.3 Feature Extraction
In the recognition rate presented in the literature of affect detection, due to the large
variation in the experimental protocol, there is no clear evidence as which characteristics
of EEG that is best representable in affect detection. However, in general, the approaches
are all focused on the Event Related Potential (EDR) and oscillation pattern variation.
In order to best exam the effectiveness of each feature extraction method and have a
fair comparison. Several state-of-art feature extraction algorithms, refer to section 4.2
for more details, were implemented and used to construct the feature sets for affect
classification.
To extract the HOC features, we further examined the parameter L (order) by setting
L = 3−30 and the optimal L (highest classification rate) value was used to for simulations
in the later sections:
0 5 10 15 20 25 300.8
0.82
0.84
0.86
0.88
0.9
0.92
0.94
Order of HOC
Cor
rect
Rec
ogni
tion
Rat
e
Subject Independent Recognition Rate using HOC and 54 Channels
Figure 6.6: HOC Order vs. Correct Recognition Rate (54 Channels)
As is shown on the plot, L = 9 provides the highest correct classification rate and
this finding is consistent in findings shown in [71]. For the rest of the analysis using HOC
features, we set the order value to 9 for further analysis.
Each recorded sample is 2.5 seconds long and was recorded during the image projec-
95
6.4 Data Splitting and K Cross Validation
0 5 10 15 20 25 300.8
0.82
0.84
0.86
0.88
0.9
0.92
0.94
The order of HOC
Cor
rect
Rec
ogni
tion
Rat
e
Subject Independent Recognition Rate using HOC and 6 Channels
Figure 6.7: HOC Order vs. Correct Recognition Rate (6 Channels)
tion period for one of the three affect states. For each sample, four sets of features were
generated on the MEMD-reconstructed EEG signal of each channel. Table ?? shows the
dimension of each feature type where Ne is the number of channels used for each analysis.
Table 6.3: The features extracted from the EEG signals
Type of features Size Reference
Statistical fst 6×Ne 4.2.1.1
Narrow-band Energy 11×Ne 4.2.2.1
Higher Order Crossings 9×Ne 4.2.1.2
Wavelet Energy and Entropy 2× 3×Ne 4.2.3.1
6.4 Data Splitting and K Cross Validation
For the selected Emobrain database, there were 30 trials for each session and three
sessions in total for each subject. Table 6.4 shows the total number of samples used for
96
6.4 Data Splitting and K Cross Validation
this study. Samples from subject 1 session 1 were excluded due to different sampling rate
and also the missing inputs of IAPS images used for this session, the image listing file
associated with this recording session was a duplicate of session 2, which raises question
on the legitimacy of these samples. Samples from subject 2 session 1 were also excluded
due to inconsistency in recording setting comparing to other trials.
Table 6.4: Overview of the dataset
subject Used sessions Total trials Pos. Trials Neg. Trials
S1 sess2,3 300 100 100
S2 sess2,3 300 100 100
S3 sess1,2,3 450 150 150
S4 sess1,2,3 450 150 150
S5 sess1,2,3 450 150 150
In this supervised machine learning process, as shown in the system diagram ??, there
are two stages (training and testing) involved to obtain the desired statistical model for
such classification purpose. The training samples were used to fit the model using a
fitness criterion (e.g., minimizing the probability of error). The testing samples were
used to test how well this model holds with un-seen, new samples. There are a few
ways to obtain the training and testing samples. Ideally, the testing samples should be
recorded during a testing session at which the developed model are targeted to be used.
However, such multiple-session recordings at different time or location can cause severe
data inconsistency (e.g., through hardware setup) that are not relevant in the model
development or un-necessary in the preliminary design. One way to get around this is
to divide randomly the existing dataset into two disjoint groups for training and testing
purposes and the testing data should be used only once to test the model. This is usually
referred to as hold-out test.
97
6.5 Simulation Results
6.4.1 k-Fold Cross Validation
Holdout testing provides an unbiased measure of performance,however, when the deal-
ing with a small data set, k-fold cross validation process can be helpful. K-fold cross-
validation builds on the idea of holdout testing by rotating data through the process.
Data is again divided randomly into groups, but now k equal-sized groups are used. The
train-test process is repeated k times, each time leaving a different segment of the data
out, as the test set.
Typical K values are from 5 to 10, with a common choice for k is 10 resulting in 10-
fold cross-validation. In 10-fold cross-validation, the observations are randomly assigned
to 10 groups. Ten separate models are built and tested on distinct data segments. The
resulting 10 performance measures are unbiased since none of them was built with test
data that was used during training. The single, final performance measurement is taken
as the mean of these 10 performance measures.
For our study, due to the small dataset constraint, k-fold cross validation process was
used to obtain the simulation results shown in the sections below.
6.5 Simulation Results
This section will provide the simulation results on affect detection using EEG. In general,
two sets of results are presented here, the subject-specific recognition rates and the non-
subject specific recognition rates. Subject-specific refers to the cases that the training
and testing samples are from the same subject, with a unseen portion of the samples
(testing) used for validate the overall performance. Cross-subject refer to the case that
observations from different subjects over various sessions were concatenated into one
augmented matrix and a portion of this matrix (observations) were used as training
samples and the remaining portion was used for testing purposes following the details
stated in the K-fold validation process. Also the recognition rates were generated using
98
6.5 Simulation Results
features without feature reduction.
6.5.1 Simulation Results Using All Channels
Since the EEG feature sets, are of very high dimensionality (thousands of features) com-
pared to the number of samples in the sets (450 per trial or 2250 per image, depending
on the classification scheme), it is believed there is always a linear boundary that can
completely separate training samples of the different classes. Another advantage of us-
ing linear classifiers is that they give better generalized solutions. However, in the high
dimensional EEG feature space, a exponentially larger sample size is required to have a
meaningful statistical analysis [23]. Linear classifiers such as Linear Discriminating Anal-
ysis (LDA) on a small sample size would run into the singularity problem and feature
reduction mechanisms need to be used to solve the sparsity of the samples in the high
dimensional feature space. The choice of feature reduction methods can greatly affect
the recognition rate later on, which makes it harder to compare the effectiveness of the
selected features on representing the variations of EEG signals under emotion stimuli.
For the first stage of simulation testing, we opted for a non-linear classification method
without feature reduction to avoid error propagation due to such feature reduction meth-
ods. In the case when the feature dimension is comparable to the number of samples
available, LDA was also applied to test the feasibility of a linear classifier.
K Nearest Neighbors (kNN) classifier with Euclidean distance as a distance metric
was used here, and odd number of neighbors 1, 3, 5, 7, 9 were picked to avoid ties, how-
ever, similar results would be obtained with even numbers when majority rule was used
for nearest points tie breaking. kNN is an instance based method. By increasing the
number of neighbors, the effect of artefacts are reduced within classes, but the class
boundary between classes are also enlarged, which could potentially degrade the classifi-
cation performance. Therefore, the final recognition performance is most depends on the
class separation in the feature space.
99
6.5 Simulation Results
5-fold cross validation process was carried out to test the robustness of our system
and also to overcome the sample size problem, at each folder, 80% of the samples were
used for training and 20% of the samples for testing. Simulation results shown in Table
6.5 were generated using both kNN and LDA classifier on the specified features without
applying feature reduction algorithms.
6.5.1.1 Subject-Specific Emotion Recognition
The following table shows recognition rate using all the Electrodes.
Table 6.5: Emotion Recognition rates using ALL 54 electrodes and 5NN
Participants Statistical Narrow-bands Power HOC Wavelet
P1 87.67 82.67 96.33 62.67
P2 88.67 90.00 97.00 91.67
P3 86.22 84.22 93.33 75.11
P4 88.00 93.33 93.33 88.224
P5 83.56 88.22 97.78 76.67
Based on the above results, we performed a sensitivity test on the kNN classifier. We
examined how well kNN performs with varying number of Neighbours, K = 1, 3, 5, 7, 9,
using HOC features.
As we can see, the recognition rate is rather stable with varying number of neighbours,
this shows that the three emotion classes were well separated in the feature space and a
linear classifier is suitable for this classification as well.
100
6.5 Simulation Results
1 3 5 7 90
10
20
30
40
50
60
70
80
90
100
Number of neighbors for KNN
Cor
rect
Rec
ogni
tion
Rat
e
Recognition Rate for Each Participant Using HOC
Figure 6.8: Recognition Rate using HOC features for K = 1,3,5,7,9
Table 6.6: Cross-Subject Emotion Recognition rates using ALL 54 electrodes
Classifier Statistical Narrow-bands Power HOC Wavelet
5NN 81.39 82.62 90.77 77.44
LDA 59.18 63.49 79.64 55.90
6.5.1.2 Cross-Subject Emotion Recognition
6.5.2 Simulation Results with Channel Reduction
In the part of the testing, we would like to find out which channels pertain high class-
related discriminating powers.
6.5.2.1 Channel Reduction in Reference to Commercial Devices
One of the objectives of this project is to investigate the feasibility of commercially
available EEG recording devices for emotion analysis. The EEG recordings from the
eNTERFACE06 project was collected using Biosemi Active II, a EEG cap that has 64
different channels and was designed for medical applications. Emotive epoch is a com-
101
6.5 Simulation Results
mercially available, wireless neuro headset with 14 EEG channels. As is shown in Table
6.7, the Emotiv Software Development Kit (SDK) for research includes a 14 channel
(plus CMS/DRL references, P3/P4 locations), high resolution and provides wirelessly
neuro-signal acquisition and processing.
Table 6.7: Device Specifications
Device Biosemi Active 2 Emotive EPOC SDK
Data Format EDF MAT
Resolution 24 bits ADC 16 bits (14 bits effective)
Sampling Rate 1024Hz 128 SPS (2018 Hz internal)
Channels 64 14
Channels in common AF3, F7, F3, FC5, FC6, F4, F8, AF4
However, because both caps use a standardized electrode placement according to
the 10-20 system, by comparing the location of the electrodes, we were able select a
set of EEG channels that are present in both device (shown in Figure 6.9), and obtain
classification results using features extracted from the common channels. Table 6.8shows
the recognition rate with EEG channels that are present in both devices.
We carried the simulation testing using reduced number of electrodes that are common
in our data set and also present in in the commercial headset as listed above. We used
the same simulation parameters for feature extraction and classifier design. The subject-
specific emotion detection performance is listed below.
We further tested the classification performance with the cross-subject case.
The correct recognition rate using KNN classifier shown in Table 6.9 are near identical
to the results in Table 6.6 with significantly reduced number of electrodes. These results
provided evidence on the feasibility of consumer grade headsets for real-time, emotion
recognition in mobile applications. However, the recognition rate using LDA decreases
dramatically, this indicates that the samples of the reduced channels are not linearly
102
6.5 Simulation Results
Figure 6.9: 6 Channels referenced to Emotive Epoch
FPzFP2
AF4AF8
AFzAF3
FP1
AF7
Fz F2 F4F8
F6F1F3F5
F7
FCz FC2 FC4 FC6FC8
FC1FC3FC5FC7
Cz C2 C4 C6 T8T7
CPz CP2 CP4 CP6TP8TP7
C5 C3 C1
CP5 CP3 CP1
Pz P2 P4 P6P8
P10
P1P3P5
P7
P9 POzPO3
PO7PO4
PO8
CzC2C1
CMS DRL
Iz
Table 6.8: Subject Specific Recognition rates using 8 electrodes and 5NN
Participants Statistical Narrow-bands Power HOC Wavelet
P1 86.33 84.33 93.67 61.67
P2 86.00 89.33 98.33 89.67
P3 83.33 86.44 95.11 74.89
P4 88.67 90.44 97.33 88.00
P5 83.33 86.44 96.89 78.89
Table 6.9: Cross-subject Recognition rate using only 8 electrodes
Classifier Statistical Narrow-bands Power HOC Wavelet
5NN 68.15 78.15 89.64 58.87
LDA 38.77 39.90 43.13 37.74
separable in the projected feature space.
103
6.5 Simulation Results
6.5.2.2 Channel Reduction Using Genetic Algorithm
As it has been mentioned in section 5.4, we used a genetic algorithm for feature (here
refers to the channels) selection. The starting population consisted of 1000 individuals
and each of them contained a randomly generated binary string of the length of the total
number of channels. Next operations of mutation and crossover were performed (with
some selected probability). In this way, exchange of genes was realized and what follows
exchange of features. Only the best adapted individuals passed to the next step of the
algorithm. In order to verify which individuals were the best adapted a fitness function
(shown in section 5.4.1 was used, which trained the classifier and next classified the
data and returned the correct classification rate. For classification Linear Discriminant
Analysis and K Nearest Neighbours was used. The fitness function performed 10 fold
cross validation test and returned percentage of the correct classification rate as fitness
measure. The algorithm terminates when maximum number of generation was reached,
200 generations were calculated here (6.10). The GA process is shown in Figure 6.10
with the correct classification performance for a single launch of the genetic algorithm.
The algorithm stabilized after about 100 generations. Thus the algorithm can be stopped
after reaching about 100 generations which means that the time of its operation will be
shortened. Additionally, the execution time of GA can be shortened by modifying the
selected probabilities of mutation and crossover operations. However, the time required
by GA for feature selection is fairly long in practice, so GA is rather suitable for the
analysis of data in off-line mode.
Due the randomness of crossover and mutation process, each run of genetic algorithm
may result in selection of different set of features. So it is very important to determine
which of these features (and the same time which channels) bring important information
to the classification process 6.11. To solve this problem genetic algorithm was launched
ten times and next the selected features were compared. Features with the most repeating
frequency were selected as the final set of features. Results can be seen in Figure 6.11.
104
6.5 Simulation Results
0 10 20 30 40 50 60 70 80 90 1000.35
0.4
0.45
0.5
0.55
0.6
0.65
0.7
Generation
Fitn
ess
Maximum and Average Fitness
Average FitnessMax Fitness
Figure 6.10: Averaged and Maximum Fitness (correct recognition rate) in each generation
using LDA and GA
After this process, we will obtain the following important information as which re-
gion or location of electrodes provide more class-related discriminating information. To
further apply our findings here, we can compare the similarity between the locations of
the reduced number of electrodes with those of the commercially available devices such
Emotive. Comparing to the results we obtained in this study, we can draw such conclu-
sion as whether the commercial device will or will not be feasible for emotion analysis
applications.
The following are the channels selected after 10 runs of genetic algorithm. Channels
of 6 and 10 were presented here.
Next we tested the system performance using the channels selected using GA algo-
rithm
105
6.5 Simulation Results
5 10 15 20 25 30 35 40 45 500
10
20
30
40
50
60
70
80
90
100
Channel Index
Fre
quen
cy o
f App
eare
nce
Channel selection using Genetic Algorithm
Channel index60%65%70%
Figure 6.11: Channels selected through Genetic Algorithm
FPzFP2
AF4AF8
AFzAF3
FP1
AF7
Fz F2 F4F8
F6F1F3F5
F7
FCz FC2 FC4 FC6FC8
FC1FC3FC5FC7
Cz C2 C4 C6 T8T7
CPz CP2 CP4 CP6TP8TP7
C5 C3 C1
CP5 CP3 CP1
Pz P2 P4 P6P8
P10
P1P3P5
P7
P9 POzPO3
PO7PO4
PO8
OzO2O1
CMS DRL
Iz
(a) 8 Channels Selected using GA
FPzFP2
AF4AF8
AFzAF3
FP1
AF7
Fz F2 F4F8
F6F1F3F5
F7
FCz FC2 FC4 FC6FC8
FC1FC3FC5FC7
Cz C2 C4 C6 T8T7
CPz CP2 CP4 CP6TP8TP7
C5 C3 C1
CP5 CP3 CP1
Pz P2 P4 P6P8
P10
P1P3P5
P7
P9 POzPO3
PO7PO4
PO8
OzO2O1
CMS DRL
Iz
(b) 14 Channels Selected using GA
Figure 6.12: Channel of significance obtained through Genetic Algorithm
106
6.6 Sensitivity Testing
Table 6.10: Channels selected using GA algorithm
Frequency of
Appearance
Number of
Channels
Channels Selected
≥ 60% 18 Fc7, Fc5, Fc3, Cp4, Fz,
Cz, P1, Af4, F4, P2, Fc4,
Cp4, Cp6, Fc8, Iz, O1,
Ft7, Ft8
> 65% 8 Fc3, Fc5, Fc2, Fc4, F4,
AF4, CP4, Cp6
> 70% 6 Fc3, Fc5, Fc2, Fc4, F4,
AF4,
Table 6.11: Emotion Recognition rates using electrodes selected using GA
Number of Channels Classifier Statistical Narrow-bands Power HOC Wavelet
185NN 80.97 83.08 90.21 72.10
LDA 48.92 51.13 62.05 48.31
105NN 79.74 82.82 88.77 69.54
LDA 43.03 46.46 56.31 45.18
65NN 76.46 81.28 89.79 67.85
LDA 40.82 44.87 53.21 41.28
6.6 Sensitivity Testing
The aim of this part of the simulation testing is to evaluate the influence of a few pa-
rameters on the final classification performance. The findings here will provide some
insights in setting the parameters for a specific affect detection system. The parameters
are selected based on their relevance in a practical affect detection system which has con-
107
6.6 Sensitivity Testing
Table 6.12: Recognition rates of emotion using channels selected by GA
Method Number of Channels
All 54
channels
8 - Referenced to
commercial device
Channel selected using GA
18 10 6
5NN 90.77 89.64 89.23 88.77 89.79
LDA 79.64 43.13 62.05 56.31 53.21
straints in processing power and processing time. Down-sampling is a common practice
to reduce the overall computational complexity to the system, however the consequence
of such practice on the final classification performance is not clear and worth of study.
Secondly, EEG recordings are continuous in time, therefore, a segmentation process is in-
evitable. The choice of window length (epoch) in the time domain and the edge effect due
to this windowing process could potentially degrade the final classification performance
and should be dealt with carefully.
6.6.1 The Effect Of Sampling Rate On System Performance
The EEG recordings we tested on have a sampling rate of 1024 Hz which is relatively
high comparing to the spectral characteristics of EEG signals (most energy concentrates
in the frequency range that is less than 50 Hz). We down sampled the recordings from
the original 1024Hz to 512Hz and 256Hz and applied the same set of feature extraction
algorithms and classifiers. The final classification performance using kNN is shown in
Figure 6.14 and LDA shown in Figure 6.13.
From the figures, we see that recognition rates degrades significantly for LDA when
the sampling rate is below 512Hz. Recognition rates using kNN classifier for all four types
of features stays the same when the original data were down-sampled. The significant
degrade of system performance using the time domain features is caused by the poor
108
6.6 Sensitivity Testing
256 512 10240
10
20
30
40
50
60
70
80
90
100
Sampling Frequency
Cor
rect
Rec
ogni
tion
Rat
e
Sampling Frequency vs. Correct Recognition Rate (all electrodes, LDA)
statisticalnarrow−bandHOCwavelet−based
Figure 6.13: Sampling Rate vs. Correct Recognition Rate using All Electrodes and LDA
256 512 10240
10
20
30
40
50
60
70
80
90
100Correct Recognition Rate vs. Sampling Rate (all electrodes, kNN)
Sampling Rate
Cor
rect
Rec
ogni
tion
Rat
e
statisticalnarrow−bandhocwavelet
based
Figure 6.14: Sampling Rate vs. Correct Recognition Rate using All Electrodes and kNN
109
6.6 Sensitivity Testing
amplitude resolution due to down sampling which significantly reduces the separability
of the classes in the HOC feature space. In [88], a sampling limit for EMD algorithm
was proposed, which also suggest that a 5 times of the Nyquist rate is required for best
performance of the EMD algorithm. If we assume the spectrum of interest is less than
60Hz, then the Nyquist rate would be 120Hz, and the required sampling rate for EMD
would be 300Hz. When down-sampled to 256 Hz, the reconstructed EEG signal is less
accurate which produces less representative features of each classes and leads to poor
overall system performance.
6.6.2 Parameters For Setting Window Size (epoch)
A significant characteristic of the EEG measurements is its high temporal resolution,
as the EEG sampling frequency is considerably higher (256 − 1024Hz) than the native
frequencies of brain signals (usually considered to be below 50 Hz). This enables temporal
redundancy of source data, i.e. a high degree of continuity which should also be reflected
on the inverse solution. Unfortunately, explicit temporal regularization is very difficult
to implement mostly because of the computational complexity required, as the size of the
regularization matrix would be multiplied by the size of the temporal window through a
Kronecker product [35], and of the absence of explicit quantifiable priors on temporal
brain activity behavior. On the other hand, one can still impose implicit temporal
regularization benefiting from the aforementioned redundancy. Indeed, by performing
the inversion simultaneously on several time frames, the focalization procedure acts as an
implicit continuity constraint, the same volumes being identically selected and reweighted
for reconstruction in successive time frames. The maximum length Tmax of the time
window can be set in concordance with the Nyquist frequency:
Tmax = floor
(
fs2fup
)
(6.1)
where fs is the EEG sampling frequency and fup is the upper limit of the EEG
110
6.6 Sensitivity Testing
frequency band. The reconstruction can then be performed on sliding windows of length
max T ≤ Tmax.
6.6.3 Parameters for Wavelet Feature Evaluation
In general, most wavelets in common use are not appropriate for filtering the nonsta-
tionary signals [62]. If an inappropriate mother wavelet was used in the process of
wavelet-based multiresolution, the phenomena of energy leakage in the reconstructed
power spectrum of all filtered bands will be more apparent. Thus, the choice of appro-
priate mother wavelet basis should be the most important first step in wavelet-based
multiresolution analysis. To determine whether the mother wavelet used is appropri-
ate for wavelet-based feature analysis, we have carried out testings on the following set
of wavelets.These wavelet functions have been chosen due to their near optimal time-
frequency localization properties. Moreover, the waveforms of these wavelets are similar
to the waveforms to be detected in the EEG signal. Therefore, extraction of EEG signals
features are more likely to be successful [67]. The correct recognition rates were listed in
Table 6.13. From these results we see that the classification performance is very similar
among these wavelet functions and can be used interchangeably.
Table 6.13: Cross subject emotion recognition rates using different wavelets for DWT
Wavelets
Classifier db4 db8 sym4 sym8 coif5 bior1.3
kNN 76.67 76.15 76.00 76.21 76.56 75.64
LDA 56.62 54.41 57.49 57.18 56.77 56.26
111
6.7 Summary
6.7 Summary
In this chapter, three research objectives were tested through a set of classification simula-
tions on a emotion-specific data set. First of all, the feasibility of MEMD as a filter-bank
for extracting signal components of interest were tested. Four types of features were
extracted from the reconstructed EEG signals and used for the classification of three
affect states: positively exited, negatively excited and neutral. Since EEG signals vary
greatly between subjects and often does not generalize well for a designed model. Two
types of classification performances were tested, the subject specific and cross-subject
recognition rates were presented for each simulation. Subject specific classification refers
to the case that the training samples for developing the statistical model and the testing
samples are from the same subject, whereas, the training samples for the cross-subject
classification simulation were from different subjects and the testing samples are mostly
from different subjects as well. The maximum correct classification rate for subject spe-
cific case is 97.78% using HOC features and kNN classifier with an average of 95.56%.
Correct recognition rate for the cross subject case is 90.77% using hoc features and kNN
classifier, and 79.64% using hoc features and LDA classifier. Although the dataset is
relatively small, these results is sufficient enough to show that EEG signal is feasible for
affect detection analysis.
Secondly, subject specific and cross subject classification performances on reduced
channels were examined. We have selected the reduced channels based on two cases.
• The first is to use the common channels that are also present in a commercially
available non-medical grade EEG headsets. The classification performance from
this reduced channels setting will provide insights on the feasibility of consumer-
grade EEG headsets for affect detection applications. The correct recognition rate
for the subject specific case has a maximum of 98.33% with an average of 96.27%
using HOC features and kNN classifier. The correct classification rate for the
112
6.7 Summary
cross subject analysis is 89.64% with the use of HOC features and kNN classifier
and 43.13% with the use of LDA classifier. From these results, we see that HOC
features are better at representing affect-specific EEG signals and it is feasible to
use consumer grade EEG headsets for affect detection analysis. LDA classification
performance degrades significantly with the reduction of channels.
• The second is to use channels selected through the application of Genetic Algorithm.
The classification performance using this reduced channels will provide an upper
bound on a pre-defined number of electrodes given the current feature extraction
analysis and classification settings. When the number of channels was 10, a cross-
subject correct recognition rate of 88.77% was achieved using HOC features and
kNN classifier and 56.31% using LDA classifier. When the number of channels was
set to 6, a cross-subject recognition rate of 89.79% was achieved using HOC features
and kNN classifier and 53.21% using LDA classifier. From these results, we see that
the major improvement by using Genetic Algorithm on channel selection was the
increase of correct recognition rate using LDA. LDA classifier is desirable because
it is simple and requires less computation than the instance-based kNN classifier
for a testing sample.
Thirdly, a set of parameters that are important for practical real-time signal process-
ing applications which typically are constrained with computational complexity (e.g.,
smarted devices) were evaluated and their impacts on the overall classification perfor-
mance were also tested.
113
Chapter 7
Conclusions and Future Works
7.1 Conclusions
Affective computing and emotion intelligence have been two very important studying
area in the field of Human-Machine Interface. Affect detection is the first step towards
emotion-aware applications and also is a critical step in the success of an emotion recog-
nition system. EEG signals are believed to be feasible for affect detection applications
due to their direct association with the human brain and short response time required in
detecting the shift of affect states. Compared to other affect detection methods such as
the Body Sensor Network (BSN), which requires a variety of sensors to be attached to
a person and can be prohibiting to a person’s normal daily activities; the commercially
available, non-medical grade EEG headsets seem to be an outstanding alternative and
can potentially provide better classification performance.
This thesis has focused on the feasibility of utilizing brainwaves for affect detection
applications. It has contributed to the field of affective computing in both the under-
standing of the association between affective states and the Central Nervous signals
(EEG) and the feasible ways of utilizing it from the system design point of view. Due to
the increasing use of non-medical grade EEG headsets for commercial applications, this
114
7.1 Conclusions
study was also set to determine the performance of affect detection applications on signif-
icantly reduced number of electrodes. Through the use of a novel time-spectral analysis
algorithm (MEMD), the thesis has shown that EEG signal is suitable for affect detection.
It also provided quantitative evidence for the affective related statements provided from
the human Psychology perspective in terms of the frontal asymmetry activity between
the hemisphere in expressing different affects.
7.1.1 Key Contributions
This study has contributed to the field of affective detection in three ways. The first
contribution is a framework that utilizes EEG signals and MEMD for emotion detection
applications. It has also further contributed in further our understanding of affect (physi-
ological expression of emotion) expression through the use of multi-channel EEG signals.
Affect specific EEG channels and frequency range of interest were thoroughly studied
and presented. This information will be beneficial in the design of reduced-channel EEG
headsets for portable devices such as smart phones.
• This work provided a framework using EEG for affect detection. Through the
study on the characteristics of EEG signal in time and spectral domain. A series
of preprocessing and feature extraction methods were researched, analyzed and
implemented. The oscillatory pattern in time domain and the energy variation
in the spectral domain were key to these methods and shown to be efficient in
detection affect variations.
• This thesis also demonstrated the application of a novel time-spectral analysis al-
gorithm, MEMD, for multi-channel EEG signal analysis. MEMD is a recently
developed signal processing method that is completely data driven and preserves
the non-linear and non-stationary characteristics of the EEG signal. MEMD can
be applied to a wide range of biomedical signals and has shown promising results.
115
7.2 Future Works
Hilbert Huang transform, which is the combined application of MEMD or EMD
algorithm with Hilbert transform, can provide key information such as, the instan-
taneous phase and amplitude, in understanding EEG signals.
• Genetic Algorithm was applied in the process of selecting the affect-specific EEG
channels and also to extract the frequency range of interest for better detection
performance. This global optimization approach allowed us to systematically select
the desired number of electrodes based on their significance in affect detection.
7.2 Future Works
While this thesis has shown the effectiveness in using EEG for affect detection, there are
a few questions to be addressed in future works.
7.2.1 Directions for Future Study On Utilizing EEG Signals For
Affect Detection Applications
In future research, the application approach presented in this dissertation can be further
improved in the following ways:
• Incorporate automatic labeling of samples from another modality during the train-
ing stage, which will shift this machine learning problem from supervised learning
to unsupervised learning. In many practical cases, the direct feedback from the user
for labeling the training samples is not possible (neural disorders such as autism)
or not reliable as some people are not aware or good at quantizing their own feel-
ings. Also, as stated in section 2.4, affective expressions in EEG vary significantly
between subjects. To develop a robust machine learning system, a large number
of training samples is required. This calls for an alternative and maybe a more
reliable and consistent way of labeling a large number of samples.
116
7.2 Future Works
• An optimal fusion algorithm for multimodality system , e.g, the use of EEG signals
and facial images, is not present in the literature. Various fusion techniques, in
both the feature level and the decision level, should be investigated.
• The novel time-spectral signal analysis approach, MEMD is effective in separating
multi-component non-linear and non-stationary signals. However, the current im-
plementation is very slow and there is a great need for the computational improve-
ments for near-real time analysis. In particular, the parallel calculation of different
projection directions to boost computation time should be studied to shorten pro-
cessing time especially if real-time processing is required or a large number of EEG
channels will be utilized.
117
Appendix A
Neighbouring Electrodes for Local
Laplacian Filter
118
Table A.1: Associated neighbour electrodes for Local Laplacian filters [5]
Electrode Neighbours electrodes Electrode Neighbours electrodes
Fp1 F7, F5, AF7, AFz, Fpz Fpz Fp1, AFz, Fp2
AF7 Fp1, F5, F3, AF3, AFz Fp2 Fpz, AFz, AF8, F6, F8
AF3 AFz, AF7, F3, F1, Fz, AF4 AF8 Fp2, AFz, AF4, F4, F6
F1 F3, FC3, FC1, Fz, AF3 AF4 AF8, AFz, AF3, Fz, F2, F4
F3 F5, FC5, FC3, F1, AF3, AF7 AFz Fpz, Fp1, AF7, AF3, AF4, AF8, Fp2
F5 F7, FT7, FC5, F3, AF7, Fp1 Fz AF3, F1, FC1, FCz, FC2, F2, AF4
F7 FT7, F5, Fp1 F2 AF4, Fz, FC2, FC4, F4
FT7 T7, FC5, F5, F7 F4 F6, AF8, AF4, F2, FC4, FC6
FC5 FT7, T7, C5, FC3, F3, F5 F6 F8, Fp2, AF8, F4, FC6, FT8
FC3 C5, C3, C1, FC1, F1, F3, FC5 F8 Fp2, F6, FT8
FC1 F1, FC3, C1, FCz, Fz FT8 F8, F6, FC6, T8
C1 FCz, FC1, FC3, C3, CP1, Cz FC6 FT8, F6, F4, FC4, C6, T8
C3 C1, FC3, C5, CP3, CP1 FC4 FC6 F4, F2, FC2, C2, C4, C6
C5 CP5, CP3, C3, FC3, FC5, T7 FC2 F2, Fz, FCz, C2, FC4
T7 TP7, CP5, C5, FC5, FT7 FCz Fz, FC1, C1, Cz, C2, FC2
TP7 P9, P7, CP5, T7 Cz FCz, C1, CP1, CPz, CP2, C2
CP5 TP7, P7, P5, CP3, C5, T7 C2 FC4, FC2, FCz, Cz, CP2, C4
CP3 CP5, P5, P3, CP1, C3, C5 C4 C6, FC4, C2, CP2, CP4
CP1 Cz, C1, C3, CP3, P3, Pz, CPz C6 T8, FC6, FC4, C4, CP4, CP6
P1 POz, P2, Pz, P3, P5, PO3 T8 FT8, FC6, C6, CP6, TP8
P3 P5, P1, Pz, CP1, CP3 TP8 T8, CP6, P8, P10
P5 PO3, P1, P3, CP3, CP5, P7 CP6 TP8, T8, C6, CP4, P6, P8
P7 PO7, PO3, P5, CP5, TP7, P9 CP4 CP6, C6, C4, CP2, P4, P6
P9 O1, PO7, P7, TP7 CP2 C4, C2, Cz, CPz, Pz, P4, CP4
PO7 Iz, Oz, PO3, P7, P9, O1 P2 PO4, P6, P4, Pz, P1, POz
PO3 Oz, POz, P1, P5, P7, PO7 P4 P6, CP4, CP2, Pz, P2
O1 Iz, PO7, P9 P6 P8, CP6, CP4, P4, P2, PO4
Iz O2, PO8, Oz, PO7, O1 P8 TP8, CP6, P6, PO4, PO8, P10
Oz Iz, PO8, PO4, POz, PO3, PO7 P10 TP8, P8, PO8, O2
POz Oz, PO4, P2, P1, PO3 PO8 O2, P10, P8, PO4, Oz, Iz
Pz P2, P4, CP2, CPz, CP1, P3, P1 PO4 P8, P6, P2, POz, Oz, PO8
CPz Cz, CP1, Pz, CP2 O2 P10, PO8, Iz
119
Appendix B
List of IAPS Images Used for the
Experiment
120
Table B.1: List of IAPS Images Used for Session 1
Trials Images per trial
1 7090.jpg 7006.jpg 7030.jpg 5530.jpg 9700.jpg
2 1710.jpg 7270.jpg 8210.jpg 8350.jpg 2160.jpg
3 7053.jpg 2580.jpg 2221.jpg 7010.jpg 7235.jpg
4 3191.jpg 6243.jpg 9252.jpg 6550.jpg 3063.jpg
5 8501.jpg 5629.jpg 5260.jpg 8300.jpg 7220.jpg
6 9800.jpg 6571.jpg 9571.jpg 6250.jpg 3530.jpg
7 7052.jpg 6150.jpg 7710.jpg 7179.jpg 5500.jpg
8 8400.jpg 8170.jpg 8185.jpg 4641.jpg 8503.jpg
9 3016.jpg 6830.jpg 2095.jpg 3301.jpg 3101.jpg
10 9401.jpg 7025.jpg 7050.jpg 9360.jpg 2880.jpg
11 6250.1.jpg 3080.jpg 7380.jpg 3064.jpg 3051.jpg
12 2346.jpg 7508.jpg 5833.jpg 8531.jpg 4601.jpg
13 2209.jpg 8116.jpg 8470.jpg 1811.jpg 4624.jpg
14 7547.jpg 6570.2jpg 7140.jpg 7035.jpg 7100.jpg
15 9421.jpg 9810.jpg 2730.jpg 3230.jpg 9428.jpg
16 7057.jpg 9210.jpg 7037.jpg 7031.jpg 5471.jpg
17 8500.jpg 5623.jpg 7400.jpg 5480.jpg 2058.jpg
18 7236.jpg 7180.jpg 7192.jpg 7130.jpg 7058.jpg
19 7020.jpg 7500.jpg 7207.jpg 7224.jpg 7590.jpg
20 2216.jpg 7260.jpg 7570.jpg 7501.jpg 8186.jpg
21 5533.jpg 7183.jpg 5740.jpg 7110.jpg 7285.jpg
22 3140.jpg 2751.jpg 9301.jpg 9910.jpg 9423.jpg
23 8080.jpg 5600.jpg 5910.jpg 8420.jpg 8499.jpg
24 2811.jpg 9300.jpg 3010.jpg 2900.jpg 3215.jpg
25 6415.jpg 9903.jpg 2710.jpg 8230.jpg 2352.2.jpg
26 7283.jpg 2745.1.jpg 7002.jpg 2980.jpg 7950.jpg
27 8190.jpg 5450.jpg 8371.jpg 4626.jpg 5700.jpg
28 4610.jpg 5460.jpg 8034.jpg 2352.1.jpg 8370.jpg
29 3110.jpg 9420.jpg 9405.jpg 3071.jpg 6313.jpg
30 6212.jpg 2799.jpg 3120.jpg 9520.jpg 3053.jpg121
Table B.2: List of IAPS Images Used for the Session 2
Trials Images per trial
1 6570.jpg 3266.jpg 6540.jpg 6825.jpg 7359.jpg
2 6570.1.jpg 9425.jpg 9140.jpg 6230.jpg 3400.jpg
3 1616.jpg 7058.jpg 7002.jpg 7004.jpg 5510.jpg
4 4599.jpg 8540.jpg 2346.jpg 5700.jpg 5623.jpg
5 8185.jpg 7502.jpg 4601.jpg 8503.jpg 8180.jpg
6 8311.jpg 7056.jpg 7020.jpg 7140.jpg 7025.jpg
7 9250.jpg 9429.jpg 9530.jpg 9253.jpg 2688.jpg
8 8485.jpg 9180.jpg 3102.jpg 3068.jpg 3100.jpg
9 8420.jpg 8502.jpg 8501.jpg 5450.jpg 5480.jpg
10 8380.jpg 8200.jpg 4640.jpg 2216.jpg 8030.jpg
11 6312.jpg 2981.jpg 6834.jpg 2800.jpg 9419.jpg
12 8496.jpg 8370.jpg 8340.jpg 8034.jpg 4623.jpg
13 8090.jpg 5910.jpg 1710.jpg 7260.jpg 5470.jpg
14 7055.jpg 7010.jpg 7040.jpg 7060.jpg 7057.jpg
15 7710.jpg 2445.jpg 2446.jpg 7190.jpg 7034.jpg
16 8400.jpg 2058.jpg 7330.jpg 1811.jpg 5660.jpg
17 7179.jpg 7036.jpg 5390.jpg 7052.jpg 7044.jpg
18 7700.jpg 1670.jpg 7009.jpg 7207.jpg 7175.jpg
19 9925.jpg 3350.jpg 3160.jpg 9500.jpg 9400.jpg
20 9570.jpg 6350.jpg 9635.1.jpg 3000.jpg 4664.2.jpg
21 7950.jpg 7550.jpg 2840.jpg 2749.jpg 7043.jpg
22 9560.jpg 3500.jpg 9340.jpg 9901.jpg 6200.jpg
23 7508.jpg 2208.jpg 8116.jpg 4641.jpg 5833.jpg
24 7160.jpg 7053.jpg 5531.jpg 7039.jpg 7217.jpg
25 7400.jpg 8499.jpg 8210.jpg 2345.jpg 8190.jpg
26 7242.jpg 2518.jpg 5120.jpg 7192.jpg 7595.jpg
27 6210.jpg 6242.jpg 9410.jpg 3180.jpg 3550.jpg
28 3062.jpg 3005.1.jpg 3017.jpg 3225.jpg 9920.jpg
29 4624.jpg 7230.jpg 8350.jpg 5270.jpg 5621.jpg
30 7080.jpg 7170.jpg 7006.jpg 7090.jpg 7000.jpg122
Table B.3: List of IAPS Images Used for the Session 3
Trials Images per trial
1 2058.jpg 8090.jpg 8300.jpg 7570.jpg 4610.jpg
2 3150.jpg 3220.jpg 2053.jpg 6370.jpg 6360.jpg
3 5740.jpg 7050.jpg 7150.jpg 7180.jpg 7546.jpg
4 2208.jpg 5270.jpg 5470.jpg 8496.jpg 8371.jpg
5 8470.jpg 8080.jpg 5600.jpg 7330.jpg 8190.jpg
6 6315.jpg 6821.jpg 3130.jpg 6260.jpg 6021.jpg
7 7035.jpg 7161.jpg 7234.jpg 7495.jpg 2206.jpg
8 3015.jpg 3181.jpg 3061.jpg 2683.jpg 3060.jpg
9 7547.jpg 5471.jpg 7100.jpg 8465.jpg 5130.jpg
10 7096.jpg 9360.jpg 7705.jpg 7590.jpg 9210.jpg
11 3030.jpg 9900.jpg 9254.jpg 9902.jpg 9630.jpg
12 9424.jpg 9611.jpg 2717.jpg 9007.jpg 6022.jpg
13 7059.jpg 7038.jpg 5731.jpg 2221.jpg 7233.jpg
14 1710.jpg 8170.jpg 2160.jpg 2352.1.jpg 2345.jpg
15 7270.jpg 8540.jpg 7501.jpg 8420.jpg 7502.jpg
16 2880.jpg 2890.jpg 7235.jpg 7285.jpg 6150.jpg
17 9430.jpg 2703.jpg 9006.jpg 3550.1.jpg 6510.jpg
18 8340.jpg 5621.jpg 8370.jpg 8186.jpg 5460.jpg
19 3069.jpg 9433.jpg 6213.jpg 6300.jpg 9050.jpg
20 7205.jpg 6570.2.jpg 7041.jpg 5520.jpg 7031.jpg
21 5260.jpg 8200.jpg 8030.jpg 8501.jpg 5629.jpg
22 2580.jpg 5534.jpg 2980.jpg 7110.jpg 7130.jpg
23 7283.jpg 7224.jpg 7491.jpg 9700.jpg 7500.jpg
24 6530.jpg 9181.jpg 9427.jpg 9921.jpg 3000.jpg
25 8380.jpg 8180.jpg 4599.jpg 5910.jpg 2209.jpg
26 2745.1.jpg 5532.jpg 9401.jpg 7236.jpg 7490.jpg
27 4640.jpg 4626.jpg 8531.jpg 7220.jpg 5833.jpg
28 4623.jpg 8500.jpg 8502.jpg 7230.jpg 5660.jpg
29 9911.jpg 3261.jpg 9620.jpg 9600.jpg 6838.jpg
30 3168.jpg 6560.jpg 9040.jpg 6831.jpg 3010.jpg123
Appendix C
Confusion Matrix Ground Truth
Validation of the Database
C: Self-assessment values from each participants D: Confusion Matrix validating the
ground truth of the database
More detailed analysis on the emotion elicitation results were shown below with con-
fusion matrix outputs for each participants:
Table C.1: Participant 1: Self-Assessment Classification Accuracy (in Percentage) for
the Three Chosen Emotions
Emotions Calm Pos. Excited Neg. Excited
Calm 94.74 5.26 0
Pos. Excited 56.25 43.75 0
Neg. Excited 5.26 0 94.74
124
Table C.2: Participant 2: Self-Assessment Classification Accuracy (in Percentage) of the
Three Chosen Emotions
Emotions Calm Pos. Excited Neg. Excited
Calm 89.66 10.34 0
Pos. Excited 90.48 9.52 0
Neg. Excited 57.14 0 42.86
Table C.3: Participant 3:Self-Assessment Classification Accuracy (in Percentage) of the
Three Chosen Emotions
Emotions Calm Pos. Excited Neg. Excited
Calm 62.07 27.59 10.34
Pos. Excited 4.76 80.95 14.29
Neg. Excited 3.57 3.57 92.86
Table C.4: Participant 4: Self-Assessment Classification Accuracy (in Percentage) of the
Three Chosen Emotions
Emotions Calm Pos. Excited Neg. Excited
Calm 100 0 0
Pos. Excited 33.33 66.67 0
Neg. Excited 21.43 0 78.57
125
Table C.5: Participant 5: Self-Assessment Classification Accuracy (in Percentage) of the
Three Chosen Emotions
Emotions Calm Pos. Excited Neg. Excited
Calm 20.69 75.86 3.45
Pos. Excited 0 100 0
Neg. Excited 7.14 0 92.86
126
Appendix D
Empirical Mode Decomposition
(EMD) Algorithm
EMD is an adaptive signal decomposition method with which any complicated signal
can be decomposed into a series of Intrinsic Mode Functions(IMFs). There are two main
criterions to meet for during this decomposition procedure as stated in the original paper
[29] :
1. For each extracted signal mode (IMF), the number of extrema and the number of
zero crossings must differ at most by one, in other words, there should have no
riding waves for each IMF.
2. The mean value of the envelopes defined by the local maxima and local minima
should be zero at any point, this indicates that the maxima and minima locates
symmetrically along the local mean (zero).
Algorithm 1. The standard EMD Algorithm
1. Find the locations of all the extrema of x′(t).
2. Interpolate (using cubic spline interpolation) between all the minima (respectively
127
maxima) to obtain the lower signal envelope, emin(k) (respectively higher envelope emax(k)).
3. Compute the local mean m(k) = [emin(k) + emax(k)]/2.
4. Subtract the local mean from the signal to obtain the ’oscillatory mode’ s(k) =
x′(k)−m(k).
5. If s(k) obeys the stopping criteria, then we define d(k) = s(k) as an IMF, otherwise
set x′(k) = s(k) and repeat the process from step 1.
The sifting process should stop when the residue r(k) become a constant, a monotonic
function, or a function contains only a single extrema, from which no more IMF can be
extracted. If the data has a trend, then the final residue should be the trend of the signal.
128
Bibliography
[1] http://www.emotiv.com/apps/epoc/.
[2] http://enterface.tel.fer.hr/docs/database_files/eNTERFACE06_EMOBRAIN.html.
[3] Ralph Adolphs, Daniel Tranel, and Antonio R Damasio. Dissociable neural systems
for recognizing emotions. Brain and Cognition, 52(1):61–69, 2003.
[4] Geoffrey L. Ahern and Gary E. Schwartz.
[5] T. I. Alecu. Robust Focalized Brain Activity Reconstruction using ElectroEncephalo-
Grams. PhD thesis, University of Geneva, 2005.
[6] Nelson Torro Alves, Srgio Sheiji Fukusima, and Antonio Aznar-Casanova. Models of
brain asymmetry in emotional processing. Psychology and Neuroscience, 1(1):63–66,
2008.
[7] Omar Alzoubi, Rafael A Calvo, and Ronald H Stevens. Classification of eeg for
affect recognition : An adaptive approach. Emotion, pages 52–61, 2009.
[8] Marian S. Bartlett, Gwen Littlewort, Ian Fasel, and Javier R. Movellan. Real time
face detection and facial expression recognition: Development and applications to
human computer interaction. In CVPRW ’03: Proceedings of the Conference on
Computer Vision and Pattern Recognition, Workshop, 2003, volume 5, page 53,
2003.
129
BIBLIOGRAPHY
[9] Peter N. Belhumeur, P. Hespanha, and David J. Kriegman. Eigenfaces vs. fisherfaces:
Recognition using class specific linear projection. IEEE Trans. Pattern Analysis and
Machine Intelligence, pages 711–720, 1997.
[10] Yvonne F Birks and Ian S Watt. Emotional intelligence and patient-centred care.
JRSM, 100(8):368–374, 2007.
[11] B. Boashash. Estimating and interpreting the instantaneous frequency of a signal.
ii. algorithms and applications. Proceedings of the IEEE, 80(4):540 –568, April 1992.
[12] D.O. Bos. Eeg-based emotion recognition. The Influence of Visual and Auditory
Stimuli University, 2006.
[13] Nabila Bouzida, Laurent Peyrodie, and Christian Vasseur. Ica and a gauge of filter
for the automatic filtering of an eeg signal. IEEE International Joint Conference on
Neural Networks (IJCNN ’05), Montreal, Canada., 4:2508–2513, Aug 2005.
[14] Rafael A. Calvo and Sidney D’Mello. Affect detection: An interdisciplinary review
of models, methods, and their applications. IEEE Transactions on Affective Com-
puting, 1(1):18–37, January 2010.
[15] Israel C Christie and Bruce H Friedman. Autonomic specificity of discrete emotion
and dimensions of affective space: a multivariate approach. International Journal
of Psychophysiology, 51(2):143 – 153, 2004.
[16] James A Coan and John J.B Allen. Frontal eeg asymmetry as a moderator and
mediator of emotion. Biological Psychology, 67(12):7 – 50, 2004.
[17] T. Cover and P. Hart. Nearest neighbor pattern classification. Information Theory,
IEEE Transactions on, 13(1):21–27, January 1967.
[18] Roddy Cowie and Randolph R. Cornelius. Describing the emotional states that are
expressed in speech. Speech Communication, 40(12):5 – 32, 2003.
130
BIBLIOGRAPHY
[19] Mayer J. D and Salovey P. What is emotional intelligence? P. Salovey & D.
Sluyter (Eds.) Emotional development and emotional intelligence: Implications for
educators, pages 3–31, 1997.
[20] R J Davidson, D C Jackson, and N H Kalin. Emotion, plasticity, context, and regu-
lation: perspectives from affective neuroscience. Psychological Bulletin, 126(6):890–
909, 2000.
[21] RJ Davidson and NA Fox. Asymmetrical brain activity discriminates between pos-
itive and negative affective stimuli in human infants. Science, 218(4578):1235–1237,
1982.
[22] Samantha Dockray and Andrew Steptoe. Positive affect and psychobiological pro-
cesses. Neuroscience & Biobehavioral Reviews, 35(1):69 – 75, 2010.
[23] David L Donoho, Iain Johnstone, Bob Stine, and Gregory Piatetsky-shapiro. High-
dimensional data analysis: The curses and blessings of dimensionality. Statistics,
pages 1–33, 2000.
[24] Richard O. Duda, Peter E. Hart, and David G. Stork. Pattern Classification (2nd
Edition). Wiley-Interscience, 2 edition, November 2001.
[25] P Ekman. Universals and cultural differences in facial expressions of emotion. Ne-
braska Symposium On Motivation, 19(4):207–283, 1971.
[26] Paul Ekman, Wallace V. Friesen, and Phoebe Ellsworth. Emotion in the Human
Face. Oxford University Press, 1972.
[27] Paul Ekman, Wallace V Friesen, and Joseph C Hager. Facial Action Coding System,
volume 97. A Human Face, 2002.
131
BIBLIOGRAPHY
[28] Raul Fernandez and Rosalind Picard. Recognizing affect from speech prosody us-
ing hierarchical graphical models. Speech Commun., 53(9-10):1088–1103, November
2011.
[29] P. Flandrin, G. Rilling, and P. Goncalves. Empirical mode decomposition as a filter
bank. Signal Processing Letters, IEEE, 11(2):112 – 114, Feburary 2004.
[30] Dennis Gabor. Theory of communication. J. Inst. Elect. Eng., 93:429–457, 1946.
[31] D E Goldberg. Genetic Algorithms in Search, Optimization, and Machine Learning.
Addison-Wesley, 1989.
[32] Andreas Haag, Silke Goronzy, Peter Schaich, and Jason Williams. Emotion recogni-
tion using bio-sensors: First steps towards an automatic system. Affective Dialogue
Systems, i(6):36–48, 2004.
[33] Thomas Holtgraves and Adam Felton. Hemispheric asymmetry in the processing
of negative and positive words: A divided field study. Cognition and Emotion,
25(4):691–699, 2011.
[34] Robert Horlings, Dragos Datcu, and Leon J. M. Rothkrantz. Emotion recognition
using brain activity. In Proceedings of the 9th International Conference on Com-
puter Systems and Technologies and Workshop for PhD Students in Computing,
CompSysTech ’08, pages 6:II.1–6:1, New York, NY, USA, 2008. ACM.
[35] Roger A. Horn and Charles R. Johnson. Topics in Matrix Analysis. Cambridge
University Press, June 1994.
[36] N. E. Huang, Z. Shen, S. R. Long, M. C. Wu, H. H. Shih, Q. Zheng, N. C. Yen, C. C.
Tung, and H. H. Liu. The empirical mode decomposition and the hilbert spectrum for
nonlinear and non-stationary time series analysis. Proceedings of the Royal Society of
132
BIBLIOGRAPHY
London. Series A: Mathematical, Physical and Engineering Sciences, 454(1971):903–
995, March 1998.
[37] Norden E. Huang. An adaptive data analysis method for nonlinear and nonstationary
time series: The empirical mode decomposition and hilbert spectral analysis. In Tao
Qian, Mang I Vai, and Yuesheng Xu, editors, Wavelet Analysis and Applications,
Applied and Numerical Harmonic Analysis, pages 363–376. Birkhuser Basel, 2007.
[38] Norden E. Huang, Zhaohua Wu, Steven R. Long, Kenneth C. Arnold, Xianyao Chen,
and Karin Blank. On instantaneous frequency. World Scientific, 1(2):177–229, 2009.
[39] A Hyvrinen and E Oja. Independent component analysis: algorithms and applica-
tions. Neural Networks, 13(4-5):411–430, 2000.
[40] Anil K. Jain, Robert P. W. Duin, and Jianchang Mao. Statistical pattern recogni-
tion: A review. IEEE Transactions on Pattern Analysis and Machine Intelligence,
22(1):4–37, 2000.
[41] William James. What is an emotion? Mind, 9(34):188–205, Apr. 1884.
[42] Valer Jurcak, Daisuke Tsuzuki, and Ippeita Dan. 10/20, 10/10, and 10/5 systems
revisited: their validity as relative head-surface-based positioning systems. Neu-
roImage, 34(4):1600–1611, February 2007.
[43] Ashish Kapoor, Hyungil Ahn, and Rosalind W. Picard. Mixture of gaussian pro-
cesses for combining multiple modalities. In Proceedings of the 6th international
conference on Multiple Classifier Systems, MCS’05, pages 86–96, Berlin, Heidelberg,
2005. Springer-Verlag.
[44] Ashish Kapoor, Selene Mota, and Rosalind W. Picard. Towards a learning compan-
ion that recognizes affect. In In AAAI Fall Symposium, 2001.
133
BIBLIOGRAPHY
[45] Ashish Kapoor, Rosalind W. Picard, and Yuri Ivanov. Probabilistic combination
of multiple modalities to detect interest. In Interest, International Conference on
Pattern Recognition, pages 969–972, 2004.
[46] C. D. Katsis, N. Katertsidis, G. Ganiatsas, and D. I. Fotiadis. Toward Emotion
Recognition in Car-Racing Drivers: A Biosignal Processing Approach. IEEE Trans-
actions on Systems Man and Cybernetics Part A Systems and Humans, 38(3), May
2008.
[47] Bejamin Kedem. Time Series Analysis by Higher Order Crossings. Institute of
Electrical and Electronics Engineering (April 1994), April 1994.
[48] Jonghwa Kim and E. Andre. Emotion recognition based on physiological changes in
music listening. Pattern Analysis and Machine Intelligence, IEEE Transactions on,
30(12):2067 –2083, dec. 2008.
[49] Josef Kittler, Mohamad Hatef, Robert P. W. Duin, and Jiri Matas. On combin-
ing classifiers. IEEE Transactions on Pattern Analysis and Machine Intelligence,
20:226–239, 1998.
[50] W. Klimesch. Eeg alpha and theta oscillations reflect cognitive and memory per-
formance: a review and analysis. Brain Research Reviews, 29(2-3):169–195, April
1999.
[51] R Kohavi and F Provost. Glossary of terms. Machine Learning, 30(2):271–274, 1998.
[52] Barry Kort, Rob Reilly, and Rosalind W. Picard. An affective model of inter-
play between emotions and learning: Reengineering educational pedagogy-building
a learning companion. In In, pages 43–48. IEEE Computer Society, 2001.
[53] I. Kotsia and I. Pitas. Facial expression recognition in image sequences using geo-
134
BIBLIOGRAPHY
metric deformation features and support vector machines. Image Processing, IEEE
Transactions on, 16(1):172 –187, jan. 2007.
[54] P Senthil Kumar, R Arumuganathan, K Sivakumar, and C Vimal. A wavelet based
statistical method for de-noising of ocular artifacts in EEG signals. IJCSNS Inter-
national Journal of Computer Science and Network Security, 8(9):87–92, 2008.
[55] Ludmila I. Kuncheva. A theoretical study on six classifier fusion strategies. IEEE
Trans. Pattern Anal. Mach. Intell., 24(2):281–286, February 2002.
[56] P. J. Lang, M. M. Bradley, and B. N. Cuthbert. International affective picture sys-
tem (IAPS): Affective ratings of pictures and instruction manual. Technical report,
University of Florida, Gainesville, FL, 2008.
[57] Marc Langheinrich. Privacy by design - principles of privacy-aware ubiquitous sys-
tems. In Proceedings of UbiComp (Ubiquitous Computing), pages 273–291, sep.
[58] Joseph E. LeDoux. The emotional brain: The mysterious underpinnings of emotional
life. Simon & Schuster, March 1998.
[59] Joseph E. LeDoux. Emotion circuits in the brain. Annual Review of Neuroscience,
23(1):155–184, 2000.
[60] Enrique Leon, Graham Clarke, Victor Callaghan, and Francisco Sepulveda. A user-
independent real-time emotion recognition system for software agents in domestic
environments. Engineering Applications of Artificial Intelligence, 20(3):337 – 345,
2007.
[61] David Looney, Ling Li, Tomasz M. Rutkowski, Danilo P. Mandic, and Andrzej
Cichocki. Ocular artifacts removal from EEG using EMD. In Rubin Wang, Enhua
Shen, and Fanji Gu, editors, Advances in Cognitive Neurodynamics ICCN 2007,
pages 831–835. Springer Netherlands, 2008.
135
BIBLIOGRAPHY
[62] S. G. Mallat. A theory for multiresolution signal decomposition: the wavelet rep-
resentation. Pattern Analysis and Machine Intelligence, IEEE Transactions on,
11(7):674–693, 1989.
[63] Jon D Morris. Observations : Sam : The self-assessment manikin an efficient cross-
cultural measurement of emotional response 1. Journal of Advertising Research,
35(December):1–6, 1995.
[64] Javier Cruz Mota, Luong Hong Viet, Alice Caplier, and Michele Rombaut. Emotion
detection in the loop from brain signals and facial images, 2006.
[65] R.D. Munk. Technical report, Technical University of Denmark, Department of
Informatics and Mathematical Modeling, Cognitive Systems.
[66] M. Murugappan. Human emotion classification using wavelet transform and knn. In
Pattern Analysis and Intelligent Robotics (ICPAIR), 2011 International Conference
on, volume 1, pages 148 –153, june 2011.
[67] M. Murugappan, M. Rizon, R. Nagarajan, and S. Yaacob. EEG feature extraction
for classifying emotions using FCM and FKM. In Proceedings of the 7th WSEAS
International Conference on Applied Computer and Applied Computational Science,
pages 299–304, Stevens Point, Wisconsin, USA, 2008. World Scientific and Engi-
neering Academy and Society (WSEAS).
[68] E. Niedermeyer and F. Lopes da Silva. Electroencephalography: Basic Principles,
Clinical Applications, and Related Fields. Lippincott Williams & Wilkins, 5th edi-
tion, nov 2004.
[69] Charles E. Osgood. The nature and measurement of meaning. Psychological Bulletin,
49(3):197 – 237, 1952.
136
BIBLIOGRAPHY
[70] C. Parameswariah and M. Cox. Frequency characteristics of wavelets. Power Engi-
neering Review, IEEE, 22(1):72, jan. 2002.
[71] P.C. Petrantonakis and L.J. Hadjileontiadis. Adaptive extraction of emotion-related
eeg segments using multidimensional directed information in time-frequency domain.
In Engineering in Medicine and Biology Society (EMBC), 2010 Annual International
Conference of the IEEE, pages 1 –4, 31 2010-sept. 4 2010.
[72] P.C. Petrantonakis and L.J. Hadjileontiadis. Emotion recognition from brain sig-
nals using hybrid adaptive filtering and higher order crossings analysis. Affective
Computing, IEEE Transactions on, 1(2):81 –97, july-dec. 2010.
[73] A. Petrosian. Kolmogorov complexity of finite sequences and recognition of different
preictal EEG patterns. In Computer-Based Medical Systems, 1995., Proceedings of
the Eighth IEEE Symposium on, pages 212 –217, jun 1995.
[74] Rosalind W. Picard. Future affective technology for autism and emotion communi-
cation. 364(1535):3575–3584, 2009.
[75] G.E. Polychronaki, P. Ktonas, S. Gatzonis, P.A. Asvestas, E. Spanou, A. Siatouni,
H. Tsekou, D. Sakas, and K.S. Nikita. Comparison of fractal dimension estimation
algorithms for epileptic seizure onset detection. In BioInformatics and BioEngi-
neering, 2008. BIBE 2008. 8th IEEE International Conference on, pages 1 –6, oct.
2008.
[76] Jonathan Posner, James A Russell, and Bradley S Peterson. The circumplex model
of affect: an integrative approach to affective neuroscience, cognitive development,
and psychopathology. Development and Psychopathology, 17(3):715–734, 2005.
[77] Dale Purves, George J. Augustine, David Fitzpatrick, William C. Hall, Anthony-
Samuel LaMantia, and Leonard E. White. Neuroscience. Sinauer Associates, Inc.,
fifth edition, November 2011.
137
BIBLIOGRAPHY
[78] N. Rehman and D. P. Mandic. Multivariate empirical mode decomposition. Pro-
ceedings of the Royal Society A: Mathematical, Physical and Engineering Science,
466(2117):1291–1302, 2010.
[79] J A Russell and L F Barrett. Core affect, prototypical emotional episodes, and other
things called emotion: dissecting the elephant. Journal of Personality and Social
Psychology, 76(5):805–819, 1999.
[80] James A. Russell. Core affect and the psychological construction of emotion. Psy-
chological Review, (1):145–172, 2003.
[81] James A Russell. Emotion, core affect, and psychological construction. Cognition
& Emotion, 23(7):1259–1283, 2009.
[82] Rutkowski, Mandic, A. Cichocki, and Przybyszewski. EMD approach to multichan-
nel EEG data-the amplitude and phase components clustering analysis. Journal of
Circuits, Systems, and Computers, 19(1), 2010.
[83] Dean Sabatinelli, Peter J. Lang, Andreas Keil, and Margaret M. Bradley. Emotional
perception: Correlation of functional mri and event-related potentials. Cerebral
Cortex, 17(5):1085–1091, 2007.
[84] George E. Sakr, Imad H. Elhajj, and Huda Abou-Saad Huijer. Support vector
machines to define and detect agitation transition. IEEE Trans. Affect. Comput.,
1:98–108, July 2010.
[85] Arman Savran, Koray Ciftci, Guillame Chanel, Javier C. Mota, Luong H. Viet, Blent
Sankur, Lale Akarun, Alice Caplier, and Michele Rombaut. Emotion Detection in
the Loop from Brain Signals and Facial Images. In Proceedings of the eNTERFACE
2006 Workshop, Dubrovnik, Croatia, July 2006.
138
BIBLIOGRAPHY
[86] M Schroder, M Bogdan, T Hinterberger, and N Birbaumer. Automated EEG feature
selection for brain computer interfaces, pages 626–629. IEEE, 2003.
[87] Chad L. Stephens, Israel C. Christie, and Bruce H. Friedman. Autonomic specificity
of basic emotions: Evidence from pattern classification and cluster analysis. Bio-
logical Psychology, 84(3):463 – 473, 2010. ¡ce:title¿The biopsychology of emotion:
Current theoretical and empirical perspectives¡/ce:title¿.
[88] N. Stevenson, M. Mesbah, and B. Boashash. A sampling limit for the empirical
mode decomposition. In Signal Processing and Its Applications, 2005. Proceedings
of the Eighth International Symposium on, volume 2, pages 647 – 650, 28-31, 2005.
[89] R. Sukanesh and R. Harikumar. A comparison of genetic algorithm & neural net-
work (mlp) in patient specific classification of epilepsy risk levels from eeg signals.
Engineering Letters, 14(1):96–104, 2007.
[90] Kazuhiko Takahashi. Remarks on emotion recognition from biopotential signals. In
in 2nd Int. Conf. on Autonomous Robots and Agents, 2004, pages 186–191.
[91] Vernon L. Towle, Jos Bolaos, Diane Suarez, Kim Tan, Robert Grzeszczuk, David N.
Levin, Raif Cakmur, Samuel A. Frank, and Jean-Paul Spire. The spatial location
of EEG electrodes: locating the best-fitting sphere relative to cortical anatomy.
Electroencephalography and Clinical Neurophysiology, 86(1):1 – 6, 1993.
[92] N. ur Rehman and D.P. Mandic. Filter bank property of multivariate empirical
mode decomposition. Signal Processing, IEEE Transactions on, 59(5):2421 –2426,
may 2011.
[93] J. Ville. Theory and Applications of the Notion of Complex Signal. Rand, 1958.
[94] J. Wagner, J. Kim, and E. Andre. From physiological signals to emotions: Imple-
menting and comparing selected methods for feature extraction and classification. In
139
BIBLIOGRAPHY
Multimedia and Expo, 2005. ICME 2005. IEEE International Conference on, pages
940–943. IEEE, July 2005.
[95] C M Whissell. The dictionary of affect in language, volume 4, pages 113–131. Aca-
demic Press, 1989.
[96] Zhaohua Wu and Norden E. Huang. A study of the characteristics of white
noise using the empirical mode decomposition method. Proceedings of the Royal
Society of London. Series A: Mathematical, Physical and Engineering Sciences,
460(2046):1597–1611, June 2004.
[97] Yongmian Zhang and Qiang Ji. Active and dynamic information fusion for facial
expression understanding from image sequences. Pattern Analysis and Machine
Intelligence, IEEE Transactions on, 27(5):699–714, 2005.
[98] T Zikov, S Bibian, G A Dumont, M Huzmezan, and C R Ries. A wavelet based de-
noising technique for ocular artifact correction of the electroencephalogram, volume 1,
pages 98–105. IEEE, 2002.
140