Emotional pattern recognition using machine learning
-
Upload
pierrick-barreau -
Category
Documents
-
view
54 -
download
3
description
Transcript of Emotional pattern recognition using machine learning
A report submitted to Dublin City University, School of Computing for module CA652: Information
Access, 20011/2012. I/We hereby certify that the work presented and the material contained
herein is my/our own except where explicitly stated references to other material are made.
Emotional pattern recognition using machine learning
Information access assignment
Module: CA652
Lecturers: Dr Alan SMEATON
Dr Cathal GURRIN
11211509
MECT1
Summary
Abstract ............................................................................................................................ 2
1. System objectives ......................................................................................................... 3
1.1 Issues and related functionalities .................................................................................................. 3
1.2 System overview............................................................................................................................ 4
1.3 Constraints and limitations ........................................................................................................... 5
2. Functional description of the system ............................................................................ 5
2.1 Standard definition ........................................................................................................................ 5
2.2 The OCC Model .............................................................................................................................. 6
2.3 Review of existing ERS ................................................................................................................... 1
2.4 The central neural network ........................................................................................................... 2
2.5 Toward a cloud-based application ................................................................................................ 3
3. Evaluation plan ............................................................................................................ 4
3.1 Training phase ............................................................................................................................... 4
3.2 Assessment phase ......................................................................................................................... 4
3.3 Expected results ............................................................................................................................ 4
4. How could this system form the basis of a successful business? .................................... 5
4.1 Plenty of potential applications .................................................................................................... 5
4.2 Competitors Analysis ..................................................................................................................... 5
4.3 Sustainable advantages ................................................................................................................. 5
4.4 Constraints..................................................................................................................................... 6
4.5 Conclusion ..................................................................................................................................... 6
References ........................................................................................................................ 7
Abstract
We have entered an era of pervasive computing. Computers and the Internet have become
ubiquitous in our everyday life. However, most of the Human Computer interactions (HCI)
interfaces are still based on the traditional model of being passive on responding only to
user’s commands [1]. Recently, the automated analysis of human affective behaviour has
attracted increasing attention from researchers and companies. Indeed, equipping a
machine with the ability of responding to the user’s emotional state have been proved
successful in many fields including Tutoring Systems [2], call centers [3], Intelligent
Automobile Systems [4] and game industry [5].
In this paper, we propose an emotional pattern recognition system based on the inputs of
several types of sensors. We draw the characteristics of such a system before detailing the
issues and constraints that should be considered when designing it. Next we describe a
possible architecture and conclude on its potential for commercialisation.
1. System objectives
1.1 Issues and related functionalities
The value proposition of our system is to turn various sensors data into the understanding of an
individual’s emotions. Therefore a first attempt to summarize its function model is:
When developing an emotion recognition system (ERS), the first task is to define what emotional
state it will retrieve. There are three different ways of taking a look at affects [6]:
- Discrete categories: We can refer to different classifications which have already been made
by philosophers (Spinoza, Prinz), scientists (Ekman [7]) or both (Descartes [8]). The most used
is the 6 basic emotions (Happiness, Anger, Fear, Sadness, Disgust and Surprise), the other
emotions being considered as varieties of these ones.
- Dimensional description: an affective state is described in terms of a small number of latent
dimensions such as evaluation, activation, control, power, etc.
- Appraisal-based approach: an emotion is described through a set of stimulus evaluation
checks including the novelty, intrinsic pleasantness, goal based significance, coping potential,
and compatibility with standards.
The appraisal-based approach defines a set of variables which maximizes the distinction between 2
different emotional states making it more practical and more suitable for the Artificial Intelligence
(AI) field than the basic classification model. For its successful implementation on various projects
(e.g. [9]), we choose to be compliant with the OCC model [10] that we will further detail in §2.3.
Even with that model, affects are often complex to distinguish. The more distinct data about the
subject internal state the system obtain, the more likely it will be able to recognize the relevant
feeling. Therefore gathering all types of sensors’ inputs (audio-visual, physiological etc…) in the ERS is
critical. However, most off-the-shelf sensors do not necessarily share a common representation
(metric) of a given indicator and the same emotional recognition technique cannot be applied to all
sensor type. Thus we need to specify a data formatting standard.
Finally the functionalities of our system are summarized by the following functional model:
Various sensors’ data System
Emotion recognized Figure 1. Broad functional model
Standardized sensors data
System
OCC Model emotional state recognized
Figure 2. Functional model
1.2 System overview
Our emotional state changing very fast, sensors will produce loads of data. From this mass, our
system has to time-efficiently extract some high-level information in order to conclude on a
particular emotion. Rather than proposing a computational model based on controlling the way
emotions are triggered, we want our system to learn when each one is generated. Thus, machine
learning systems such as Support Vector Machine (SVM) or Neural networks seem to be adapted
choices for this application. They all accept formatted data as inputs and “fire” a decision as output.
Rather than computing the data, they learn from training experiences what their output should do.
More about machine learning techniques will be drawn in Section 2.2.
Moreover, as we have previously seen, the system needs to adapt to a lot of different sensor types
but the same ERS cannot be applied with a maximal efficiency to all these different types. Therefore
we propose a multi-layered architecture with a specialised ERS for each different type of sensors and
a central neural network which will take the final decision. Thus we end up with the final architecture
and challenges:
In the Challenges part of Figure 3, we summarize the different issues that we will more specifically
address on the functional description in Section 2.
Figure 3. System architecture and Challenges
1.3 Constraints and limitations
We identify 3 types of limitations for our system:
Related to social and governmental aspects.
Regulatory entities approbation and market adoption may be affected by issues regarding data
privacy and tracking. Indeed monitoring people’s emotions is highly sensitive information.
Related to the machine learning algorithms.
The training phase requires access to a lot of data and to high computation power, thus leading to
constraints on a powerful and efficient (so expensive) infrastructure and the access to large
corpus or databases of emotional samples. Moreover the results of such type of system are not
100% sure but are subject to fluctuations. This could also cause regulatory and adoption
limitations.
Related to emotion recognition.
Feelings are difficult to distinguish and people may not react to a particular emotion with the
same internal stimuli. For example, Ekman [7] discovered that facial expression, speech and body
gesture indicators depend both on the affective state and the environment (cultural,
demographical) where the affective behaviour occurs. Therefore the accuracy of such system can
be pushed to a limited extent.
2. Functional description of the system
In the following we will further describe the technical issues presented in the section 1.2.
2.1 Standard definition
In order to guarantee the sensor data interoperability and provide an easy language to manipulate
them, we choose XML for the of the standard data format definition. For each type of sensor (e.g.
Temperature, web …), a common indicator and metric will be defined (e.g. internal temperature, °C).
In order to integrate with our solution, one will have to transform its data to fit with this specification
and present it in a XML document typically of the form:
The system will then acquire this data, check if it matches with the specification and, if successful
forward it to serve as input to the related machine learning systems.
Figure 4. XML Standard definition
2.2 The OCC Model
The appraisal theory of Orthony and al. [10], also known as the OCC model, is the concept our system
use to represent the user’s emotional state. It defines three categories of emotional situation:
- an event having consequences on one’s goal;
- an agent’s action that matches or deviates from the standards;
- an object (idea or concept) that suits or not one’s tastes.
An event can then be categorized considering it can:
- affect the fortune of others;
- influence one’s plans and prospects;
- act on one’s well-being;
This gives six different classes of emotion, which can also be differentiated into eleven distinct pairs
of opposite emotions (see below).
In order to translate these emotions into comprehensive variables that will then feed the ERS, we
base our proposition on the work of Trabelsi & al. [1] which define a set of 14 indicators:
- Sense of reality (real/unreal)
- Strength of Cognitive Unit
- Unexpectedness
- Desirability
- Praiseworthiness
- Appealingness
- Desire-for-other
- Liking
- Effort
- Realization
- Expect-Dev
- Familiarity
These indicators are described by digits values ranging from -1 (extreme low) to 1 (extreme high)
according to their respective intensities, giving a vector of 14 values describing an emotional state.
These OCC model vectors are the outputs of the specialized ERS and serve as input to the neural
network which will, based on them, retrieve the user’s emotion.
2.3 Review of existing ERS
Developing an efficient ERS for a type of sensor is a difficult and expensive task. Therefore, we do not
propose new specialised ERS in this study but rather focus on the integration of existing ones. For an
ERS to be compliant with our system, it must accept standardised data as inputs and outputs an OCC
model vector (or at least involve at some point this theory). This model being relatively widespread,
multiple implementations have already been successfully developed, tested and applied. We review
some of them in the following:
Thanks to the data standardisation, other ERS can with slight changes integrate with the proposed
system. They just need to change their output to be OCC model vectors. Therefore specialised
Support Vector machine, Dynamic Bayesian Network and could also be suitable targets.
Figure 5. Neural network model
Figure 6. Neural network architecture
2.4 The central neural network
Machine learning is the branch of artificial intelligence that studies and develops architectures and
algorithms to equip an agent (a machine which is usually a computer) with certain behaviour and an
ability to build internal models from empirical training data in order to solve a certain task [11].
Among them we distinguish the Neural Network that we will use for the final ERS.
A neural network is a multi-categorical classifier. It is composed of an interconnected multi-layered
set of entities called “neurons”, where each neuron can be “activated” outputting its “activity” which
is a level of confidence in the recognition of a pattern. Each neuron is connected to the neurons at
the next layer by weighted links.
The whole concept relies on the “firing” function φ(). When the sum of all inputs multiplied by their
affected weights exceeds a certain threshold, the neuron is activated and outputs a value yj as
explained in Figure 5. Thus the decision-making algorithm is the combination of multiple neurons’
decisions. Initially, scientists configure the network hierarchy and the “firing” rule of each neuron.
The training algorithms then “teach” the network by changing the weights affected to links.
Training phase algorithms
For each data samples, some neurons will “fire” (ie. say to the next level that they recognize the
pattern) and some not. During training time, data samples are marked as belonging to one category
or another. Their features are extracted and serve as inputs in the neural network. The objective of
training algorithms is then to minimize the quadratic error of the output by reducing the weights of
neurons that went wrong and improving the others depending on the level of confidence they
output.
Application to our system
The central neural network will accept the OCC model emotional vector as inputs and will be trained
with data samples extracted from the specialised ERS outputs. Its last layer will be composed of a
single neuron which will take the final decision regarding the recognised emotional state.
y1
y2
yn
w1
w2
wn
∑ φ()
𝒙𝒋 = 𝒘𝒊 ∗ 𝒚𝒊
𝒏
𝒊=𝟏
𝒚𝒋 = 𝝋 𝒙𝒋
Inp
uts
of
pre
vio
us
leve
l
Output for next level Neuron
Weights
Figure 7. Final cloud-based architecture
2.5 Toward a cloud-based application
In Section 1.3, we reported constraints on a powerful and scalable infrastructure needed in order to
deploy our ERS. Moreover, the system needs a way to improve its market reach. Thus, in order to
solve both problems, we propose in the following to host our system in a cloud computing
infrastructure.
The idea is to build a service-oriented architecture (SOA) where each service is a specialized ERS for a
sensor type. Each service will then “fire” its decision through the network to the central neural
network which will retrieve the emotion and send the result to the client. This SOA approach will
allow us to:
- Solve the problem of interoperability between different sensors type.
- Build a scalable infrastructure which provision on-demand high computational power.
- Train and redeploy the ERS systems in real-time.
- Combine different services in order to offer highly-customized services to clients.
- Offer an easy way for customers to interact with our system through Internet.
- Open the system to external innovation through API to allow client to build their own
application and their own ERS compliant with our system.
Finally we end up with the following system architecture:
3. Evaluation plan
3.1 Training phase
The performance and accuracy of a machine learning system is highly dependent of the databases
used to train it. Having enough authentic labelled data of human affective expressions is challenging
because they are short-lived, rare and context-sensitive [6], but also expensive as they involve
humans in the research process. Several attempts have been made to cope with these problems such
as hiring professional actors or use cinema footages, but training the system with them affects the
results’ accuracy when it comes to retrieve real-world emotions.
Therefore we review here some of the databases that must be considered for different ERS types:
After the ERS have been trained, they can generate accurate OCC model vectors which will, in turn,
be used to train the central neural network. Thanks to the cloud hosting, the different parts of the
system can be trained and redeployed seamlessly improving their accuracy over time without
disturbing the users.
3.2 Assessment phase
In order to reliably assess a customer-facing application, we cannot avoid including humans in the
evaluation process. Because demographical and cultural background of the audience can influence
the results (see Ekman [7]), the subjects will first have to fill a questionnaire. Then the sensors
already integrated in the system will be installed on each subject and a set of images and videos
likely to make them encompass some target emotions will be displayed. Finally we will compare for a
set of 22 emotions to what extent (percentage) each of them are correctly recognized by the system.
We will also be vigilant on the percentage of false positive for each category. Results will be matched
with the demographical data collected previously and we will draw a conclusion on the accuracy of
our ERS system.
3.3 Expected results
As seen in §2.3, the average results for ERS techniques range between 60 and 70%. Merging the data
acquisition of several sensor types (and notably the possible integration of physiological sensors) may
improve this ratio. Moreover, this project, by integrating more and more sensor types and enriching
the training set of the central neural network, should keep improving over time. Therefore we expect
to reach a ratio between 70 to 75% in order to launch a first customer-facing application.
4. How could this system form the basis of a successful business?
4.1 Plenty of potential applications
Emotional feedback is one of the hottest topics of physiological computing. Indeed, enabling a
machine to know its user internal state could allow it to respond more accurately to its request.
Recently, mood-based user interface have proved successful on the web. Websites such as
stereomood.com which provide music accordingly to the visitor’s mood encompasses great
audience, but they ask the user to click on its current mood in order to acquire this information. Thus
automatic mood-based UI, which is one of the most straightforward applications we could imagine,
would benefit a lot of our system. In the same field, reporting the user’s emotional state as status on
social network.
Knowing about the emotional state could also benefit the person itself and allow building a personal
coach for emotion containment. It could also interest external people such as marketers who desire
to know the emotional response to their last campaign or security guards who want to track
dangerous individuals and check that they do not commit crimes. Finally, in a certain extent
company-customized apps can be developed for frustration or stress monitoring.
And that are just a few applications, most of them have still to be discovered. Through an API
definition, customers could be able to produce their own ideas real, enabling open innovation. The
business model would then be web service provider, where customer pays according to their service
use and the complexity of their requests.
4.2 Competitors Analysis
The most serious actor of the field is currently Visual Recognition, a Dutch spin-off company of the
ISLA Laboratory at the Universiteit van Amsterdam who recently released an emotion recognition
software (ERS) called eMotion. Based on the facial expressions it aims at retrieving automatically
your emotion. It has successfully been incorporated in a website GladOrSad.com which recognizes
these two antagonists feelings based on an uploaded photo [12]. It has also been deployed for a
marketing study at Unilever to determine what food makes people happier [13]. Despite this
company is implanted on the field and has two successful products at its account, its ERS is not web-
based, so not as interoperable and easily customizable as the proposed system.
4.3 Sustainable advantages
As we intend to use technologies currently in research stage, it is unlikely that our competitors are
planning to integrate them in their products. This gives us the opportunity to secure our position
through partnerships with the scientists of the field, thus developing technical barriers to entry.
Furthermore, by providing a scalable highly-customizable and open infrastructure, we aim at
establishing a strong customer base and retaining them through optional personalized features, such
as a dedicated neural network for example.
4.4 Constraints
However the regulatory constraints identified in Section 1.3 are still a strong barrier for a Business to
Consumer (B2C) start-up creation. Ethical issues such as tracking of a person’s emotion and privacy
are likely to be considered before launching any application.
Moreover the machine learning algorithms being uncertain by nature, having a system output which
could be possibly wrong can raise legal issues. For example, when tracking criminals, if the system
fires anger while the subject is completely calm, it can cause unexpected troubles.
Finally this system being the combination of several research works over laboratories, intellectual
property and lobbying strategies issues might raise on its path to commercialisation.
4.5 Conclusion
The proposed system can be the basis of a successful business, because it allows several applications
which could encompasses great success. The architecture being hosted on the cloud, there is no
need for an expensive investment on servers and the computational resources being provisioned on-
demand, the cloud hosting cost will match the web service use and thus the benefits. There is
currently just one competitor on the field and it is not oriented towards a web integration of its
solution. The most important risk is related to ethical and regulatory issues that can arise from the
use of the web services by illegal applications. Therefore, we strongly believe that the proposed
system should be further studied in the upcoming years.
References
[1] A. Trabelsi and C. Frasson, “The Emotional Machine, a Machine Learning Approach to Online
Prediction of User’s Emotion and Intensity”, 10th IEEE International Conference on Advanced
Learning Technologies, 2010, pp. 613-617.
[2] M. Ochs and C. Frasson, “Emotionally Intelligent Tutoring Systems”, Proc. International Florida
Artificial Intelligence Research Society Conference (FLAIRS 04), May 2004, pp. 251-256.
[3] C. M. Lee and S. S. Narayanan, “Toward detecting emotions in spoken dialogs,” IEEE Tran. Speech
and Audio Processing, vol. 13, Mar. 2005, pp. 293-303, doi:10.1109/TSA.2004.838534.
[4] Ji. Qiang, P. Lan and C. Looney, “A probabilistic framework for modelling and real-time monitoring
human fatigue,” IEEE Trans. Systems, Man and Cybernetics, Part A: Systems and Humans, vol.36, Sep.
2006, pp. 862-875, doi: 10.1109/TSMCA.2005.855922
[5] S. Slater, R. Moreton, K. Buckley and A. Bridges, “A Review of Agent Emotion Architectures,”
Eludamos Journal for Computer Game Culture, vol.2, 2008, pp. 203-214.
[6] Z. Zeng, M. Pantic, G. I. Roisman and T. S. Huang, “A Survey of Affect Recognition Methods: Audio,
Visual, and Spontaneous Expressions,” IEEE Tran. Pattern Analysis and Machine Intelligence, vol. 31,
Jan. 2009, pp. 39-58, doi:10.1109/TPAMI.2008.52
[7] Emotion in the Human Face, P. Ekman, ed., second ed. Cambridge Univ. Press, 1982.
[8] René Descartes (1983). The Passions. Paris: Librairie Philosophique J. Vrin. 353.
[9] J. Bates, A. B. Loyall and W. S. Reilly, “An Architecture for Action, Emotion, and Social Behaviour,”
European Workshop on Modelling and Autonomous Agents in a Multi-Agent World (MAAMAW 92),
Jul. 1992, pp. 55-68.
[10] A. Ortony, G. L. Clore and A. Collins, The cognitive structure of emotions. New York: Cambridge
University Press, 1988.
[11] Martel J. Convolutional Neural Networks - A Short Introduction to Deep Learning. Not published
yet. 2012.
[12] Available at http://www.gladorsad.com/, Last accessed 22nd March 2012.
[13] Available at http://www.wired.com/science/discoveries/news/2007/07/expression_research,
Last accessed 22nd March 2012.
[14] C. Conati and H. Maclaren, “Empirically building and evaluating a probabilistic model of user
affect,” User Modelling and User-Adapted Interaction, vol. 19, Aug. 2009, pp. 267-303.
[15] M. Shaikh, P. Helmut, M. Ishizuka. (2006). A cognitively based approach to affect sensing from
text. IUI '06 Proceedings of the 11th international conference on Intelligent user interfaces. 1 (1),
p303-305.
[16] Salway, A., Graham, M.: Extracting information about emotions in films. In ACM Multimedia 03,
Berkeley, CA, USA.
[17] S. Ioannou et al., “Emotion recognition through facial expression analysis based on a neurofuzzy
method,” J. of Neural Networks, vol. 18, pp. 423–435, 2005.
[18] T. Kanade, J. Cohn, and Y. Tian, “Comprehensive Database for Facial Expression Analysis,” Proc.
IEEE Int’l Conf. Face and Gesture Recognition (AFGR ’00), pp. 46-53, 2000.
[19] L. Yin, X. Wei, Y. Sun, J. Wang, and M.J. Rosato, “A 3D Facial Expression Database for Facial
Behavior Research,” Proc. IEEE Int’l Conf. Automatic Face and Gesture Recognition (AFGR ’06), pp.
211-216, 2006.
[20] H. Gunes and M. Piccardi, “A Bimodal Face and Body Gesture Database for Automatic Analysis of
Human Nonverbal Affective Behavior,” Proc. 18th Int’l Conf. Pattern Recognition (ICPR ’06), vol. 1,
pp. 1148-1153, 2006.
[21] R. Banse and K.R. Scherer, “Acoustic Profiles in Vocal Emotion Expression,” J. Personality Social
Psychology, vol. 70, no. 3, pp. 614-636, 1996.
[22] Available at: http://cpk.auc.dk/~tb/speech/Emotions/ Last accessed: 04th April 2012.