Emotional pattern recognition using machine learning

A report submitted to Dublin City University, School of Computing for module CA652: Information

Access, 20011/2012. I/We hereby certify that the work presented and the material contained

herein is my/our own except where explicitly stated references to other material are made.

Emotional pattern recognition using machine learning

Information access assignment

Module: CA652

Lecturers: Dr Alan SMEATON

Dr Cathal GURRIN

11211509

MECT1

Summary

Abstract ............................................................................................................................ 2

1. System objectives ......................................................................................................... 3

1.1 Issues and related functionalities .................................................................................................. 3

1.2 System overview............................................................................................................................ 4

1.3 Constraints and limitations ........................................................................................................... 5

2. Functional description of the system ............................................................................ 5

2.1 Standard definition ........................................................................................................................ 5

2.2 The OCC Model .............................................................................................................................. 6

2.3 Review of existing ERS ................................................................................................................... 1

2.4 The central neural network ........................................................................................................... 2

2.5 Toward a cloud-based application ................................................................................................ 3

3. Evaluation plan ............................................................................................................ 4

3.1 Training phase ............................................................................................................................... 4

3.2 Assessment phase ......................................................................................................................... 4

3.3 Expected results ............................................................................................................................ 4

4. How could this system form the basis of a successful business? .................................... 5

4.1 Plenty of potential applications .................................................................................................... 5

4.2 Competitors Analysis ..................................................................................................................... 5

4.3 Sustainable advantages ................................................................................................................. 5

4.4 Constraints..................................................................................................................................... 6

4.5 Conclusion ..................................................................................................................................... 6

References ........................................................................................................................ 7

Abstract

We have entered an era of pervasive computing. Computers and the Internet have become

ubiquitous in our everyday life. However, most of the Human Computer interactions (HCI)

interfaces are still based on the traditional model of being passive on responding only to

user’s commands [1]. Recently, the automated analysis of human affective behaviour has

attracted increasing attention from researchers and companies. Indeed, equipping a

machine with the ability of responding to the user’s emotional state have been proved

successful in many fields including Tutoring Systems [2], call centers [3], Intelligent

Automobile Systems [4] and game industry [5].

In this paper, we propose an emotional pattern recognition system based on the inputs of

several types of sensors. We draw the characteristics of such a system before detailing the

issues and constraints that should be considered when designing it. Next we describe a

possible architecture and conclude on its potential for commercialisation.

1. System objectives

1.1 Issues and related functionalities

The value proposition of our system is to turn various sensors data into the understanding of an

individual’s emotions. Therefore a first attempt to summarize its function model is:

When developing an emotion recognition system (ERS), the first task is to define what emotional

state it will retrieve. There are three different ways of taking a look at affects [6]:

- Discrete categories: We can refer to different classifications which have already been made

by philosophers (Spinoza, Prinz), scientists (Ekman [7]) or both (Descartes [8]). The most used

is the 6 basic emotions (Happiness, Anger, Fear, Sadness, Disgust and Surprise), the other

emotions being considered as varieties of these ones.

- Dimensional description: an affective state is described in terms of a small number of latent

dimensions such as evaluation, activation, control, power, etc.

- Appraisal-based approach: an emotion is described through a set of stimulus evaluation

checks including the novelty, intrinsic pleasantness, goal based significance, coping potential,

and compatibility with standards.

The appraisal-based approach defines a set of variables which maximizes the distinction between 2

different emotional states making it more practical and more suitable for the Artificial Intelligence

(AI) field than the basic classification model. For its successful implementation on various projects

(e.g. [9]), we choose to be compliant with the OCC model [10] that we will further detail in §2.3.

Even with that model, affects are often complex to distinguish. The more distinct data about the

subject internal state the system obtain, the more likely it will be able to recognize the relevant

feeling. Therefore gathering all types of sensors’ inputs (audio-visual, physiological etc…) in the ERS is

critical. However, most off-the-shelf sensors do not necessarily share a common representation

(metric) of a given indicator and the same emotional recognition technique cannot be applied to all

sensor type. Thus we need to specify a data formatting standard.

Finally the functionalities of our system are summarized by the following functional model:

Various sensors’ data System

Emotion recognized Figure 1. Broad functional model

Standardized sensors data

System

OCC Model emotional state recognized

Figure 2. Functional model

1.2 System overview

Our emotional state changing very fast, sensors will produce loads of data. From this mass, our

system has to time-efficiently extract some high-level information in order to conclude on a

particular emotion. Rather than proposing a computational model based on controlling the way

emotions are triggered, we want our system to learn when each one is generated. Thus, machine

learning systems such as Support Vector Machine (SVM) or Neural networks seem to be adapted

choices for this application. They all accept formatted data as inputs and “fire” a decision as output.

Rather than computing the data, they learn from training experiences what their output should do.

More about machine learning techniques will be drawn in Section 2.2.

Moreover, as we have previously seen, the system needs to adapt to a lot of different sensor types

but the same ERS cannot be applied with a maximal efficiency to all these different types. Therefore

we propose a multi-layered architecture with a specialised ERS for each different type of sensors and

a central neural network which will take the final decision. Thus we end up with the final architecture

and challenges:

In the Challenges part of Figure 3, we summarize the different issues that we will more specifically

address on the functional description in Section 2.

Figure 3. System architecture and Challenges

1.3 Constraints and limitations

We identify 3 types of limitations for our system:

Related to social and governmental aspects.

Regulatory entities approbation and market adoption may be affected by issues regarding data

privacy and tracking. Indeed monitoring people’s emotions is highly sensitive information.

Related to the machine learning algorithms.

The training phase requires access to a lot of data and to high computation power, thus leading to

constraints on a powerful and efficient (so expensive) infrastructure and the access to large

corpus or databases of emotional samples. Moreover the results of such type of system are not

100% sure but are subject to fluctuations. This could also cause regulatory and adoption

limitations.

Related to emotion recognition.

Feelings are difficult to distinguish and people may not react to a particular emotion with the

same internal stimuli. For example, Ekman [7] discovered that facial expression, speech and body

gesture indicators depend both on the affective state and the environment (cultural,

demographical) where the affective behaviour occurs. Therefore the accuracy of such system can

be pushed to a limited extent.

2. Functional description of the system

In the following we will further describe the technical issues presented in the section 1.2.

2.1 Standard definition

In order to guarantee the sensor data interoperability and provide an easy language to manipulate

them, we choose XML for the of the standard data format definition. For each type of sensor (e.g.

Temperature, web …), a common indicator and metric will be defined (e.g. internal temperature, °C).

In order to integrate with our solution, one will have to transform its data to fit with this specification

and present it in a XML document typically of the form:

The system will then acquire this data, check if it matches with the specification and, if successful

forward it to serve as input to the related machine learning systems.

Figure 4. XML Standard definition

2.2 The OCC Model

The appraisal theory of Orthony and al. [10], also known as the OCC model, is the concept our system

use to represent the user’s emotional state. It defines three categories of emotional situation:

- an event having consequences on one’s goal;

- an agent’s action that matches or deviates from the standards;

- an object (idea or concept) that suits or not one’s tastes.

An event can then be categorized considering it can:

- affect the fortune of others;

- influence one’s plans and prospects;

- act on one’s well-being;

This gives six different classes of emotion, which can also be differentiated into eleven distinct pairs

of opposite emotions (see below).

In order to translate these emotions into comprehensive variables that will then feed the ERS, we

base our proposition on the work of Trabelsi & al. [1] which define a set of 14 indicators:

- Sense of reality (real/unreal)

- Strength of Cognitive Unit

- Unexpectedness

- Desirability

- Praiseworthiness

- Appealingness

- Desire-for-other

- Liking

- Effort

- Realization

- Expect-Dev

- Familiarity

These indicators are described by digits values ranging from -1 (extreme low) to 1 (extreme high)

according to their respective intensities, giving a vector of 14 values describing an emotional state.

These OCC model vectors are the outputs of the specialized ERS and serve as input to the neural

network which will, based on them, retrieve the user’s emotion.

2.3 Review of existing ERS

Developing an efficient ERS for a type of sensor is a difficult and expensive task. Therefore, we do not

propose new specialised ERS in this study but rather focus on the integration of existing ones. For an

ERS to be compliant with our system, it must accept standardised data as inputs and outputs an OCC

model vector (or at least involve at some point this theory). This model being relatively widespread,

multiple implementations have already been successfully developed, tested and applied. We review

some of them in the following:

Thanks to the data standardisation, other ERS can with slight changes integrate with the proposed

system. They just need to change their output to be OCC model vectors. Therefore specialised

Support Vector machine, Dynamic Bayesian Network and could also be suitable targets.

Figure 5. Neural network model

Figure 6. Neural network architecture

2.4 The central neural network

Machine learning is the branch of artificial intelligence that studies and develops architectures and

algorithms to equip an agent (a machine which is usually a computer) with certain behaviour and an

ability to build internal models from empirical training data in order to solve a certain task [11].

Among them we distinguish the Neural Network that we will use for the final ERS.

A neural network is a multi-categorical classifier. It is composed of an interconnected multi-layered

set of entities called “neurons”, where each neuron can be “activated” outputting its “activity” which

is a level of confidence in the recognition of a pattern. Each neuron is connected to the neurons at

the next layer by weighted links.

The whole concept relies on the “firing” function φ(). When the sum of all inputs multiplied by their

affected weights exceeds a certain threshold, the neuron is activated and outputs a value yj as

explained in Figure 5. Thus the decision-making algorithm is the combination of multiple neurons’

decisions. Initially, scientists configure the network hierarchy and the “firing” rule of each neuron.

The training algorithms then “teach” the network by changing the weights affected to links.

Training phase algorithms

For each data samples, some neurons will “fire” (ie. say to the next level that they recognize the

pattern) and some not. During training time, data samples are marked as belonging to one category

or another. Their features are extracted and serve as inputs in the neural network. The objective of

training algorithms is then to minimize the quadratic error of the output by reducing the weights of

neurons that went wrong and improving the others depending on the level of confidence they

output.

Application to our system

The central neural network will accept the OCC model emotional vector as inputs and will be trained

with data samples extracted from the specialised ERS outputs. Its last layer will be composed of a

single neuron which will take the final decision regarding the recognised emotional state.

y1

y2

yn

w1

w2

wn

∑ φ()

𝒙𝒋 = 𝒘𝒊 ∗ 𝒚𝒊

𝒏

𝒊=𝟏

𝒚𝒋 = 𝝋 𝒙𝒋

Inp

uts

of

pre

vio

us

leve

l

Output for next level Neuron

Weights

Figure 7. Final cloud-based architecture

2.5 Toward a cloud-based application

In Section 1.3, we reported constraints on a powerful and scalable infrastructure needed in order to

deploy our ERS. Moreover, the system needs a way to improve its market reach. Thus, in order to

solve both problems, we propose in the following to host our system in a cloud computing

infrastructure.

The idea is to build a service-oriented architecture (SOA) where each service is a specialized ERS for a

sensor type. Each service will then “fire” its decision through the network to the central neural

network which will retrieve the emotion and send the result to the client. This SOA approach will

allow us to:

- Solve the problem of interoperability between different sensors type.

- Build a scalable infrastructure which provision on-demand high computational power.

- Train and redeploy the ERS systems in real-time.

- Combine different services in order to offer highly-customized services to clients.

- Offer an easy way for customers to interact with our system through Internet.

- Open the system to external innovation through API to allow client to build their own

application and their own ERS compliant with our system.

Finally we end up with the following system architecture:

3. Evaluation plan

3.1 Training phase

The performance and accuracy of a machine learning system is highly dependent of the databases

used to train it. Having enough authentic labelled data of human affective expressions is challenging

because they are short-lived, rare and context-sensitive [6], but also expensive as they involve

humans in the research process. Several attempts have been made to cope with these problems such

as hiring professional actors or use cinema footages, but training the system with them affects the

results’ accuracy when it comes to retrieve real-world emotions.

Therefore we review here some of the databases that must be considered for different ERS types:

After the ERS have been trained, they can generate accurate OCC model vectors which will, in turn,

be used to train the central neural network. Thanks to the cloud hosting, the different parts of the

system can be trained and redeployed seamlessly improving their accuracy over time without

disturbing the users.

3.2 Assessment phase

In order to reliably assess a customer-facing application, we cannot avoid including humans in the

evaluation process. Because demographical and cultural background of the audience can influence

the results (see Ekman [7]), the subjects will first have to fill a questionnaire. Then the sensors

already integrated in the system will be installed on each subject and a set of images and videos

likely to make them encompass some target emotions will be displayed. Finally we will compare for a

set of 22 emotions to what extent (percentage) each of them are correctly recognized by the system.

We will also be vigilant on the percentage of false positive for each category. Results will be matched

with the demographical data collected previously and we will draw a conclusion on the accuracy of

our ERS system.

3.3 Expected results

As seen in §2.3, the average results for ERS techniques range between 60 and 70%. Merging the data

acquisition of several sensor types (and notably the possible integration of physiological sensors) may

improve this ratio. Moreover, this project, by integrating more and more sensor types and enriching

the training set of the central neural network, should keep improving over time. Therefore we expect

to reach a ratio between 70 to 75% in order to launch a first customer-facing application.

4. How could this system form the basis of a successful business?

4.1 Plenty of potential applications

Emotional feedback is one of the hottest topics of physiological computing. Indeed, enabling a

machine to know its user internal state could allow it to respond more accurately to its request.

Recently, mood-based user interface have proved successful on the web. Websites such as

stereomood.com which provide music accordingly to the visitor’s mood encompasses great

audience, but they ask the user to click on its current mood in order to acquire this information. Thus

automatic mood-based UI, which is one of the most straightforward applications we could imagine,

would benefit a lot of our system. In the same field, reporting the user’s emotional state as status on

social network.

Knowing about the emotional state could also benefit the person itself and allow building a personal

coach for emotion containment. It could also interest external people such as marketers who desire

to know the emotional response to their last campaign or security guards who want to track

dangerous individuals and check that they do not commit crimes. Finally, in a certain extent

company-customized apps can be developed for frustration or stress monitoring.

And that are just a few applications, most of them have still to be discovered. Through an API

definition, customers could be able to produce their own ideas real, enabling open innovation. The

business model would then be web service provider, where customer pays according to their service

use and the complexity of their requests.

4.2 Competitors Analysis

The most serious actor of the field is currently Visual Recognition, a Dutch spin-off company of the

ISLA Laboratory at the Universiteit van Amsterdam who recently released an emotion recognition

software (ERS) called eMotion. Based on the facial expressions it aims at retrieving automatically

your emotion. It has successfully been incorporated in a website GladOrSad.com which recognizes

these two antagonists feelings based on an uploaded photo [12]. It has also been deployed for a

marketing study at Unilever to determine what food makes people happier [13]. Despite this

company is implanted on the field and has two successful products at its account, its ERS is not web-

based, so not as interoperable and easily customizable as the proposed system.

4.3 Sustainable advantages

As we intend to use technologies currently in research stage, it is unlikely that our competitors are

planning to integrate them in their products. This gives us the opportunity to secure our position

through partnerships with the scientists of the field, thus developing technical barriers to entry.

Furthermore, by providing a scalable highly-customizable and open infrastructure, we aim at

establishing a strong customer base and retaining them through optional personalized features, such

as a dedicated neural network for example.

4.4 Constraints

However the regulatory constraints identified in Section 1.3 are still a strong barrier for a Business to

Consumer (B2C) start-up creation. Ethical issues such as tracking of a person’s emotion and privacy

are likely to be considered before launching any application.

Moreover the machine learning algorithms being uncertain by nature, having a system output which

could be possibly wrong can raise legal issues. For example, when tracking criminals, if the system

fires anger while the subject is completely calm, it can cause unexpected troubles.

Finally this system being the combination of several research works over laboratories, intellectual

property and lobbying strategies issues might raise on its path to commercialisation.

4.5 Conclusion

The proposed system can be the basis of a successful business, because it allows several applications

which could encompasses great success. The architecture being hosted on the cloud, there is no

need for an expensive investment on servers and the computational resources being provisioned on-

demand, the cloud hosting cost will match the web service use and thus the benefits. There is

currently just one competitor on the field and it is not oriented towards a web integration of its

solution. The most important risk is related to ethical and regulatory issues that can arise from the

use of the web services by illegal applications. Therefore, we strongly believe that the proposed

system should be further studied in the upcoming years.

References

[1] A. Trabelsi and C. Frasson, “The Emotional Machine, a Machine Learning Approach to Online

Prediction of User’s Emotion and Intensity”, 10th IEEE International Conference on Advanced

Learning Technologies, 2010, pp. 613-617.

[2] M. Ochs and C. Frasson, “Emotionally Intelligent Tutoring Systems”, Proc. International Florida

Artificial Intelligence Research Society Conference (FLAIRS 04), May 2004, pp. 251-256.

[3] C. M. Lee and S. S. Narayanan, “Toward detecting emotions in spoken dialogs,” IEEE Tran. Speech

and Audio Processing, vol. 13, Mar. 2005, pp. 293-303, doi:10.1109/TSA.2004.838534.

[4] Ji. Qiang, P. Lan and C. Looney, “A probabilistic framework for modelling and real-time monitoring

human fatigue,” IEEE Trans. Systems, Man and Cybernetics, Part A: Systems and Humans, vol.36, Sep.

2006, pp. 862-875, doi: 10.1109/TSMCA.2005.855922

[5] S. Slater, R. Moreton, K. Buckley and A. Bridges, “A Review of Agent Emotion Architectures,”

Eludamos Journal for Computer Game Culture, vol.2, 2008, pp. 203-214.

[6] Z. Zeng, M. Pantic, G. I. Roisman and T. S. Huang, “A Survey of Affect Recognition Methods: Audio,

Visual, and Spontaneous Expressions,” IEEE Tran. Pattern Analysis and Machine Intelligence, vol. 31,

Jan. 2009, pp. 39-58, doi:10.1109/TPAMI.2008.52

[7] Emotion in the Human Face, P. Ekman, ed., second ed. Cambridge Univ. Press, 1982.

[8] René Descartes (1983). The Passions. Paris: Librairie Philosophique J. Vrin. 353.

[9] J. Bates, A. B. Loyall and W. S. Reilly, “An Architecture for Action, Emotion, and Social Behaviour,”

European Workshop on Modelling and Autonomous Agents in a Multi-Agent World (MAAMAW 92),

Jul. 1992, pp. 55-68.

[10] A. Ortony, G. L. Clore and A. Collins, The cognitive structure of emotions. New York: Cambridge

University Press, 1988.

[11] Martel J. Convolutional Neural Networks - A Short Introduction to Deep Learning. Not published

yet. 2012.

[12] Available at http://www.gladorsad.com/, Last accessed 22nd March 2012.

[13] Available at http://www.wired.com/science/discoveries/news/2007/07/expression_research,

Last accessed 22nd March 2012.

[14] C. Conati and H. Maclaren, “Empirically building and evaluating a probabilistic model of user

affect,” User Modelling and User-Adapted Interaction, vol. 19, Aug. 2009, pp. 267-303.

[15] M. Shaikh, P. Helmut, M. Ishizuka. (2006). A cognitively based approach to affect sensing from

text. IUI '06 Proceedings of the 11th international conference on Intelligent user interfaces. 1 (1),

p303-305.

[16] Salway, A., Graham, M.: Extracting information about emotions in films. In ACM Multimedia 03,

Berkeley, CA, USA.

[17] S. Ioannou et al., “Emotion recognition through facial expression analysis based on a neurofuzzy

method,” J. of Neural Networks, vol. 18, pp. 423–435, 2005.

http://www.gladorsad.com/

http://www.wired.com/science/discoveries/news/2007/07/expression_research

[18] T. Kanade, J. Cohn, and Y. Tian, “Comprehensive Database for Facial Expression Analysis,” Proc.

IEEE Int’l Conf. Face and Gesture Recognition (AFGR ’00), pp. 46-53, 2000.

[19] L. Yin, X. Wei, Y. Sun, J. Wang, and M.J. Rosato, “A 3D Facial Expression Database for Facial

Behavior Research,” Proc. IEEE Int’l Conf. Automatic Face and Gesture Recognition (AFGR ’06), pp.

211-216, 2006.

[20] H. Gunes and M. Piccardi, “A Bimodal Face and Body Gesture Database for Automatic Analysis of

Human Nonverbal Affective Behavior,” Proc. 18th Int’l Conf. Pattern Recognition (ICPR ’06), vol. 1,

pp. 1148-1153, 2006.

[21] R. Banse and K.R. Scherer, “Acoustic Profiles in Vocal Emotion Expression,” J. Personality Social

Psychology, vol. 70, no. 3, pp. 614-636, 1996.

[22] Available at: http://cpk.auc.dk/~tb/speech/Emotions/ Last accessed: 04th April 2012.

http://cpk.auc.dk/~tb/speech/Emotions/

Emotional pattern recognition using machine learning

Documents

Transcript of Emotional pattern recognition using machine learning